Scolaris Content Display Scolaris Content Display

Structural magnetic resonance imaging for the early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment

Collapse all Expand all

Background

Mild cognitive impairment (MCI) due to Alzheimer's disease is the symptomatic predementia phase of Alzheimer's disease dementia, characterised by cognitive and functional impairment not severe enough to fulfil the criteria for dementia. In clinical samples, people with amnestic MCI are at high risk of developing Alzheimer's disease dementia, with annual rates of progression from MCI to Alzheimer's disease estimated at approximately 10% to 15% compared with the base incidence rates of Alzheimer's disease dementia of 1% to 2% per year.

Objectives

To assess the diagnostic accuracy of structural magnetic resonance imaging (MRI) for the early diagnosis of dementia due to Alzheimer's disease in people with MCI versus the clinical follow‐up diagnosis of Alzheimer's disease dementia as a reference standard (delayed verification).

To investigate sources of heterogeneity in accuracy, such as the use of qualitative visual assessment or quantitative volumetric measurements, including manual or automatic (MRI) techniques, or the length of follow‐up, and age of participants.

MRI was evaluated as an add‐on test in addition to clinical diagnosis of MCI to improve early diagnosis of dementia due to Alzheimer's disease in people with MCI.

Search methods

On 29 January 2019 we searched Cochrane Dementia and Cognitive Improvement's Specialised Register and the databases, MEDLINE, Embase, BIOSIS Previews, Science Citation Index, PsycINFO, and LILACS. We also searched the reference lists of all eligible studies identified by the electronic searches.

Selection criteria

We considered cohort studies of any size that included prospectively recruited people of any age with a diagnosis of MCI. We included studies that compared the diagnostic test accuracy of baseline structural MRI versus the clinical follow‐up diagnosis of Alzheimer's disease dementia (delayed verification). We did not exclude studies on the basis of length of follow‐up. We included studies that used either qualitative visual assessment or quantitative volumetric measurements of MRI to detect atrophy in the whole brain or in specific brain regions, such as the hippocampus, medial temporal lobe, lateral ventricles, entorhinal cortex, medial temporal gyrus, lateral temporal lobe, amygdala, and cortical grey matter.

Data collection and analysis

Four teams of two review authors each independently reviewed titles and abstracts of articles identified by the search strategy. Two teams of two review authors each independently assessed the selected full‐text articles for eligibility, extracted data and solved disagreements by consensus. Two review authors independently assessed the quality of studies using the QUADAS‐2 tool. We used the hierarchical summary receiver operating characteristic (HSROC) model to fit summary ROC curves and to obtain overall measures of relative accuracy in subgroup analyses. We also used these models to obtain pooled estimates of sensitivity and specificity when sufficient data sets were available.

Main results

We included 33 studies, published from 1999 to 2019, with 3935 participants of whom 1341 (34%) progressed to Alzheimer's disease dementia and 2594 (66%) did not. Of the participants who did not progress to Alzheimer's disease dementia, 2561 (99%) remained stable MCI and 33 (1%) progressed to other types of dementia. The median proportion of women was 53% and the mean age of participants ranged from 63 to 87 years (median 73 years). The mean length of clinical follow‐up ranged from 1 to 7.6 years (median 2 years). Most studies were of poor methodological quality due to risk of bias for participant selection or the index test, or both.

Most of the included studies reported data on the volume of the total hippocampus (pooled mean sensitivity 0.73 (95% confidence interval (CI) 0.64 to 0.80); pooled mean specificity 0.71 (95% CI 0.65 to 0.77); 22 studies, 2209 participants). This evidence was of low certainty due to risk of bias and inconsistency.

Seven studies reported data on the atrophy of the medial temporal lobe (mean sensitivity 0.64 (95% CI 0.53 to 0.73); mean specificity 0.65 (95% CI 0.51 to 0.76); 1077 participants) and five studies on the volume of the lateral ventricles (mean sensitivity 0.57 (95% CI 0.49 to 0.65); mean specificity 0.64 (95% CI 0.59 to 0.70); 1077 participants). This evidence was of moderate certainty due to risk of bias.

Four studies with 529 participants analysed the volume of the total entorhinal cortex and four studies with 424 participants analysed the volume of the whole brain. We did not estimate pooled sensitivity and specificity for the volume of these two regions because available data were sparse and heterogeneous.

We could not statistically evaluate the volumes of the lateral temporal lobe, amygdala, medial temporal gyrus, or cortical grey matter assessed in small individual studies.

We found no evidence of a difference between studies in the accuracy of the total hippocampal volume with regards to duration of follow‐up or age of participants, but the manual MRI technique was superior to automatic techniques in mixed (mostly indirect) comparisons. We did not assess the relative accuracy of the volumes of different brain regions measured by MRI because only indirect comparisons were available, studies were heterogeneous, and the overall accuracy of all regions was moderate.

Authors' conclusions

The volume of hippocampus or medial temporal lobe, the most studied brain regions, showed low sensitivity and specificity and did not qualify structural MRI as a stand‐alone add‐on test for an early diagnosis of dementia due to Alzheimer's disease in people with MCI. This is consistent with international guidelines, which recommend imaging to exclude non‐degenerative or surgical causes of cognitive impairment and not to diagnose dementia due to Alzheimer's disease. In view of the low quality of most of the included studies, the findings of this review should be interpreted with caution. Future research should not focus on a single biomarker, but rather on combinations of biomarkers to improve an early diagnosis of Alzheimer's disease dementia.

How accurate is magnetic resonance imaging for the early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment?

Why is improving Alzheimer's disease diagnosis important?

Cognitive impairment is when people have problems remembering, learning, concentrating and making decisions. People with mild cognitive impairment (MCI) generally have more memory problems than other people of their age, but these problems are not severe enough to be classified as dementia. Studies have shown that people with MCI and loss of memory are more likely to develop Alzheimer's disease dementia (approximately 10% to 15% of cases per year) than people without MCI (1% to 2% per year). Currently, the only reliable way of diagnosing Alzheimer's disease dementia is to follow people with MCI and assess cognitive changes over the years. Magnetic resonance imaging (MRI) may detect changes in the brain structures that indicate the beginning of Alzheimer's disease. Early diagnosis of MCI due to Alzheimer's disease is important because people with MCI could benefit from early treatment to prevent or delay cognitive decline.

What was the aim of this review?

To assess the diagnostic accuracy of MRI for the early diagnosis of dementia due to Alzheimer's disease in people with MCI.

What was studied in the review?

The volume of several brain regions was measured with MRI. Most studies (22 studies, 2209 participants) measured the volume of the hippocampus, a region of the brain that is associated primarily with memory.

What are the main results in this review?

Thirty‐three studies were eligible, in which 3935 participants with MCI were included and followed up for two or three years to see if they developed Alzheimer's disease dementia. About a third of them converted to Alzheimer's disease dementia, and the others did not or developed other types of dementia.

We found that MRI is not accurate enough to identify people with MCI who will develop dementia due to Alzheimer's disease. The correct prediction of Alzheimer's disease would be missed in 81 out of 300 people with MCI (false negatives) and a wrong prediction of Alzheimer's disease would be made in 203 out of 700 people with MCI (false positives). As a result, people with a false‐negative diagnosis would be falsely reassured and would not prepare themselves to cope with Alzheimer's disease, while those with a false‐positive diagnosis would suffer from the wrongly anticipated diagnosis.

How reliable are the results of the studies?

The included studies diagnosed Alzheimer's disease dementia by assessing all participants with standard clinical criteria after two or three years' follow‐up. We had some concerns about how the studies were conducted, since the participants were mainly selected from clinical registries and referral centres, and we also had concerns about how studies interpreted MRI. Moreover, the studies were conducted differently from each other, and they used different methods to select people with MCI and perform MRI.

Who do the results of this review apply to?

The results do not apply to people with MCI in the community, but only to people with MCI who attend memory clinics or referral centres.

What are the implications of this review?

MRI, as a single test, is not accurate for the early diagnosis of dementia due to Alzheimer's disease in people with MCI since one in three or four participants received a wrong diagnosis of Alzheimer's disease. Future research should not focus on a single test (such as MRI), but rather on combinations of tests to improve an early diagnosis of Alzheimer's disease dementia.

How up to date is this review?

This evidence is up to date to 29 January 2019.

Authors' conclusions

Implications for practice

Structural magnetic resonance imaging (MRI) in hippocampus or medial temporal lobe, the most studied brain regions, showed low sensitivity and specificity and did not reach the standard required to be a stand‐alone, add‐on test for an early diagnosis of dementia due to Alzheimer's disease in people with MCI. This is consistent with international guidelines, which recommend structural MRI to exclude non‐degenerative or surgical causes of cognitive impairment but not to diagnose dementia due to Alzheimer's disease. Medial temporal lobe atrophy or hippocampal volume measured by structural MRI cannot be recommended in clinical practice for an early diagnosis of dementia due to Alzheimer's disease in people with MCI.

Implications for research

Research priorities include the definition of what is considered to be a 'positive' result of volumetric assessment of brain regions measured by structural MRI. Research is essential for the development of accurate criteria to address a timely diagnosis of Alzheimer's disease dementia. Frisoni and colleagues proposed a research framework to assess the analytical and clinical validity of biomarkers for Alzheimer's disease and their clinical utility. To achieve these objectives, research priorities include the standardisation of the readout of biomarker assays and thresholds for normality, the evaluation of their performance in detecting early disease, the development of diagnostic algorithms comprising combinations of biomarkers, and the development of clinical guidelines for the use of biomarkers in qualified memory clinics (Frisoni 2017a). Implementation of these proposed research topics are expected to provide useful results over the medium term.

We identified several weaknesses in the included studies using the QUADAS 2 quality assessment tool. We recommend that future studies consider:

  1. including large prospective cohorts of consecutive or random samples of people with a definite diagnosis of MCI;

  2. using a diagnostic accuracy study design that adheres to the recommendations of the STARDdem Initiative 'Reporting standards for studies of diagnostic test accuracy in dementia' (Noel‐Storr 2014);

  3. Incorporating the QUADAS 2 tool into the study design (Whiting 2011);

  4. Providing a clear, pre‐specified definition of what is a 'positive' result of the index test;

  5. Assessing interobserver and intraobserver variability; and

  6. evaluating long‐term outcomes and cost effectiveness of the index text implementation.

Summary of findings

Open in table viewer
Summary of findings Whole brain volume or volume of specific brain regions for early Alzheimer's disease dementia diagnosis in people with mild cognitive impairment

Whole brain volume versus volume of specific brain regions for early Alzheimer's disease dementia diagnosis in people with mild cognitive impairment

Patient or population: people with mild cognitive impairment (MCI)

Setting: memory clinics or registry data (e.g. ADNI)

New test: volume of total hippocampus, medial temporal lobe, total entorhinal cortex, lateral ventricles, and whole brain. Volume measured with either quantitative manual or automated MRI technique

Cut‐off value: not reported

Number of results per 1000 participants tested (95% CI)

Prevalence 30%. Typically seen in participants with MCI after 2 to 3 years of follow‐up

Test

Number of participants
(Number of studies)

True positives

False negatives

True negatives

False positives

Pooled sensitivity
(95% CI)

Pooled specificity
(95% CI)

Certainty of the evidence (GRADE)

Total hippocampus

2209

(22)

219

(192 to 240)

81

(60 to 108)

497

(455 to 539)

203

(161 to 245)

0.73

(0.64 to 0.80)

0.71

(0.65 to 0.77)

⊕⊕⊝⊝
Lowa,b

Medial temporal lobe

1077

(7)

192

(159 to 219)

108

(81 to 141)

455

(357 to 532)

245

(168 to 343)

0.64

(0.53 to 0.73)

0.65

(0.51 to 0.76)

⊕⊕⊕⊝
Moderatea,c

Lateral ventricles

1077

(5)

171

(147 to 195)

129

(105 to 153)

448

(413 to 490)

252

(210 to 287)

0.57

(0.49 to 0.65)

0.64

(0.59 to 0.70)

⊕⊕⊕⊝
Moderatea,c

Total entorhinal cortex

529

(4)

Meta‐analyses not conducted due to sparse and heterogeneous data

Range: 0.50 to 0.88

Range: 0.60 to 1.00

⊕⊝⊝⊝
Very lowa,d

Whole brain

424

(4)

Meta‐analyses not conducted due to sparse and heterogeneous data

Range: 0.33 to 0.92

Range: 0.41 to 1.00

⊕⊝⊝⊝
Very lowa,d

The table displays normalised number of participants within a hypothetical cohort of 1000 people at a prevalence of Alzheimer's disease (pre‐test probabilities) of 30%. We selected a prevalence value based on a prevalence observed in people with MCI after 2 to 3 years of follow‐up. We estimated confidence intervals based on those around the point estimates for pooled sensitivity and specificity.

ADNI: Alzheimer's Disease Neuroimaging Initiative; CI: confidence interval; MCI: mild cognitive impairment; MRI: magnetic resonance imaging

GRADE Working Group GRADES of evidence

High certainty: we are very confident that the true effect lies close to the estimate of the effect.

Moderate certainty: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.

Low certainty: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.

Very low certainty: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

aRisk of bias: most studies were at high risk of bias for participant selection (registry data), or index test or both. We downgraded the certainty of the evidence by one level.
bImprecision: wide 95% confidence intervals. We downgraded the certainty of the evidence by one level
cImprecision: wide 95% confidence intervals, however upper limit for both sensitivity and specificity are below 0.75, which is a modest performance. We did not downgrade.
dInconsistency and imprecision: sparse and inconsistent data. We downgraded the certainty of the evidence by one level both for inconsistency and imprecision.

Background

The shift from normal aging to Alzheimer's Disease dementia is a continuous process where the transitional state between normal cognition and Alzheimer's disease dementia progressively involves, to a variable extent and in different stages, episodic memory (i.e. the ability to learn and retain new information), executive functions (e.g. set‐shifting, reasoning, problem‐solving, planning), language (e.g. naming, fluency, expressive speech, and comprehension), visuospatial skills, attention and perceptual speed (Bäckman 2004). The criteria of a clinical diagnosis of probable or possible Alzheimer's disease dementia were proposed in 1984 by the National Institute of Neurological and Communicative Disorders and Stroke (NINCDS) and the Alzheimer's Disease and Related Disorders Association (ADRDA) (the NINCDS‐ADRDA criteria; Appendix 1). A diagnosis of definite Alzheimer's disease dementia requires clinical criteria for probable Alzheimer's disease and histopathologic evidence obtained from a biopsy or autopsy, but these are not applicable in daily clinical practice (McKhann 1984). The NINCDS‐ADRDA criteria were updated in 2011 by the National Institute on Aging (NIA) and the Alzheimer's Association (AA), known as the NIA‐AA criteria (McKhaan 2011). In agreement with the NINCDS‐ADRDA criteria, the NIA‐AA criteria require a significant interference in the ability to function at work or in usual daily activities. The presence of any positive biomarker (e.g. medial temporal lobe atrophy detected by MRI) is not essential for the diagnosis but is useful to investigate the "biomarker probability of AD [Alzheimer's disease] dementia etiology" (McKhaan 2011).

The NIA‐AA criteria for the diagnosis of MCI due to Alzheimer's disease dementia define MCI as the symptomatic pre‐dementia phase of Alzheimer's disease dementia, and include two sets of criteria: (1) core clinical criteria that comprise evidence of concern about a change in cognition, in comparison with the person's previous level; lower performance in one or more cognitive domains that is greater than would be expected for the patient's age and educational background; preservation of independence in functional abilities and no evidence of a significant impairment in social or occupational functioning; and (2) the use of biomarkers based on imaging and cerebrospinal fluid measures in clinical research settings (Albert 2011; Appendix 2). Single or multiple cognitive domains may be affected in a person with MCI. If memory only is affected, MCI is defined as 'amnestic'. When single or more cognitive domains different from memory are affected, MCI is defined as 'non‐amnestic'. In clinical series, people with amnestic MCI are at high risk of developing Alzheimer's disease dementia, with annual rates of progression from amnestic MCI to Alzheimer's disease dementia estimated at 10% to 15% compared with the base incidence rates of Alzheimer's disease dementia of 1% to 2% per year (Petersen 2009). In the general population also, people with amnestic MCI are at high risk of progression to Alzheimer's disease dementia over three years (Palmer 2008). Progression is high in the first few years following MCI diagnosis (Mitchell 2009). However, people diagnosed with MCI may be stable or revert to normal condition over time, while some of them may develop non‐Alzheimer's disease dementia (Palmer 2008).

People with early cognitive impairments are increasingly presenting to both primary and secondary care (NICE 2018). Since an early intervention could be more effective in delaying the development of dementia, these people may represent the suitable target for addressing future disease‐modifying therapies. A survey conducted among members of the American Academy of Neurology, who had an aging, dementia, or behavioural neurology practice focus, found that the majority of respondents recognised MCI as a clinical diagnosis and used its diagnostic code for billing purposes. When seeing these patients, most respondents routinely communicated the dementia risk and sometimes prescribed cholinesterase inhibitors (Roberts 2010). While our protocol was under development, the ClinicalTrials.gov (clinicaltrials.gov), registry contained 230 references referring to completed or ongoing trials of medication as well as non‐medication approaches for treating MCI.

In 2018 the NIA‐AA published a "Research framework: towards a biological definition of Alzheimer's disease", which defined Alzheimer's disease on the basis of biomarkers as a proxy for the neuropathology of Alzheimer's disease (Jack 2018). Recommended biomarkers are markers of amyloid deposition (A), markers of neurofibrillary tangles tau (T), and markers of neurodegeneration (N). For each category, both a cerebrospinal fluid and a neuroimaging biomarker were suggested. According to the biological definition of Alzheimer's disease, cognitive symptoms can be added to the ATN system but are not mandatory for the diagnosis. The NIA‐AA emphasised that it is premature and inappropriate to use this research framework in clinical practice. In a published comment Cochrane Dementia and Cognitive Improvement reported that the biomarkers described in the NIA‐AA framework are neither sensitive nor specific to the diagnosis of Alzheimer's disease dementia (McCleery 2019).

In this context, the objective of this review is to determine the accuracy of structural MRI for the early diagnosis of dementia due to Alzheimer's disease in people with MCI.

Target condition being diagnosed

The primary target condition is dementia due to Alzheimer's disease, a degenerative disease of the brain accounting for 60% to 80% of dementia cases. In 2019 the Alzheimer's Disease International (ADI) estimates that there were over 50 million people living with dementia globally, a number set to increase to 152 million by 2050, primarily driven by increased longevity (ADI 2019).

Index test(s)

This review assesses the diagnostic accuracy of structural MRI in detecting atrophy in the whole brain and in specific brain regions, such as the hippocampus, lateral ventricles, entorhinal cortex, amygdala, medial temporal lobe, lateral temporal lobe, medial temporal gyrus, and cortical grey matter. Structural MRI assesses the structure of the brain tissues as opposed to functional MRI, which assesses functional brain activity. Atrophy is a decrease in volume of tissues.

MRI does not involve X‐rays or the use of ionising radiation. It is non‐invasive and has no significant adverse health effects. Patients are at risk of injury from MRI if they have metal objects in their bodies, such as pacemakers, clips or metallic prostheses. Since individuals with fear of confined spaces may become anxious during MRI, the test is contraindicated in people with claustrophobia.

Clinical pathway

Alzheimer's disease dementia shows an insidious onset characterised by progressive decline of cognitive functions such as memory, thinking, comprehension, calculation, language, learning capacity and judgement that are sufficient to impair personal activities of daily living (McKhann 1984). This disease needs to be clearly differentiated from age‐related cognitive decline. The onset of Alzheimer's disease dementia is usually after 65 years of age, though earlier onset is not uncommon. As age advances, the incidence increases rapidly (it roughly doubles every five years). Since life expectancy increases in the population, the total number of individuals affected by dementia is expected to rise. Dementia due to Alzheimer's disease has economic as well as quality of life‐related consequences, not only for the patients but also for their families.

People who present with symptoms of cognitive decline generally are evaluated first by the general practitioner who obtains information from the patient or a family member. The National Institute for Health and Clinical Excellence (NICE) guideline recommends that if dementia is suspected after the initial clinical judgement, a physical examination, appropriate blood and urine tests to exclude reversible causes of cognitive decline, and a cognitive assessment should be undertaken. Moreover, if dementia is still suspected, the person should be referred to a specialist dementia diagnostic service (such as a memory clinic or community old age psychiatry service). Specialists have to confirm cognitive decline, rule out reversible causes and, when possible, diagnose the dementia subtype. Brain computed tomography (CT) or MRI should be used to rule out reversible causes of cognitive decline and to assist the subtype diagnosis, unless dementia is well established and the subtype is clear (NICE 2018).

Role of index test(s)

We evaluated the potential role of structural MRI in improving an early diagnosis of dementia due to Alzheimer's disease in people with MCI when MRI is used in addition to clinical judgement or cognitive test performance or both (add‐on test). Hippocampal atrophy measured by MRI has been qualified by the European Medicines Agency (EMA) for enrichment in regulatory clinical trials in the pre‐dementia stage of Alzheimer's disease (European Medicines Agency 2011). The Food and Drug Administration (FDA) issued a letter supporting the role of a low baseline hippocampal volume as a prognostic biomarker for enrichment (US Food and Drug Administration 2015). Hippocampal or medial temporal lobe atrophy measured on MRI has been included as a marker of neuronal injury in the recommendations of the NIA‐AA on the diagnosis of MCI due to Alzheimer's disease (Albert 2011). Although no treatment is currently available to cure MCI due to Alzheimer's disease, an early diagnosis of Alzheimer's disease dementia could be of significant support for patients and their families. For example, lifestyle interventions to prevent or postpone the onset of dementia or inclusion in clinical trials might be suggested to people with a diagnosis of MCI at risk of progression to Alzheimer's disease dementia.

Alternative test(s)

We did not include an alternative test in the review. An initial single‐test review is preliminary to conducting comparative reviews or reviews of test combinations. The accuracy of other biomarkers (cerebrospinal fluid (CSF) biomarkers, plasma biomarkers, amyloid positron imaging tomography (PET), fluorodeoxyglucose (FDG) PET for the longitudinal prediction of dementia due to Alzheimer's disease and other dementias in people with cognitive decline but no dementia are presented in other Cochrane Reviews (Ritchie 2014; Ritchie 2017, Martínez 2017; Smailagic 2015).

Rationale

MCI is considered either a risk factor or a symptomatic pre‐dementia phase. MCI represents a target to better understand mechanisms underlying dementia onset and progression, and a clinical condition to test preventive strategies or early intervention. The Lancet Commission on prevention and management of dementia reported a large body of research evidence showing that interventions for improving modifiable risk factors might have the potential to delay or prevent a third of dementia cases (Livingston 2017). Early diagnosis of dementia due to Alzheimer's disease would facilitate timely referral to education, counselling and support services for people with cognitive impairment and their carers, and would likely allow input from the patients about their care plans. An early differential diagnosis is also important to identify treatable medical causes of cognitive impairment, such as depression, metabolic conditions, cardiovascular or cerebrovascular disease. Moreover, early diagnosis would allow people to participate in treatment trials preventing or delaying cognitive decline (Livingston 2017). Currently there are more than 200 treatments under investigation (www.clinicaltrials.gov), and a large consensus exists on the hypothesis that the earlier the intervention takes place, the greater will be the protection against further neuronal damage. Disease‐modifying approaches for people with MCI require better knowledge of the accuracy of diagnostic tests that are used in clinical trials. The new criteria for the diagnosis of MCI due to Alzheimer's disease (Albert 2011), incorporate biomarkers based on imaging and CSF measures in order to increase the probability to identify MCI due to Alzheimer's disease. These biomarkers used with clinical judgement might increase the sensitivity or specificity of a testing strategy. However, biomarkers must be preliminarily assessed for individual accuracy before starting to use them as add‐on tests in clinical practice.

Objectives

To assess the diagnostic accuracy of structural MRI for the early diagnosis of dementia due to Alzheimer's disease in people with MCI versus the clinical follow‐up diagnosis of Alzheimer's disease dementia as a reference standard (delayed verification).

Secondary objectives

To investigate sources of heterogeneity in the diagnostic accuracy of structural MRI for the early diagnosis of dementia due to Alzheimer's disease in people with MCI. Potential sources of heterogeneity included the following.

  1. Setting: referral centres versus population cohorts

  2. Patient spectrum: mean or median age and amnestic versus non‐amnestic MCI

  3. Mean or median duration of follow‐up: less than three years versus three years or longer

  4. MRI region of interest: medial temporal lobe versus other structures and, if possible, hippocampus versus other structures, entorhinal cortex versus other structures, and temporoparietal regions versus other structures

  5. MRI technology: magnetic field strength less than 1 Tesla versus 1 Tesla or higher

  6. MRI techniques: visual versus manual versus automatic and semiautomatic computer‐based methods.

Methods

Criteria for considering studies for this review

Types of studies

We included studies if they:

  1. were prospective cohort studies with a clinical follow‐up as a reference standard for diagnosis of dementia due to Alzheimer's disease. In the cohort design, participants are enrolled and undergo the index test before the final outcome (presence or absence of Alzheimer's disease dementia) is known

  2. contained sufficient data to construct 2 x 2 contingency tables expressing MRI results by disease status

  3. were conducted in any healthcare setting, that is, population‐based studies or clinical settings

  4. were published in any language.

We excluded case series or case‐control studies, which lead to inflated estimates of disease prevalence and test accuracy (Lijmer 1999; Whiting 2004). We excluded retrospective studies when participants were selected through a retrospective review of clinical records. We also excluded studies reported only in abstract form or in conference proceedings for which the full text was not available and study authors did not respond to our request to clarify study eligibility.

Participants

Study participants included people with a diagnosis of MCI, based on a decline in memory objectively verified by neuropsychological tests in combination with a precise history, referred by the patient, an informant, or both (Petersen 2004). We included participants with a decline in other cognitive domains and not meeting the criteria for dementia, as defined by the Diagnostic and Statistical Manual of Mental Disorders (American Psichiatric Association 2000; American Psychiatric Association 2013). We included all subtypes of MCI (amnestic single domain, amnestic multiple domain, non‐amnestic single domain, non‐amnestic multiple domain). Since clinical criteria for the diagnosis of MCI have changed over the past 20 years, we accepted the diagnostic criteria reported by the study authors, for example, a Clinical Dementia Rating (CDR) score of 0.5 (Morris 1993), a Global Deterioration Scale score of 3 (Reisberg 1982), “questionable dementia” (Galton 2005) or “minimal dementia” (Visser 1999), "cognitive impairment, no dementia, as the presence of objective cognitive impairment in any tested domain, with performance falling between the two extremes of normality and dementia" (Graham 1997). We accepted only studies in which the MCI diagnosis was based exclusively on clinical judgement or cognitive test performance, or both. We included all people with MCI for whom clinicians would suspect initial dementia and who would undergo MRI in clinical practice (Differences between protocol and review).

We excluded studies reporting results of MRI on healthy people, or subjective cognitive decline in the absence of objective cognitive dysfunction. We ruled out papers that based MCI definition on biomarker results. Eventually, in order to avoid participants overlapping, if more studies were performed on the same database (e.g. ADNI, AddNeuroMed) and reported results for the same brain regions, we included only the paper reporting the highest number of participants.

Index tests

We assessed the diagnostic accuracy of structural MRI in detecting atrophy in the whole brain or in specific brain regions: hippocampus, medial temporal lobe, lateral ventricles, entorhinal cortex, medial temporal gyrus, lateral temporal lobe, amygdala, and cortical grey matter.

For the interpretation of the atrophy patterns, Scheltens 1992 and Ten Kate 2017b validated and reported visual rating scales and Frisoni 2017a quantitative volumetric measures. Methods of image quantification vary among research groups and are constantly being refined. A minimum set of MRI criteria for the evaluation of memory clinic patients consists of 3D T1‐weighted imaging, fluid‐attenuated inversion recovery (FLAIR), turbo‐spin or fast‐spin T2‐weighted images, diffusion‐weighted images (DWI) and T2‐weighted gradient‐recalled echo (GRE) imaging (Vernooij 2019).

We included studies that used either visual assessment or quantitative volumetric measurements, including manually outlining the brain structure and computer‐based, semi‐automated or automated segmentation methods that allow anatomical identification of areas of the brain. We included studies that used an 'automatic classifier' of MRI data only when accuracy results were based on the volume of individual brain regions. We included any strength of magnetic field, that is, 0.5, 1 or 3 Tesla.

We considered studies only if they reported diagnostic accuracy estimates per number of participants ('participant‐level' analysis) and reported data in sufficient detail for construction of 2 x 2 contingency tables.

We excluded studies that reported a single index of MRI accuracy estimate derived from multiple volumetric measures (e.g. multiple regions of interest (ROI)), because of a wide heterogeneity in the number and areas of the brain considered in such studies, or studies that reported MRI‐derived index as the spatial pattern of abnormalities for recognition of early Alzheimer's disease (SPARE‐AD).

In order to estimate accuracy of a pure volumetric index test, we excluded studies that assessed:

  1. a “mixed index test”, i.e. comprehensive of both volumetric and cortical thickness measures of the brain

  2. an "MRI‐derived index", in which volumetric data were summed or divided for other values

  3. sub‐volumes of brain regions

  4. a voxel‐based‐morphometry (VBM) test which allows to detect important information about regions of atrophy across groups, but cannot provide reliable information about single‐subject diagnosis (Teipel 2013).

We excluded studies that used more than one volumetric technique, that is, manual and automated, without reporting separated results per technique. We excluded studies that reported accuracy results of combined MRI test with other methods to diagnose dementia due to Alzheimer's disease (e.g. neuropsychological tests or genetic data). We did not include longitudinal changes of the volumes of brain regions.

Target conditions

Dementia due to Alzheimer's disease was the target condition. We excluded studies in which the diagnosis of Alzheimer's disease dementia was not the primary outcome of the study and separate data for Alzheimer's disease dementia were not available. We also excluded studies in which findings of the baseline MRI index test formed the basis of selection for the reference standard because this was likely to distort any assessment of the diagnostic value of MRI.

Reference standards

The reference standard for this review was clinical diagnosis of Alzheimer's disease dementia during the follow‐up using the NINCDS‐ADRDA criteria (McKhann 1984; delayed verification). The gold standard for the diagnosis of Alzheimer's disease dementia is biopsy or autopsy, however thi clinical diagnosis represents the best available reference standard in clinical practice. In recent years several biomarkers have been proposed in order to support the diagnosis of dementia due to Alzheimer's disease. In this regard, the updated NIA‐AA diagnostic criteria for Alzheimer's disease dementia (McKhaan 2011), are an acceptable reference standard if only the Alzheimer's disease‐core clinical criteria were used, because they substantially correspond to the NINCDS‐ADRDA criteria (McKhann 1984).

Search methods for identification of studies

We developed the search strategy in collaboration with Cochrane Dementia and Cognitive Improvement's Information Specialists, according to recommendations provided in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (De Vet 2008).

Electronic searches

We searched:

  1. Cochrane Dementia and Cognitive Improvement's Specialised Register for diagnostic test accuracy reviews;

  2. MEDLINE (Ovid SP; Ovid MEDLINE Epub Ahead of Print, In‐Process & Other Non‐Indexed Citations, Ovid MEDLINE Daily and Ovid MEDLINE) 1946 to 29 January 2019;

  3. Embase (Ovid SP) 1974 to 29 January 2019;

  4. BIOSIS Citation Index (ISI Web of Science) 1926 to 29 January 2019;

  5. Web of Science Core Collection (ISI Web of Science) 1945 to 29 January 2019;

  6. PSYCINFO (Ovid SP) 1806 to 29 January 2019;

  7. LILACS (Bireme 29 January 2019.

See Appendix 3 for a list of the sources searched and the search strategies used.

We did not apply any language restriction to the electronic searches. In addition, we did not use any methodological search filters, aimed to increase specificity, as filters currently published have been shown to lack sensitivity and can therefore miss potentially relevant studies (De Vet 2008; Doust 2005). We performed the most recent search for this review on 29 January 2019.

Searching other resources

We handsearched reference lists of all relevant publications (retrieved full‐texts of key articles and reviews).

Data collection and analysis

Selection of studies

Four teams of two review authors each (GF and GCa; AGB and CL; GC and GL; EC and GCo) independently reviewed titles and abstracts of articles identified by our search to select potentially relevant studies for inclusion. If the study eligibility was unclear from the abstract, or if no abstract was available but the title suggested a potentially relevant study, we obtained the full‐text of the article to assess eligibility for inclusion on the basis of criteria listed above under Criteria for considering studies for this review. We also screened the reference lists of systematic reviews and included articles to identify any studies missed by the electronic database search. We prepared a manual to assist review authors in the selection of studies. We solved any disagreement in selection of abstracts or full‐text articles by discussion. We stored all abstracts and full‐text articles in a database designed for the review. When articles reported on a cohort (same database) that overlapped with a cohort in another paper, we used the study with the higher sample size. We included articles reporting results on different MRI techniques on the same study population or separate results for relevant participants' subgroups. If we included more than one article from the same study authors, we assessed the absence of overlap by using the reported recruitment periods or directly contacting the study authors to clarify study eligibility. For excluded studies, we documented reasons for exclusion (Characteristics of excluded studies).

Data extraction and management

Two teams of two review authors each (AGB and EC; GL and GC) independently extracted data and solved disagreements by consensus. If required, we contacted study authors for missing data. We designed for the review a data extraction form and pilot‐tested it on five studies. The extraction form was uploaded in a Microsoft Access 2003 database. Review authors who extracted data were not blind to publishing journal, names of study authors, and institutions. We prepared a manual to assist review authors in data extraction and management. We extracted the following data from eligible studies.

  1. Study characteristics: identity number (ID), first author, country, language, year of publication, journal name, additional bibliographic references linked to the study

  2. Characteristics of study participants: multicentric study (item A0), inclusion and exclusion criteria (item A1 to A11), clinical characteristics (item A12 to A22), co‐pathologies and treatments (item A23 to A25)

  3. Features of the index test (item B1 to B5)

  4. Features of the reference standard, including the follow‐up length (item C1 to C8)

  5. Data tables and missing data (item D1‐D19)

  6. Numbers of true positives (TP), false negatives (FN), true negatives (TN) and false positives (FP) were used to construct a 2 x 2 table for the index test. If studies did not report these values, we contacted the study authors or attempted to reconstruct 2 x 2 tables from the accuracy estimates reported in the article.

  7. Notes

Assessment of methodological quality

We used QUADAS‐2, a modified version of the QUADAS (Quality Assessment of Diagnostic Accuracy Studies) tool, to assess the methodological quality of each included study (Whiting 2011). We have presented the review‐specific QUADAS‐2 tool and an explanatory document in Appendix 4. We judged each paper as having a 'low', 'high' or 'unclear' risk of bias for each of the following four domains: patient selection; index test; reference standard; flow and timing. We assessed concerns about applicability in three domains: patient characteristics and setting; index test; reference standard. We judged low‐quality studies as having high or unclear risk of bias in at least one QUADAS‐2 domain. Two review authors (GL and GC) independently assessed each included study and solved disagreement by reaching consensus. Any disagreement that could not be solved by consensus was referred to a third author (GF).

Statistical analysis and data synthesis

We used data from 2 x 2 tables of structural MRI performance (TP, FN, FP, TN) to summarise accuracy estimates of each primary study. We estimated sensitivity, specificity, positive and negative likelihood ratios (LR+ and LR−), with their 95% confidence intervals (CI). We provided graphical representation of the studies by plotting sensitivity and specificity estimates with their 95% CIs in both a forest plot and a receiver operating characteristic (ROC) space. We used the hierarchical summary ROC curve (HSROC) model proposed by Rutter and Gatsonis (Rutter 2001) and in chapter 10 of the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Macaskill 2010), to estimate pooled accuracy measures in the absence of specified thresholds as well as to investigate relative diagnostic odds ratios (DORs) in subgroup analyses (assuming parallel ROC curves in logits). We used this technique to plot the summary ROC curve, and we also calculated pooled point estimates of sensitivity and specificity, since we found studies yielded heterogeneous estimates and no clear threshold effects were apparent both graphically and statistically in analyses with more data. We used the metadas user‐written command in Statistical Analysis System (SAS) (version 9.4. SAS Institute Inc., Cary, NC, USA) statistical package for the analyses (Takwoingi 2010).

Very few studies reported MRI data extracted with both manual and automated methods and we decided post‐hoc to use manual methods in order to be consistent with the majority of the studies.

If an individual study reported results for more than one follow‐up period, we reported accuracy estimates for all the periods, but selected just the estimate from the three years' (or longer) follow‐up for the meta‐analysis. This choice is based on the assumption that the conversion rate to Alzheimer's disease dementia is higher in the first few years following MCI diagnosis and declines thereafter, and that short‐term MRI accuracy is therefore the most relevant information for patients and clinicians. Moreover, most of the data were available at two or three years of follow‐up and very few studies reported a follow‐up period of more than three years.

If estimates of sensitivity and specificity or sufficient data to construct a 2 x 2 table of test performance were not available, we wrote to the authors of the primary study requesting the individual participant data. If we received the individual participants' data, we calculated the estimates of sensitivity and specificity corresponding to the threshold nearer to the upper left point of the ROC curve. We were aware that this data‐driven method for threshold selection could lead to an overestimate of diagnostic accuracy (Leeflang 2008). However, there are no accepted thresholds to a priori define a positive MRI, and published accuracy estimates are likely to be based on data‐driven threshold selection.

If the primary study authors did not provide data (e.g. we were not able to locate contact details of study authors, we received no reply from study authors, study authors replied that the requested information was unavailable), we excluded the study from the review.

Investigations of heterogeneity

We initially assessed heterogeneity by visually examining forest plots of sensitivities and specificities and ROC plots. We planned to formally explore heterogeneity by a likelihood ratio test comparing the model without covariate with the model including the test type covariate. We stated potential sources of heterogeneity under Secondary objectives.

Sensitivity analyses

We planned to conduct sensitivity analyses to assess the impact of the methodological quality of included studies on MRI accuracy estimates excluding studies at high risk of bias (see Assessment of methodological quality). However, we were not able to do this because we judged almost all studies at high risk of bias.

Thus we decided to perform specific analysis according to the brain region and considering MRI techniques, duration of follow‐up length and age of participants as covariate.

'Summary of findings' table

We presented the main results of the review in a 'Summary of findings' table, according to recommendations described in Chapter 11 of the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (version 0.9; Bossuyt 2013). We graded the quality of evidence according to the GRADE system for diagnostic tests, considering study limitations (risk of bias), indirectness, inconsistency, imprecision, and risk of publication bias (Schunemann 2008;Schünemann 2016). According to the software GRADEpro GDT, we assigned four levels of quality of evidence: high, moderate, low, and very low.

Results

Results of the search

A flow chart describes the results of the selection process (Figure 1). The literature search identified 29,335 references. We screened titles and abstracts to exclude duplicates (n = 5064) and irrelevant studies (n = 23,962). We retrieved the full texts of the remaining 309 references and assessed them for eligibility. Ultimately, 33 studies that were eligible according to the inclusion criteria provided data for the review; we excluded 276 studies. We reported the list and descriptions of excluded studies under Characteristics of excluded studies. For 112 studies the index test was outside the inclusion criteria, reporting data for a combination of multiple volumetric measures, or a test comprehensive of both volumetric and cortical thickness measures of the brain, or a voxel‐based‐morphometry test. We excluded an additional 43 studies as they were of retrospective, case‐control, or cross‐sectional design. Thirty‐one studies were not diagnostic test accuracy studies and focused on technical aspects of the test. We excluded another 24 studies as they reported on a cohort that overlapped with a cohort in another included paper. We excluded 23 studies as they enrolled healthy participants or participants with dementia. Twenty studies presented insufficient descriptions of study results needed to construct 2 x 2 tables and we were unable to contact study authors. We could not extract data for 2 x 2 tables from 14 studies and authors did not reply to our request (Frisoni 2010a [pers comm]; Frisoni 2010d [pers comm]; Frisoni 2010f [pers comm]; Frisoni 2010g [pers comm]; Frisoni 2010k [pers comm]; Frisoni 2010m [pers comm]; Frisoni 2010n [pers comm]; Frisoni 2010o [pers comm]; Frisoni 2016a [pers comm]; Frisoni 2016b [pers comm]; Frisoni 2016c [pers comm]; Frisoni 2016e [pers comm]; Frisoni 2016f [pers comm]) or answered but provided no information (Frisoni 2012 [pers comm]). In four excluded studies, the reference standard was outside the inclusion criteria. We excluded four unpublished studies. We excluded one additional study that reported outcomes for number of MRI, not number of participants.


Figure 1. Flow of studies identified in literature search for systematic review on structural magnetic resonance imaging for an early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment

Figure 1. Flow of studies identified in literature search for systematic review on structural magnetic resonance imaging for an early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment

We reported the list and details of the included studies under Characteristics of included studies and Table 1. The 33 included studies involved 3935 participants, with a median of 43 participants per study (range 13 to 480). All the included studies were conducted at tertiary referral centres and 16 (48%) were multicentric. Nineteen studies were conducted in Europe, nine in North America, three in North America and Europe, one in Taiwan and one in Australia. The articles were published from 1999 to 2019. The median proportion of women was 53% (range 26% to 71%), and the mean age of participants ranged from 63 to 87 years (median 73 years). At baseline, participants had a median Mini Mental State Examination (MMSE) score of 27 (range 22 to 29) and a mean level of education (years of schooling) of 12 years (range 5 to 16 years). Of the 3935 participants, 1341 (34%) progressed to Alzheimer's disease dementia and 2594 (66%) did not progress. Of the participants who did not progress to Alzheimer's disease dementia, 2561 (99%) remained stable MCI and 33 (1%) converted to other types of dementia. The percentages of participants who remained stable with MCI and those who converted to other types of dementia varied among the included studies from 31% to 81%, and from 1% to 19% respectively. The mean length of follow‐up ranged from 1 to 7.6 years (median 2 years) and the percentages of participants who progressed to Alzheimer's disease dementia during follow‐up ranged from 19% to 69%. All included studies reported accuracy estimates for one follow‐up period, except Gaser 2013, who reported data at one and three years' follow‐up. We included data from the three years' follow‐up.

Open in table viewer
Table 1. Participants: sociodemographic and clinical characteristics

Study

Country

Multicentrica

Age (years)

mean ± SD

Number of participantsb

(% female)

Education (years)

mean ± SD

Baseline MMSE

mean ± SD

Mean follow‐up (years)

No. of MCI converters to AD dementia (%)

No. of stable MCI (%)

No. of MCI who converted to other dementia (%)

Carmichael 2007

USA

No

86.6 ± 5.9

29 (69%)

91.6 ± 5.5c

3.2

12 (45%)

12 (41%)

4 (14%)

Caroli 2007

Italy

No

70.2 ± 6.7

23 (43%)

9.7 ± 4.75

26.9 ± 2.0

1.6

9 (39%)

14 (61%)

1 (4%)d

Clerx 2013a

Europe

Yes

70.6 ± 7.6

328 (52%)

10.0 ± 3.8

27.0 ± 2.5

2.0

91 (28%)

225 (69%)

12 (3%)e

deToledo‐Morell 2004
USA

No

81.7 ± 6.9

27 (56%)

16.4 ± 3.1

27.3 ± 1.8

3.0

10 (37%)

17 (63%)

Devanand 2007

USA

No

66.8 ± 9.7

139 (56%)

15.2 ± 4.2

27.5 ± 2.2

3.0

35 (25%)

104 (75%)

2 (1%)d

Eckerstrom 2008
Sweden

No

67.9 ± 6.7

42 (57%)

11.4 ± 3.6

2.0

13 (31%)

21 (50%)

8 (19%)

Eckerstrom 2013
Sweden

No

69.6 ± 6.9

42 (57%)

(34 included in analysis)

10.2 ± 3.2

27.7 ± 2.6

2.0

13 (31%)

21 (50%)

8 (19%)

Erten‐Lyons 2006

USA

No

86.9 ± 6.6

37 (70%)

13.7 ± 3.7

27.3 ± 1.5

7.6

22 (59%)

14 (38%)

1 (3%)

Frolich 2017

Germany

Yes

65.7 ± 9

115 (42%)

9.5 ± 1.9

27.0 ± 2.1

2.2

28 (24%)

87 (76%)

Galton 2005

UK

No

63.7 ± 9.9

29 (48%)

26.9 ± 2.4

1.6

11 (38%)

18 (62%)

2 (6%)d

Gaser 2013

USA, Canada

Yes

75.2 ± 6.9

195 (33%)

16.0 ± 2.7

27.0 ± 1.8

3.0

133 (68%)

62 (32%)

Herukka 2008

Finland

No

71.2 ± 4.5

21 (67%)

4.2

8 (38%)

13 (62%)

Jack 2000

USA

No

77.6 ± 8.2

43 (53%)

13.6 ± 3.2

25.7 ± 3.3

3.0

18 (42%)

25 (58%)

Jang 2018

USA, Canada

Yes

71.3 ± 7.4

340 (47%)

16 (14‐18f)

29 (27‐29)f

3.0f

69 (20%)

271 (80%)

Khan 2015

USA, Canada, Europe

Yes

74.9 ± 6.9

447 (40%)

14.2 ± 4.5

27.0 ± 1.4

1.0

90 (20%)

357 (80%)

Ledig 2018

USA, Canada

Yes

74.3 in MCI c, 74.4 in MCI ncf

343 (41%)

26 in MCI c, 28 in MCI non cf

2.0

177 (52%)

166 (48%)

Liu 2010

Europe

Yes

73.6 ± 5.8

100 (53%)

9.0 ± 4.0

27.0 ± 2.0

1.0

21 (21%)

79 (79%)

Monge Argilés 2014
Spain

No

72.9 ± 6.9

30 (60%)

23.5 ± 2.0

2.0

15 (50%)

15 (50%)

Nesteruk 2016

Poland

No

63.2 ± 9.6

40 (55%)

13.9 ± 2.9

27.5 ± 1.7

2.0

9 (22%)

31 (78%)

Ong 2015

Australia

Yes

72.7 ± 6.6

45 (‐)

13.6 ± 3.7

27.3 ± 1.9

2.0

20 (44%)

21 (47%)

4 (9%)

Pereira 2014

USA, Canada, Europe

Yes

74.9 ± 7.3

480 (40%)

13.9 ± 4.6

27.0 ± 1.4

1.0

95 (20%)

385 (80%)

Platero 2019

Spain

Yes

74.1 ± 5.2

97 (63%)

8.5 ± 4.3

26.5 ± 2.7

3.0

36 (37%)

61 (63%)

Prestia 2013

Italy, Netherlands, Sweden

Yes

66.2 ± 9.4

73 (56%)

27.2 ± 1.5

2.4

29 (40%)

44 (60%)

Not reportede

Prestia 2013 (ADNI)
USA, Canada, Italy
(only data from Italy were used, see Table 2)

Yes

73.6 ± 8.6

93 (47%)

(36 included in analysis)

26.9 ± 1.7

2.7

18 (50%)

18 (50%)

Not reportede

Prieto del Val 2016 Spain

Yes

69.0 ± 7.0

34 (65%)

7.5 ± 5.7

26.6 ± 2.4

2.0

16 (47%)

18 (53%)

Rhodius‐Meester 2016

Netherlands

No

70.6 ± 7.3

171 (46%)

5.0 ± 1.0

26.7 ± 1.9

3.0f

104 (61%)

67 (39%)

23e

VanderFlier 2005
Netherlands

No

75.0 ± 7.0

15 (71%)

10.0 ± 3.0

26.0 ± 2.0

1.8

9 (60%)

6 (40%)

Visser 1999
Netherlands

No

78.8 ± 4.5

13 (54%)

7.4 ± 2.3

22.4 ± 2.3

3.0

9 (69%)

4 (31%)

Visser 2002
Netherlands

No

64.9 ± 9.5

29 (42%)

10.7 ± 3.2

27.7 ± 1.8

1.9

7 (23%)

20 (67%)

3 (1%)

Wang 2006

Taiwan

No

76.3 ± 4.0

58 (26%)

11.8 ± 4.3

25.9 ± 2.9

1.8

19 (33%)

39 (67%)

Westman 2011

Europe (6 countries)

Yes

74.0 ± 5.8

101 (52%)

8.7 ± 4.3

27.2 ± 1.6

1.0

19 (19%)

82 (81%)

Wolz 2011

USA, Canada

Yes

74.7 ± 7.9

405 (35%)

15.6 ± 3.2

27.0 ± 1.9

1.5

167 (41%)

238 (59%)

Wood 2016

UK

Yes

69.1 ± 4.5

15 (27%)

11.7 ± 1.0

27.7 ± 1.3

2.0

9 (60%)

6 (40%)

AD: Alzheimer's disease; MCI: mild cognitive impairment; MCI c: MCI converted to AD; MCI nc: MCI not converted to AD; MMSE: Mini Mental State Examination; SD: standard deviation

aAll studies were conducted at memory clinics or tertiary centres.
bNumber of participants reported in this table are those used in the meta‐analysis.
cModified Mini Mental State Examination.
dCases excluded from the analysis.
eCases excluded a priori from the study.
fMedian value was available instead of mean.

During the literature screening we assessed the eligibility of several Alzheimer's Disease Neuroimaging Initiative (ADNI) papers. ADNI is a multicenter project ongoing in 50 medical centres and university sites across the USA and Canada. The primary objective of ADNI is to collect, validate and utilise data, acquired serially over two to three years of follow‐up, including structural MRI and positron emission tomography (PET) images, genetic data, cognitive tests, cerebrospinal fluid (CSF) and blood biomarkers as predictors of Alzheimer's disease dementia. In order to avoid participants overlapping, we excluded all studies performed on the ADNI database in the same period and focusing on the same brain region (Table 2). Thus, we included ADNI studies with the larger sample size. Among the 33 included papers, we identified seven eligible ADNI studies, from which we extracted sixteen 2 x 2 contingency tables (Gaser 2013; Jang 2018; Khan 2015; Ledig 2018; Pereira 2014; Prestia 2013 (ADNI); Wolz 2011). We also applied the same selection criteria to studies performed on other databases belonging to DESCRIPA study (Development of Screening Guidelines and Clinical Criteria for Predementia Alzheimer's disease), VUmc study (University Medical Center Amsterdam) and AddNeuroMed study. Review authors focused on one to eight different brain regions (hippocampus, entorhinal cortex, amygdala, medial temporal lobe, lateral temporal lobe, lateral ventricles, medial temporal gyrus, cortical grey matter) or whole brain, using several MRI techniques.

Open in table viewer
Table 2. Included and excluded studies assessed for overlapping risk

Study

Data sets

Study period

MRI (Tesla)

MRI technique
(V, M, Aa)

MRI scale or software

MRI regionb

No. of participants

Follow‐up mean years

Participants' overlapping risk with other included studies

Decision on inclusion or exclusion of the MRI region

Bouwmann 2007

VUMC

2001‐2004

1

V‐Scheltens

MTL

59

1.6

Rhodius‐Meester 2016

Excluded

Caroli 2007

Brescia (Italy)

2002‐2005

1

1

V‐Scheltens

M‐DISPLAY

MTL

H total, H left, H right

23

23

1.6

1.6

No

No

Included

Included

Chupin 2009

ADNI

NR

1.5

1.5

sA‐SNT

A‐SACHA

H total

H total

210

210

1.5

1.5

Wolz 2011

Wolz 2011

Excluded

Excluded

Clerx 2013a

DESCRIPA +

VUMC

NR

1‐1.5

V‐Scheltens

M‐Show_Images 3.7.0

A‐LEAP

SIENAX

MTL

H total

H total

LV

328

328

328

328

2

2

2

2

No

No

No

No

Included

Included

Included

Included

Cuignet 2011

ADNI

NR

1.5

A‐SACHA

A‐Freesurfer

H total

H total

104

104

1.5

1.5

Wolz 2011

Wolz 2011

Excluded

Excluded

Dickerson 2013

ADNI

NR

1.5

A‐Freesurfer

H total

111

3

Wolz 2011

Excluded

Eckerstrom 2008

Gothenburg (Sweden)

NR

0.5

M‐Hipposegm

H total

42

2

No

Included

Eckerstrom 2013

Gothenburg (Sweden)

NR

0.5

M‐Hipposegm

H left and right

42

2

No

Included

Ewers 2012  

ADNI

NR

1.5

A‐Freesurfer

H left and right

130 (45)

2.3

Gaser 2013

Excluded

Gaser 2013

ADNI

NR

1.5

A‐Freesurfer

H left and right

H left

H right

195

195

195

1

3

3

No

No

Ledig 2018

Included

Included

Excluded

Gomar 2011

ADNI

Downloaded from ADNI on August 3, 2009

1.5

A‐Freesurfer

H left

H right

LV

WB

320

2

Gaser 2013

Ledig 2018

Ledig 2018

Ledig 2018

Excluded

Excluded

Excluded

Excluded

Gómez‐Sancho 2018

ADNI

NR

1.5

A‐Freesurfer

H total

183

3

Ledig 2018

Excluded

Heister 2011

ADNI

October 14, 2010

1.5

A‐NeuroQuant

H total

192

3

Wolz 2011

Excluded

Jang 2018

ADNI

Data downloaded in December 2017

3

V‐CVRS scale

(Sheltens for MTL)

MTL

GCA (more than one region)

LV

340

340

340

3

Pereira 2014

No

No

Excluded

Excluded

Included

Khan 2015c

ADNI +

AddNeuroMed

NR

1.5

A‐Freesurfer

H total

447

1

Wolz 2011 and Liu 2010

Included

Landau 2010

ADNI

NR

1.5

A‐Freesurfer

H total

85

2

Wolz 2011

Excluded

Ledig 2018

ADNI

NR

1.5‐3

A‐MALPEM

H total

H right

EC total

A left

A right

A total

MTG

WB

LV

cGM

343

343

343

343

343

343

343

343

343

343

2

2

2

2

2

2

2

2

2

2

Wolz 2011

Gaser 2013

No

No

No

No

No

No

No

No

Excluded

Included

Included

Included

Included

Included

Included

Included

Included

Included

Lehman 2013

ADNI

Downloaded from ADNI in June 2011

1.5

V‐Scheltens

MTL

394

3

Pereira 2014

Excluded

Lillemark 2014

ADNI

NR

1.5

A‐Freesurfer

WB

H total

240

1

Ledig 2018

Wolz 2011

Excluded

Excluded

Liu 2010c

AddNeuroMed

NR

1.5

A‐Fischl

H total

100

1

Khan 2015

Included

Liu 2013  

ADNI

NR

NR

V‐Scheltens

MTL

387

3

Pereira 2014

Excluded

Minhas 2017

ADNI

NR

1.5

A‐Freesurfer

H total

EC total

LV

52

52

52

3

3

3

Wolz 2011

Ledig 2018

Ledig 2018

Excluded

Excluded

Excluded

Pereira 2014

ADNI

AddNeuroMed

NR

1.5‐3

V‐Scheltens

MTL

480

1

No

Included

Prestia 2013

Brescia (Italy) +

VUMC +

Stockholm

NR

1

1.5

3

A‐Freesurfer

H total (the smallest between left and right H)

73

2.4

No

Included

Prestia 2013 (ADNI)

ADNI

ADNI

Brescia

Brescia

NR

NR

From 2006

From 2006

1.5‐3

1.5‐3

1

1

A‐Freesurfer

sA‐SNT

A‐Freesurfer

M‐DISPLAY

H total

H total

H total

H total

57

57

36

36

3

3

2.2

2.2

Wolz 2011

Wolz 2011

Prestia 2013

No

Excluded

Excluded

Included

Included

Prestia 2015

Brescia +

VUMC +

Stockholm

NR

1

1.5

3

A‐Freesurfer

H total

73

2.4

Prestia 2013

Excluded

Rhodius‐Meester 2016

VUMC

2000‐2012

1‐1.5

V‐Scheltens

MTL

171

3

No

Included

Sørensen 2016

ADNI

28 September 2012

1.5

A‐Freesurfer

H total

233

2

Wolz 2011

Excluded

Suppa 2015a

ADNI

NR

1.5

A‐VBM+mask

H total

198

1

2

3

Wolz 2011

Excluded

Tang 2015

ADNI

NR

1.5

A‐Freesurfer

H total

222

3

Wolz 2011

Excluded

VanderFlier 2005

VUMC

NR

1.5

M‐DISPLAY

H total, H left, H right, MTL total, MTL left, MTL right

15

1.8

No

Included

Varon 2015

ADNI

27 June 2013

1.5

A‐FreeSurfer

V‐Sheltens

A‐FreeSurfer

H total

MTA

EC total

89

3.2

Wolz 2011

Pereira 2014

Ledig 2018

Excluded

Excluded

Excluded

Vasta 2016

ADNI

NR

1.5

A‐Freesurfer

H total

121

1.5

Wolz 2011

Excluded

Visser 1999

AMSTEL study

NR

0.6

M‐developed in house software

H total

LTL

13

3

No

No

Included

Visser 2002

Maastricht Memory Clinic

NR

1.5

V‐Sheltens

M‐ShowImage

MTL

H total

30

30

1.9

1.9

No

No

Included

Included

Vos 2012

DESCRIPA+

VUMC

2003‐2005

1‐1.5

A‐LEAP

H total

153

2

Clerx 2013a

Excluded

Westman 2011

AddNeuroMed

NR

1.5

V‐Scheltens

M‐HERMES software

MTL

H total

101

101

1

1

Pereira 2014

No

Excluded

Included

Wolz 2011c

ADNI

Follow‐up stopped in 2011

1.5

A‐Lotjonen (fast and robust multi‐atlas segmentation)

H total

405

1.5

Khan 2015

Included

Yang 2012

ADNI

NR

1.5

1.5

A‐Freesurfer

A‐Freesurfer

H total

LV

111

111

2

2

Wolz 2011

Ledig 2018

Excluded

Excluded

Yu 2012

ADNI

June 2010

1.5

NR

EC

LV

H left

H right

63

63

63

63

2

Ledig 2018

Zhang 2012b

ADNI

NR

3

V‐Scheltens

MTL

53

2

Pereira 2014

Excluded

ADNI: Alzheimer's Disease Neuroimaging Initiative; MMSE: Mini Mental State Examinaation; NR: not reported; SD: standard deviation; VUMC: University Medical Centre, Amsterdam

aMRI technique: V: visual; M: manual; A: automated
bMRI region: A: amygdala; cGM: cortical grey matter; EC: entorhinal cortex; GCA: global cortical atrophy; H: hippocampus; MTL: medial temporal lobe; LV: lateral ventricles; MTG: medial temporal gyrus; WB: whole brain.
cUncertain risk of overlap between these studies (Khan 2015 did not specify the number of participants in both ADNI and AddNeuroMed studies).

In response to our request (Frisoni 2010b [pers comm]; Frisoni 2010c [pers comm]; Frisoni 2010e [pers comm]; Frisoni 2010h [pers comm]; Frisoni 2010i [pers comm]; Frisoni 2010j [pers comm]; Frisoni 2010l [pers comm]; Frisoni 2010p [pers comm]; Frisoni 2010q [pers comm]; Frisoni 2010r [pers comm]; Frisoni 2010s [pers comm]; Frisoni 2016d [pers comm]; Frisoni 2017b [pers comm]), the authors of 12 included studies sent us the data needed to complete 2 x 2 tables (Carmichael 2007; Caroli 2007; deToledo‐Morell 2004; Devanand 2007; Eckerstrom 2008; Erten‐Lyons 2006; Herukka 2008; Jack 2000; Prestia 2013; VanderFlier 2005; Visser 2002; Wang 2006).

Fifteen studies analysed one brain region, six studies considered two regions, and the remaining studies considered three or more than three regions (Table 3).Twenty‐four studies measured volume of brain regions with quantitative manual or automated methods, four studies used visual and quantitative methods, five studies used only the visual method (Table 3). Studies generally measured the volume of the hippocampus and the entorhinal cortex with a quantitative manual method, whereas they mainly used a visual method based on the Scheltens scale (Scheltens 1992; Scheltens 1997), to measure medial temporal lobe atrophy. The choice of a threshold value for the medial temporal lobe atrophy was heterogeneous between studies. Three study authors (Caroli 2007; Clerx 2013a; Visser 2002), did not specify a cut‐off value, while Pereira 2014 and Rhodius‐Meester 2016 used an averaged left and right medial temporal lobe cut‐off value of 1.5 or more. One study (Monge Argilés 2014), considered a cut‐off based on the sum of left and right medial temporal lobe atrophy scores (≥ 3.0). The most used software for the manual assessment of the brain region volume was DISPLAY (Caroli 2007; Prestia 2013 (ADNI); VanderFlier 2005) and for the automated assessment it was Freesurfer (Gaser 2013; Khan 2015; Nesteruk 2016; Prestia 2013; Prieto del Val 2016). Some study authors used software developed in house. The main manufacturers of MRI scanners used in the included studies were Philips, Siemens and General Electric (29 of the 33 included studies). Several studies, such as ADNI studies, used all three manufacturers. Two studies used Toshiba and Technicare; two studies did not report manufacturers. Twenty‐six (79%) of the included studies performed the MRI at 1.5 Tesla, one study at 3.0 Tesla (Jang 2018), and two studies at 0.5 Tesla (Eckerstrom 2008; Eckerstrom 2013). Only Visser 1999 used MRI at 0.6 Tesla. One study did not report this information.

Open in table viewer
Table 3. Index test: description and common abbreviations

Study

Manufacturer of MRI scanners

Field strength (Tesla)

Brain regionsa

MRI‐B or MRI‐Lb

Technique: visual; quantitative manual; quantitative semi‐automated or automated

Carmichael 2007c

General Electric

1.5

LV, WB

MRI‐B + MRI‐L

Quantitative automated

Caroli 2007c

General Electric

1.0

H left, H right, H total

MRI‐B

Quantitative manual

MTL

MRI‐B

Visual

Clerx 2013a

Siemens, Philips

1 or 1.5

H total

MRI‐B

Quantitative manual

H total

MRI‐B

Quantitative automated

MTL

MRI‐B

Visual

LV

MRI‐B

Quantitative automated

deToledo‐Morell 2004c

General Electric

1.5

H total, EC total

MRI‐B

Quantitative manual

Devanand 2007c

General Electric

1.5

H left, H right, H total, EC left,

EC right, EC total

MRI‐B

Quantitative manual

Eckerstrom 2008c

Philips

0.5

H total

MRI‐B + MRI‐L

Quantitative manual

Eckerstrom 2013

Philips

0.5

H left, H right

MRI‐B

Quantitative manual

Erten‐Lyons 2006c

Not reported

1.5

H total, LV, WB

MRI‐B + MRI‐L

Quantitative semiautomated

Frolich 2017

Siemens, Philips

1.5

H total

MRI‐B

Quantitative automated

Galton 2005

General Electric

1.5

H left, H right, LTL right

MRI‐B

Visual

Gaser 2013

Several (ADNI scanners)

1.5

H left

H right

MRI‐B

Quantitative automated

Quantitative automatedd

Herukka 2008c

Siemens

1.5

H left, H right, H total, EC left,

EC right, EC total

MRI‐B

Quantitative manual

Jack 2000c

General Electric

1.5

H total

MRI‐B + MRI‐L

Quantitative manual

Jang 2018

Several (ADNI scanners)

3

MTL

MRI‐B

Visuald

GCA

MRI‐B

Visuald

LV

MRI‐B

Visual

Khan 2015

Several (ADNI and AddNeuroMed scanners)

1.5

H total

MRI‐B

Quantitative automated

Ledig 2018

Several (ADNI scanners)

1.5‐3

H total

MRI‐B

Quantitative automatedd

H right, A total, A left, A right, MTG, EC total, WB, LV, cGM

MRI‐B‐MRI‐L

Quantitatve automated

Liu 2010

Several (ADNI and AddNeuroMed scanners)

1.5

H total

MRI‐B

Quantitative automated

Monge Argilés 2014

General Electric

1.5

MTL

MRI‐B

Visual

Nesteruk 2016

Toshiba

1.5

H left, H right, EC left

MRI‐B

Quantitative automated

Ong 2015

Not specified

Not reported

H total

MRI‐B + MRI‐L

Quantitative automated

Pereira 2014

Several (ADNI and AddNeuroMed scanners)

1.5 or 3

MTL

MRI‐B

Visual

Platero 2019

General Electric

1.5

H total

MRI‐B

Quantitative automated

Prestia 2013 (ADNI)

Several (ADNI scanners)

1.5 or 3

H total

MRI‐B

Quantitative automatedd and semiautomatedd

Philips (TOMC)

1.0

H total

MRI‐B

Quantitative manual and automated

Prestia 2013c

PHILIPS, Siemens (TOMC, VUmc, KUHH)

1.0 or 1.5 or 3.0

H total (the smallest between left and right volumes)

MRI‐B

Quantitative automated

Prieto del Val 2016

Philips

1.5

A right

MRI‐B

Quantitative automated

Rhodius‐Meester 2016

Siemens, General Electric

1.0 or 1.5

MTL

MRI‐B

Visual

VanderFlier 2005c

Philips

1.5

H left, H right, H total, MTL left, MTL right, MTL total

MRI‐B

Quantitative manual

WB

MRI‐B

Quantitative semiautomated

Visser 1999c

Teslacon II (Technicare)

0.6

H total, LTL

MRI‐B

Quantitative manual

Visser 2002c

Philips

1.5

H total

MRI‐B

Quantitative manual

MTL

MRI‐B

Visual

Wang 2006c

Siemens

1.5

H left, H right, H total, A left,

A right, A total

MRI‐B

Quantitative manual

Westman 2011

Several (AddNeuroMed scanners)

1.5

H total

MRI‐B

Quantitative manual

> 1 region

MRI‐B

Quantitative automatedd

MTL

MRI‐B

Visuald

Wolz 2011

Several (ADNI scanners)

1.5

H total

MRI‐B

Quantitative automated

Wood 2016

Siemens

1.5

H total

MRI‐B

Quantitative automated

aA: amygdala; cGM: cortical grey matter; EC: entorhinal cortex; GCA: global cortical atrophy; H: hippocampus; MTL: medial temporal lobe; LTL: lateral temporal lobe; LV: lateral ventricles; WB: whole brain.

bMRI‐B: MRI‐baseline; MRI‐L: MRI‐longitudinal.
cData received from the study authors.
dData not used for the analysis (see Table 2).

The majority of the included studies used the NINCDS ADRDA criteria as a reference standard (McKhann 1984). Three studies (Nesteruk 2016; Rhodius‐Meester 2016; Wood 2016), used the NIA‐AA diagnostic criteria (McKhaan 2011).

Methodological quality of included studies

We present the details on the quality of included studies in the QUADAS‐2 results summary (Figure 2). We judged all studies as low quality because we rated them all as having at least one domain with high or unclear risk of bias.


Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Participant selection

Only one study (VanderFlier 2005), demonstrated low risk of participant selection bias, we judged one study (Visser 1999), at unclear risk of bias, and the other included studies demonstrated high risk. Non‐consecutive enrolment or the use of registry data, such as in ADNI or AddNeuroMed studies, which, despite being prospective registries, imposed specific participant selection criteria, such as the availability of multiple biomarkers, were the main reasons for assessment of high risk of bias. Moreover, in prospective registries it was unclear if inappropriate exclusions (e.g. depression, vascular lesions on MRI) were avoided. Absence of a clear definition of inclusion and exclusion criteria was the reason for assessment of high risk of bias in other included cohort studies.

Index test

We judged 24 (73%) studies at high risk of bias, six (18%) at unclear risk, and three (9%) (Galton 2005; Monge Argilés 2014; Pereira 2014), at low risk of bias for this domain. Overall, 30 studies did not provide sufficient details on the index test, either because of a lack of a clear, pre‐specified definition of what was considered to be a 'positive' result of the MRI or lack of blinding of radiologists to the reference standard, or both. Twenty‐four studies had unclear criteria for a positive MRI result. Only nine studies reported a clear, pre‐specified definition of a positive MRI result (Galton 2005; Khan 2015; Monge Argilés 2014; Ong 2015; Pereira 2014; Prestia 2013; Prestia 2013 (ADNI); Rhodius‐Meester 2016; Wolz 2011).

We specified a threshold mainly for MRI assessed by the visual method and the automated method, and chose a cut‐off value for the visual method according to the Scheltens scale (Scheltens 1992; Scheltens 1997), but cut‐offs were set at different levels between studies, for example, a cut‐off of 1.5 or higher, based on the mean medial temporal lobe atrophy scores of both hemispheres (Pereira 2014; Rhodius‐Meester 2016), or a cut‐off of 3 or higher, based on the sum of right and left medial temporal lobe atrophy (Monge Argilés 2014). Clerx 2013a, Frolich 2017, Prieto del Val 2016 and Wood 2016 used criteria for a positive manual MRI test based on the Youden's index, which has the advantage of being a single measure, but it loses the distinction between false positives and false negatives (Hilden 1996).

Blinding of radiologists to the clinical diagnosis of Alzheimer's disease dementia was unclear in 18 (54%) studies. To evaluate blinding, we applied the same criteria for visual, manual, or automated methods. However, we acknowledge that the absence of blinding of radiologist or the interpretation of MRI by different radiologists for different participants may be considered less severe when the automated method was used rather than the visual or manual methods. In three studies (Clerx 2013a; Erten‐Lyons 2006; Jang 2018), more than one radiologist interpreted MRI scans for different participants, whereas in sixteen studies it was unclear if one or more radiologists interpreted MRI separately or in a joint session. Thirteen studies assessed interobserver or intraobserver variability in the whole cohort or in a subset of randomly selected participants (Clerx 2013a; deToledo‐Morell 2004; Devanand 2007; Eckerstrom 2008; Eckerstrom 2013; Erten‐Lyons 2006; Herukka 2008; Jang 2018; Monge Argilés 2014; Rhodius‐Meester 2016; VanderFlier 2005; Visser 2002; Westman 2011).

Reference standard

Twenty‐nine (88%) studies were at low risk of bias in the 'reference standard' domain and we classified four as unclear risk. We judged one study (Prestia 2013 (ADNI), at unclear risk of bias because baseline biomarker results of participants were available to clinicians who diagnosed Alzheimer's disease dementia. As specified in the Methods section, we accepted the new diagnostic criteria for dementia due to Alzheimer's disease (McKhaan 2011), if only the Alzheimer's disease core clinical criteria were used, as in Rhodius‐Meester 2016. When this information was not available, we judged the included study at unclear risk of bias and unclear concern about the incorporation of MRI into the diagnosis of Alzheimer's disease during follow‐up (incorporation risk) (Nesteruk 2016; Wood 2016). We judged one study (Erten‐Lyons 2006), at unclear risk of bias because of insufficient information regarding the reference standard.

Flow and timing

Nine (27%) studies (Carmichael 2007; Caroli 2007; Devanand 2007; Galton 2005; Platero 2019; Prestia 2013 (ADNI); VanderFlier 2005; Visser 1999; Visser 2002), were at high risk of bias in the 'flow and timing' domain, two (6%) were at unclear risk, and 22 (67%) were at low risk. We classified a study as having high risk of bias when study authors did not adequately explain withdrawals or losses to follow‐up, or study authors excluded from the analysis participants who progressed to non‐Alzheimer's disease dementia. Erten‐Lyons 2006 did not report if all participants received the same reference standard and we judged it at unclear risk. Eckerstrom 2013 was at unclear risk of bias because it did not specify if non‐Alzheimer's disease dementia cases were included in the analysis. The median interval between MRI test and reference standard was two years (range 1 to 7.6 years).

Concerns regarding applicability

We had no concerns for any studies about applicability in the 'patient selection' and 'index test' domains. Participants and the index text in the included studies did not differ from those targeted by the review question. Three studies (Erten‐Lyons 2006; Nesteruk 2016; Wood 2016), demonstrated unclear concern for the 'reference standard' domain and the remaining 30 studies demonstrated low concern.

Findings

We have presented findings under five main brain regions (summary of findings Table).

Hippocampus

Twenty‐two studies (Caroli 2007; Clerx 2013a; deToledo‐Morell 2004; Devanand 2007; Eckerstrom 2008; Erten‐Lyons 2006; Frolich 2017; Herukka 2008; Jack 2000; Khan 2015; Liu 2010; Ong 2015; Platero 2019; Prestia 2013; Prestia 2013 (ADNI); VanderFlier 2005; Visser 1999; Visser 2002; Wang 2006; Westman 2011; Wolz 2011; Wood 2016), which included a total of 2209 participants (687 (31%) of whom progressed to Alzheimer's disease dementia), measured the total hippocampal volume. The studies used different MRI techniques: manual (11 studies, 512 participants); semiautomatic or automatic (nine studies, 1334 participants); manual and semiautomatic or automatic (two studies; 421 participants). The overall sample size ranged from 13 participants (Visser 1999), to 447 participants (Khan 2015). Sensitivity ranged from 0.28 to 1.00, while specificity ranged from 0.43 to 0.94. Forest plots demonstrated a high degree of heterogeneity and wide confidence intervals for estimates of both sensitivity and specificity between the included studies (Figure 3). Two studies used two techniques for total hippocampal volume, thus we chose the manual technique for these studies (Clerx 2013a; Prestia 2013 (ADNI)), for consistency with other studies. The mean sensitivity and specificity (summary operating point) were 0.73 (95% CI 0.64 to 0.80) and 0.71 (95% CI 0.65 to 0.77) respectively (Figure 4). Positive likelihood ratio was 2.53 (95% CI 2.09 to 3.06) while negative likelihood ratio was 0.38 (95% CI 0.29 to 0.50). The certainty of the evidence (summary of findings Table) was low for both sensitivity and specificity due to risk of bias (−1) and inconsistency due to heterogeneous study results (−1).


Forest plot of total hippocampal volume measured by structural magnetic resonance imaging (MRI) for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Plot shows study‐specific estimates of sensitivity and specificity (squares) with 95% confidence interval (black line) and study. Studies are ordered according to the estimates of sensitivity. TP: true positive; FP: false positive; FN: false negative; TN: true negative

Forest plot of total hippocampal volume measured by structural magnetic resonance imaging (MRI) for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Plot shows study‐specific estimates of sensitivity and specificity (squares) with 95% confidence interval (black line) and study. Studies are ordered according to the estimates of sensitivity. TP: true positive; FP: false positive; FN: false negative; TN: true negative


Summary receiver operating characteristic (ROC) plot of total hippocampus volume measured by structural magnetic resonance imaging (MRI) for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Each point represents the pair of sensitivity and specificity from a study. The solid black circle represents the pooled sensitivity and specificity, which is surrounded by a 95% confidence region (dashed line)

Summary receiver operating characteristic (ROC) plot of total hippocampus volume measured by structural magnetic resonance imaging (MRI) for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Each point represents the pair of sensitivity and specificity from a study. The solid black circle represents the pooled sensitivity and specificity, which is surrounded by a 95% confidence region (dashed line)

Investigation of heterogeneity

Potential sources of heterogeneity are outlined under the section Secondary objectives. We were able to assess the impact of heterogeneity on the MRI diagnostic accuracy for MRI technique, duration of follow‐up and age of participants. We have reported the numbers of participants in the subgroup analyses in Table 4. Because very few studies reached a follow‐up of more than three years, we used the cut‐off value for subgroup analyses on follow‐up time at 'less than three years' versus 'at least three years'. All these comparisons were between studies or indirect.

Open in table viewer
Table 4. Numbers of participants in subgroup analysis

Brain region

Number of studies

Sample size

Converted to AD dementia (%)

Sensitivity (95% CI)

Specificity (95% CI)

LR+ (95% CI)

LR‐ (95% CI)

Hippocampus totala

22

2209

687 (31%)

0.73
(0.64 to 0.80)

0.71
(0.65 to 0.77)

2.53

(2.09 to 3.06)

0.38

(0.29 to 0.50)

Automatic or semiautomatic technique

11

1698

531 (31%)

0.59

(0.48 to 0.70)

0.66

(0.56 to 0.74)

1.72

(1.25 to 2.36)

0.62

(0.46 to 0.85)

Manual technique

13

551

156 (31%)

0.82

(0.69 to 0.90)

0.74

(0.67 to 0.81)

3.21

(2.42 to 4.27)

0.31

(0.14 to 0.70)

≥ 3 years' follow‐up

8

413

156 (38%)

0.71

(0.54 to 0.84)

0.76

(0.67 to 0.82)

2.94

(2.11 to 4.11)

0.38

(0.22 to 0.64)

< 3 years' follow‐up

14

1796

513 (29%)

0.74

(0.65 to 0.81)

0.69

(0.61 to 0.76)

2.39

(1.90 to 3.00)

0.31

(0.21 to 0.47)

≥ 70 years old

16

1796

566 (32%)

0.73
(0.64 to 0.81)

0.69
(0.62 to 0.75)

3.10
(2.15 to 4.48)

0.41
(0.16 to 1.03)

< 70 years old

6

413

121 (29%)

0.72
(0.54 to 0.84)

0.77
(0.67 to 0.84)

3.10
(2.15 to 4.48)

0.41
(0.16 to 1.03)

Hippocampus left

8

359

113 (31%)

0.71

(0.62 to 0.79)

0.76

(0.67 to 0.83)

2.95

(2.14 to 4.06)

0.38

(0.28 to 0.51)

Hippocampus right

8

359

113 (31%)

0.81

(0.73 to 0.88)

0.71

(0.61 to 0.80)

2.82

(2.01 to 3.96)

0.23

(0.11 to 0.46)

Medial temporal lobe total

7

1077

330

0.64

(0.53 to 0.73)

0.65

(0.51 to 0.76)

1.81

(1.41 to 2.32)

0.56

(0.46 to 0.67)

Enthorinal cortex total

4

529

229

range: 0.50 to 0.88

range: 0.60 to 1.00

Not computed since no meta‐analyses was conducted

Lateral ventricles

5

1077

371

0.57

(0.49 to 0.65)

0.64

(0.59 to 0.70)

1.61

(1.39 to 1.87)

0.66

(0.57 to 0.78)

Whole brain

4

424

220

range: 0.33 to 0.92

range: 0.41 to 1.00

Not computed since no meta‐analyses was conducted

AD: Alzheimer's disease; CI: confidence interval; LR+: positive likelihood ratio; LR‐: negative likelihood ratio.

a Two studies (Clerx 2013a; Prestia 2013 (ADNI)) used both manual and automatic techniques for total hippocampal volume.

Sensitivity was 0.82 (95% CI 0.69 to 0.90) for manual technique (13 studies) and 0.59 (95% CI 0.48 to 0.70) for automatic or semiautomatic technique (11 studies); specificity was 0.74 (95% CI 0.67 to 0.81) and 0.66 (95% CI 0.56 to 0.74), respectively (Table 4). The relative DOR was 4.83 (95% CI 1.82 to 12.8), suggesting better accuracy with the manual compared with the automatic technique.

Sensitivity was 0.74 (95% CI 0.65 to 0.81) for less than three years' follow‐up (14 studies), and 0.71 (95% CI 0.54 to 0.84) for at least three years' follow‐up (8 studies); specificity was 0.69 (95% CI 0.61 to 0.76) and 0.76 (95% CI 0.67 to 0.82), respectively. No difference in accuracy was found (relative DOR 0.80, 95% CI 0.34 to 1.92) for the longer versus shorter follow‐up).

Sensitivity was 0.72 (95% CI 0.54 to 0.84) for studies including patients with a mean age of less than 70 years (6 studies), and 0.73 (95% CI 0.64 to 0.81) for a mean age of 70 years or more (16 studies); specificity was 0.77 (95% CI 0.67 to 0.84) and 0.69 (95% CI 0.62 to 0.75), respectively. We found no difference in accuracy (relative DOR 1.38, 95%CI 0.57 to 3.31) for younger versus older age.

We were not able to explore the effects of amnestic versus non‐amnestic MCI, medial temporal lobe versus other structures, hippocampus versus other structures, entorhinal cortex versus other structures, or temporoparietal regions versus other structures, as potential sources of heterogeneity, because studies providing these data were too few to make this a meaningful analysis. Furthermore, we performed no assessment of heterogeneity for setting and MRI Tesla because all included participants had been referred to tertiary centres and the majority of the included studies used a magnetic field strength of 1.5 Tesla.

Direct comparisons of left and right hippocampal volumes

Seven studies, including 298 participants, made a direct comparison of the left and right hippocampal volumes that were measured by manual technique (Caroli 2007; Devanand 2007; Eckerstrom 2013; Herukka 2008; VanderFlier 2005; Wang 2006) or automatic technique (Nesteruk 2016). Galton 2005, including 29 participants, used a visual technique and found sensitivity and specificity of 0.64 (95% CI 0.31 to 0.89) and 0.89 (95% CI 0.65 to 0.99) for the right hippocampus and 0.91 (95% CI 0.59 to 1.00) and 0.89 (95% CI 0.65 to 0.99) for the left hippocampus, respectively. We considered that the visual technique should not be pooled with quantitative manual or automatic techniques and excluded this study from direct comparisons of right versus left hippocampus, as follows.

  1. Left hippocampal volume: sensitivity varied from 0.44 to 0.89, while specificity varied from 0.64 to 1.00 (Data table 2; Data table 3). The mean sensitivity and specificity (summary operating point) were, respectively, 0.71 (95% CI 0.62 to 0.79) and 0.76 (95% CI 0.67 to 0.83; Figure 5). Positive likelihood ratio was 2.95 (95% CI 2.14 to 4.06) while negative likelihood ratio was 0.38 (95% CI 0.28 to 0.51).

  2. Right hippocampal volume: sensitivity ranged from 0.61 to 1.00, specificity from 0.43 to 0.81 (Figure 6). The mean sensitivity and specificity were, respectively, 0.81 (95% CI 0.73 to 0.88) and 0.71 (95% CI 0.61 to 0.80; Figure 5). Positive likelihood ratio was 2.82 (95% CI 2.01 to 3.96) while negative likelihood ratio was 0.23 (95% CI 0.11 to 0.46).


Summary receiver operating characteristic curve (ROC) presenting direct comparisons of hippocampus left and hippocampus right

Summary receiver operating characteristic curve (ROC) presenting direct comparisons of hippocampus left and hippocampus right


Summary receiver operating characteristic (ROC) plot of total medial temporal lobe volume measured by structural MRI for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Each point represents the pair of sensitivity and specificity from a study. The solid black circle represents the pooled sensitivity and specificity, which is surrounded by a 95% confidence region (dashed line)

Summary receiver operating characteristic (ROC) plot of total medial temporal lobe volume measured by structural MRI for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Each point represents the pair of sensitivity and specificity from a study. The solid black circle represents the pooled sensitivity and specificity, which is surrounded by a 95% confidence region (dashed line)

Figure 5 shows the paired ROC plot with studies directly comparing right versus left hippocampus. The relative DOR suggested no overall difference in accuracy (1.37, 95% CI 0.63 to 2.99).

Medial temporal lobe

Seven studies (Caroli 2007; Clerx 2013a; Monge Argilés 2014; Pereira 2014; Rhodius‐Meester 2016; VanderFlier 2005; Visser 2002), assessed the volume of the total medial temporal lobe for a total number of 1077 participants (330 (31%) of whom progressed to Alzheimer's disease dementia; Data table 4). Six studies used a visual method and one (VanderFlier 2005), a quantitative manual method. The smallest study (VanderFlier 2005), recruited 15 participants, while the largest study (Pereira 2014), enrolled 480 participants. Sensitivities and specificities ranged from 0.40 to 0.86 and from 0.44 to 0.85, respectively. The mean sensitivity and specificity (summary operating point) were 0.64 (95% CI 0.53 to 0.73) and 0.65 (95% CI 0.51 to 0.76), respectively (Figure 6). Positive likelihood ratio was 1.81 (95% CI 1.41 to 2.32) while negative likelihood ratio was 0.56 (95% CI 0.46 to 0.67). The certainty of the evidence (summary of findings Table) was moderate for both sensitivity and specificity due to risk of bias (−1), but we did not downgrade for imprecision since confidence intervals were large, but still their upper limit was below 0.75 for both sensitivity and specificity, which is a modest performance.

VanderFlier 2005 analysed separately left and right medial temporal lobe and reported sensitivity and specificity of 0.89 (95% CI 0.52 to 1.00) and 0.33 (95% CI 0.04 to 0.78) for the left lobe, 0.22 (95% CI 0.03 to 0.60) and 1.00 (95% CI 0.54 to 1.00) for the right lobe (Data table 5; Data table 6).

Lateral ventricles

Five studies (Carmichael 2007; Clerx 2013a; Erten‐Lyons 2006; Jang 2018; Ledig 2018), measured the volume of the lateral ventricles for a total of 1077 participants (371 (34%) of whom progressed to Alzheimer's disease dementia; Data table 7). Four studies used an automatic or semi‐automatic technique and one study (Jang 2018), used a visual method. The smallest study (Carmichael 2007), recruited 29 participants and the largest study (Ledig 2018), recruited 343 participants. Sensitivities and specificities ranged from 0.51 to 0.75 and from 0.47 to 0.73, respectively. The mean sensitivity and specificity (summary operating point) were 0.57 (95% CI 0.49 to 0.65) and 0.64 (95% CI 0.59 to 0.70) respectively (Figure 7). Positive likelihood ratio was 1.61 (95% CI 1.39 to 1.87) while negative likelihood ratio was 0.66 (95% CI 0.57 to 0.78). The certainty of the evidence (summary of findings Table) was moderate for both sensitivity and specificity due to risk of bias (‐1), but was not downgraded for imprecision since confidence intervals were large, but still their upper limit was below 0.75 for both sensitivity and specificity, which is a modest performance.


Summary receiver operating characteristic (ROC) plot of volume of lateral ventricles measured by structural magnetic resonance imaging (MRI) for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Each point represents the pair of sensitivity and specificity from a study. The solid black circle represents the pooled sensitivity and specificity, which is surrounded by a 95% confidence region (dashed line)

Summary receiver operating characteristic (ROC) plot of volume of lateral ventricles measured by structural magnetic resonance imaging (MRI) for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Each point represents the pair of sensitivity and specificity from a study. The solid black circle represents the pooled sensitivity and specificity, which is surrounded by a 95% confidence region (dashed line)

Entorhinal cortex

Four studies (deToledo‐Morell 2004; Devanand 2007; Herukka 2008; Ledig 2018), measured the volume of the total entorhinal cortex for a total of 529 participants (229 (43%) of whom progressed to Alzheimer's disease dementia). Three studies used a manual method and one study (Ledig 2018), used an automated method. The smallest study (Herukka 2008) recruited 21 participants, while the largest study (Ledig 2018) recruited 343 participants. Sensitivities and specificities ranged from 0.50 to 0.88 and from 0.60 to 1.00, respectively. We did not do a meta‐analysis because of sparse and heterogeneous data, which were suggestive of moderate accuracy. The certainty of the evidence (summary of findings Table) was very low due to risk of bias (−1), imprecision (−1) and inconsistency due to heterogenous study results (−1).

Whole brain

Four studies (Carmichael 2007; Erten‐Lyons 2006; Ledig 2018; VanderFlier 2005) measured the whole brain volume for a total of 424 participants (220 (52%) of whom progressed to Alzheimer's disease dementia). The four studies used an automatic or semi‐automatic technique. The smallest study (VanderFlier 2005) recruited 15 participants, while the largest study (Ledig 2018) recruited 343 participants. Sensitivities and specificities ranged from 0.33 to 0.92 and from 0.41 to 1.00, respectively. We did not do a meta‐analysis because of sparse and heterogeneous data, which were suggestive of moderate accuracy, particularly the largest study (Ledig 2018). The certainty of the evidence (summary of findings Table) was very low for both sensitivity and specificity due to risk of bias (−1), imprecision (−1) and inconsistency due to heterogeneous study results (−1).

Medial temporal gyrus, lateral temporal lobe, amygdala, cortical grey matter

Due to the limited number of studies, we did not calculate summary estimates for these regions (Data table 12; Data table 13; Data table 14; Data table 15; Data table 16; Data table 17; Data table 18). Visser 1999 studied the total lateral temporal lobe, and Galton 2005 analysed the right lateral temporal lobe. Wang 2006 and Ledig 2018 evaluated the total, left and right amygdala, Prieto del Val 2016 evaluated only the right amygdala. The amygdala volume was measured manually (Wang 2006), and with an automated method (Ledig 2018). Ledig 2018 evaluated the medial temporal gyrus and cortical grey matter.

Discussion

Summary of main results

This review analysed the diagnostic accuracy of structural MRI for the early diagnosis of dementia due to Alzheimer's disease in people with MCI. We used clinical diagnosis of dementia due to Alzheimer's disease at follow‐up as the reference standard. Data from 3935 participants with a diagnosis of MCI at baseline, who undertook a structural MRI and were followed for at least one year, were analysed in 33 primary studies published from 1999 to 2019. Sociodemographic and clinical characteristics of participants are presented in Table 1 and key results in summary of findings Table.

We assumed an add‐on role of structural MRI, that is, a test used in addition to the clinical judgement or cognitive test performance or both to improve a timely diagnosis of Alzheimer's disease dementia in people with MCI. The results of this review show that structural MRI did not meet the sensitivity and specificity criteria that are needed for an add‐on test, which should be highly specific and sensitive. This evidence was of low certainty for total hippocampus volume, which the largest number of studies reported, moderate for the volumes of the medial temporal lobe and lateral ventricles, and very low for the entorhinal cortex and the whole brain volumes. False positives should be low because a false diagnosis of Alzheimer's disease dementia can lead to a heavy burden for the patient and their family, inappropriate treatment of patients with medications for Alzheimer's disease, or lack of a proper therapy for potentially treatable causes of cognitive impairment. Moreover, false positives have a significant impact on health and social care costs. False negatives also should be low because a timely diagnosis of Alzheimer's disease dementia, at a time when people first seek for help being worried about changes in cognition, behaviour, or functioning, can allow them to receive counselling about lifestyle modifications that may help to slow down the progression of cognitive impairment. A timely diagnosis of Alzheimer's disease dementia, moreover, allows people to participate in clinical trials of new drugs for dementia due to Alzheimer's disease.

The results of this review cannot be considered conclusive because the included studies were at high or unclear risk of bias and heterogeneous, and data were not sufficient to compare test accuracy between different brain regions or different types of MCI, for example, amnestic or non‐amnestic MCI. We found no significant differences in sensitivity or specificity of total hippocampal volume between included studies with regards to follow‐up length, or age of participants, but the overall accuracy was better for manual versus automatic MRI techniques in mixed (mostly indirect) comparisons.

In a qualitative review, the authors concluded that, for the early diagnosis of Alzheimer's disease dementia, volume of entorhinal cortex provided better diagnostic accuracies than volume of other brain regions, such as the hippocampus (Leandrou 2018). However, key aspects of this qualitative review undermine its conclusion. The results were based on two studies (deToledo‐Morell 2004; Killiany 2000). We included the deToledo‐Morell 2004 study and judged it at high risk of bias for patient selection and index test. Sensitivity and specificity of entorhinal cortex were 0.50 and 1.00 respectively. We excluded the study of Killiany 2000 because participants were people with normal cognition or “questionable AD dementia”. Leandrou 2018 and colleagues did not assess the quality of evidence of the results arising from their review.

Strengths and weaknesses of the review

Strengths of this review include the following.

  1. We conducted an extensive, comprehensive, and sensitive literature search, using different electronic databases, and assessed the eventuality of participants' overlapping in the eligible studies.

  2. Two teams of two review authors each independently extracted data and two independent review authors used the QUADAS‐2 tool for quality assessments of the included studies.

  3. We included only prospective studies of participants who underwent structural MRI before diagnosis of dementia due to Alzheimer's disease, minimising the risk of bias in interpretation of the index test results.

  4. We approached authors of studies in an attempt to obtain missing information.

Limitations of this review include the following.

  1. Only heterogeneous, small studies were available, and few studies were available for some brain regions. This undermined our confidence in the pooled estimates of structural MRI diagnostic accuracy and likely contributed to the great variability in sensitivity and specificity observed in the included studies.

  2. We judged most of the included studies at high or unclear risk of bias, which contributed to the low certainty of evidence we presented in this review for the region with the most studies, total hippocampal volume.

  3. The studies varied with respect to the included participants and definition of MCI. Moreover, consecutive enrolling of participants and the method of recruitment used were seldom reported in most of the included studies. We considered participant selection at high risk of bias in 31 out of 33 included studies.

  4. Twenty‐four studies did not provide sufficient information regarding the index test, and we had to judge them at high risk of bias in this domain.

  5. The studies varied with respect to protocols for structural MRI. Most of the included studies (24 out of 33) described the MRI findings but did not provide a clear, pre‐specified definition of what was considered a 'positive' result of structural MRI.

  6. Only 13 studies addressed interobserver and intraobserver variability for MRI.

  7. Diagnosis of dementia due to Alzheimer's disease would require a histopathological confirmation but this is not feasible in clinical practice. The clinical diagnosis of Alzheimer's disease dementia at follow‐up is a delayed verification test, which is an imperfect reference standard and could have introduced bias. Furthermore, the experience of clinicians and the clinical pathway were poorly reported in most of the included studies.

Additional limitations of this review may be the following.

  1. We excluded studies that reported MRI accuracy obtained from multiple volumetric brain regions. However, some studies reported the highest diagnostic accuracies when both entorhinal cortex and hippocampus were combined in the analysis (Leandrou 2018), or when hippocampal subvolumes and presubiculum volume were combined (Khan 2015).

  2. We addressed accuracy of structural MRI alone and not as a component of a combination of tests. Other reviews reported that assessment of hippocampal volume or medial temporal lobe atrophy in isolation for the early diagnosis of dementia due to Alzheimer's disease in people with MCI is not supported by the current evidence (Frisoni 2013; Frisoni 2017a; Payton 2018; Ten Kate 2017b). These authors recommended that clinical research should focus on assessing the impact of combinations of biomarkers. Neuropsychological tests and multiple putative biomarkers including neuroimaging (MRI or PET) have been proposed, but the clinical usefulness of these biomarkers is still under evaluation.

  3. We have presented pooled estimates of sensitivity and specificity, despite the fact that explicit volume cut‐offs were not reported, which limits the clinical usefulness of summary estimates. Nonetheless, they are consistent with poor accuracy.

Applicability of findings to the review question

We did not judge any studies as having high concerns about applicability in 'patient selection' and 'index test' domains. Participants and the index text in the included studies did not differ from those targeted by the review question. Three studies presented unclear concern for the 'reference standard' domain and the remaining 30 studies demonstrated low concern. Structural imaging techniques and expertise needed to measure volume of brain areas, although potentially applicable, are not widely used in routine clinical practice. In an Italian study, the choice of neuroimaging technique (CT, MRI, or PET) in the clinical pathway of dementia was driven as much by test availability, physicians' familiarity with the technology, and waiting time for patients as by the patient's age, severity of cognitive impairment, or the diagnostic question (e.g. clinical suspicion of cerebrovascular disease; Frisoni 2017a).

Figure 1. Flow of studies identified in literature search for systematic review on structural magnetic resonance imaging for an early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment
Figures and Tables -
Figure 1

Figure 1. Flow of studies identified in literature search for systematic review on structural magnetic resonance imaging for an early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study
Figures and Tables -
Figure 2

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Forest plot of total hippocampal volume measured by structural magnetic resonance imaging (MRI) for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Plot shows study‐specific estimates of sensitivity and specificity (squares) with 95% confidence interval (black line) and study. Studies are ordered according to the estimates of sensitivity. TP: true positive; FP: false positive; FN: false negative; TN: true negative
Figures and Tables -
Figure 3

Forest plot of total hippocampal volume measured by structural magnetic resonance imaging (MRI) for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Plot shows study‐specific estimates of sensitivity and specificity (squares) with 95% confidence interval (black line) and study. Studies are ordered according to the estimates of sensitivity. TP: true positive; FP: false positive; FN: false negative; TN: true negative

Summary receiver operating characteristic (ROC) plot of total hippocampus volume measured by structural magnetic resonance imaging (MRI) for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Each point represents the pair of sensitivity and specificity from a study. The solid black circle represents the pooled sensitivity and specificity, which is surrounded by a 95% confidence region (dashed line)
Figures and Tables -
Figure 4

Summary receiver operating characteristic (ROC) plot of total hippocampus volume measured by structural magnetic resonance imaging (MRI) for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Each point represents the pair of sensitivity and specificity from a study. The solid black circle represents the pooled sensitivity and specificity, which is surrounded by a 95% confidence region (dashed line)

Summary receiver operating characteristic curve (ROC) presenting direct comparisons of hippocampus left and hippocampus right
Figures and Tables -
Figure 5

Summary receiver operating characteristic curve (ROC) presenting direct comparisons of hippocampus left and hippocampus right

Summary receiver operating characteristic (ROC) plot of total medial temporal lobe volume measured by structural MRI for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Each point represents the pair of sensitivity and specificity from a study. The solid black circle represents the pooled sensitivity and specificity, which is surrounded by a 95% confidence region (dashed line)
Figures and Tables -
Figure 6

Summary receiver operating characteristic (ROC) plot of total medial temporal lobe volume measured by structural MRI for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Each point represents the pair of sensitivity and specificity from a study. The solid black circle represents the pooled sensitivity and specificity, which is surrounded by a 95% confidence region (dashed line)

Summary receiver operating characteristic (ROC) plot of volume of lateral ventricles measured by structural magnetic resonance imaging (MRI) for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Each point represents the pair of sensitivity and specificity from a study. The solid black circle represents the pooled sensitivity and specificity, which is surrounded by a 95% confidence region (dashed line)
Figures and Tables -
Figure 7

Summary receiver operating characteristic (ROC) plot of volume of lateral ventricles measured by structural magnetic resonance imaging (MRI) for early diagnosis of dementia due to Alzheimer's disease in people with mild cognitive impairment. Each point represents the pair of sensitivity and specificity from a study. The solid black circle represents the pooled sensitivity and specificity, which is surrounded by a 95% confidence region (dashed line)

Hippocampus total.
Figures and Tables -
Test 1

Hippocampus total.

Hippocampus left.
Figures and Tables -
Test 2

Hippocampus left.

Hippocampus right.
Figures and Tables -
Test 3

Hippocampus right.

Medial temporal lobe total.
Figures and Tables -
Test 4

Medial temporal lobe total.

Medial temporal lobe left.
Figures and Tables -
Test 5

Medial temporal lobe left.

Medial temporal lobe right.
Figures and Tables -
Test 6

Medial temporal lobe right.

Lateral ventricles.
Figures and Tables -
Test 7

Lateral ventricles.

Enthorinal cortex total.
Figures and Tables -
Test 8

Enthorinal cortex total.

Enthorinal cortex left.
Figures and Tables -
Test 9

Enthorinal cortex left.

Enthorinal cortex right.
Figures and Tables -
Test 10

Enthorinal cortex right.

Whole brain.
Figures and Tables -
Test 11

Whole brain.

Medial temporal gyrus.
Figures and Tables -
Test 12

Medial temporal gyrus.

Lateral temporal lobe total.
Figures and Tables -
Test 13

Lateral temporal lobe total.

Lateral temporal lobe right.
Figures and Tables -
Test 14

Lateral temporal lobe right.

Amygdala total.
Figures and Tables -
Test 15

Amygdala total.

Amygdala left.
Figures and Tables -
Test 16

Amygdala left.

Amygdala right.
Figures and Tables -
Test 17

Amygdala right.

Cortical grey matter.
Figures and Tables -
Test 18

Cortical grey matter.

Summary of findings Whole brain volume or volume of specific brain regions for early Alzheimer's disease dementia diagnosis in people with mild cognitive impairment

Whole brain volume versus volume of specific brain regions for early Alzheimer's disease dementia diagnosis in people with mild cognitive impairment

Patient or population: people with mild cognitive impairment (MCI)

Setting: memory clinics or registry data (e.g. ADNI)

New test: volume of total hippocampus, medial temporal lobe, total entorhinal cortex, lateral ventricles, and whole brain. Volume measured with either quantitative manual or automated MRI technique

Cut‐off value: not reported

Number of results per 1000 participants tested (95% CI)

Prevalence 30%. Typically seen in participants with MCI after 2 to 3 years of follow‐up

Test

Number of participants
(Number of studies)

True positives

False negatives

True negatives

False positives

Pooled sensitivity
(95% CI)

Pooled specificity
(95% CI)

Certainty of the evidence (GRADE)

Total hippocampus

2209

(22)

219

(192 to 240)

81

(60 to 108)

497

(455 to 539)

203

(161 to 245)

0.73

(0.64 to 0.80)

0.71

(0.65 to 0.77)

⊕⊕⊝⊝
Lowa,b

Medial temporal lobe

1077

(7)

192

(159 to 219)

108

(81 to 141)

455

(357 to 532)

245

(168 to 343)

0.64

(0.53 to 0.73)

0.65

(0.51 to 0.76)

⊕⊕⊕⊝
Moderatea,c

Lateral ventricles

1077

(5)

171

(147 to 195)

129

(105 to 153)

448

(413 to 490)

252

(210 to 287)

0.57

(0.49 to 0.65)

0.64

(0.59 to 0.70)

⊕⊕⊕⊝
Moderatea,c

Total entorhinal cortex

529

(4)

Meta‐analyses not conducted due to sparse and heterogeneous data

Range: 0.50 to 0.88

Range: 0.60 to 1.00

⊕⊝⊝⊝
Very lowa,d

Whole brain

424

(4)

Meta‐analyses not conducted due to sparse and heterogeneous data

Range: 0.33 to 0.92

Range: 0.41 to 1.00

⊕⊝⊝⊝
Very lowa,d

The table displays normalised number of participants within a hypothetical cohort of 1000 people at a prevalence of Alzheimer's disease (pre‐test probabilities) of 30%. We selected a prevalence value based on a prevalence observed in people with MCI after 2 to 3 years of follow‐up. We estimated confidence intervals based on those around the point estimates for pooled sensitivity and specificity.

ADNI: Alzheimer's Disease Neuroimaging Initiative; CI: confidence interval; MCI: mild cognitive impairment; MRI: magnetic resonance imaging

GRADE Working Group GRADES of evidence

High certainty: we are very confident that the true effect lies close to the estimate of the effect.

Moderate certainty: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.

Low certainty: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.

Very low certainty: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

aRisk of bias: most studies were at high risk of bias for participant selection (registry data), or index test or both. We downgraded the certainty of the evidence by one level.
bImprecision: wide 95% confidence intervals. We downgraded the certainty of the evidence by one level
cImprecision: wide 95% confidence intervals, however upper limit for both sensitivity and specificity are below 0.75, which is a modest performance. We did not downgrade.
dInconsistency and imprecision: sparse and inconsistent data. We downgraded the certainty of the evidence by one level both for inconsistency and imprecision.

Figures and Tables -
Summary of findings Whole brain volume or volume of specific brain regions for early Alzheimer's disease dementia diagnosis in people with mild cognitive impairment
Table 1. Participants: sociodemographic and clinical characteristics

Study

Country

Multicentrica

Age (years)

mean ± SD

Number of participantsb

(% female)

Education (years)

mean ± SD

Baseline MMSE

mean ± SD

Mean follow‐up (years)

No. of MCI converters to AD dementia (%)

No. of stable MCI (%)

No. of MCI who converted to other dementia (%)

Carmichael 2007

USA

No

86.6 ± 5.9

29 (69%)

91.6 ± 5.5c

3.2

12 (45%)

12 (41%)

4 (14%)

Caroli 2007

Italy

No

70.2 ± 6.7

23 (43%)

9.7 ± 4.75

26.9 ± 2.0

1.6

9 (39%)

14 (61%)

1 (4%)d

Clerx 2013a

Europe

Yes

70.6 ± 7.6

328 (52%)

10.0 ± 3.8

27.0 ± 2.5

2.0

91 (28%)

225 (69%)

12 (3%)e

deToledo‐Morell 2004
USA

No

81.7 ± 6.9

27 (56%)

16.4 ± 3.1

27.3 ± 1.8

3.0

10 (37%)

17 (63%)

Devanand 2007

USA

No

66.8 ± 9.7

139 (56%)

15.2 ± 4.2

27.5 ± 2.2

3.0

35 (25%)

104 (75%)

2 (1%)d

Eckerstrom 2008
Sweden

No

67.9 ± 6.7

42 (57%)

11.4 ± 3.6

2.0

13 (31%)

21 (50%)

8 (19%)

Eckerstrom 2013
Sweden

No

69.6 ± 6.9

42 (57%)

(34 included in analysis)

10.2 ± 3.2

27.7 ± 2.6

2.0

13 (31%)

21 (50%)

8 (19%)

Erten‐Lyons 2006

USA

No

86.9 ± 6.6

37 (70%)

13.7 ± 3.7

27.3 ± 1.5

7.6

22 (59%)

14 (38%)

1 (3%)

Frolich 2017

Germany

Yes

65.7 ± 9

115 (42%)

9.5 ± 1.9

27.0 ± 2.1

2.2

28 (24%)

87 (76%)

Galton 2005

UK

No

63.7 ± 9.9

29 (48%)

26.9 ± 2.4

1.6

11 (38%)

18 (62%)

2 (6%)d

Gaser 2013

USA, Canada

Yes

75.2 ± 6.9

195 (33%)

16.0 ± 2.7

27.0 ± 1.8

3.0

133 (68%)

62 (32%)

Herukka 2008

Finland

No

71.2 ± 4.5

21 (67%)

4.2

8 (38%)

13 (62%)

Jack 2000

USA

No

77.6 ± 8.2

43 (53%)

13.6 ± 3.2

25.7 ± 3.3

3.0

18 (42%)

25 (58%)

Jang 2018

USA, Canada

Yes

71.3 ± 7.4

340 (47%)

16 (14‐18f)

29 (27‐29)f

3.0f

69 (20%)

271 (80%)

Khan 2015

USA, Canada, Europe

Yes

74.9 ± 6.9

447 (40%)

14.2 ± 4.5

27.0 ± 1.4

1.0

90 (20%)

357 (80%)

Ledig 2018

USA, Canada

Yes

74.3 in MCI c, 74.4 in MCI ncf

343 (41%)

26 in MCI c, 28 in MCI non cf

2.0

177 (52%)

166 (48%)

Liu 2010

Europe

Yes

73.6 ± 5.8

100 (53%)

9.0 ± 4.0

27.0 ± 2.0

1.0

21 (21%)

79 (79%)

Monge Argilés 2014
Spain

No

72.9 ± 6.9

30 (60%)

23.5 ± 2.0

2.0

15 (50%)

15 (50%)

Nesteruk 2016

Poland

No

63.2 ± 9.6

40 (55%)

13.9 ± 2.9

27.5 ± 1.7

2.0

9 (22%)

31 (78%)

Ong 2015

Australia

Yes

72.7 ± 6.6

45 (‐)

13.6 ± 3.7

27.3 ± 1.9

2.0

20 (44%)

21 (47%)

4 (9%)

Pereira 2014

USA, Canada, Europe

Yes

74.9 ± 7.3

480 (40%)

13.9 ± 4.6

27.0 ± 1.4

1.0

95 (20%)

385 (80%)

Platero 2019

Spain

Yes

74.1 ± 5.2

97 (63%)

8.5 ± 4.3

26.5 ± 2.7

3.0

36 (37%)

61 (63%)

Prestia 2013

Italy, Netherlands, Sweden

Yes

66.2 ± 9.4

73 (56%)

27.2 ± 1.5

2.4

29 (40%)

44 (60%)

Not reportede

Prestia 2013 (ADNI)
USA, Canada, Italy
(only data from Italy were used, see Table 2)

Yes

73.6 ± 8.6

93 (47%)

(36 included in analysis)

26.9 ± 1.7

2.7

18 (50%)

18 (50%)

Not reportede

Prieto del Val 2016 Spain

Yes

69.0 ± 7.0

34 (65%)

7.5 ± 5.7

26.6 ± 2.4

2.0

16 (47%)

18 (53%)

Rhodius‐Meester 2016

Netherlands

No

70.6 ± 7.3

171 (46%)

5.0 ± 1.0

26.7 ± 1.9

3.0f

104 (61%)

67 (39%)

23e

VanderFlier 2005
Netherlands

No

75.0 ± 7.0

15 (71%)

10.0 ± 3.0

26.0 ± 2.0

1.8

9 (60%)

6 (40%)

Visser 1999
Netherlands

No

78.8 ± 4.5

13 (54%)

7.4 ± 2.3

22.4 ± 2.3

3.0

9 (69%)

4 (31%)

Visser 2002
Netherlands

No

64.9 ± 9.5

29 (42%)

10.7 ± 3.2

27.7 ± 1.8

1.9

7 (23%)

20 (67%)

3 (1%)

Wang 2006

Taiwan

No

76.3 ± 4.0

58 (26%)

11.8 ± 4.3

25.9 ± 2.9

1.8

19 (33%)

39 (67%)

Westman 2011

Europe (6 countries)

Yes

74.0 ± 5.8

101 (52%)

8.7 ± 4.3

27.2 ± 1.6

1.0

19 (19%)

82 (81%)

Wolz 2011

USA, Canada

Yes

74.7 ± 7.9

405 (35%)

15.6 ± 3.2

27.0 ± 1.9

1.5

167 (41%)

238 (59%)

Wood 2016

UK

Yes

69.1 ± 4.5

15 (27%)

11.7 ± 1.0

27.7 ± 1.3

2.0

9 (60%)

6 (40%)

AD: Alzheimer's disease; MCI: mild cognitive impairment; MCI c: MCI converted to AD; MCI nc: MCI not converted to AD; MMSE: Mini Mental State Examination; SD: standard deviation

aAll studies were conducted at memory clinics or tertiary centres.
bNumber of participants reported in this table are those used in the meta‐analysis.
cModified Mini Mental State Examination.
dCases excluded from the analysis.
eCases excluded a priori from the study.
fMedian value was available instead of mean.

Figures and Tables -
Table 1. Participants: sociodemographic and clinical characteristics
Table 2. Included and excluded studies assessed for overlapping risk

Study

Data sets

Study period

MRI (Tesla)

MRI technique
(V, M, Aa)

MRI scale or software

MRI regionb

No. of participants

Follow‐up mean years

Participants' overlapping risk with other included studies

Decision on inclusion or exclusion of the MRI region

Bouwmann 2007

VUMC

2001‐2004

1

V‐Scheltens

MTL

59

1.6

Rhodius‐Meester 2016

Excluded

Caroli 2007

Brescia (Italy)

2002‐2005

1

1

V‐Scheltens

M‐DISPLAY

MTL

H total, H left, H right

23

23

1.6

1.6

No

No

Included

Included

Chupin 2009

ADNI

NR

1.5

1.5

sA‐SNT

A‐SACHA

H total

H total

210

210

1.5

1.5

Wolz 2011

Wolz 2011

Excluded

Excluded

Clerx 2013a

DESCRIPA +

VUMC

NR

1‐1.5

V‐Scheltens

M‐Show_Images 3.7.0

A‐LEAP

SIENAX

MTL

H total

H total

LV

328

328

328

328

2

2

2

2

No

No

No

No

Included

Included

Included

Included

Cuignet 2011

ADNI

NR

1.5

A‐SACHA

A‐Freesurfer

H total

H total

104

104

1.5

1.5

Wolz 2011

Wolz 2011

Excluded

Excluded

Dickerson 2013

ADNI

NR

1.5

A‐Freesurfer

H total

111

3

Wolz 2011

Excluded

Eckerstrom 2008

Gothenburg (Sweden)

NR

0.5

M‐Hipposegm

H total

42

2

No

Included

Eckerstrom 2013

Gothenburg (Sweden)

NR

0.5

M‐Hipposegm

H left and right

42

2

No

Included

Ewers 2012  

ADNI

NR

1.5

A‐Freesurfer

H left and right

130 (45)

2.3

Gaser 2013

Excluded

Gaser 2013

ADNI

NR

1.5

A‐Freesurfer

H left and right

H left

H right

195

195

195

1

3

3

No

No

Ledig 2018

Included

Included

Excluded

Gomar 2011

ADNI

Downloaded from ADNI on August 3, 2009

1.5

A‐Freesurfer

H left

H right

LV

WB

320

2

Gaser 2013

Ledig 2018

Ledig 2018

Ledig 2018

Excluded

Excluded

Excluded

Excluded

Gómez‐Sancho 2018

ADNI

NR

1.5

A‐Freesurfer

H total

183

3

Ledig 2018

Excluded

Heister 2011

ADNI

October 14, 2010

1.5

A‐NeuroQuant

H total

192

3

Wolz 2011

Excluded

Jang 2018

ADNI

Data downloaded in December 2017

3

V‐CVRS scale

(Sheltens for MTL)

MTL

GCA (more than one region)

LV

340

340

340

3

Pereira 2014

No

No

Excluded

Excluded

Included

Khan 2015c

ADNI +

AddNeuroMed

NR

1.5

A‐Freesurfer

H total

447

1

Wolz 2011 and Liu 2010

Included

Landau 2010

ADNI

NR

1.5

A‐Freesurfer

H total

85

2

Wolz 2011

Excluded

Ledig 2018

ADNI

NR

1.5‐3

A‐MALPEM

H total

H right

EC total

A left

A right

A total

MTG

WB

LV

cGM

343

343

343

343

343

343

343

343

343

343

2

2

2

2

2

2

2

2

2

2

Wolz 2011

Gaser 2013

No

No

No

No

No

No

No

No

Excluded

Included

Included

Included

Included

Included

Included

Included

Included

Included

Lehman 2013

ADNI

Downloaded from ADNI in June 2011

1.5

V‐Scheltens

MTL

394

3

Pereira 2014

Excluded

Lillemark 2014

ADNI

NR

1.5

A‐Freesurfer

WB

H total

240

1

Ledig 2018

Wolz 2011

Excluded

Excluded

Liu 2010c

AddNeuroMed

NR

1.5

A‐Fischl

H total

100

1

Khan 2015

Included

Liu 2013  

ADNI

NR

NR

V‐Scheltens

MTL

387

3

Pereira 2014

Excluded

Minhas 2017

ADNI

NR

1.5

A‐Freesurfer

H total

EC total

LV

52

52

52

3

3

3

Wolz 2011

Ledig 2018

Ledig 2018

Excluded

Excluded

Excluded

Pereira 2014

ADNI

AddNeuroMed

NR

1.5‐3

V‐Scheltens

MTL

480

1

No

Included

Prestia 2013

Brescia (Italy) +

VUMC +

Stockholm

NR

1

1.5

3

A‐Freesurfer

H total (the smallest between left and right H)

73

2.4

No

Included

Prestia 2013 (ADNI)

ADNI

ADNI

Brescia

Brescia

NR

NR

From 2006

From 2006

1.5‐3

1.5‐3

1

1

A‐Freesurfer

sA‐SNT

A‐Freesurfer

M‐DISPLAY

H total

H total

H total

H total

57

57

36

36

3

3

2.2

2.2

Wolz 2011

Wolz 2011

Prestia 2013

No

Excluded

Excluded

Included

Included

Prestia 2015

Brescia +

VUMC +

Stockholm

NR

1

1.5

3

A‐Freesurfer

H total

73

2.4

Prestia 2013

Excluded

Rhodius‐Meester 2016

VUMC

2000‐2012

1‐1.5

V‐Scheltens

MTL

171

3

No

Included

Sørensen 2016

ADNI

28 September 2012

1.5

A‐Freesurfer

H total

233

2

Wolz 2011

Excluded

Suppa 2015a

ADNI

NR

1.5

A‐VBM+mask

H total

198

1

2

3

Wolz 2011

Excluded

Tang 2015

ADNI

NR

1.5

A‐Freesurfer

H total

222

3

Wolz 2011

Excluded

VanderFlier 2005

VUMC

NR

1.5

M‐DISPLAY

H total, H left, H right, MTL total, MTL left, MTL right

15

1.8

No

Included

Varon 2015

ADNI

27 June 2013

1.5

A‐FreeSurfer

V‐Sheltens

A‐FreeSurfer

H total

MTA

EC total

89

3.2

Wolz 2011

Pereira 2014

Ledig 2018

Excluded

Excluded

Excluded

Vasta 2016

ADNI

NR

1.5

A‐Freesurfer

H total

121

1.5

Wolz 2011

Excluded

Visser 1999

AMSTEL study

NR

0.6

M‐developed in house software

H total

LTL

13

3

No

No

Included

Visser 2002

Maastricht Memory Clinic

NR

1.5

V‐Sheltens

M‐ShowImage

MTL

H total

30

30

1.9

1.9

No

No

Included

Included

Vos 2012

DESCRIPA+

VUMC

2003‐2005

1‐1.5

A‐LEAP

H total

153

2

Clerx 2013a

Excluded

Westman 2011

AddNeuroMed

NR

1.5

V‐Scheltens

M‐HERMES software

MTL

H total

101

101

1

1

Pereira 2014

No

Excluded

Included

Wolz 2011c

ADNI

Follow‐up stopped in 2011

1.5

A‐Lotjonen (fast and robust multi‐atlas segmentation)

H total

405

1.5

Khan 2015

Included

Yang 2012

ADNI

NR

1.5

1.5

A‐Freesurfer

A‐Freesurfer

H total

LV

111

111

2

2

Wolz 2011

Ledig 2018

Excluded

Excluded

Yu 2012

ADNI

June 2010

1.5

NR

EC

LV

H left

H right

63

63

63

63

2

Ledig 2018

Zhang 2012b

ADNI

NR

3

V‐Scheltens

MTL

53

2

Pereira 2014

Excluded

ADNI: Alzheimer's Disease Neuroimaging Initiative; MMSE: Mini Mental State Examinaation; NR: not reported; SD: standard deviation; VUMC: University Medical Centre, Amsterdam

aMRI technique: V: visual; M: manual; A: automated
bMRI region: A: amygdala; cGM: cortical grey matter; EC: entorhinal cortex; GCA: global cortical atrophy; H: hippocampus; MTL: medial temporal lobe; LV: lateral ventricles; MTG: medial temporal gyrus; WB: whole brain.
cUncertain risk of overlap between these studies (Khan 2015 did not specify the number of participants in both ADNI and AddNeuroMed studies).

Figures and Tables -
Table 2. Included and excluded studies assessed for overlapping risk
Table 3. Index test: description and common abbreviations

Study

Manufacturer of MRI scanners

Field strength (Tesla)

Brain regionsa

MRI‐B or MRI‐Lb

Technique: visual; quantitative manual; quantitative semi‐automated or automated

Carmichael 2007c

General Electric

1.5

LV, WB

MRI‐B + MRI‐L

Quantitative automated

Caroli 2007c

General Electric

1.0

H left, H right, H total

MRI‐B

Quantitative manual

MTL

MRI‐B

Visual

Clerx 2013a

Siemens, Philips

1 or 1.5

H total

MRI‐B

Quantitative manual

H total

MRI‐B

Quantitative automated

MTL

MRI‐B

Visual

LV

MRI‐B

Quantitative automated

deToledo‐Morell 2004c

General Electric

1.5

H total, EC total

MRI‐B

Quantitative manual

Devanand 2007c

General Electric

1.5

H left, H right, H total, EC left,

EC right, EC total

MRI‐B

Quantitative manual

Eckerstrom 2008c

Philips

0.5

H total

MRI‐B + MRI‐L

Quantitative manual

Eckerstrom 2013

Philips

0.5

H left, H right

MRI‐B

Quantitative manual

Erten‐Lyons 2006c

Not reported

1.5

H total, LV, WB

MRI‐B + MRI‐L

Quantitative semiautomated

Frolich 2017

Siemens, Philips

1.5

H total

MRI‐B

Quantitative automated

Galton 2005

General Electric

1.5

H left, H right, LTL right

MRI‐B

Visual

Gaser 2013

Several (ADNI scanners)

1.5

H left

H right

MRI‐B

Quantitative automated

Quantitative automatedd

Herukka 2008c

Siemens

1.5

H left, H right, H total, EC left,

EC right, EC total

MRI‐B

Quantitative manual

Jack 2000c

General Electric

1.5

H total

MRI‐B + MRI‐L

Quantitative manual

Jang 2018

Several (ADNI scanners)

3

MTL

MRI‐B

Visuald

GCA

MRI‐B

Visuald

LV

MRI‐B

Visual

Khan 2015

Several (ADNI and AddNeuroMed scanners)

1.5

H total

MRI‐B

Quantitative automated

Ledig 2018

Several (ADNI scanners)

1.5‐3

H total

MRI‐B

Quantitative automatedd

H right, A total, A left, A right, MTG, EC total, WB, LV, cGM

MRI‐B‐MRI‐L

Quantitatve automated

Liu 2010

Several (ADNI and AddNeuroMed scanners)

1.5

H total

MRI‐B

Quantitative automated

Monge Argilés 2014

General Electric

1.5

MTL

MRI‐B

Visual

Nesteruk 2016

Toshiba

1.5

H left, H right, EC left

MRI‐B

Quantitative automated

Ong 2015

Not specified

Not reported

H total

MRI‐B + MRI‐L

Quantitative automated

Pereira 2014

Several (ADNI and AddNeuroMed scanners)

1.5 or 3

MTL

MRI‐B

Visual

Platero 2019

General Electric

1.5

H total

MRI‐B

Quantitative automated

Prestia 2013 (ADNI)

Several (ADNI scanners)

1.5 or 3

H total

MRI‐B

Quantitative automatedd and semiautomatedd

Philips (TOMC)

1.0

H total

MRI‐B

Quantitative manual and automated

Prestia 2013c

PHILIPS, Siemens (TOMC, VUmc, KUHH)

1.0 or 1.5 or 3.0

H total (the smallest between left and right volumes)

MRI‐B

Quantitative automated

Prieto del Val 2016

Philips

1.5

A right

MRI‐B

Quantitative automated

Rhodius‐Meester 2016

Siemens, General Electric

1.0 or 1.5

MTL

MRI‐B

Visual

VanderFlier 2005c

Philips

1.5

H left, H right, H total, MTL left, MTL right, MTL total

MRI‐B

Quantitative manual

WB

MRI‐B

Quantitative semiautomated

Visser 1999c

Teslacon II (Technicare)

0.6

H total, LTL

MRI‐B

Quantitative manual

Visser 2002c

Philips

1.5

H total

MRI‐B

Quantitative manual

MTL

MRI‐B

Visual

Wang 2006c

Siemens

1.5

H left, H right, H total, A left,

A right, A total

MRI‐B

Quantitative manual

Westman 2011

Several (AddNeuroMed scanners)

1.5

H total

MRI‐B

Quantitative manual

> 1 region

MRI‐B

Quantitative automatedd

MTL

MRI‐B

Visuald

Wolz 2011

Several (ADNI scanners)

1.5

H total

MRI‐B

Quantitative automated

Wood 2016

Siemens

1.5

H total

MRI‐B

Quantitative automated

aA: amygdala; cGM: cortical grey matter; EC: entorhinal cortex; GCA: global cortical atrophy; H: hippocampus; MTL: medial temporal lobe; LTL: lateral temporal lobe; LV: lateral ventricles; WB: whole brain.

bMRI‐B: MRI‐baseline; MRI‐L: MRI‐longitudinal.
cData received from the study authors.
dData not used for the analysis (see Table 2).

Figures and Tables -
Table 3. Index test: description and common abbreviations
Table 4. Numbers of participants in subgroup analysis

Brain region

Number of studies

Sample size

Converted to AD dementia (%)

Sensitivity (95% CI)

Specificity (95% CI)

LR+ (95% CI)

LR‐ (95% CI)

Hippocampus totala

22

2209

687 (31%)

0.73
(0.64 to 0.80)

0.71
(0.65 to 0.77)

2.53

(2.09 to 3.06)

0.38

(0.29 to 0.50)

Automatic or semiautomatic technique

11

1698

531 (31%)

0.59

(0.48 to 0.70)

0.66

(0.56 to 0.74)

1.72

(1.25 to 2.36)

0.62

(0.46 to 0.85)

Manual technique

13

551

156 (31%)

0.82

(0.69 to 0.90)

0.74

(0.67 to 0.81)

3.21

(2.42 to 4.27)

0.31

(0.14 to 0.70)

≥ 3 years' follow‐up

8

413

156 (38%)

0.71

(0.54 to 0.84)

0.76

(0.67 to 0.82)

2.94

(2.11 to 4.11)

0.38

(0.22 to 0.64)

< 3 years' follow‐up

14

1796

513 (29%)

0.74

(0.65 to 0.81)

0.69

(0.61 to 0.76)

2.39

(1.90 to 3.00)

0.31

(0.21 to 0.47)

≥ 70 years old

16

1796

566 (32%)

0.73
(0.64 to 0.81)

0.69
(0.62 to 0.75)

3.10
(2.15 to 4.48)

0.41
(0.16 to 1.03)

< 70 years old

6

413

121 (29%)

0.72
(0.54 to 0.84)

0.77
(0.67 to 0.84)

3.10
(2.15 to 4.48)

0.41
(0.16 to 1.03)

Hippocampus left

8

359

113 (31%)

0.71

(0.62 to 0.79)

0.76

(0.67 to 0.83)

2.95

(2.14 to 4.06)

0.38

(0.28 to 0.51)

Hippocampus right

8

359

113 (31%)

0.81

(0.73 to 0.88)

0.71

(0.61 to 0.80)

2.82

(2.01 to 3.96)

0.23

(0.11 to 0.46)

Medial temporal lobe total

7

1077

330

0.64

(0.53 to 0.73)

0.65

(0.51 to 0.76)

1.81

(1.41 to 2.32)

0.56

(0.46 to 0.67)

Enthorinal cortex total

4

529

229

range: 0.50 to 0.88

range: 0.60 to 1.00

Not computed since no meta‐analyses was conducted

Lateral ventricles

5

1077

371

0.57

(0.49 to 0.65)

0.64

(0.59 to 0.70)

1.61

(1.39 to 1.87)

0.66

(0.57 to 0.78)

Whole brain

4

424

220

range: 0.33 to 0.92

range: 0.41 to 1.00

Not computed since no meta‐analyses was conducted

AD: Alzheimer's disease; CI: confidence interval; LR+: positive likelihood ratio; LR‐: negative likelihood ratio.

a Two studies (Clerx 2013a; Prestia 2013 (ADNI)) used both manual and automatic techniques for total hippocampal volume.

Figures and Tables -
Table 4. Numbers of participants in subgroup analysis
Table Tests. Data tables by test

Test

No. of studies

No. of participants

1 Hippocampus total Show forest plot

22

2209

2 Hippocampus left Show forest plot

8

525

3 Hippocampus right Show forest plot

8

673

4 Medial temporal lobe total Show forest plot

7

1077

5 Medial temporal lobe left Show forest plot

1

15

6 Medial temporal lobe right Show forest plot

1

15

7 Lateral ventricles Show forest plot

5

1077

8 Enthorinal cortex total Show forest plot

4

529

9 Enthorinal cortex left Show forest plot

3

199

10 Enthorinal cortex right Show forest plot

2

159

11 Whole brain Show forest plot

4

424

12 Medial temporal gyrus Show forest plot

1

343

13 Lateral temporal lobe total Show forest plot

1

13

14 Lateral temporal lobe right Show forest plot

1

29

15 Amygdala total Show forest plot

2

401

16 Amygdala left Show forest plot

2

401

17 Amygdala right Show forest plot

3

435

18 Cortical grey matter Show forest plot

1

343

Figures and Tables -
Table Tests. Data tables by test