Scolaris Content Display Scolaris Content Display

Exactitud diagnóstica de la evaluación a través de la telemedicina de la demencia y el deterioro cognitivo leve

Contraer todo Desplegar todo

Antecedentes

Millones de personas con demencia en todo el mundo no están diagnosticadas, lo que tiene un impacto negativo en el acceso a la atención y el tratamiento, así como en la planificación racional de los servicios. La telemedicina (el uso de las tecnologías de la información y la comunicación [TIC] para prestar servicios sanitarios a distancia) podría ser una forma de aumentar el acceso a la evaluación especializada de las personas con sospecha de demencia, especialmente las que viven en zonas remotas o rurales. También se ha utilizado mucho durante la pandemia de covid‐19. Es importante saber si los diagnósticos realizados mediante la evaluación a través de la telemedicina son tan exactos como los realizados en ámbitos clínicos convencionales y presenciales.

Objetivos

Objetivo principal: evaluar la exactitud diagnóstica de la evaluación a través de la telemedicina de la demencia y el deterioro cognitivo leve.

Objetivos secundarios: identificar la calidad y la cantidad de la evidencia de investigación relevante; identificar las fuentes de heterogeneidad en los datos de exactitud de las pruebas; identificar y resumir cualquier dato sobre la satisfacción de los pacientes o los médicos, el uso de los recursos, los costes o la viabilidad de los modelos de evaluación a través de la telemedicina en los estudios incluidos.

Métodos de búsqueda

El 4 de noviembre de 2020 se realizaron búsquedas en múltiples bases de datos y registros de ensayos clínicos de literatura publicada y "gris", así como de ensayos registrados. No se aplicaron filtros de búsqueda ni restricciones de idioma. Las citas recuperadas se examinaron por duplicado y se evaluaron por duplicado los textos completos de los artículos considerados potencialmente relevantes.

Criterios de selección

En la revisión se incluyeron estudios transversales con diez o más participantes que habían sido derivados a un servicio especializado para la evaluación de un presunto trastorno cognitivo. En un periodo de un mes o menos, cada participante se tuvo que someter a dos evaluaciones clínicas diseñadas para diagnosticar la demencia o el deterioro cognitivo leve (DCL): una evaluación a través de telemedicina (la prueba índice) y una evaluación presencial convencional (la prueba de referencia). La evaluación a través de telemedicina se podía basar en algunos datos recogidos de forma presencial, p.ej., por enfermeras que trabajan en la atención primaria, pero todo el contacto entre el paciente y el médico especialista responsable de resumir la información y hacer el diagnóstico tenía que tener lugar a distancia utilizando las TIC.

Obtención y análisis de los datos

Dos autores de la revisión extrajeron de forma independiente los datos de los estudios incluidos. Los datos extraídos abarcan el diseño del estudio, el contexto, los participantes, los detalles de la prueba índice y la prueba de referencia, y los resultados en forma de número de participantes a los que se les diagnosticó demencia o DCL. También se buscaron datos sobre los diagnósticos de subtipos de demencia y sobre las medidas cuantitativas de satisfacción de los pacientes o los médicos, el uso de los recursos, los costes y la viabilidad. El riesgo de sesgo y la aplicabilidad de cada estudio incluido se evaluó mediante QUADAS‐2. Se introdujeron los resultados en tablas de 2x2 para calcular la sensibilidad y especificidad de la evaluación a través de la telemedicina para el diagnóstico de la demencia por todas las causas, el DCL y cualquier síndrome cognitivo (que combinara demencia y DCL). Los resultados de los estudios incluidos se presentan de forma narrativa porque hubo muy pocos estudios para obtener estimaciones resumidas de la sensibilidad y la especificidad.

Resultados principales

Tres estudios con 136 participantes fueron elegibles para inclusión. Dos estudios (20 y 100 participantes) se realizaron en contextos comunitarios en Australia y un estudio (16 participantes) se llevó a cabo en hogares de veteranos en Estados Unidos. Los participantes fueron derivados desde la atención primaria con síntomas cognitivos no diagnosticados o se identificaron con alto riesgo de presentar demencia en una prueba de detección en residencias. La demencia y el DCL fueron las afecciones de interés en el estudio más grande; los otros estudios se centraron en el diagnóstico de la demencia solamente. Sólo un pequeño estudio utilizó un modelo de telemedicina "puro", es decir, que no incluía elementos de evaluación presencial.

En general, los estudios estaban bien realizados. Se consideró que dos estudios tenían un alto riesgo de sesgo de incorporación porque se utilizó una cantidad importante de información recopilada de forma presencial por las enfermeras para informar tanto a las evaluaciones con la prueba índice como con la prueba de referencia. Un estudio tuvo riesgo incierto de sesgo de selección.

Para el diagnóstico de la demencia por todas las causas, la sensibilidad de la evaluación a través de la telemedicina varió entre 0,80 y 1,00 y la especificidad entre 0,80 y 1,00. Se consideró evidencia de certeza muy baja debido a la imprecisión, la inconsistencia entre los estudios y el riesgo de sesgo. Para el diagnóstico del DCL, sólo se disponía de datos de un estudio (100 participantes) con una sensibilidad de 0,71 (IC del 95%: 0,54 a 0,84) y una especificidad de 0,73 (IC del 95%: 0,60 a 0,84). Se consideró evidencia de certeza baja debido a la imprecisión y al riesgo de sesgo. Para el diagnóstico de cualquier síndrome cognitivo (demencia o DCL), los datos del mismo estudio proporcionaron una sensibilidad de 0,97 (IC del 95%: 0,91 a 0,99) y una especificidad de 0,22 (IC del 95%: 0,03 a 0,60). La mayoría de las discrepancias diagnósticas se referían a la distinción entre el DCL y la demencia, y se dieron aproximadamente por igual en ambas direcciones. También hubo una tendencia a que los pacientes identificados como cognitivamente sanos en la evaluación presencial se diagnosticaran con DCL en la evaluación a través de la telemedicina (pero las cifras fueron bajas).

No hubo datos suficientes para hacer una evaluación de la exactitud del diagnóstico del subtipo de demencia.

Un estudio proporcionó una pequeña cantidad de datos que indicaban un buen nivel de satisfacción de los médicos y, especialmente, de los pacientes con el modelo de telemedicina. No había datos sobre el uso de los recursos, los costes o la viabilidad.

Conclusiones de los autores

Se encontraron muy pocos estudios elegibles con un número reducido de participantes. Una diferencia importante entre los estudios que proporcionaron datos para los análisis fue si la afección de interés era sólo la demencia (dos estudios) o la demencia y el DCL (un estudio). Los datos indican que la evaluación a través de la telemedicina podría ser muy sensible y específica para el diagnóstico de la demencia por todas las causas cuando se evalúa en comparación con la prueba de referencia de la evaluación presencial convencional, pero las estimaciones son imprecisas debido a los pequeños tamaños muestrales y a la heterogeneidad entre los estudios, y se podrían aplicar principalmente a los modelos de telemedicina que incorporan una cantidad considerable de contacto presencial con otros profesionales sanitarios distintos del médico responsable de hacer el diagnóstico. Para el diagnóstico del DCL mediante la evaluación a través de la telemedicina, las mejores estimaciones de la sensibilidad y la especificidad fueron algo menores, pero se basaron en un único estudio. Los errores se produjeron en los límites de salud cognitiva/DCL y DCL/demencia. Sin embargo, no hay evidencia de que las discrepancias diagnósticas fueran más frecuentes de lo que cabría esperar debido a la conocida variación entre las opiniones de los médicos a la hora de asignar un diagnóstico de demencia.

Exactitud de la evaluación a través de la telemedicina para diagnosticar la demencia y el deterioro cognitivo leve

Antecedentes

La demencia es una enfermedad en la que la memoria y otras capacidades cognitivas se deterioran hasta el punto de que alguien ya no puede realizar sus actividades cotidianas sin ayuda. Si los problemas de memoria y pensamiento son más leves, de forma que la autonomía no se ve afectada, la enfermedad se describe como deterioro cognitivo leve (DCL). Ambas enfermedades suelen afectar a personas mayores. Se considera importante que las personas con demencia o DCL puedan obtener un diagnóstico preciso en un momento y lugar adecuados para ellos, de modo que tanto ellos como sus familias puedan comprender el problema y acceder a tratamiento y apoyo. Sin embargo, millones de personas con demencia en todo el mundo nunca reciben un diagnóstico. Hay muchas razones para ello, pero una de ellas podría ser la falta de servicios de diagnóstico accesibles, sobre todo para los habitantes de las zonas rurales o los que tienen dificultades para desplazarse. Durante la pandemia de covid‐19 se cerraron muchos servicios presenciales. Telemedicina: el uso de las tecnologías de la información y la comunicación (TIC) para prestar servicios sanitarios a distancia podría ser una forma de aumentar el acceso a la evaluación especializada de las personas con sospecha de demencia que no pueden acudir fácilmente a las clínicas. Sin embargo, es importante asegurarse de que el aumento de la accesibilidad no vaya en detrimento de la precisión del diagnóstico.

Pregunta de revisión

La pregunta de la revisión es cómo de precisos son los diagnósticos de demencia y DCL realizados a través de la telemedicina en comparación con los diagnósticos realizados cuando los pacientes acuden a las clínicas tradicionales para una evaluación presencial.

Qué se hizo

Se realizaron búsquedas en las bases de datos de estudios médicos hasta el 4 de noviembre de 2020 para obtener los estudios en los que las personas se sometieron a dos evaluaciones por sospecha de demencia o DCL: una evaluación a través de la telemedicina y una evaluación presencial convencional. Ambas evaluaciones fueron realizadas por especialistas y tuvieron lugar con un mes de diferencia. En el caso de la evaluación a través de la telemedicina, todo el contacto entre el paciente y el especialista que realiza el diagnóstico tenía que hacerse a distancia, utilizando las TIC, pero parte de la información necesaria para realizar el diagnóstico podía ser recogida por otros miembros del equipo sanitario que veían al paciente en persona. A continuación se evaluó el grado de coincidencia de los resultados de las evaluación a través de la telemedicina con las evaluaciones presenciales.

Qué se encontró

Se incluyeron tres estudios (136 participantes) con sospecha de demencia. Un pequeño estudio (16 participantes) se llevó a cabo en hogares de veteranos en los Estados Unidos; los otros dos estudios se realizaron en servicios comunitarios en Australia. Todos utilizaron sistemas de videoconferencia para las evaluaciones a través de la telemedicina. Los tres estudios tenían como objetivo realizar diagnósticos de demencia, pero sólo uno de ellos también tenía como objetivo diagnosticar el DCL. La calidad de los estudios en general fue buena. En dos estudios, las enfermeras que vieron a los pacientes en persona desempeñaron un papel importante en la recopilación de la información utilizada en ambas evaluaciones, lo que podría sesgar esos estudios hacia una estrecha concordancia entre las evaluaciones.

Los estudios encontraron que la evaluación a través de la telemedicina identificaba correctamente entre el 80% y el 100% de las personas a las que se les había diagnosticado demencia en la evaluación presencial y también identificaba correctamente entre el 80% y el 100% de las personas que no tenían demencia.

Sólo un estudio (100 participantes) intentó diagnosticar el DCL. En este estudio, el 71% de los participantes que tenían DCL y el 73% de los participantes que no tenían DCL fueron identificados correctamente utilizando la evaluación a través de la telemedicina.

La evaluación a través de la telemedicina en este estudio identificó correctamente el 97% de los participantes que tenían DCL o demencia, pero sólo identificó correctamente el 22% de los que no tenían ninguno de los dos, aunque de nuevo este resultado fue muy incierto debido al número tan bajo de personas en esta categoría.

Es importante tener en cuenta que los diagnósticos de demencia y DCL realizados por dos especialistas que vean a los pacientes de forma presencial no mostrarán una concordancia del 100%. Por lo tanto, no se puede esperar una concordancia perfecta entre las evaluaciones de telemedicina y las presenciales.

Qué se concluye

A partir de la evidencia encontrada, la evaluación a través de la telemedicina para el diagnóstico de la demencia parece tener un buen nivel de precisión en comparación con la evaluación presencial, aunque el pequeño número de estudios y participantes, así como las diferencias entre los estudios incluidos, hacen que haya mucha incertidumbre sobre este resultado. La telemedicina pareció ser un poco menos precisa para diagnosticar el DCL que para diagnosticar la demencia. La concordancia entre dos evaluaciones presenciales tampoco es perfecta y no se puede afirmar que las discrepancias entre los diagnósticos a través de la telemedicina y los presenciales fueran más frecuentes.

Authors' conclusions

Implications for practice

Although a number of services using telehealth for dementia diagnosis have published evaluations showing that their telehealth model is feasible and acceptable to patients and clinicians, we found only three studies with 136 participants which were designed rigorously to assess the accuracy of this method of diagnosis. Only one study (100 participants) replicated clinical practice in seeking to diagnose both dementia and mild cognitive impairment (MCI). Most of the data came from studies using a model in which both telehealth and conventional face‐to‐face assessments were informed by a substantial amount of clinical information collected face‐to‐face by nursing staff.

The available data suggest that telehealth assessment may be highly sensitive and specific for the diagnosis of all‐cause dementia when assessed against a reference standard of conventional face‐to‐face assessment, but the estimates are very imprecise due to small sample sizes and between‐study heterogeneity, and may apply mainly to telehealth models which incorporate a considerable amount of face‐to‐face contact with healthcare professionals. It is not possible to draw conclusions about the accuracy of diagnosis of dementia subtypes. For the diagnosis of MCI by telehealth assessment, best estimates of both sensitivity and specificity were somewhat lower than for all‐cause dementia, but were based on a single study. Errors occurred at the healthy / MCI and MCI/dementia boundaries. However, there is no evidence that diagnostic disagreements were more frequent than would be expected due to the known imperfect inter‐rater reliability of the reference standard.

Implications for research

Further research into the accuracy of a range of telehealth models (including a 'pure' telehealth model) would be valuable to reduce uncertainty about the diagnostic accuracy. Studies should attempt to replicate the range of diagnoses made in standard services, including both MCI and dementia subtypes. We recommend the study design used by Martin‐Khan 2012, which compared agreement between telehealth and face‐to‐face assessments with agreement between paired face‐to‐face assessments; this aids interpretation by quantifying the impact of imperfect inter‐rater reliability of the reference standard.

In some remote and rural settings, telehealth may be the only option for service provision. In other areas, services developed during the pandemic may be retained. Hence, research into how to improve telehealth assessment, comparing different models, is also needed.

In all studies, data on resource use, costs, feasibility and user experience are important to inform service developments.

Summary of findings

Open in table viewer
Summary of findings 1. Summary of findings table: Telehealth assessment for diagnosis of dementia

Setting: secondary care

Reference test: criterion‐based diagnosis of dementia at conventional face‐to‐face diagnostic assessment

Outcome

No. of studies

(patients)

Study design

Effect per 100 patients

tested.

Pre‐test probability of 50% a.

Certainty of evidence

Comment

True positives

(patients correctly classified as having dementia)

3 studies

(136 patients)

Cross‐sectional

(cohort‐type accuracy study)

40 to 50

Very low b,c,d

Sensitivity 0.80‐1.00

False negatives

(patients incorrectly classified as not having dementia)

0 to 10

True negatives

(patients correctly classified as not having dementia)

3 studies

(136 patients)

Cross‐sectional

cohort‐type accuracy study)

40 to 50

Very low b,c,d

Specificity 0.80‐1.00

False positives

(patients incorrectly classified as having dementia)

0 to 10

GRADE Working Group grades of evidence
High certainty: we are very confident that the true effect lies close to that of the estimate of the effect.
Moderate certainty: we are moderately confident in the effect estimate; the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low certainty: our confidence in the effect estimate is limited; the true effect may be substantially different from the estimate of the effect.
Very low certainty: we have very little confidence in the effect estimate; the true effect is likely to be substantially different from the estimate of effect.

a Prevalence of dementia taken from largest included study (Martin‐Khan 2012)

b Downgraded due to risk of bias: two studies (116/136 participants) were at high risk of bias in the index test domain (incorporation bias)

c Downgraded due to inconsistency between studies

d Downgraded due to imprecision: small sample size

Open in table viewer
Summary of findings 2. Summary of findings table: Telehealth assessment for diagnosis of mild cognitive impairment

Setting: secondary care

Reference test: criterion‐based diagnosis of mild cognitive impairment (MCI) at conventional face‐to‐face diagnostic assessment

Outcome

No. of

studies

(patients)

Study design

Effect per 100 patients

tested.Pre‐test probability of 40% a.

Certainty of

evidence

Comment

True positives

(patients correctly classified as having MCI)

1 study

(100 patients)

Cross‐sectional

(cohort‐type

accuracy study)

28 (95% CI 22 to 34)

Low b,c

Sensitivity 0.71

(95% CI 0.54 to

0.84)

False negatives

(patients incorrectly classified as not having MCI)

12 (95% CI 6 to 18)

True negatives

(patients correctly classified as not having MCI)

1 study

(100 patients)

Cross‐sectional

(cohort‐type accuracy study)

44 (95% CI 36 to 50)

Low b,c

Specificity 0.73

(95% CI 0.60 to 0.84)

False positives

(patients incorrectly classified as having MCI)

16 (95% CI 10 to 24)

GRADE Working Group grades of evidence
High certainty: we are very confident that the true effect lies close to that of the estimate of the effect.
Moderate certainty: we are moderately confident in the effect estimate; the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low certainty: our confidence in the effect estimate is limited; the true effect may be substantially different from the estimate of the effect.
Very low certainty: we have very little confidence in the effect estimate; the true effect is likely to be substantially different from the estimate of effect.

a Prevalence of MCI taken from largest included study (Martin‐Khan 2012)

b Downgraded due to risk of bias: high risk of bias in index test domain (incorporation bias)

c Downgraded due to imprecision: single study, small sample size

Background

Dementia is a clinical syndrome which can be the end result of a large number of neurodegenerative diseases and other causes of brain damage. Its defining features are a multidomain decline in cognitive function and the loss of the ability to manage daily activities independently. Its prevalence increases sharply with age from about 65 years onwards (van der Flier 2005).

More common than dementia in late life are milder cognitive impairments, which exceed the normal cognitive changes of ageing but do not affect functional independence. These conditions are grouped together under the label of mild cognitive impairment (MCI). Some cases of MCI represent the earliest clinically detectable stages of progressive neurodegenerative diseases and will develop into dementia within a few years, but this is not inevitable (Tampi 2015).

An estimated 50 million people were living with dementia worldwide in 2017. Based on demographic forecasts, a doubling of prevalence every 20 years has been predicted, which would translate to more than 130 million people by 2050. The majority of people with dementia already live in low‐ and middle‐income countries, and this proportion is also increasing. As the numbers have grown, so has the political profile of dementia, partly due to pressure from older people and their families, and partly to the immense economic impact of dementia, which in 2015 was around 1% of global GDP, mainly in the costs of informal and formal social care (Alzheimer's Disease International 2000).

Most people living with dementia have not had a formal diagnosis. This was the focus of the World Alzheimer Report in 2011 (Prince 2011). At that time, even in high‐income countries, 50% to 80% of cases were thought to remain undiagnosed and undocumented in primary care, with an even larger ‘treatment gap’ (perhaps more accurately, a recognition gap) in low‐ and middle‐income countries. Diagnosis is considered to be vitally important both to inform service planning and, at an individual level, to allow access to information, services, and treatment for patients and their caregivers. Guidelines produced around the world emphasise the importance of access to high‐quality, timely diagnosis made by doctors or other healthcare professionals with specialist training. The World Health Organization (WHO) has set a target that by 2025 at least 50% of the estimated number of people with dementia in 50% of countries should have had a diagnosis (WHO 2017). This target will be challenging in many countries, and innovative ways to increase access to assessment will be needed, particularly for older people living outside urban centres.

At the time of writing, the COVID‐19 pandemic has greatly disrupted access to health services, especially for older people, who are amongst the most clinically vulnerable to COVID‐19. Adjustments made by memory assessment services to increase safety are likely to lead to reductions in service capacity. It is not at all clear when such disruption will end; even in countries which were initially relatively successful in lowering transmission rates, the risk of intermittent re‐imposition of local restrictions on face‐to‐face contacts remains, with implications for continuity of service provision. ‘Remote’ assessment models, which allow high‐quality diagnosis without face‐to‐face contact between patient and doctor, may be one way both of widening access to specialist diagnostic services for dementia, and of providing this safely in the pandemic context.

Target condition being diagnosed

The target conditions of interest for this review are dementia (all‐cause dementia and all subtypes of dementia), MCI, and any cognitive diagnosis (combining dementia and MCI in a single category).

Dementia diagnosis is operationalised in various classification systems, such as the WHO’s International Classification of Diseases (ICD), WHO 2010, and the American Psychiatric Association’s Diagnostic and Statistical Manual of Mental Disorders (DSM), APA 2013 (although the most recent iteration of this (DSM‐5) refers to ‘Major Neurocognitive Disorder’ rather than to dementia). The key features of all criteria, as expressed in the ICD, are that dementia is an acquired organic mental disorder involving a loss of intellectual abilities of sufficient severity to interfere with social or occupational functioning.

Dementia can be subtyped according to the causative brain disorder. The most common subtypes are currently considered to be: dementia due to Alzheimer’s disease (ADD); vascular dementia (VaD); the related conditions of dementia with Lewy bodies (DLB) and dementia in Parkinson’s disease (PDD); the various frontotemporal dementias (FTD); and mixed pathologies. There are rarer causes. There are also other pathologies that may be common but which have only recently been described and whose prevalence and contribution to the population burden of dementia is not yet well understood (e.g. Nelson 2019). Specialist memory assessment services usually aim to make a subtype diagnosis in order to better tailor information, support, and treatment to the individual patient. Validated criteria for each of the common subtypes are available. These are based on a combination of clinical symptoms and signs and special investigations, such as neuroimaging. Most allow a distinction between ‘probable’ and ‘possible’ subtype cases.

MCI does not have such well‐operationalised diagnostic criteria as dementia. However, there has been broad agreement that the term should be used to refer to a condition that includes (a) objective evidence of cognitive decline beyond that expected for age, and (b) preserved independence in daily activities. Criteria have been published for mild neurocognitive disorder (the DSM‐5 equivalent of MCI) (APA 2013), for MCI due to Alzheimer’s disease (Albert 2011) and, most recently, for the diagnosis of MCI due to Lewy body disease (LBD) (McKeith 2020), although the latter are currently intended for research purposes only. In this review we will consider all‐cause (undifferentiated) MCI, but will note if a particular subtype is reported.

Index test(s)

The index test being assessed in this review is a diagnostic assessment for dementia or MCI by a specialist, secondary care‐level clinician in which the person with suspected cognitive impairment and the clinician responsible for making or excluding the diagnosis do not meet face‐to‐face, but communicate using telephone or videoconference technologies. This approach is often described as ‘telehealth’ or ‘telemedicine’. Different telehealth models for specialist dementia assessment are possible. For example, all aspects of the assessment may be conducted from a distance, or the patient’s local primary care services may contribute to the assessment to varying degrees. The key point is that the person responsible for synthesising the clinical information and making a diagnostic judgement has only ‘remote’ contact with the patient.

Several services have published descriptions of their approaches to remote dementia assessment, including data demonstrating feasibility and acceptability to patients (e.g. Barton 2011; Dang 2018; Weiner 2011).

Clinical pathway

In many healthcare systems, the first assessment of cognitive symptoms is usually undertaken outside a specialist service, often in primary care. A suspicion of cognitive impairment may be raised by the patient or their family and friends, or as a result of opportunistic or targeted screening, such as when accessing primary or secondary healthcare services for another reason.

The initial, non‐specialist assessment will typically involve history‐taking, examination, and a short cognitive screening test. This assessment will aim, amongst other things, to try to confirm the presence of an objective cognitive problem and to exclude reversible causes of cognitive decline. When the non‐specialist clinician suspects dementia or MCI, then the patient is referred to a specialist assessment clinic (sometimes referred to as a memory clinic or cognitive disorders clinic; for consistency, we will use the term 'memory clinic' throughout this review). Prior to being seen in the memory clinic, patients will typically have a set of blood tests as part of the process of exclusion of alternative causes of cognitive decline, and may have neuroimaging (computed tomography (CT) or magnetic resonance imaging (MRI) according to a local dementia protocol).

Specialist memory clinics are most commonly provided in secondary care and may be hospital or community based. However, they may also be provided as an enhanced primary care service by primary care staff with additional, specialist training who receive referrals in a similar way to a secondary care clinic; for the purposes of this review, we will consider such a model to be at the level of secondary care.

In some healthcare systems, it may be possible for patients who are concerned about cognitive symptoms to refer themselves directly to a specialist assessment clinic, a model involving direct access to secondary care. The population in memory clinics with direct public access may resemble the pre‐referral primary care population in settings where primary care gate‐keeping is standard practice.

The standard model of assessment for suspected dementia or MCI in memory clinics in higher‐income settings is a face‐to‐face assessment by a team involving a medical specialist (a geriatrician, psychiatrist, or neurologist). The memory clinic team is often multidisciplinary in nature with a variety of other healthcare professionals (nurse, occupational therapist, neuropsychologist) contributing to the cognitive and functional assessments. Synthesising the information gathered during assessment and reaching a diagnosis is usually the responsibility of the doctor, but may sometimes be delegated to other healthcare professionals who have received appropriate training. Assessment at a memory clinic involves a history from the patient and an informant, physical examination as indicated, cognitive testing, and a selection of investigations, typically including blood tests if these have not been done before referral. Structural neuroimaging using CT or MRI may form part of some assessments where indicated and available. Such investigations are not always available or readily accessible even in high‐income countries, and usually only to a small minority of the population in low‐ and middle‐income countries. Use of specialised imaging and tissue biomarkers to improve diagnosis and pathological subtying is gaining traction in research and is recommended in clinical guidelines of certain countries. At present, these tests are not considered routine clinical practice and their application is usually restricted to tertiary referral centres.

Alternative test(s)

None.

Rationale

The World Alzheimer Report 2011 identified “that lack of detection is a significant barrier to improving lives of people with Alzheimer’s disease and other dementias, their families and carers” (Prince 2011). There are, in turn, barriers to detection of dementia, including too few specialists and concentration of specialists in urban areas. Most recently, the reduced access to services caused by the COVID‐19 pandemic has been an additional obstacle. Remote consultations using communication technologies (telehealth) may be one approach to overcoming some of these barriers. However, this should not be done at the cost of accuracy and quality of diagnosis. In this review, we aim to provide the best available evidence about the accuracy of diagnoses of dementia made using telehealth assessment in order to inform service development and future research.

Objectives

To assess the diagnostic accuracy of telehealth assessment for dementia and mild cognitive impairment (MCI).

Secondary objectives

  1. To identify the quality and quantity of the research evidence describing the accuracy of telehealth assessment for dementia and MCI.

  2. To identify sources of heterogeneity in the test accuracy. Potential sources of heterogeneity to be explored if data permit include: referral pathway (referral after a non‐specialist assessment or self‐referral), special populations (e.g. post‐stroke), severity of dementia, subtype of dementia, telehealth model (type of technology, degree of face‐to‐face contact), and proportion of participants undergoing neuroimaging.

  3. To identify and synthesise any data on patient or clinician satisfaction, resource use, costs or feasibility of the telehealth assessment models in the included studies.

Methods

Criteria for considering studies for this review

Types of studies

We included cross‐sectional studies where the index test(s) and reference standard were administered within one month of each other.

We excluded studies with a small number of cases (fewer than 10), as these studies are prone to various selection biases.

Case‐control studies in which participants are selected to undergo a telehealth assessment on the basis of their cognitive diagnostic status were not eligible for inclusion.

Participants

Our population of interest is any adult (aged over 18 years) with a suspected cognitive impairment who has presented to a specialist service to be assessed for dementia or mild cognitive impairment (MCI).
Studies in populations with and without a routine non‐specialist assessment before presentation at the specialist assessment service were eligible. This was considered to be a potential source of heterogeneity (since pre‐assessment probability of dementia will differ).

Studies of selected populations (e.g. studies in traumatic brain injury or in stroke) were also eligible, again noting this as a potential source of heterogeneity.

Index tests

Our index test of interest is a criterion‐based diagnosis of dementia or MCI made by means of an assessment in which all contact between the person with suspected cognitive impairment and the clinician responsible for making or excluding the diagnosis of dementia or MCI is by telephone or videoconference technologies. Although there is no face‐to‐face contact, the diagnostic assessment must involve a human‐to‐human interface; diagnostic assessments made using computer‐based algorithms without human contact were not eligible for inclusion.

Some defined elements of the assessment (assistance with technology, standardised cognitive testing, physical assessment) may be conducted face‐to‐face by a trained person (e.g. a nurse, primary healthcare worker), as long as this person is not responsible for synthesising the clinical information and making a diagnostic judgement. The degree of face‐to‐face contact in the index test assessment process was considered as a source of heterogeneity. If the same information obtained face‐to‐face is also used in the reference standard assessment, then we considered the effect of possible incorporation bias as part of our quality assessment.

Target conditions

Our target conditions are as follows.

  1. All‐cause dementia, diagnosed using validated criteria (e.g. ICD or DSM).

  2. All subtypes of dementia, diagnosed using validated criteria.

  3. Mild cognitive impairment (MCI) diagnosed using any system which incorporates evidence of cognitive decline and preserved functional independence, including DSM‐5 criteria for Minor Neurocognitive Disorder and National Institute on Aging/Alzheimer’s Association (NIA‐AA) criteria for MCI due to Alzheimer’s disease.

Reference standards

Our reference standard is a diagnosis of dementia or MCI made by means of an assessment which is conducted primarily face‐to‐face and, in particular, which involves face‐to‐face contact with the clinician responsible for making or excluding a diagnosis of dementia or MCI. We anticipated that this assessment would be the usual standard of care in the study setting, and we accepted any assessment protocol which is sufficient to make a diagnosis according to validated diagnostic criteria for dementia. Criteria for making a diagnosis of MCI are less well‐established, but the same assessment protocol is required. All reference standard assessments have to include as a minimum: a history from the patient and, wherever possible, from a knowledgeable informant, including an account and/or structured assessment of daily functioning; an objective assessment of cognition; and a mental state examination sufficient to exclude alternative psychiatric causes of cognitive symptoms, such as depression. Blood tests and special investigations, such as neuroimaging, if done, are often conducted prior to memory clinic assessment. So that patients do not have unnecessary repetition of invasive investigations, any such investigations are expected to contribute to diagnostic decisions made at both face‐to‐face and telehealth assessments, and are therefore not considered to be part of either the index test or the reference standard. We recorded how many patients in each study underwent neuroimaging and considered this as a potential source of between‐study heterogeneity.

The reference standard assessment had to take place in secondary care. Settings described as primary care were only eligible if the assessment was provided as an enhanced service (e.g. by general practitioner (GP) with a special interest who has undertaken additional training and who receives referrals from other GPs); we considered this model to be equivalent to a secondary care model.

The assessment may involve a multidisciplinary team, that is the person making the diagnosis need not conduct all aspects of the assessment personally. Use of information derived from pre‐assessment questionnaires and remote contact with an informant who is not able to attend in person are acceptable components of a reference standard assessment. If these materials also featured in the index test, then we considered the effect of incorporation bias as part of our quality assessment.

Masking of the reference standard assessment team to the index test outcome was an important part of the quality assessment.

Studies that make a postmortem diagnosis or base diagnosis on imaging or other biomarkers without a comprehensive clinical assessment were not eligible.

Search methods for identification of studies

Electronic searches

We searched MEDLINE (OvidSP) (1946 to present), Embase (OvidSP) (1974 to present), Web of Science Core Collection (Clarivate Analytics) (1900 to present), PsycINFO (Ovid) (1806 to present), and LILACS (BIREME) (Latin American and Caribbean Health Science Information database) (1982 to present) on 4 November 2020. These databases cover grey literature that is appropriate to the topic of this review. We searched the following trial registers: US National Institutes of Health Ongoing Trials Register ClinicalTrials.gov (www.clinicaltrials.gov/) and World Health Organization International Clinical Trials Registry Platform (apps.who.int/trialsearch/). (See Appendix 1 for the search strategy we ran in MEDLINE). We used controlled vocabulary such as MeSH terms and EMTREE where appropriate. In the searches developed, we made no attempt to restrict studies on the basis of sampling frame or setting. This approach is intended to maximise sensitivity and allow inclusion on the basis of population‐based sampling to be assessed at screening (see Selection of studies). We did not use search filters (collections of terms aimed at reducing the number need to screen) as an overall limiter because those that are published have not proved sensitive enough (Whiting 2008). We did not apply any language restriction to the electronic searches, using translation services as necessary. We searched ALOIS, the CDCIG Specialized Register, which includes both intervention and diagnostic test accuracy (DTA) studies in dementia. We performed forward and backward citation searching from potentially relevant or included papers.

Searching other resources

We included only papers published in peer‐reviewed scientific journals, so our searching of the grey literature was limited to the electronic databases described.

Data collection and analysis

Selection of studies

The search was run by the Cochrane Dementia & Cognitive Improvement Group's Information Specialist, who removed duplicates and conducted a 'first pass' assessment to exclude obviously irrelevant records. She imported the remaining citations into Covidence. Two review authors (JMcC and JL), working independently, first assessed the titles and abstracts of the citations in Covidence, and then assessed the full text of any article thought by either review author to be potentially eligible for inclusion in the review.

For consistency with our other DTA titles, we adopted a hierarchical approach to exclusion, first excluding on the basis of index test and reference standard, then on the basis of study design, and then for any other reason.

Where the available material (trial registry entry, conference abstract, or full paper) did not include the data required to judge the eligibility of a study or the data needed for analyses, we contacted the lead author to request the necessary information.

We detailed the study selection process in a PRISMA flow diagram (Figure 1).


Study flow diagram.

Study flow diagram.

Data extraction and management

Two review authors (JMcC and JL) independently extracted the data from eligible papers to a bespoke data extraction form. The form was tested on two papers and modified as necessary.

The data extraction form covered: population (including country, demographics, setting, eligibility criteria, pre‐referral screening processes, special characteristics (e.g. specific health conditions)); details of index test (including person conducting and setting of all components of assessment); details of reference standard (including person conducting and setting of all components of assessment); order of implementation of index test and reference standard. We sought numbers of potentially eligible participants who were excluded or who declined participation to get an idea of generalisability and acceptability.

We extracted data on categorical diagnoses of all‐cause dementia, subtypes of dementia (divided into probable and possible if reported), and MCI.

We sought and extracted data on any quantitative measures of patient or clinician satisfaction, resource use (e.g. time taken), feasibility (e.g. the number of incomplete assessments) and cost.

We did not seek to collect qualitative data on users' experience of remote and face‐to‐face assessments for this review.

The two review authors compared their extracted data and resolved initial disagreements through discussion.

Assessment of methodological quality

Two review authors (JMcC and JL), blinded to each other’s scores, assessed risk of bias (internal validity) and generalisability (external validity) using a modified version of the Quality Assessment tool for Diagnostic Accuracy Studies (QUADAS‐2) tool (www.bristol.ac.uk/population-health-sciences/projects/quadas/quadas-2/).

QUADAS‐2 assessment covers issues relating to patient selection, index test, reference standard, and participant flow. Each domain is assessed for issues which may introduce bias, and the first three domains are also assessed for generalisability concerns. We based our risk of bias assessment on a previously described, modified version of the generic QUADAS‐2 tool that was developed for studies of neuropsychological tests in dementia, and includes operationalised scoring rules at item and domain level (Davis 2013). This tool was further tailored to this review by adding a signalling question to the reference standard domain to address risk of incorporation bias, and by removing from the index test domain a signalling question on prespecifying test thresholds which is not relevant to this review (Appendix 2).

A particular concern for this review is incorporation bias: are the telehealth and face‐to‐face assessments based on the same data? The optimal study design for our purposes is where all the data informing the two assessments are acquired by different means for the index test and the reference standard. However, in some circumstances, there may be a degree of overlap (e.g. cognitive testing conducted face‐to‐face in an index test assessment, or informant history obtained remotely in some reference standard assessments) (see Index tests and Reference standards). In such circumstances, it is possible that the same data will inform both assessments. We considered incorporation bias as part of our QUADAS‐2 assessment by means of a specific signalling question in the index test domain. We also considered any element of face‐to‐face assessment in the index test as a potential source of heterogeneity to be examined using subgroup analysis.

We present QUADAS‐2 results as graphical displays and as a narrative within the text. There were not enough included studies to perform a sensitivity analysis limited to those papers with low concern for risk of bias in the index test and reference standard domains (pre‐specified because of the critical importance of index test and reference standard assessments being interpreted ‘blind’).

Statistical analysis and data synthesis

In practice, the diagnosis of dementia and MCI is made at various levels of precision, for example differentiating dementia and non‐dementia, assigning a pathological subtype, and describing the certainty of the formulation (possible or probable subtype diagnoses). We proposed a hierarchy of analyses so as to make the best use of all available information. In practice, the data allowed only analyses at the first level of the hierarchy.

  • First, we described accuracy at the level of dementia versus not‐dementia and MCI versus not‐MCI, where dementia covers all‐cause (undifferentiated) dementia and all dementia subtypes, and MCI is undifferentiated by cause. These were our primary analyses. We also conducted an analysis looking at the accuracy of identification of any cognitive impairment syndrome (dementia and MCI combined) versus no cognitive impairment syndrome.

  • Second, we planned to perform an analysis limited to those papers that describe dementia subtypes, assessing accuracy of the subtyping.

  • Third, if data permitted, we planned to perform analyses that looked at both dementia diagnosis and certainty (e.g. possible and probable Alzheimer’s disease dementia).

In all analyses, a 'positive' result is a clinician's criterion‐based diagnosis of dementia or a subtype of dementia or of MCI (as defined above in Target conditions), and a 'negative' result is the absence of such a diagnosis. We constructed 2x2 tables for each study and used the data to calculate sensitivities and specificities, with 95% confidence intervals, employing Review Manager 2014 software. We present the results graphically in forest plots of sensitivity and corresponding specificity.

We did not calculate summary estimates of sensitivity and specificity because of the small number of included studies and the clinical differences between them. (See Differences between protocol and review).

The data needed to populate a 2x2 table were not available for one study. For this study, we report the Kappa statistic for agreement between face‐to‐face and telehealth diagnoses, as presented in the original study publication.

Investigations of heterogeneity

There were insufficient data to conduct any of the planned subgroup analyses to explore the following potential areas of heterogeneity:

  1. population: special populations, referral pathway (self‐referral or referral after a non‐specialist assessment), severity of dementia, proportion undergoing neuroimaging;

  2. index test: technology used, degree of face‐to‐face assessment;

  3. reference standard: subtype of dementia.

Sensitivity analyses

We did not conduct any sensitivity analyses.

Assessment of reporting bias

We did not plan to perform a quantitative assessment of reporting bias. We recognise that there is a lack of consensus about the most robust approach to assessment of reporting bias in DTA (Wilson 2015), and uncertainty about how to apply standard approaches such as funnel plots (van Enst 2014).

Results

Results of the search

The search retrieved 3474 unique records (after de‐duplication). The Cochrane Dementia & Cognitive Improvement Group Information Specialist conducted a 'first pass' assessment and excluded 2825 of these as obviously irrelevant, leaving 649 records. Two review authors, JMcC & JL, independently examined these 649 titles (and abstracts where available). We discarded 634 as irrelevant and selected 15 to examine in full text. We identified one more article for full‐text assessment from reference lists. Of the 16 articles examined in full text, we excluded 12 and included four, which reported on three separate studies. At each stage, any disagreements between review authors were resolved by discussion. The three included studies were: Loh 2007, Martin‐Khan 2012 and Shores 2004. The study selection process is depicted in Figure 1.

Methodological quality of included studies

We assessed the risk of bias in each of four domains using the QUADAS‐2 tool. Our judgments are presented graphically in Figure 2.


Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Patient selection. Loh 2007 did not report their sampling method or give any detail about exclusion criteria. We judged this study to be at unclear risk of bias in this domain. The other two studies used consecutive (or complete) samples and described appropriate exclusion criteria. We judged them to be at low risk of selection bias. We had no concerns about applicability in this domain.

Index test. In all three studies, there was evidence that the index tests were interpreted (i.e. diagnoses were made on the basis of the telehealth assessment) without knowledge of the result of the face‐to‐face reference standard assessment. In Loh 2007, there was little overlap between the index test and reference standard assessments and we judged the risk of bias in this domain to be low. However, in the other two studies, there was a substantial amount of information gathered face‐to‐face before the diagnostic interviews which was used to inform both assessments (see Table 1). This included some cognitive testing and informant questionnaires in both studies. In Shores 2004, it also included a neurological examination. We considered that this degree of overlap created a high risk of incorporation bias (towards agreement between the two assessments) and judged both of these studies to be at high risk of bias in this domain. We had no applicability concerns about the index test in any of the studies.

Reference standard. In all three studies, there was evidence that the reference standard results were interpreted without knowledge of the index test results. In all three studies, diagnoses of dementia were made on the basis of validated diagnostic criteria (DSM‐IV or ICD‐10). In Martin‐Khan 2012, clinicians were also able to make a diagnosis of 'Cognitive impairment ‐ no dementia' in participants who met some but not all of the DSM‐IV criteria for dementia. We considered that the risk of bias related to misclassification was low in all studies. We had no applicability concerns related to the reference standard.

Flow and timing. In all three studies, the interval between index test and reference standard assessments was short. The order of assessments was either random or alternating. All participants were included in analyses. We judged all studies to be at low risk of bias in this domain.

Open in table viewer
Table 1. Components of assessments

Study ID

Assessment components described in paper

Conducted face‐to‐face prior to both diagnostic assessments (used to inform both)

Conducted during both reference standard and index test assessments

Conducted in reference standard assessment only

Shores 2004

Medical history

+

Lab tests

+

Neurological exam

+

MMSE

+

Lawton‐Brody IADL

+

Lawton‐Brody PMS

+

Psychiatric exam

+

Focused neurological exam*

(observation of gait, eye movements, tremor and frontal release signs)

+

Short cognitive tests**

+

Notes:

Face‐to‐face components prior to both diagnostic assessments were conducted by nursing staff at the veterans’ homes.

* Focused neurological exam during index test assessment was also conducted with assistance of a member of staff at the care home.

** Short Blessed, three word recall, clock drawing

Neuroimaging is not mentioned in the paper.

Loh

2007

Lab tests

+

Neuroimaging#

+

sMMSE

+

GDS

+

Katz ADL

+

IADL

+

IQCODE

+

Physical exam

+

Notes:

#Not clear how many participants had neuroimaging.

No specific information given about the content of the interviews with the diagnosing clinicians.

Martin‐Khan 2012

Lab tests

Neuroimaging#

sMMSE

+

RUDAS

+

Clock drawing

+

FAS fluency

+

Animal Fluency

+

GDS‐15

+

IQCODE

+

NPI‐Q

+

DAD

+

Notes:

#Not clear how many participants had neuroimaging.

No specific information given about the content of the interviews with the diagnosing clinicians.

IADL: Instrumental Activities of Daily Living; PMS: Physical Self Maintenance Scale; GDS: Geriatric Depression Scale; Kat ADLL: Katz assessment of Activities of Daily Living;IQCODE:‐ Informant Questionnaire for Cognitive Decline in the Elderly; NPI‐Q: Neuropsychiatry Inventory;sMMSE: Short form; standardised Mini Mental State Examination; DAD: Disability Assessment for Dementia.

Findings

Description of included studies

The included studies are described in the Characteristics of included studies tables. The assessments making up the index tests and reference standards are described in detail in Table 1.

Study design

All of the included studies were cross‐sectional diagnostic test accuracy studies with short intervals between the two diagnostic assessments (< 1 day to mean 8.2 (SD 2.3) days). Two of the included studies (Shores 2004; Loh 2007) were very small (16 and 20 participants, respectively). The third study (Martin‐Khan 2012) included 100 participants relevant to this review. (Martin‐Khan 2012 also investigated the reliability of conventional diagnostic assessment for dementia by including a group of participants who had two face‐to‐face assessments from different clinicians).

Setting

Shores 2004 was a study of the residents of two veterans' homes in the USA. Loh 2007 and Martin‐Khan 2012 were conducted in community settings in Australia by researchers with an interest in services to remote and rural areas.

Participants

Participants in all studies had either been identified as being at intermediate or high risk of having dementia on a screening test in a care home (Shores 2004), or had been referred by GPs to secondary care services because of cognitive symptoms. The mean age of participants in the studies was 75 to 79 years. Fifteen of 16 participants in Shores 2004 were male; in the community‐based studies, there was a fairly even split between the sexes. Overall, the participants were well representative of the patients seen by secondary care dementia assessment services in high‐income countries.

Target condition

The target condition in all of the included studies was a diagnosis of dementia made by a specialist physician at a face‐to‐face assessment, but the criteria used to make the diagnoses varied between studies.

Loh 2007 stated that their target condition was ICD‐10 dementia in Alzheimer's disease (AD), but they also recorded diagnoses of mixed dementia (AD and vascular) according to ICD‐10 among their 20 participants. We combined these into a category of all‐cause dementia.

Martin‐Khan 2012 and Shores 2004 used a target condition of DSM‐IV dementia. Martin‐Khan 2012 also identified participants with "Cognitive impairment no dementia", which they defined as evidence of impairment meeting some, but not all, DSM‐IV criteria for dementia, including amnestic disorder and cognitive impairment not otherwise specified. We treated this group of participants as having mild cognitive impairment (MCI) for the purposes of this review.

Index test and reference standard assessments

In all three studies the telehealth assessment was conducted using a videoconference system.

The components of the diagnostic assessments in the included studies are described in detail in Table 1.

Table 1 includes a column for those assessment elements which were conducted face‐to‐face prior to both diagnostic interviews and which were used to inform both index test and reference standard assessments. In Shores 2004, these elements were conducted by nursing staff in the participants' care homes; in Martin‐Khan 2012, where the index test and reference standard assessments took place on the same day during a clinic visit, they were conducted by clinic nursing staff. The shared elements in both of these studies included collection of structured informant history using questionnaires and some cognitive testing (Mini Mental State Examination (MMSE) or standardised Mini Mental State Examination (sMMSE) in both studies; additional cognitive tests in Martin‐Khan 2012). In Shores 2004, a neurological examination was also conducted by local staff at the care home.

Also in Table 1 is a column for the elements reported in each paper to have made up the remote and face‐to‐face interviews with the diagnosing clinicians, i.e. the core of the index test and reference standard assessments. Shores 2004 was the only paper which specified that both diagnostic interviews included a psychiatric examination, although this may have constituted part of both assessments in the other studies too. In Loh 2007 and Shores 2004, there was specified cognitive testing during the clinician interviews; in Martin‐Khan 2012, all cognitive testing was done face‐to‐face prior to the clinician interviews. In Shores 2004; the clinicians observed gait, eye movements, tremor and frontal release signs (directly in the reference standard assessment and with the help of care home nursing staff in the index test assessment); in Loh 2007, a physical examination formed part of the reference standard but not the index test assessment; in Martin‐Khan 2012, there was no physical examination.

All papers specified that the assessing clinicians were given results of pre‐assessment lab tests. Shores 2004 did not mention neuroimaging. Loh 2007 and Martin‐Khan 2012 stated that clinicians had access to neuroimaging results where these were available, but no paper specified what proportion of participants had had neuroimaging or the nature of the scans.

Results of analyses

Sensitivity and specificity results are presented for three analyses ‐ telehealth assessment for the diagnosis of dementia (versus not‐dementia), MCI (versus not‐MCI) and any cognitive syndrome (versus no cognitive syndrome) ‐ in Forest plots in Figure 3, Figure 4 and Figure 5, respectively.


Forest plot of 1 Dementia / not‐dementia.

Forest plot of 1 Dementia / not‐dementia.


Forest plot of 2 MCI / not‐MCI.

Forest plot of 2 MCI / not‐MCI.


Forest plot of 3 Any cognitive syndrome / none.

Forest plot of 3 Any cognitive syndrome / none.

Loh 2007 enrolled 20 participants who had been referred by a GP for evaluation of cognitive symptoms. The telehealth assessment in this study was fully remote, i.e. it was not informed by any data collected face‐to‐face prior to the remote interview. This study was at unclear risk of bias in the patient selection domain because of a lack of information about the sampling process, but at low risk of bias in other respects.

Twelve participants were diagnosed with ICD‐10 dementia at face‐to‐face assessment: dementia in AD (9 participants) or mixed AD and vascular dementia (3 participants). Telehealth assessment had a sensitivity of 1.00 (95% CI 0.74 to 1.00) and a specificity of 1.00 (95% CI 0.63 to 1.00) for the diagnosis of all‐cause dementia. One of the patients diagnosed with mixed dementia at the face‐to‐face assessment was diagnosed with AD dementia at the telehealth assessment.

Martin‐Khan 2012 enrolled 100 participants who had been referred by a GP to a memory clinic for evaluation of undiagnosed cognitive symptoms. In this study too, a significant amount of data on participants was collected face‐to‐face by clinic nurses and used to inform both face‐to‐face and telehealth diagnostic assessments. This study was at high risk of bias in the index test domain because of a substantial risk of incorporation bias, but at low risk of bias in other domains.

Fifty participants were diagnosed at face‐to‐face assessment with all‐cause dementia according to DSM‐IV criteria and 41 with 'Cognitive impairment ‐ no dementia' (CIND) because of cognitive impairment which did not meet full DSM‐IV criteria for dementia. We considered the latter group to have MCI for the purposes of this review. Telehealth assessment had a sensitivity of 0.80 (95% CI 0.66 to 0.90) and a specificity of 0.80 (95% CI 0.66 to 0.90) for the diagnosis of all‐cause dementia; a sensitivity of 0.71 (95% CI 0.54 to 0.84) and a specificity of 0.73 (95% CI 0.60 to 0.84) for the diagnosis of CIND/MCI; and a sensitivity of 0.97 (95% CI 0.91 to 0.99) but a specificity of 0.22 (95% CI 0.03 to 0.60) for the diagnosis of any cognitive syndrome (dementia or CIND/MCI).

Of the nine participants identified as cognitively normal at face‐to‐face assessment, two were considered cognitively normal and seven were diagnosed with CIND/MCI at the telehealth assessment. Of the 41 participants diagnosed with CIND/MCI at face‐to‐face assessment, two were considered cognitively normal and 10 were diagnosed with dementia at the telehealth assessment. Of the 50 participants diagnosed with dementia at face‐to‐face assessment, one was considered cognitively normal and nine were diagnosed with CIND/MCI at the telehealth assessment.

This paper did not report data on dementia subtype diagnoses, but stated that there was quote: "no statistically significant disagreement ... in relation to Alzheimer disease (or) vascular dementia".

Shores 2004 enrolled 16 residents of veterans' homes who were identified as being at intermediate or high risk of dementia on a screening test. In this study, a significant amount of data was collected face‐to‐face by care home staff and used to inform both face‐to‐face and telehealth assessments. This study was at high risk of bias in the index test domain, due to a substantial risk of incorporation bias, but at low risk of bias in other domains.

Twelve participants were diagnosed at face‐to‐face assessment with all‐cause dementia according to DSM‐IV criteria. Telehealth assessment had a sensitivity of 1.00 (95% CI 0.74 to 1.00) and a specificity of 1.00 (95% CI 0.40 to 1.00) for the diagnosis of all‐cause dementia. The authors also reported perfect agreement for subtype diagnoses (seven dementia in AD, three DLB, one mixed dementia, one dementia in multiple sclerosis).

Other data

Among the included studies, only Shores 2004 reported any data on user satisfaction. Both patients and clinicians were asked to answer five questions using a five‐point Likert scale. Patients showed a high level of satisfaction on all questions, which covered understanding what they were told, privacy, time saved, likelihood of choosing telemedicine again and preferring telemedicine to attending a clinic. Where a score of 5 indicated strong agreement with a positive statement about telemedicine, the mean scores on these questions ranged from 4.5 to 4.8 (all SDs ≤ 0.6). Clinicians were asked two questions about the utility of the telehealth consultation (whether it led to recognition of problems not apparent to the patient and to an effect on the care plan) and three questions about the technology (whether it worked well, whether they preferred the teleconference system to a telephone call, and whether the quality of audio and video was adequate). Mean scores for all questions indicated clinician satisfaction (range 3.7 to 4.1).

We did not find any quantitative data on resource use, costs or feasibility from the included studies.

Discussion

Summary of main results

Three studies with 136 participants were included in the review. Two of these studies were very small (16 and 20 participants).

Dementia was a target condition in all of the included studies. Only one study (100 participants) also assessed the accuracy of diagnosis of cognitive impairment ‐ no dementia (CIND), which we considered as equivalent to mild cognitive impairment (MCI) for the purposes of the review.

The included studies were generally well‐conducted. One small study was at unclear risk of selection bias. We considered two studies to be at high risk of bias in the index test domain because a significant amount of information used to make the diagnoses was gathered face‐to‐face and contributed to both telehealth and face‐to‐face diagnostic assessments. If the question of interest is the accuracy of a 'pure' telehealth assessment, then this may be considered a source of bias, likely to exaggerate agreement between telehealth and face‐to‐face diagnoses. However, a model incorporating a substantial amount of face‐to face contact is a valid model, which may be suitable for some services in some settings. Hence this could also be viewed as an issue of the applicability of the evidence (see further discussion below).

For the diagnosis of all‐cause dementia, sensitivity of telehealth assessment ranged from 0.80 to 1.00 and specificity from 0.80 to 1.00. We considered the evidence on dementia diagnosis to be very low certainty because of imprecision, inconsistency between studies and risk of bias.

For the diagnosis of MCI, data were available from only one study (100 participants) giving a sensitivity of 0.71 (95% CI 0.54 to 0.84) and a specificity of 0.73 (95% CI 0.60 to 0.84) for MCI diagnosis based on telehealth assessment. We considered this to be low‐certainty evidence because of imprecision and risk of bias. For diagnosis of any cognitive syndrome (dementia or MCI), data from the same study gave a sensitivity of 0.97 (95% CI 0.91 to 0.99) and a specificity of 0.22 (95% CI 0.03 to 0.60).

The perfect agreement between telehealth and face‐to‐face diagnoses of all‐cause dementia reported by two very small studies (sensitivity 1.00 and specificity 1.00 in both studies) may result both from their small size and from the fact that they aimed only to distinguish dementia from not‐dementia. Accuracy appeared lower in the larger study which also allowed diagnoses of MCI ‐ a much closer reflection of clinical practice. Most disagreements in this study were at the boundary between MCI and dementia (in both directions). There was also a tendency for the telehealth assessment to result in more diagnoses of MCI in the small number of participants considered cognitively healthy at face‐to‐face assessment, as shown in the low specificity estimate (with wide confidence intervals) for telehealth diagnosis of any cognitive syndrome.

Strengths and weaknesses of the review

The principle weakness of the review is the small number of studies and participants.

Only one of the included studies gave clinicians the option of diagnosing mild cognitive impairment (or cognitive impairment falling short of dementia criteria) although this is a diagnosis commonly made in memory clinics. The presence or absence of this option may influence the 'threshold' of impairment at which a clinician makes a diagnosis of dementia, i.e. the application of the reference standard, and may contribute to between‐study heterogeneity in the review.

A general limitation of dementia test accuracy research is the constraint on accuracy imposed by the imperfect reference standard (Cerrullo 2021). Inter‐ and intra‐rater reliability of dementia diagnosis is reduced but not eliminated by use of standardised diagnostic criteria, all of which still involve exercise of clinical judgement, particularly in assessing whether a patient's functional decline passes a rather subjective threshold. The question this raises of what should be considered 'adequate' accuracy metrics for diagnoses made using telehealth methods is not directly addressed in this review. Martin‐Khan 2012 and Martin‐Khan 2007 (a smaller study of similar design excluded from the review because dementia diagnoses were not made strictly according to validated diagnostic criteria) also included groups of participants who had two face‐to‐face assessments from different clinicians; inter‐rater agreement within these groups could be thought of as representing an upper limit on the agreement which could be achieved between telehealth and face‐to‐face assessments in the same studies. Martin‐Khan 2007 reported kappa statistics of 0.53 (95% CI 0.3‐0.8) for diagnoses of all‐cause dementia made by paired face‐to‐face assessments and 0.63 (95% CI 0.4‐0.9) for telehealth and face‐to‐face assessments, but commented that, by chance in a small study, the paired face‐to‐face group included more borderline and complex cases which could have reduced agreement. Martin‐Khan 2012 addressed the question of the imperfect reference standard by designing the analysis to test whether telehealth assessment was non‐inferior to standard face‐to‐face assessment. This study reported a weighted kappa of 0.50 (95% CI 0.36‐0.65) for overall diagnostic agreement in the paired face‐to‐face group and 0.52 (95% CI 0.39‐0.66) in the telehealth / face‐to‐face group. The authors used a Chi2 test to show that these were consistent with the null hypothesis of no difference between the assessment methods.

The small amount of data did not allow exploration of any potentially important sources of heterogeneity, including the target conditions in the study (as discussed above), referral pathway, severity of dementia, details of the assessment model or use of neuroimaging (which was poorly reported).

The data in these studies do not allow assessment of the accuracy of telehealth assessment for making dementia subtype diagnoses.

Applicability of findings to the review question

Participants in the included studies had a mean age of 75 to 79 years and had either been referred by GPs to secondary care services because of a suspected cognitive syndrome or had been identified as being at intermediate or high risk of having dementia on a screening test in a care home. They were well representative of the patients seen by secondary care dementia assessment services in high‐income countries. The results do not apply to special populations, e.g. post‐stroke.

All studies used videoconferencing systems for the telehealth assessments, which is broadly what is still likely to be used in current practice. Although technology may have improved since the studies were conducted, there was no reason to think that technological limitations affected the results.

One small study (20 participants) used a 'pure' telehealth model, but most of the data came from studies using a model in which the telehealth assessment relied heavily on clinical data collected by nurses during face‐to‐face contacts with patients and informants. This latter model is well suited to services in remote and rural areas, but the results may be less applicable to the clinical models established during the COVID‐19 pandemic in many countries, the point of which was to avoid face‐to‐face contacts.

Study flow diagram.

Figuras y tablas -
Figure 1

Study flow diagram.

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Figuras y tablas -
Figure 2

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Forest plot of 1 Dementia / not‐dementia.

Figuras y tablas -
Figure 3

Forest plot of 1 Dementia / not‐dementia.

Forest plot of 2 MCI / not‐MCI.

Figuras y tablas -
Figure 4

Forest plot of 2 MCI / not‐MCI.

Forest plot of 3 Any cognitive syndrome / none.

Figuras y tablas -
Figure 5

Forest plot of 3 Any cognitive syndrome / none.

Dementia / not‐dementia

Figuras y tablas -
Test 1

Dementia / not‐dementia

MCI / not‐MCI

Figuras y tablas -
Test 2

MCI / not‐MCI

Any cognitive syndrome / none

Figuras y tablas -
Test 3

Any cognitive syndrome / none

Summary of findings 1. Summary of findings table: Telehealth assessment for diagnosis of dementia

Setting: secondary care

Reference test: criterion‐based diagnosis of dementia at conventional face‐to‐face diagnostic assessment

Outcome

No. of studies

(patients)

Study design

Effect per 100 patients

tested.

Pre‐test probability of 50% a.

Certainty of evidence

Comment

True positives

(patients correctly classified as having dementia)

3 studies

(136 patients)

Cross‐sectional

(cohort‐type accuracy study)

40 to 50

Very low b,c,d

Sensitivity 0.80‐1.00

False negatives

(patients incorrectly classified as not having dementia)

0 to 10

True negatives

(patients correctly classified as not having dementia)

3 studies

(136 patients)

Cross‐sectional

cohort‐type accuracy study)

40 to 50

Very low b,c,d

Specificity 0.80‐1.00

False positives

(patients incorrectly classified as having dementia)

0 to 10

GRADE Working Group grades of evidence
High certainty: we are very confident that the true effect lies close to that of the estimate of the effect.
Moderate certainty: we are moderately confident in the effect estimate; the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low certainty: our confidence in the effect estimate is limited; the true effect may be substantially different from the estimate of the effect.
Very low certainty: we have very little confidence in the effect estimate; the true effect is likely to be substantially different from the estimate of effect.

a Prevalence of dementia taken from largest included study (Martin‐Khan 2012)

b Downgraded due to risk of bias: two studies (116/136 participants) were at high risk of bias in the index test domain (incorporation bias)

c Downgraded due to inconsistency between studies

d Downgraded due to imprecision: small sample size

Figuras y tablas -
Summary of findings 1. Summary of findings table: Telehealth assessment for diagnosis of dementia
Summary of findings 2. Summary of findings table: Telehealth assessment for diagnosis of mild cognitive impairment

Setting: secondary care

Reference test: criterion‐based diagnosis of mild cognitive impairment (MCI) at conventional face‐to‐face diagnostic assessment

Outcome

No. of

studies

(patients)

Study design

Effect per 100 patients

tested.Pre‐test probability of 40% a.

Certainty of

evidence

Comment

True positives

(patients correctly classified as having MCI)

1 study

(100 patients)

Cross‐sectional

(cohort‐type

accuracy study)

28 (95% CI 22 to 34)

Low b,c

Sensitivity 0.71

(95% CI 0.54 to

0.84)

False negatives

(patients incorrectly classified as not having MCI)

12 (95% CI 6 to 18)

True negatives

(patients correctly classified as not having MCI)

1 study

(100 patients)

Cross‐sectional

(cohort‐type accuracy study)

44 (95% CI 36 to 50)

Low b,c

Specificity 0.73

(95% CI 0.60 to 0.84)

False positives

(patients incorrectly classified as having MCI)

16 (95% CI 10 to 24)

GRADE Working Group grades of evidence
High certainty: we are very confident that the true effect lies close to that of the estimate of the effect.
Moderate certainty: we are moderately confident in the effect estimate; the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low certainty: our confidence in the effect estimate is limited; the true effect may be substantially different from the estimate of the effect.
Very low certainty: we have very little confidence in the effect estimate; the true effect is likely to be substantially different from the estimate of effect.

a Prevalence of MCI taken from largest included study (Martin‐Khan 2012)

b Downgraded due to risk of bias: high risk of bias in index test domain (incorporation bias)

c Downgraded due to imprecision: single study, small sample size

Figuras y tablas -
Summary of findings 2. Summary of findings table: Telehealth assessment for diagnosis of mild cognitive impairment
Table 1. Components of assessments

Study ID

Assessment components described in paper

Conducted face‐to‐face prior to both diagnostic assessments (used to inform both)

Conducted during both reference standard and index test assessments

Conducted in reference standard assessment only

Shores 2004

Medical history

+

Lab tests

+

Neurological exam

+

MMSE

+

Lawton‐Brody IADL

+

Lawton‐Brody PMS

+

Psychiatric exam

+

Focused neurological exam*

(observation of gait, eye movements, tremor and frontal release signs)

+

Short cognitive tests**

+

Notes:

Face‐to‐face components prior to both diagnostic assessments were conducted by nursing staff at the veterans’ homes.

* Focused neurological exam during index test assessment was also conducted with assistance of a member of staff at the care home.

** Short Blessed, three word recall, clock drawing

Neuroimaging is not mentioned in the paper.

Loh

2007

Lab tests

+

Neuroimaging#

+

sMMSE

+

GDS

+

Katz ADL

+

IADL

+

IQCODE

+

Physical exam

+

Notes:

#Not clear how many participants had neuroimaging.

No specific information given about the content of the interviews with the diagnosing clinicians.

Martin‐Khan 2012

Lab tests

Neuroimaging#

sMMSE

+

RUDAS

+

Clock drawing

+

FAS fluency

+

Animal Fluency

+

GDS‐15

+

IQCODE

+

NPI‐Q

+

DAD

+

Notes:

#Not clear how many participants had neuroimaging.

No specific information given about the content of the interviews with the diagnosing clinicians.

IADL: Instrumental Activities of Daily Living; PMS: Physical Self Maintenance Scale; GDS: Geriatric Depression Scale; Kat ADLL: Katz assessment of Activities of Daily Living;IQCODE:‐ Informant Questionnaire for Cognitive Decline in the Elderly; NPI‐Q: Neuropsychiatry Inventory;sMMSE: Short form; standardised Mini Mental State Examination; DAD: Disability Assessment for Dementia.

Figuras y tablas -
Table 1. Components of assessments
Table Tests. Data tables by test

Test

No. of studies

No. of participants

1 Dementia / not‐dementia Show forest plot

3

136

2 MCI / not‐MCI Show forest plot

1

100

3 Any cognitive syndrome / none Show forest plot

1

100

Figuras y tablas -
Table Tests. Data tables by test