Scolaris Content Display Scolaris Content Display

Rehabilitación de la memoria para personas con esclerosis múltiple

Contraer todo Desplegar todo

Resumen

Antecedentes

Los problemas con la función cognitiva, especialmente la memoria, son frecuentes en las personas con esclerosis múltiple (EM), y pueden afectar la capacidad de completar las actividades cotidianas y repercutir negativamente en la calidad de vida. En los últimos años, ha aumentado considerablemente el número de ensayos controlados aleatorizados (ECA) sobre la rehabilitación de la memoria en la EM. Para guiar a médicos e investigadores, esta revisión proporciona una visión global de la efectividad de la rehabilitación de la memoria en personas con EM.

Objetivos

Determinar si las personas con EM que recibieron rehabilitación de la memoria, en comparación con las que recibieron ningún tratamiento o un control activo, mostraron mejores desenlaces inmediatos, intermedios o a más largo plazo en:

1. las funciones de memoria,

2. otras habilidades cognitivas, y

3. las capacidades funcionales, en cuanto a las actividades cotidianas, el estado de ánimo y la calidad de vida.

Métodos de búsqueda

Se hicieron búsquedas en CENTRAL, que incluye ClinicalTrials.gov, el Portal internacional de registro de ensayos clínicos de la Organización Mundial de la Salud (OMS), Embase y PubMed (MEDLINE), así como en las siguientes bases de datos (6 de septiembre de 2020): CINAHL, LILACS, la base de datos NIHR Clinical Research Network Portfolio, The Allied and Complementary Medicine Database, PsycINFO, y CAB Abstracts.

Criterios de selección

Se seleccionaron los ECA o los ensayos cuasialeatorizados de rehabilitación de la memoria o rehabilitación cognitiva para pacientes con EM en los que se comparó un grupo de tratamiento con rehabilitación de la memoria con un grupo control. La selección primero se realizó de forma independiente y luego se confirmó mediante un debate en grupo. Se excluyeron los estudios que incluyeron a participantes cuyos déficits de memoria no fueron el resultado de la EM, a menos que se hubiera podido identificar un subgrupo de participantes con EM con resultados separados.

Obtención y análisis de los datos

Ocho autores de la revisión participaron en esta actualización en cuanto a la selección de los estudios, la evaluación de la calidad, la extracción de los datos y la revisión del manuscrito. Cuando fue necesario, se estableció contacto con los investigadores de los estudios primarios para obtener información adicional. El análisis y la síntesis de los datos se realizaron de acuerdo con la metodología Cochrane. Se realizó una síntesis de la “mejor evidencia” según la calidad metodológica del estudio primario incluido. Los desenlaces se consideraron por separado para los puntos temporales "inmediatos" (en el primer mes tras completar la intervención), "intermedios" (de uno a seis meses) y "a largo plazo" (más de seis meses).

Resultados principales

En esta actualización se agregaron 29 estudios, lo que proporciona un total de 44 estudios con 2714 participantes. Las intervenciones incluyeron diversas técnicas de rehabilitación de la memoria, como programas informático y entrenamiento en el uso de dispositivos internos y externos de ayuda a la memoria. Los grupos control variaron en cuanto al formato desde grupos de evaluación solamente, grupos de debate y juegos, rehabilitación cognitiva inespecífica y entrenamiento visuoespacial o de la atención. El riesgo de sesgo de los estudios incluidos fue generalmente bajo, pero se encontró que ocho estudios tuvieron alto riesgo de sesgo relacionado con ciertos aspectos de su metodología.

En este resumen solo se proporcionan desenlaces en el punto temporal intermedio (es decir, entre uno y seis meses). Se encontró una ligera diferencia entre los grupos en la memoria subjetiva (DME 0,23; IC del 95%: 0,11 a 0,35; 11 estudios; 1045 participantes; evidencia de calidad alta) y la calidad de vida (DME 0,30; IC del 95%: 0,02 a 0,58; seis estudios; 683 participantes; evidencia de calidad alta) a favor del grupo de rehabilitación de la memoria. Hubo una pequeña diferencia entre los grupos en la memoria verbal (DME 0,25; IC del 95%: 0,11 a 0,40; seis estudios; 753 participantes; evidencia de calidad baja) y el procesamiento de la información (DME 0,27; IC del 95%: 0,00 a 0,54; ocho estudios; 933 participantes; evidencia de calidad baja) a favor del grupo de rehabilitación de la memoria.

Se encontró poca o ninguna diferencia entre los grupos en la memoria visual (DME 0,20; IC del 95%: ‐0,11 a 0,50; seis estudios, 751 participantes; evidencia de calidad moderada), la memoria de trabajo (DME 0,16; IC del 95%: ‐0,09 a 0,40; ocho estudios, 821 participantes; evidencia de calidad moderada) o las actividades cotidianas (DME 0,06; IC del 95%: ‐0,36 a 0,24; cuatro estudios, 400 participantes; evidencia de calidad alta).

Conclusiones de los autores

Existe evidencia para apoyar la efectividad de la rehabilitación de la memoria sobre algunos desenlaces evaluados en esta revisión en un punto temporal intermedio del seguimiento. La evidencia indica que la rehabilitación de la memoria da lugar a diferencias entre los grupos que favorecen al grupo de rehabilitación de la memoria en el punto temporal intermedio para los desenlaces de memoria subjetiva, memoria verbal, procesamiento de la información y calidad de vida, lo que indica que la rehabilitación de la memoria es beneficiosa y significativa para las personas con EM. Existen efectos diferenciales de la rehabilitación de la memoria en función de la calidad de los ensayos, con estudios de alto riesgo de sesgo que exageran los desenlaces (positivos). Todavía se necesitan más ECA sólidos, a gran escala y multicéntricos, con mejor calidad en la publicación de los resultados, que utilicen evaluaciones ecológicamente válidas de los desenlaces (incluyendo los desenlaces económicos sanitarios) evaluados en puntos temporales a más largo plazo para tener certeza de la efectividad de la rehabilitación de la memoria en personas con EM.

PICO

Population
Intervention
Comparison
Outcome

El uso y la enseñanza del modelo PICO están muy extendidos en el ámbito de la atención sanitaria basada en la evidencia para formular preguntas y estrategias de búsqueda y para caracterizar estudios o metanálisis clínicos. PICO son las siglas en inglés de cuatro posibles componentes de una pregunta de investigación: paciente, población o problema; intervención; comparación; desenlace (outcome).

Para saber más sobre el uso del modelo PICO, puede consultar el Manual Cochrane.

Rehabilitación de la memoria en la esclerosis múltiple

Pregunta de la revisión

¿Las personas con esclerosis múltiple (EM) que recibieron rehabilitación de la memoria, en comparación con las que no recibieron tratamiento o recibieron un control activo, mostraron mejores desenlaces inmediatos, intermedios o a más largo plazo en:

1. las funciones de memoria,

2. otras habilidades cognitivas, y

3. las capacidades funcionales, en cuanto a las actividades cotidianas, el estado de ánimo y la calidad de vida?

Antecedentes

Las personas con esclerosis múltiple (EM) suelen padecer problemas de memoria que pueden causarles dificultades en su día a día. La rehabilitación de la memoria se ofrece a las personas para afrontar los problemas de memoria, mejorar su capacidad de realizar actividades cotidianas y aumentar la independencia reduciendo los episodios de olvido. Dicha rehabilitación puede incluir técnicas y estrategias específicas para cambiar la forma en que la persona intenta activar, almacenar o recuperar los recuerdos. Sin embargo, no está claro si la rehabilitación de la memoria es efectiva para reducir los episodios de olvido o mejorar la realización de las actividades cotidianas. Históricamente había pocos estudios de buena calidad que investigaran la efectividad de la rehabilitación de la memoria en personas con EM, pero últimamente se han realizado estudios grandes. Por lo tanto, se quiso averiguar si la evidencia de la efectividad de la rehabilitación de la memoria ha cambiado desde la versión anterior de la revisión.

Características de los estudios

Esta revisión incluyó 44 estudios con 2714 participantes que recibieron diversos tipos de técnicas de rehabilitación de la memoria, algunas utilizaron técnicas restaurativas (p.ej., programas informático) y otras utilizaron técnicas compensatorias (p.ej., ayudas a la memoria como diarios o calendarios).

Resultados clave y calidad de la evidencia

Se han hecho avances considerables desde la última actualización de esta revisión y los resultados indican que existe evidencia para apoyar el uso de la rehabilitación de la memoria en personas con EM. Los participantes que recibieron rehabilitación de la memoria comunicaron un mejor funcionamiento de la memoria y calidad de vida en comparación con los que no la recibieron y estas diferencias se encontraron inmediatamente después de completar la intervención y algún tiempo después. Sin embargo, los que recibieron rehabilitación de la memoria no parecieron mejorar en cuanto a sus síntomas de ansiedad o en las actividades cotidianas. En esta actualización se han agregado estudios grandes y de buena calidad en los que basar los resultados, por lo que la evidencia que apoya la efectividad de la rehabilitación de la memoria es más sólida que en la actualización anterior. No obstante, siguen necesitándose estudios grandes de buena calidad que examinen el efecto a largo plazo de la rehabilitación de la memoria y estudios que evalúen la coste‐efectividad de la rehabilitación de la memoria en personas con EM.

Authors' conclusions

Implications for practice

In the last two decades increasing attention has been given to memory problems as a frequent complaint for people with multiple sclerosis (MS). Memory rehabilitation programmes are offered to some people with MS, but their effectiveness has been questionable. Small studies using a mixture of internal and external memory aids, errorless learning, and environmental manipulation have yielded positive results with many these using computer‐delivered interventions. Large randomised controlled trials (RCTs) use mostly group‐based and computer‐delivered interventions and have also yielded positive results with improvements in outcomes seen in these trials often being maintained at follow‐up. The positive results in trials using computerised interventions have important implications for clinical practice in the current COVID‐19 pandemic, as cognitive rehabilitation may have to be delivered virtually for the foreseeable future. This review examined the evidence from RCTs and quasi‐RCTs and found evidence to suggest that memory rehabilitation is effective in improving memory performance on subjective, objective (verbal, visual and working memory) assessments across immediate and intermediate follow‐ups, and quality of life in the immediate, intermediate, and longer‐term, and reducing depression (but only immediately after the intervention). Memory rehabilitation did not have an effect, at any time point, on activities of daily living or anxiety. There appeared to be no indication of harm caused by the interventions, but several studies did not routinely report adverse effects.

Implications for research

The research base from which to draw inferences for clinical practice regarding the effectiveness of memory rehabilitation for MS has improved since the previous review (das Nair 2016b). RCTs tended to be of modest sample size, and mostly used impairment‐level outcome measures, which have limited value in assessing the functional effects of neurorehabilitation. These trials did not always adhere to the CONSORT guidelines (Moher 2001), which makes it difficult to get a full and true picture of the studies, and therefore limits the reader from making an informed decision regarding the fidelity of their conclusions. Missing information from such reports also make collating information for a meta‐analysis difficult. Furthermore, results from ‘positive’ trials may be difficult to implement in clinical practice if sufficient details about the actual intervention are not clearly spelt out. The TiDieR checklist and other more specific guidance for reporting of memory rehabilitation trials may help improve the quality of reporting trials of complex interventions (Hoffman 2014; Martin 2015). Given that memory rehabilitation is a complex intervention and four studies assessed the fidelity of the intervention, we would encourage trialists to consider intervention fidelity assessments in future memory rehabilitation trials. The results of this review indicate that more research is required to arrive at a definitive answer as to whether or not memory rehabilitation for MS is effective in reducing activity limitations or restrictions to participation. It also highlights the need for more well‐conceptualised, executed, and reported RCTs of memory rehabilitation that take into consideration the issues raised in this review.

Summary of findings

Open in table viewer
Summary of findings 1. Memory rehabilitation for people with multiple sclerosis

Memory rehabilitation for people with multiple sclerosis

Patient or population: people with multiple sclerosis
Settings: clinic and home‐based
Intervention: memory rehabilitation

Comparison: active control or no treatment

Outcomes

Illustrative comparative risks* (95% CI)

Relative effect
(95% CI)

No of Participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Control

Memory rehabilitation

Subjective memory measures ‐ intermediate

EMQ, MSNQ, CFQ, MFQa

Follow‐up: 1 to 6 months

 

The mean subjective memory measures ‐ immediate in the intervention groups was

0.23 standard deviations higher

(0.11 to 0.35 higher)

1045

(11 studies)

⊕⊕⊕⊕
high

Immediate follow‐up: 

SMD 0.32
(0.05 to 0.58)

 

Longer‐term follow‐up: 

SMD 0.16 (0.02 to 0.30)

Objective verbal memory measures ‐ intermediate

CVLT, AVLT, HVLT, VLT, SRT, MUSICa
Follow‐up: 1 to 6 months

 

 

 

 

 

The mean objective verbal memory measures ‐ intermediate in the intervention groups was

0.25 standard deviations higher

(0.11 to 0.4 higher)

 

 

 

 

 

 

 

753

(6 studies)

 

 

 

 

⊕⊕⊝⊝
lowb,c

 

 

 

 

Immediate follow‐up: 

SMD  0.40

(0.22 to 0.58)

 

Longer‐term follow‐up: 

SMD 0.13 (‐0.03 to 0.29)

 

 

 

 

 

Objective visual memory measures ‐ intermediate

BVMT‐R, SPART, CMT, ROCF

Follow‐up: 1 to 6 months

 

 

 

 

 

 

 

The mean objective visual memory measures ‐
intermediate in the intervention groups was
0.2 standard deviations higher
(0.11 lower to 0.5 higher)

 

 

 

 

 

751

(6 studies)

 

 

 

 

⊕⊕⊕⊝
moderatee

 

 

 

Immediate follow‐up: 

SMD 0.42

(0.25 to 0.60)

 

Longer‐term follow‐up: 

SMD 0.12

(‐0.13 to 0.37)

 

 

 

 

Objective working memory measures ‐ intermediate

PASAT, WAIS

Follow‐up: 1 to 6 months

 

 

 

 

 

 

The mean objective working memory measures ‐
intermediate in the intervention groups was
0.16 standard deviations higher
(0.09 lower to 0.40 higher)

 

 

 

 

821

(8 studies)

 

 

 

 

⊕⊕⊕⊝
moderatef

 

 

 

Immediate follow‐up: 

SMD 0.45 (0.18 to 0.72)

 

Longer‐term follow‐up: 

SMD 0.04

(‐0.11 to 0.2)

 

Informating processing ‐ intermediate

SDMT

Follow‐up: 1 to 6 months

 

 

 

 

 

 

 

The mean information processing measures ‐
intermediate in the intervention groups was
0.27 standard deviations higher
(0.00 to 0.54 higher)

 

 

 

 

 

933

(8 studies)

 

 

 

 

⊕⊕⊝⊝
lowg,h

 

 

Immediate follow‐up: 

SMD 0.51

(0.19 to 0.82)

 

Longer‐term follow‐up: 

SMD 0.21

(‐0.03 to 0.45)

 

 

 

 

Quality of life ‐ intermediate
MSIS, MSQOL, SF‐36, SF‐12, SWLS, EQ‐5D‐5La
Follow‐up: 1 to 6 months

 

 

 

The mean quality of life measures ‐ intermediate in the intervention groups was
0.30 standard deviations higher
(0.02 to 0.58 higher)

 

 

 

683

(6 studies)

 

 

⊕⊕⊕⊕
high

 

 

Immediate follow‐up: 

SMD 0.42

(0.15 to 0.68)

 

Longer‐term follow‐up: 

SMD 0.17

(0.02 to 0.32)

 

 

Acitivities of daily living ‐ intermediate
EADLa
Follow‐up: 1 to 6 months

 

 

 

The mean activities of daily living measures ‐ intermediate in the intervention groups was
0.06 standard deviations lower
(0.36 lower to 0.24 higher)

 

 

 

400

(4 studies)

 

 

 

⊕⊕⊕⊕
high

 

 

 

Immediate follow‐up: 

SMD 0.02 (‐0.26 to 0.29)

 

Longer‐term follow‐up: 

SMD ‐0.11 (‐0.49 to 0.27)

 

 

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: confidence interval; SMD: standardised mean difference

GRADE Working Group grades of evidence
High quality ⊕⊕⊕⊕: further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality ⊕⊕⊕⊝: further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality ⊕⊕⊝⊝: further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality ⊕⊝⊝⊝: we are very uncertain about the estimate.

Please note: As per Cochrane guidelines, we only report seven outcomes here. Details of our other outcomes can be found in Table 1

aCMT: Contextual Memory Text; EAQ: Emotional awareness questionnaire;EMQ: Everyday Memory Questionnaire;HADS: Hospital Anxiety and Depression Scale; STAI: State Trait Anxiety Inventory; MSNQ: Multiple Sclerosis Neuropsychological Screening Questionnaire;MFQ: Memory Functioning Questionnaire;RBMT: Rivermead Behavioural Memory Test;CVLT: California Verbal Learning Test; AVLT: Auditory Verbal Learning Test; HVLT: Hopkins Verbal Learning Test; VLT: Verbal Learning Test; LNNB: Luria‐Nebraska Neuropsychological Battery; BRBNT: Brief Repeatable Battery of Neuropsychological Tests; GHQ: General Health Questionnaire; BDI: Beck Depression Inventory; BDI‐FS: Beck Depression Inventory‐Fast Screen; EADL: Extended Activities of Daily Living; MSIS: Multiple Sclerosis Impact Scale; FAMS: Functional Assessment of Multiple Sclerosis;MSQOL: Multiple Sclerosis Quality of Life; PASAT: Paced auditory serial addition test; SF‐36: 36‐Item Short Form Health Survey; SF‐12: 12‐Item Short Form Health Survey.
b 1 of 10 studies had possible risk of bias related to random sequence generation, and in 2 of the 10 it was unclear. Allocation concealment was possible in 1 study, and unclear in 3 of the 10 studies. Blinding was a potential source of bias in 2 studies, and unclear in 1 of the 10 studies. Incomplete outcome data may have been biased in 1 study, an unclear in 3 of the 10 studies. Selective reporting may have been biased in 1 study.

Downgraded by 1 due to 95% confidence intervals including no effect, and the upper or lower confidence intervals limit crosses an effect size of 0.5 in either direction.

bAll or nearly all of the studies used a list‐learning task as an objective measure of verbal memory, which has poor ecological validity.
c2 of the 6 studies showed unclear risk of bias relating to random sequence generation. 1 study had unclear potential risk of allocation concealment bias. 4 studies had potential risk of bias related to blinding. 3 studies had unclear risk of bias due to incomplete outcome data. 1 study had unclear risk of other bias.

e2 of 6 studies showed unclear potential risk of bias related to random sequence generation. 1 study showed unclear potential risk of bias related to allocation concealment. 4 of 6 studies showed potential risk of bias related to blinding. 3 of 6 studies showed unclear risk of bias related to incomplete outcome data.

f5 of 12 studies showed unclear potential risk of bias related to random sequence generation. 6 of 12 studies showed unclear risk of bias related to allocation concealment. 7 of 12 studies showed possible risk of bias related to blinding procedures. 1 study showed potential risk of bias related to incomplete data, and 3 of 12 studies were unclear risk of bias. 1 study had potential risk of bias related to selective reporting.

g3 of 8 studies showed unclear risk of bias related to random sequence generation. 1 study showed potential risk of bias related to allocation concealment, 2 of 8 studies showed unclear risk of bias. 4 of 8 studies showed potential risk of bias related to blinding procedures, 1 study showed unclear risk of bias. 3 of 8 studies showed unclear risk of bias related to incomplete data.

hInconsistency with results, statistical heterogeneity>50%

Open in table viewer
1. Summary of findings continued

 

Illustrative comparative risks (95% CI)

Relative effect (95% CI)

No of participants (studies)

Quality of evidence (GRADE)

Comments

Assumed Risk

Corresponding Risk

Outcomes

Control

Memory rehabiliation

Subjective memory ‐ immediate

EMQ, MSNQ, CFQ, MFQa
Follow‐up: within one month

The mean subjective memory measures ‐ immediate in the intervention groups was
0.32 standard deviations higher
(0.05 to 0.58 higher)

568
(10 studies)

⊕⊕⊕⊝
moderateb

SMD 0.32
(0.05 to 0.58)

Subjective memory ‐ longer‐term

EMQ, MSNQ, CFQ, MFQa
Follow‐up: 6 months+

The mean subjective memory measures ‐ long term in the intervention groups was
0.16 standard deviations higher
(0.02 to 0.30 higher)

775
(5 studies)

⊕⊕⊕⊕
high

SMD 0.16
(0.02 to 0.30)

Verbal memory ‐ immediate

CVLT, AVLT, HVLT, VLT, SRT, MUSICa
Follow‐up: within one month

The mean objective verbal memory measures ‐
immediate in the intervention groups was
0.4 standard deviations higher
(0.22 to 0.58 higher)

922
(19 studies)

⊕⊕⊝⊝
lowc,d

SMD 0.40

(0.22 to 0.58)

Verbal memory ‐ longer‐term 

CVLT, AVLT, HVLT, VLT, SRT, MUSICa
Follow‐up: 6 months+

The mean objective verbal memory measures ‐ longer‐term in the intervention groups was
0.13 standard deviations higher
(0.03 lower to 0.29 higher)

619
(4 studies)

⊕⊕⊕⊝
moderated

SMD 0.13 (‐0.03 to 0.29)

Visual memory ‐ immediate

BVMT‐R, SPART, CMT, ROCF

Follow‐up: within one month

The mean objective visual memory measures ‐
immediate in the intervention groups was
0.42 standard deviations higher
(0.25 to 0.6 higher)

799

(16 studies)

⊕⊕⊕⊝
moderatef

SMD 0.42

(0.25 to 0.60)

Visual memory ‐ longer‐term

BVMT‐R, SPART, CMT, ROCF

Follow‐up: 6 months+

The mean objective visual memory measures ‐
longer‐term in the intervention groups was
0.12 standard deviations higher
(0.13 lower to 0.37 higher)

619

(4 studies)

⊕⊕⊕⊕
high

SMD 0.12

(‐0.13 to 0.37)

Working memory ‐ immediate

PASAT, WAIS

Follow‐up: within one month

The mean objective working memory measures ‐
immediate in the intervention groups was
0.45 standard deviations higher
(0.18 to 0.72 higher)

655

(12 studies)

⊕⊕⊝⊝
lowh,p

SMD 0.45

(0.18 to 0.72)

Working memory ‐ longer‐term

PASAT, WAIS

Follow‐up: 6 months+

The mean objective working memory measures ‐
longer‐term in the intervention groups was
0.04 standard deviations higher
(0.11 lower to 0.2 higher)

665

(5 studies)

⊕⊕⊕⊕
high

SMD 0.04

(‐0.11 to 0.2)

Information processing ‐ immediate

SDMT

Follow‐up: within one month

The mean information processing measures ‐
immediate in the intervention groups was
0.51 standard deviations higher
(0.19 to 0.82 higher)

808

(15 studies)

⊕⊕⊝⊝
lowj,p

SMD 0.51

(0.19 to 0.82)

Information processing ‐ longer‐term

SDMT

Follow‐up: 6 months+

The mean information processing measures ‐
longer‐term in the intervention groups was
0.21 standard deviations higher
(0.03 lower to 0.45 higher)

723

(5 studies)

⊕⊕⊕⊝
moderatel

SMD 0.21

(‐0.03 to 0.45)

Depression (mood) ‐ immediate

GHQ, BDI, BDI‐FS, Chicago Multiscale Depression Inventory, HADS, EAQ, CES‐D, MADRSa
Follow‐up: within one month

The mean depression measures (mood) ‐ immediate in the intervention groups was
0.34 standard deviations higher
(0.15 to 0.53 higher)

853

(16 studies)

⊕⊕⊕⊝
moderatem

SMD 0.34

(0.15 to 0.53)

Depression (mood) ‐ intermediate

GHQ, BDI, BDI‐FS, Chicago Multiscale Depression Inventory, HADS, EAQ, CES‐D, MADRS

Follow‐up: 1 to 6 months

The mean depression measures (mood) ‐ intermediate in the intervention groups was
0.20 standard deviations higher
(0.06 lower to 0.45 higher)

1003 (10 studies)

⊕⊕⊕⊝
moderatem

SMD 0.20

(‐0.06 to 0.45)

Depression (mood) ‐ longer‐term

GHQ, BDI, BDI‐FS, Chicago Multiscale Depression Inventory, HADS, EAQ, CES‐D, MADRSa
Follow‐up: 1 to 6 months

The mean depression measures (mood) ‐ longer‐term in the intervention groups was
0.15 standard deviations higher
(0.04 lower to 0.34 higher)

891
(7 studies)

⊕⊕⊕⊕
high

SMD 0.15 (‐0.04 to 0.34)

Anxiety (mood) ‐ immediate

GHQ, EAQ, STAI, HADS

Follow‐up: within one month

The mean anxiety measures (mood) ‐ immediate in the intervention groups was
0.29 standard deviations higher
(0.01 lower to 0.59 higher)

178

(4 studies)

⊕⊕⊕⊕
high

SMD 0.29

(‐0.01 to 0.59)

Anxiety (mood) ‐ intermediate

GHQ, EAQ, STAI, HADS

Follow‐up: 1 to 6 months

The mean anxiety measures (mood) ‐ intermediate in the intervention groups was
0.16 standard deviations higher
(0.15 lower to 0.46 higher)

502 (4 studies)

⊕⊕⊕⊕
high

SMD 0.16

(‐0.15 to 0.46)

Anxiety (mood) ‐ longer‐term

GHQ, EAQ, STAI, HADS

Follow‐up: 6 months+

The mean anxiety measures (mood) ‐ longer‐term in the intervention groups was
0.27 standard deviations higher
(0.12 lower to 0.65 higher)

502

(4 studies)

⊕⊕⊕⊕
high

SMD 0.27

(‐0.12 to 0.65)

Quality of life ‐ immediate

MSIS, MSQOL, SF‐36, SF‐12, SWLS, EQ‐5D‐5La
Follow‐up: within one month

The mean quality of life measures ‐ immediate in the intervention groups was
0.42 standard deviations higher
(0.15 to 0.68 higher)

371

(8 studies)

⊕⊕⊕⊕
high

SMD 0.42

(0.15 to 0.68)

Quality of life ‐ longer‐term

MSIS, MSQOL, SF‐36, SF‐12, SWLS, EQ‐5D‐5La
Follow‐up: 6 months+

The mean quality of life measures ‐ longer‐term in the intervention groups was
0.17 standard deviations higher
(0.02 to 0.32 higher)

687

(5 studies)

⊕⊕⊕⊝
moderateo

SMD 0.17

(0.02 to 0.32)

Activities of daily living ‐ immediate

EADLa
Follow‐up: within one month

The mean activities of daily living measures ‐ immediate in the intervention groups was
0.02 standard deviations higher
(0.26 lower to 0.29 higher)

265
(4 studies)

⊕⊕⊕⊕
high

SMD 0.02 (‐0.26 to 0.29)

Activities of daily living ‐ longer‐term

EADLa
Follow‐up: 6 months+

The mean activities of daily living measures ‐ longer‐term in the intervention groups was
0.11 standard deviations lower
(0.49 lower to 0.27 higher)

369
(3 studies)

⊕⊕⊕⊕
high

SMD ‐0.11 (‐0.49 to 0.27)

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: confidence interval; SMD: standardised mean difference

GRADE Working Group grades of evidence
High quality ⊕⊕⊕⊕: further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality ⊕⊕⊕⊝: further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality ⊕⊕⊝⊝: further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality ⊕⊝⊝⊝: we are very uncertain about the estimate.

aCMT: Contextual Memory Text; EAQ: Emotional awareness questionnaire;EMQ: Everyday Memory Questionnaire;HADS: Hospital Anxiety and Depression Scale; STAI: State Trait Anxiety Inventory; MSNQ: Multiple Sclerosis Neuropsychological Screening Questionnaire;MFQ: Memory Functioning Questionnaire;RBMT: Rivermead Behavioural Memory Test;CVLT: California Verbal Learning Test; AVLT: Auditory Verbal Learning Test; HVLT: Hopkins Verbal Learning Test; VLT: Verbal Learning Test; LNNB: Luria‐Nebraska Neuropsychological Battery; BRBNT: Brief Repeatable Battery of Neuropsychological Tests; GHQ: General Health Questionnaire; BDI: Beck Depression Inventory; BDI‐FS: Beck Depression Inventory‐Fast Screen; EADL: Extended Activities of Daily Living; MSIS: Multiple Sclerosis Impact Scale; FAMS: Functional Assessment of Multiple Sclerosis;MSQOL: Multiple Sclerosis Quality of Life; PASAT: Paced auditory serial addition test; SF‐36: 36‐Item Short Form Health Survey; SF‐12: 12‐Item Short Form Health Survey.
b 1 of 10 studies had possible risk of bias related to random sequence generation, and in 2 of the 10 it was unclear. Allocation concealment was possible in 1 study, and unclear in 3 of the 10 studies. Blinding was a potential source of bias in 2 studies, and unclear in 1 of the 10 studies. Incomplete outcome data may have been biased in 1 study, an unclear in 3 of the 10 studies. Selective reporting may have been biased in 1 study.

Downgraded by 1 due to 95% confidence intervals including no effect, and the upper or lower confidence intervals limit crosses an effect size of 0.5 in either direction.
 

c1 study had possible risk of bias related to random sequence generation, and in 5 of 19 studies this was unclear. Allocation concealment was potentially biased in 1 study, and unclear in 6 of 19 studies. Blinding was a potential source of bias in 7 studies. Incomplete outcome data may have biased 2 of 19 studies and was unclear in 6 of 19 studies. Selective reporting may have been bias in 1 study. May have been other sources in of bias in 1 study and unclear in 1 study.
dAll or nearly all of the studies used a list‐learning task as an objective measure of verbal memory, which has poor ecological validity.
f5 of 16 studies showed unclear potential risk of bias related to random sequence generation. 6 of 16 studies showed unclear potential risk of bias related to allocation concealment. 7 of 16 studies showed potential risk of bias related to blinding. 1 study showed potential risk of bias related to incomplete outcome data, 4 of the 16 studies showed unclear risk of bias. May have other source of bias in 1 study.

h5 of 12 studies showed unclear potential risk of bias related to random sequence generation. 6 of 12 studies showed unclear risk of bias related to allocation concealment. 7 of 12 studies showed possible risk of bias related to blinding procedures. 1 study showed potential risk of bias related to incomplete data, and 3 of 12 studies were unclear risk of bias. 1 study had potential risk of bias related to selective reporting.

j5 of 15 studies showed unclear risk of bias related to random sequence generation. 6 of 15 studies showed unclear risk of bias related to allocation concealment. 8 of 15 studies showed potential risk of bias related to blinding procedures. 2 of 15 studIes showed potential risk of bias related to blinding procedures, and 3 of 15 were unclear risk of bias. 1 study showed potential risk of bias related to incomplete data. 1 study showed potential risk of bias related to other bias.

l2 of 5 studies showed unclear risk of bias related to random sequence generation. 1 study showed potential risk of bias related to allocation concealment, 1 study showed unclear risk of bias. 3 of 5 studies showed potential risk of bias related to blinding procedures, 1 study showed unclear risk of bias. 1 study showed potential risk of bias related to incomplete data, 1 study showed unclear risk of bias. 1 study showed potential risk of bias related to selective reporting.

m2 of 16 studies showed potential risk of bias related to random sequence generation, 3 of 16 studies showed unclear risk. 1 study showed potential risk of bias relating to allocation concealment, 6 of 16 studies showed unclear risk of bias. 5 of 16 studies showed potential risk of bias relating to blinding procedures. 3 of 16 studies showed potential risk of bias relating to incomplete data, 3 of 13 studies showed unclear risk of bias. 1 study showed potential risk of bias relating to selective reporting. 1 study showed potential risk of bias relating to other bias.

o 1 study showed unclear risk of bias related to random sequence generation,  blinding procedures and incomplete outcome data, as well as high risk of bias relating to allocation concealment. 1 study showed high risk of bias relating to blinding procedures, incomplete data and selective reporting.

pInconsistency with results, statistical heterogeneity>50%

Background

Description of the condition

Multiple sclerosis (MS) is an inflammatory disease of the central nervous system that can cause physical and cognitive disturbances. The prevalence of these cognitive problems, which include dysfunctions in memory, attention, speed of information processing, and executive functions, varies up to 70% (Julian 2011). Rao 1993 reported that impaired memory functions were evident in 40% to 60% of people with MS. Impairments in cognitive functions are also related to low mood (Chiaravalloti 2008; Gilchrist 1994), and have the potential to hamper functions related to activities of daily living (ADL) (Kalmar 2008; Langdon 1996).

Description of the intervention

Cognitive rehabilitation is a specialised facet of neuropsychological rehabilitation that assists in the development of functional independence and adjustment of individuals with cognitive problems through targeted intervention or focused stimulation (Robertson 1993). Robertson 2008 defined cognitive rehabilitation as a "structured, planned experience derived from an understanding of brain function which ameliorates dysfunctional cognitive and brain processes caused by disease or injury and improves everyday life function" p565. Memory rehabilitation is a major component of the management of people with memory problems and is either implemented as part of a cognitive rehabilitation programme or as a stand‐alone intervention, depending on the needs and neuropsychological profile of the patient, or clinical services available.

How the intervention might work

There is uncertainty about the precise mechanisms by which memory rehabilitation interventions work. However, it is widely believed that they provide people with the knowledge of, and information about their memory problems, by teaching them to use internal and external memory aids, different strategies to pay attention, and alternative ways of encoding, storing, and retrieving information. Targeted, repeated stimulation of certain brain areas using drill and practice cognitive exercises are thought to trigger the activation of neural networks. For group‐based interventions, the therapeutic effects of being with others with similar problems may also help (Carr 2014; das Nair 2013; Klein 2019). Some of these behavioural strategies (referred to as 'restitution' or 'compensation') are believed to map onto the neural networks engaged in performing memory functions.

Why it is important to do this review

Studies have examined the effectiveness of memory rehabilitation using different methods. Single‐case and small‐group studies have reported positive results of memory rehabilitation, but the results obtained from randomised controlled trials (RCTs) and some systematic reviews have been less positive and reported inconclusive evidence. Some reviews (for example Cicerone 2005; Cicerone 2011; Cicerone 2019) have concluded that there is compelling evidence for memory strategy training with participants with mild memory problems, that the use of external memory aids may be beneficial for people with moderate to severe memory problems, and that errorless learning may be effective for those with severe memory impairments (albeit with limited generalisability to new tasks or overall memory problems). Cicerone 2019 also suggests that group‐based interventions may be considered as part of a comprehensive neuropsychological rehabilitation of memory deficits. However, these reviews focused mainly on people with traumatic brain injury. Cochrane Reviews by Majid 2000 and das Nair 2016a found insufficient evidence to support or refute the effectiveness of memory rehabilitation following stroke. Some reviews have focused on generic psychological interventions for people with MS (Thomas 2006), or neuropsychological interventions for people with MS (Rosti‐Otajärvi 2014), however these were not specific to memory rehabilitation. The Thomas 2006 review did not consider grey literature and was unable to draw any "definite conclusions". The Rosti‐Otajärvi 2014 review focused on neuropsychological rehabilitation across multiple cognitive domains, as well as associated health‐related factors and emotional well‐being. The Goverover 2018b review was similar in that it focused on cognitive rehabilitation in six cognitive domains: attention, learning and memory, processing speed and working memory, executive functioning, metacognition, or nonspecific/combined. The current systematic review is focused solely on the effectiveness of memory rehabilitation for people with MS; databases were searched that were not searched as part of the Rosti‐Otajärvi 2014 or Goverover 2018b reviews, and studies are included that were not in these reviews. This is an update of the Cochrane Review ‘Memory rehabilitation for people with multiple sclerosis’ (first published in the Cochrane Library 2012, Issue 3; das Nair 2012).

Objectives

To determine whether people with multiple sclerosis (MS) who received memory rehabilitation compared to those who received no treatment, or an active control, showed better immediate, intermediate, or longer‐term outcomes in their:

  1. memory functions,

  2. other cognitive abilities, and

  3. functional abilities, in terms of activities of daily living, mood, and quality of life.

Methods

Criteria for considering studies for this review

Types of studies

For inclusion in the review, we sought randomised and quasi‐randomised controlled trials, as defined by the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2019), and the pre‐cross‐over component of randomised cross‐over trials with people with MS, in which a memory treatment was compared to a control. Where papers were based on the same sample, or subset of a larger sample, we included only the study with the full sample to avoid double counting. If a study was available through both grey literature (for example conference abstract) and a peer‐reviewed publication, we used the peer‐reviewed publication.

Types of participants

Trials included in this review were limited to those with people with MS (including relapsing remitting, secondary progressive, and primary progressive MS). We excluded trials with participants whose memory deficits were the result of traumatic brain injury, brain tumour, stroke, epilepsy, or any other neurological condition, unless we could define a subgroup of people with MS of at least 75% for which there were separate data. Included studies based a diagnosis of MS on well‐established diagnostic criteria, for example Paty 1988 and Poser 1983 (and revised versions of the McDonald criteria (Polman 2005Polman 2011Thompson 2017). We did not define the type of memory deficits participants needed to have in advance, because we assumed that those people with MS who were given treatment for impaired memory had memory deficits. We placed no restrictions on the type of memory deficits participants reported .

Types of interventions

We included trials in which there was a comparison between a treatment group that received memory rehabilitation strategies, and a control group that received either a comparable standard of treatment (active control) or no memory intervention. We considered rehabilitation to take place over more than a single session; therefore, we did not consider laboratory‐based experiments (such as single‐session list‐recall or mnemonic strategy training) as rehabilitation. Control groups needed to have people with MS, or a subgroup of people with MS amongst those with other diagnoses, for whom separate data were available. We considered memory rehabilitation to be any attempt to modify memory function by means of drill‐and‐practice, or by the use of internal and/or external memory aids, or by teaching people with MS strategies to cope with their memory problems. We did not include pharmacological studies.

Types of outcome measures

We included trials in which the intervention group either received memory rehabilitation or comprehensive cognitive rehabilitation with a memory component. We considered all trials that met the listed inclusion criteria and did not discriminate based on the type of memory outcomes or other cognitive outcomes they used. We considered memory outcomes to be any questionnaire or test that measures general memory or a specific domain such as verbal memory. The nine outcomes listed below were decided before the analysis was conducted to avoid bias.

Primary outcomes

Primary outcomes were measures of the extent of memory problems in everyday life. There are several ways in which this is assessed in clinical practice and research, but we only included measures that directly assessed this construct. Where multiple tests were used to assess the same construct, we followed a hierarchy that we developed prior to data analysis. We included the following commonly used tests.

  1. For subjective reports of memory: we considered Everyday Memory Questionnaire (EMQ) (Sunderland 1983), over the Cognitive Failures Questionnaire (Broadbent 1982), over the Subjective Memory Questionnaire (Davis 1995), over the Memory Assessment Clinics Questionnaire (Crook 1992). If more than one questionnaire was used, we used the following hierarchy: memory problems in daily life, over general forgetting, over domain‐specific questions. If a questionnaire was used that was not in this hierarchy, we arrived at a consensus through discussion prior to data extraction to avoid bias.

  2. For objective verbal measure of memory: we considered California Verbal Learning Test (CVLT‐II) (Delis 2000) over Selective Reminding Test (SRT) (Buschke 1973), over Doors and People Test (Baddeley 1994). For neuropsychological test batteries, we used verbal domain‐specific scores over composite scores.

  3. For objective visual measures of memory: we considered Brief Visuospatial Memory Test – Revised (BVMT‐R) (Benedict 1996), 10/36 Spatial Recall Test (SPART) (Rao 1990), Contextual Memory Text (CMT) (Toglia 2004), Rey‐Osterrieth Complex Figure text (ROCF) (Rey 1941 and Osterrieth 1944). For neuropsychological test batteries, we used visual domain‐specific scores over composite battery scores.

  4. For objective working measures of memory: we considered Rivermead Behavioural Memory Test (RBMT) (Wilson 1985 or newer versions of this test), over Wechsler Memory Scale (WMS) (Wechsler 1997 or newer versions of this test), over Cambridge Test of Prospective Memory (Wilson 2005), over Doors and People Test (Baddeley 1994).

  5. For information processing measures: we considered the Symbol Digit Modalities Test (SDMT) (Smith 1973) over other measures, because we were aware that this is one of the most frequently used tests of information processing in MS research (Benedict 2017).

Where studies included more than one test for each outcome group, we used a hierarchy based on the tests' degree of sensitivity to assess everyday memory problems and the tests’ ecological validity. If we were unsure about which outcome measure to consider in the analysis, we arrived at a consensus following a discussion with review authors which measure to consider as the primary outcome, before the statistical analyses were conducted to minimise bias.

Secondary outcomes

  1. Mood ‐ depression, such as the General Health Questionnaire (GHQ) (Goldberg 1988), Hospital Anxiety and Depression Scale (HADS) (Zigmond 1983); Beck Depression Inventory‐Fast Screen (Beck 2003). General mood outcomes such as the GHQ were included in both the depression and anxiety scales of the mood outcome.

  2. Mood ‐ anxiety, such as the General Health Questionnaire (GHQ) (Goldberg 1988), Hospital Anxiety and Depression Scale (HADS) (Zigmond 1983), State Trait Anxiety Inventory (STAI) (Spielberger 1983), HADS.

  3. Functional abilities, such as the Functional Independence Measure (FIM) (Uniform Data System for Medical Rehab 1993), Functional Assessment Measure (FAM) (Hal 1997), Nottingham Extended Activities of Daily Living (EADL) (Nouri 1987).

  4. Quality of life, such as the Multiple Sclerosis Impact Scale (MSIS) (Hobart 2001) World Health Organization Quality of Life assessment (WHO‐QoL) (The WHOQOL Group 1993), 36‐item Short Form Health Survey (SF‐36) (Ware 2001).

We also considered non‐standardised measures, such as return to work and goal attainment, if studies had included these as a measure of outcome. If more than one of these scales was reported for each domain, we used the first scale in the list.

We categorised all outcomes into three separate time‐points: “immediate”, “intermediate”, and “longer‐term” and conducted separate analyses for each of these. We defined immediate as assessments conducted within the first month after completing the intervention, intermediate as assessments conducted between one to six months later, and longer‐term as any assessments conducted more than six months later.

Search methods for identification of studies

We conducted an electronic search with no restriction, and two review authors (LT, RdN) identified all potential studies.

Electronic searches

The Information Specialist used in the previous update was not available, so all studies were searched by the review authors (LT, RdN). We searched the Cochrane Central Register of Controlled Trials (CENTRAL) (2 June 2015 to 6 September 2020) which contains records from the following databases.

  • MEDLINE (PubMed).

  • Embase (Embase.com).

  • Cumulative Index to Nursing and Allied Health Literature (CINAHL) (EBSCO host).

  • ClinicalTrials.gov.

  • World Health Organization (WHO) International Clinical Trials Registry Portal (http://apps.who.int/trialsearch/).

The keywords used to search for studies for this review are listed in Appendix 1.

We also searched the following databases.

  • The NIHR Clinical Research Network database (2 June 2015 to 6 September 2020).

  • PsycINFO (2 June 2015 to 6 September 2020).

  • Allied and Complementary Medicine Database (AMED) (2 June 2015 to 6 September 2020).

  • Latin American and Caribbean Health Science Information Database (LILACS) (Bireme) (2 June 2015 to 6 September 2020).

  • CAB Abstracts (2 June 2015 to 6 September 2020).

Searching other resources

We citation tracked all primary study articles and scanned reference lists from book chapters and review articles. We also examined studies identified by the Rosti‐Otajärvi 2014 and Thomas 2006 MS reviews for inclusion. We only handsearched the reference lists of identified studies, not the full scientific journals, as until the early 1990s cognitive impairments were not universally recognised as a common complaint in MS (Rao 1991), and most RCTs have been reported (or updated) on electronic databases or journals. Furthermore, we would have found relevant trials from the search of the CENTRAL database, for which handsearching is carried out periodically, and we did not wish to duplicate this effort. Where necessary, we contacted authors of relevant trials to enquire whether their registered trials had been published, and to solicit more data where data required for the meta‐analysis were not presented in the published paper in a format that could be used.

We accessed grey literature by searching (http://www.greynet.org/) and the British Library’s EThOS database (http://ethos.bl.uk/Home.do). Grey literature is "a field in Library and Information Science that deals with the production, distribution, and access to multiple document types produced on all levels of government, academics, business, and organisation in electronic and print formats not controlled by commercial publishing i.e. where publishing is not the primary activity of the producing body" (GreyNet 2011).

Data collection and analysis

Selection of studies

One review author, (RdN), developed the search strategy in consultation with a senior librarian and the Cochrane Multiple Sclerosis and Rare Diseases of the CNS Group. Another review author (LT) evaluated abstracts of the studies obtained by this search strategy and identified trials for inclusion in the review using four inclusion criteria (types of trials, participants, interventions, and outcome measures). Five review authors (RdN, NE, DW, JMM, LS) cross‐checked the search strategy, independently appraised the protocol, and confirmed the inclusion and exclusion of studies.

We eliminated articles based on the following exclusion criteria hierarchy,

  • not MS, or a mixed‐aetiology group without at least 75% of the sample being people with MS,

  • not an RCT or quasi‐RCT,

  • not an adult population;

  • not a memory rehabilitation study, or did not have a separate memory component if within the context of a larger "cognitive rehabilitation" (or "cognitive retraining" or "neuropsychological rehabilitation") study, or

  • not a rehabilitation intervention study (not more than one session).

Data extraction and management

LT and another review author (NE, DW, JMM, or LS) independently assessed the methodological quality of each of the selected trials and rated them according to the guidelines of The Cochrane Collaboration. In case of disagreement, a third review author (RdN) arbitrated, and a verdict was reached. Our main considerations were whether participant allocation had been random and adequately concealed, and whether outcomes were performed blind to group allocation. We conducted the review using the Cochrane Review Manager software version 5.4.1 (RevMan 2020). The data extraction tool employed by the das Nair and Lincoln Cochrane review (das Nair 2016b) was used in this study and is therefore not replicated here.

Assessment of risk of bias in included studies

Review author LT and another review author (NE, DW, JMM, or LS) independently graded the included trials and completed the risk of bias table as described in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2019).

The table includes the following domains.

  • Random sequence generation

  • Allocation concealment

  • Blinding (of participants, personnel, and outcome assessors)

  • Incomplete outcome data

  • Selective outcome reporting

  • Other sources of bias

On the basis of the information provided in the studies or by the authors of the primary studies, five review authors (LT, NE, DW, JMM, LS) independently judged each of these domains as being low or high risk of bias, or unclear if information was insufficient. Any disagreements were arbitrated by another review author (RdN). As review authors working in the field of memory rehabilitation, we are familiar with the studies published in this area, and thus we could not be blinded to the names of the authors, institutions, or the publishing journal of the included trials. We made an evaluation of the overall risk of bias, based on the relative importance of the various domains listed. In addition to the risk of bias table, we used the GRADE approach to assessing quality of studies (GRADE Working Group 2004). This was completed across outcomes and is found in the summary of findings table. This approach allows for judgements to be made about the quality of the studies included in each outcome.

Measures of treatment effect

We planned to use odds ratio (OR) with 95% confidence intervals (CIs) for binary outcomes if reported. We used standardised mean difference (SMD) with 95% CIs for the continuous outcomes.

Unit of analysis issues

We included parallel‐group, cluster‐randomised, cross‐over RCTs, and quasi‐RCTs, and included the data from all these types of studies for the meta‐analysis. For cross‐over studies (as mentioned under Types of studies section), we only included the pre‐cross‐over phase of these trials. We did not combine the first and second phases of the cross‐over studies because of uncertainty about the carry‐over effects in such trials, given that they are psychological interventions, where the washout period is difficult to determine.

We included trials with more than two intervention groups and analysed them by pooling together the data on all the treatment groups (if appropriate) and compared them with the control group. If there was more than one control group, the results from the control groups were pooled together and compared with treatment.

We conducted separate analyses for the various outcomes and for the three different time‐points (i.e. immediate, intermediate, and longer‐term).

Dealing with missing data

Where data were not available from or unclear in the reports, we contacted the corresponding author of the studies in question for further information. We assessed the rates of attrition and missing data from the included studies (where available) and explored how these may have affected the results of the studies. If following several attempts to contact the study author we had not received a response, the missing data were not included in the analysis. Furthermore, if standard deviations (SDs) were not available from the papers, these values were inputted using methods specified in section 16.1.3.1 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2019).

Assessment of heterogeneity

We considered heterogeneity by comparing the distribution of important participant factors between trials (age, gender, type of MS), and trial factors (sequence generation, allocation concealment, blinding, losses to follow‐up). We employed the I² statistic to statistically assess heterogeneity (Higgins 2019; Huedo‐Medina 2006). We further scrutinised the studies to explore the reasons for the heterogeneity if the I² statistic was significant at >= 50%.

Assessment of reporting biases

We considered reporting bias by conducting an exhaustive search of the literature that included but was not limited to the CENTRAL database, Embase, PsycINFO, LILACS, grey literature, reference lists of included studies and relevant reviews. We also considered reporting bias by deciding what outcomes would be assessed and reported before the meta‐analysis was conducted.

Data synthesis

We consulted the Cochrane Handbook for Systematic Reviews of Interventions to plan the data synthesis (Higgins 2019), and followed the procedures outlined therein. As most psychological and neuropsychological outcome measures in memory rehabilitation tend to be ordinal‐level measures, we treated these as continuous data (as recommended by Higgins 2019). the SMD was used as a summary statistic, using a random‐effects model, because we predicted that multiple trials would use various outcome measures to assess memory and because of the heterogeneity of sampling.

If low scores represented a better outcome, the valence of the score was changed from positive to negative. In situations where studies combined scores from scales in which high scores are in some instances good outcomes and in some instances poor outcomes, the signs of the discrepant scores were reversed to keep them consistent. We only considered data that we deemed to be similar or comparable enough to meaningfully pool based on of the outcome measures used for the meta‐analysis.

Subgroup analysis and investigation of heterogeneity

We planned subgroup analyses where at least two trials had separate data available for people with different subtypes of MS. Where significant heterogeneity was observed, we attempted to determine the causes of heterogeneity and explain this in our discussion. We did not plan on conducting subgroup analyses based on heterogeneity.

Sensitivity analysis

We considered sensitivity analyses to assess the impact of study quality (whether there was a difference between studies using an intention‐to‐treat analysis and an on‐treatment analysis) where data needed to perform such analyses were available from the included papers. We also considered a sensitivity analysis to assess the influence of methodological quality on the intervention effect for each outcome by comparing the outcomes of those trials with low risk of bias with the outcomes of all the included studies. Following the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2019), we made only informal comparisons (see Table 2), and did not conduct individual forest plots for each sensitivity analysis, but provided a summary table. Sensitivity analysis was also conducted to assess the impact of inputting the SD values as advised in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2019).

Open in table viewer
Table 2. Sensitivity analysis

Outcome

No. of studies

No. of participants

Effect size

SMD (95% CI)

Heterogeneity (I2)

Test for overall effect

Subjective memory ‐ immediate

2

E = 127

C = 117

0.03 [‐0.24, 0.31]

10%

Z = 0.22 (P = 0.82)

Subjective memory ‐ intermediate

6

E = 396

C = 343

0.25 [0.11, 0.40]

0%

Z = 3.39 (P = 0.0007)

Subjective memory ‐ longer‐term

4

E = 325

C = 294

0.19 [0.03, 0.36]

0%

Z = 2.33 (P = 0.03)

Verbal memory ‐ immediate

5

E = 100

C = 96

0.72 [0.24, 1.19]

59%

Z = 2.96 (P = 0.003)

Verbal memory ‐ intermediate

2

E = 254

C = 209

0.22 [0.03, 0.40]

0%

Z = 2.32 (P = 0.02)

Verbal memory ‐ longer‐term

N/A

Visual memory ‐ immediate

5

E = 100

C = 94

0.27 [‐0.01, 0.56]

0%

Z = 1.86 (P = 0.06)

Visual memory ‐ intermediate

2

E = 251

C = 209

‐0.11 [‐0.29, 0.08]

0%

Z = 1.14 (P = 0.25)

Visual memory ‐ longer‐term

N/A

Working memory ‐ immediate

2

E = 49

C = 42

0.46 [‐0.68, 1.59]

84%

Z = 0.79 (P = 0.43)

Working memory ‐ intermediate

4

E = 284

C = 236

‐0.06 [‐0.28, 0.15]

11%

Z = 0.59 (P = 0.56)

Working memory ‐ longer‐term

2

E = 229

C = 193

‐0.02 [‐0.21, 0.17]

0%

Z = 0.18 (P = 0.86)

Information processing ‐ immediate

4

E = 131

C = 120

0.29 [‐0.04, 0.62]

40%

Z = 1.72 (P = 0.05)

Information processing ‐ intermediate

4

E = 294

C = 245

0.02 [‐0.14, 0.19]

0%

Z = 0.28 (P = 0.78)

Information processing ‐ longer‐term

N/A

Depression ‐ immediate

4

E = 93

C = 87

0.55 [0.03, 1.07]

65%

Z = 2.07 (P = 0.04)

Depression ‐ intermediate

6

E = 392

C = 350

0.29 [‐0.10, 0.67]

79%

Z = 1.45 (P = 0.15)

Depression ‐ longer‐term

4

E = 328

C = 270

0.14 [‐0.20, 0.48]

63%

Z = 0.80 (P = 0.42)

Anxiety ‐ immediate

N/A

Anxiety ‐ intermediate

3

E = 257

C = 214

0.29 [0.11, 0.48]

0%

Z = 3.11 (P = 0.002)

Anxiety ‐ longer‐term

3

E = 255

C = 193

0.27 [‐0.12, 0.65]

43%

Z = 1.37 (P = 0.17)

Quality of life ‐ immediate

4

E = 101

C = 96

0.49 [0.06, 0.91]

54%

Z = 2.25 (P = 0.02)

Quality of life ‐ intermediate

5

E = 340

C = 317

0.31 [‐0.01, 0.62]

64%

Z = 1.90 (P = 0.06)

Quality of life ‐ longer‐term

3

E = 295

C = 259

0.12 [‐0.05, 0.30]

5%

Z = 1.37 (P = 0.17)

Activities of daily living ‐ immediate

N/A

Activities of daily living ‐ intermediate

2

E = 100

C = 86

‐0.13 [‐0.60, 0.33]

37%

Z = 0.56 (P = 0.57)

Activities of daily living ‐ longer‐term

2

E = 100

C = 86

‐0.33 [‐0.63, ‐0.03]

0%

Z = 2.18 (P = 0.03)

E: Experimental; C: Control; SMD: Standardised mean difference.

Summary of findings and assessment of the quality of the evidence

We used the GRADE approach to interpret findings and present them in a summary of findings table, as advised in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2019). We considered seven key outcomes at one specifc time point they were measured to be important in the table, and thus present them in the summary of findings table.

The GRADE approach allows the quality of the evidence to be assessed clearly and without bias using four possible ratings: high, moderate, low and very low (Schünemann 2013). This rating system measures the degree of confidence that the true effect is close to that of the estimate of the effect, with high indicating very confident and very low indicating little confidence in the effect estimate. There are several factors that can lead to the downgrading of evidence such as risk of bias in included studies, inconsistency in results, and imprecision of effect estimates. If an outcome was downgraded, the reasons for this are detailed in the footnotes below the summary of findings table.

Summary of findings and assessment of the certainty of the evidence

We used the GRADE approach to interpret findings and present them in a 'Summary of findings' table, as advised in the Cochrane Handbook (Higgins 2019). We considered all of our outcomes at each time point they were measured to be important in the table, and thus present them in the 'Summary of findings' table.

The GRADE approach allows the quality of the evidence to be assessed clearly and without bias using four possible ratings: high, moderate, low and very low (Schünemann 2013). This rating system measures the degree of confidence that the true effect is close to that of the estimate of the effect, with high indicating very confident and very low indicating little confidence in the effect estimate. There are several factors that can lead to the downgrading of evidence such as risk of bias in included studies, inconsistency in results, and imprecision of effect estimates. If an outcome is downgraded, the reasons for this are detailed in the footnotes below the 'Summary of findings' table.

Results

Description of studies

Results of the search

We identified a total of 29 studies using the above‐mentioned search strategy. Fifteen studies from the previous review were added to the 29 new studies in the final analysis. Please see Figure 1.


Flow diagram showing article screening process

Flow diagram showing article screening process

Included studies

Forty‐four studies, comprising 2714 participants in total, met the inclusion criteria for this review (Campbell 2016Carr 2014Chiaravalloti 2005Chiaravalloti 2013Chiaravalloti 2019aChiaravalloti 2019bChmelařová 2020Arian Darestani 2020das Nair 2012Naeeni Davarani 2020De Luca 2019Ernst 2015Ernst 2018Gich 2015Goodwin 2020Goverover 2018aHancock 2015Hanssen 2015Hildebrandt 2007Huiskamp 2016Impellizzeri 2020Lincoln 2002Lincoln 2020Maggio 2020Mani 2018Mattioli 2016Mendozzi 1998Messinis 2017Messinis 2020Mousavi 2018aMousavi 2018bPedulla 2016Perez‐Martin 2017Pusswald 2014Rahmani 2020Rilo 2018Shahpouri 2019Solari 2004Stuifbergen 2012Stuifbergen 2018Tesar 2005Vilou 2020), and the Charvet 2017De Luca 2019, and Jønsson 1993 studies were included in the review, but excluded from the meta‐analysis because raw data were unattainable.

Twenty‐eight of the included studies were undertaken in Europe (Austria, Denmark, Germany, Italy, Norway, Spain, Greece, Czech Republic, the UK), seven were from Iran, and nine were from the USA. All the European studies were recruited at hospital clinics or rehabilitation centres, with seven of these European studies recruiting from multiple centres (Goodwin 2020Goverover 2018aLincoln 2020Mattioli 2016Messinis 2017Perez‐Martin 2017Solari 2004). The maximum number of recruitment sites used was 10 (Mattioli 2016). Seven of the USA studies recruited participants from both clinic and community settings, with two of these USA studies recruiting from multiple centres (Chiaravalloti 2019aStuifbergen 2018).

There were nine multicentre trials (Goodwin 2020Lincoln 2020Hancock 2015Mattioli 2016Messinis 2017Messinis 2020Perez‐Martin 2017Solari 2004Stuifbergen 2018). In terms of randomisation and stratification by site, Solari 2004Messinis 2017 and Messinis 2020 used a site‐stratified schedule. Lincoln 2020 had a 6:5 randomisation ratio, stratified by site and minimised by type of MS. Stuifbergen 2018 used a closed envelope method but did not specify stratification. Chmelařová 2020Perez‐Martin 2017Rahmani 2020 did not specify their method of randomisation. Chiaravalloti 2019aGoodwin 2020Mattioli 2016 used random number generators, but did not specify stratification, and Stuifbergen 2018 used the closed envelope method. Hancock 2015 used a block‐stratified randomisation procedure to ensure that equal types of each MS subtype were included in the intervention and control groups.

There were 35 single‐centre trials (Arian Darestani 2020Carr 2014Campbell 2016Charvet 2017Chiaravalloti 2005Chiaravalloti 2013Chiaravalloti 2019aChiaravalloti 2019bChmelařová 2020Naeeni Davarani 2020De Luca 2019Ernst 2015Ernst 2018Gich 2015Goverover 2018aHanssen 2015Hildebrandt 2007Huiskamp 2016Impellizzeri 2020Jønsson 1993Lincoln 2002Maggio 2020Mani 2018Mattioli 2016Mendozzi 1998Mousavi 2018aMousavi 2018bPedulla 2016Pusswald 2014Rahmani 2020Rilo 2018Shahpouri 2019Stuifbergen 2012Tesar 2005Vilou 2020). Five studies did not mention the method of generating the random schedule (Arian Darestani 2020Ernst 2015Mendozzi 1998Pedulla 2016Tesar 2005). One study reported that randomisation was quote: “performed by a lottery by the director of the rehabilitation centre” (Hanssen 2015). Four studies used quasi‐randomisation: Chiaravalloti 2005 used odd‐even random allocation, Hildebrandt 2007 and Pusswald 2014 allocated by alternating between intervention and control, and Arian Darestani 2020 quote: “divided [the participants] into control (n = 30) and experimental (n = 30) groups”. Six trials reported independent randomisation (Carr 2014Chiaravalloti 2013das Nair 2012Lincoln 2002Solari 2004Tesar 2005), and Jønsson 1993 and Stuifbergen 2012 used a closed‐envelope method. Mendozzi 1998 randomised the first 30 participants, and purposefully assigned the last 30 to balance age, gender, and education between groups; all data were included in our analysis. Gich 2015 stratified by level of cognitive impairment.

Participants were diagnosed with MS using the Poser criteria (Poser 1983), in seven studies, using the McDonald criteria, (McDonald 2001 in 18 studies, and the Schumacher criteria (Schumacher 1965). in one study (Jønsson 1993). Eighteen studies did not report the criteria used to diagnose MS, but merely stated that participants had clinically‐definite MS. Twenty‐six studies included participants with mixed types of MS (relapsing remitting MS (RRMS) and secondary progressive MS (SPMS) in Campbell 2016das Nair 2012Gich 2015Lincoln 2002Maggio 2020Mendozzi 1998Mousavi 2018bPedulla 2016Tesar 2005; and RRMS, SPMS, and primary progressive MS (PPMS) in Carr 2014Charvet 2017Chiaravalloti 2005Chiaravalloti 2013Chiaravalloti 2019bGoverover 2018aHancock 2015Hanssen 2015Jønsson 1993Impellizzeri 2020Lincoln 2020Perez‐Martin 2017Rilo 2018Shahpouri 2019). Eight studies included participants with RRMS only (Ernst 2015Ernst 2018Hildebrandt 2007Mani 2018Mattioli 2016Messinis 2017Rahmani 2020Vilou 2020). Two studies included participants with RRMS, SPMS, PPMS and progressive‐relapsing MS (Chiaravalloti 2019aHuiskamp 2016) with one study including these participants and participants with benign MS and “unknown” types (Goodwin 2020). The type of MS was not reported in eight studies (Chmelařová 2020Arian Darestani 2020Naeeni Davarani 2020De Luca 2019Mousavi 2018aPusswald 2014Solari 2004Stuifbergen 2012).

The number of participants in the studies ranged from 16, inHuiskamp 2016, to 449, in Lincoln 2020, and the number of participants in treatment or control groups ranged from seven, in das Nair 2012 and Huiskamp 2016, to 245, in Lincoln 2020. Most participants were in their 40s. Varied gender ratios were reported, with the percentage of women ranging from 36.7%, in Impellizzeri 2020, to 100%, in Mani 2018 and Rahmani 2020. The participants had a minimum of elementary education in most studies, with the participants in the Chiaravalloti 2019b having the highest number of years of education (16.07 in intervention, 16.46 in control). De Luca 2019 and Perez‐Martin 2017 had the lowest number of years of education (10.8 in intervention, 11.3 in control and 10.2 in intervention, 11.6 in control, respectively). Six studies did not report education (Chmelařová 2020Mousavi 2018aMousavi 2018bShahpouri 2019Tesar 2005Vilou 2020).

The groups were comparable on assessed baseline characteristics in 32 studies, and in the other studies where differences were observed, they were statistically corrected (Chiaravalloti 2005das Nair 2012Hancock 2015Hildebrandt 2007Huiskamp 2016Jønsson 1993Solari 2004), with the exception of Mendozzi 1998 and Stuifbergen 2012. Two studies appeared to be matched for baseline characteristics, but no statistics were reported (Arian Darestani 2020Rahmani 2020), and one study had unequal groups due to stratification requirements but overall was well‐matched (Charvet 2017).

Thirty‐seven studies used two‐group comparisons (treatment versus control), and six studies used three‐group comparisons (das Nair 2012Ernst 2015Lincoln 2002Mendozzi 1998Mousavi 2018aMousavi 2018b). Lincoln 2002 used assessment versus assessment plus feedback versus assessment plus feedback and treatment; Mendozzi 1998 examined specific cognitive retraining versus non‐specific cognitive retraining versus control; and das Nair 2012 investigated restitution versus compensation versus self‐help control. Rahmani 2020 used computer‐based versus manual‐based versus mixed cognitive rehabilitation versus placebo versus control, five groups in total.

Twenty‐nine studies used individual treatment, including clinic‐based and home‐based interventions (Campbell 2016Charvet 2017Chiaravalloti 2019aChiaravalloti 2019bChmelařová 2020Arian Darestani 2020Naeeni Davarani 2020De Luca 2019Ernst 2015Ernst 2018Gich 2015Goodwin 2020Goverover 2018aHancock 2015Hildebrandt 2007Huiskamp 2016Jønsson 1993Lincoln 2002Maggio 2020Mattioli 2016Mendozzi 1998Messinis 2017Messinis 2020Pedulla 2016Pusswald 2014Rahmani 2020Solari 2004Stuifbergen 2018Vilou 2020), and 13 had group interventions (Carr 2014Chiaravalloti 2005Chiaravalloti 2013das Nair 2012Impellizzeri 2020Lincoln 2020Mani 2018Mousavi 2018aMousavi 2018bRilo 2018Shahpouri 2019Stuifbergen 2012Tesar 2005). One study used a mix of both group and individual sessions (Hanssen 2015), and another used both group sessions and individual computerised sessions (Stuifbergen 2012).

The structure and content of the treatment programmes varied. Most interventions were of four to eight weeks duration (Chiaravalloti 2005Chiaravalloti 2013Chiaravalloti 2019aChiaravalloti 2019bChmelařová 2020Arian Darestani 2020Naeeni Davarani 2020Goodwin 2020Hancock 2015Hanssen 2015Hildebrandt 2007Huiskamp 2016Impellizzeri 2020Jønsson 1993Lincoln 2002Maggio 2020Mani 2018Mendozzi 1998Messinis 2020Mousavi 2018aMousavi 2018bPedulla 2016Pusswald 2014Solari 2004Stuifbergen 2012Stuifbergen 2018Tesar 2005Vilou 2020). Carr 2014das Nair 2012Lincoln 2020Messinis 2017 had 10‐week programmes, Charvet 2017 and Perez‐Martin 2017 had 12‐week programmes, Mattioli 2016 had a 15‐week programme, Rahmani 2020 had a 21‐week programme, and Gich 2015 used a six‐month programme. Four studies did not specify set durations of their treatment but either selected a number of sessions to be completed when the participants were available (Ernst 2015Ernst 2018Shahpouri 2019) or specified a timeframe for the sessions to be completed in (Lincoln 2002).

Sessions ranged from 30 minutes, (Hildebrandt 2007Pedulla 2016, and Pusswald 2014), and two hours (Hanssen 2015Mani 2018, and Shahpouri 2019), and participants met one to six times a week in all studies except Mendozzi 1998, where the treatment was bi‐weekly. The Goodwin 2020 study lasted for two months and the session frequency varied as it was dependent on the types of text message reminders each participant required. Similarly, the Lincoln 2002 study specified a six‐month timeframe in which the sessions had to be completed, but the frequency of sessions depended on individual need. The lowest number of total sessions was six (Ernst 2015Ernst 2018Goverover 2018a) and the highest number of total sessions was 60 (Charvet 2017). Fourteen studies had between eight and 10 sessions (Carr 2014Chiaravalloti 2005Chiaravalloti 2013Chiaravalloti 2019aChiaravalloti 2019bArian Darestani 2020Naeeni Davarani 2020Hanssen 2015Huiskamp 2016Lincoln 2020Mani 2018Mousavi 2018aMousavi 2018bShahpouri 2019). Nine studies had between 12 and 18 sessions (Campbell 2016das Nair 2012Jønsson 1993Mendozzi 1998Perez‐Martin 2017Pusswald 2014Solari 2004Tesar 2005Vilou 2020). Seven studies had between 20 and 30 sessions (De Luca 2019Hildebrandt 2007Maggio 2020Messinis 2017Messinis 2020Rahmani 2020Stuifbergen 2018). Six studies had between 32 and 48 sessions (Chmelařová 2020Gich 2015Hancock 2015Impellizzeri 2020Rilo 2018Stuifbergen 2012). For two studies the frequency of sessions depended on individual need (Goodwin 2020Lincoln 2002).

In three studies, the contents of the treatment programmes were individualised ( Goodwin 2020Hanssen 2015;Lincoln 2002 ), depending on the needs of the participant. Seven studies used comprehensive memory rehabilitation programmes (including teaching participants to use internal and external memory aids) (Carr 2014das Nair 2012Jønsson 1993Lincoln 2002Lincoln 2020Pusswald 2014Tesar 2005). Sixteen studies used computerised memory‐ and attention‐retraining packages (Campbell 2016Charvet 2017Chmelařová 2020Arian Darestani 2020Naeeni Davarani 2020Gich 2015Hancock 2015Hildebrandt 2007Mendozzi 1998Messinis 2020Pedulla 2016Pusswald 2014Solari 2004Stuifbergen 2012Stuifbergen 2018Vilou 2020), and Chiaravalloti 2005Chiaravalloti 2013, and Chiaravalloti 2019b used the Story Memory Technique, which involved the use of imagery and story generation. De Luca 2019 used both computerised and paper‐and‐pencil training strategies, but did not explain the specifics of what this entailed. Goodwin 2020 used mobile phones to deliver reminders throughout the day, and the number of messages delivered varied depending on each person’s needs.

Studies that had a sham or attention control group reported having ensured that these groups had minimal memory content, thereby reducing contamination (Chiaravalloti 2005Chiaravalloti 2013das Nair 2012Ernst 2018Hancock 2015Jønsson 1993Mousavi 2018aMousavi 2018bRahmani 2020Solari 2004).

Lincoln 2020 assessed fidelity of intervention using three methods: firstly, a the cognitive rehabilitation followed a manual that was developed and tested in a pilot study (Carr 2014), secondly; the training was delivered by psychology graduates with clinical experience and they received training from a clinical psychologist as well as monthly teleconferences to discuss specific challenges; and thirdly, the intervention sessions were recorded and coded by an independent researcher using the time‐sampling procedure and found that the intervention was delivered as intended. However, only three other studies assessed fidelity of intervention.

The 44 included studies used a range of outcome measures. All studies included at least one measure of learning or memory, with the exception of Hanssen 2015, where outcomes were related to psychological functioning and impact of disease.

Seventeen studies used subjective measures of memory. Six studies (Carr 2014das Nair 2012Goodwin 2020Lincoln 2002Lincoln 2020Shahpouri 2019) used the Everyday Memory Questionnaire (EMQ) (Sunderland 1983), and das Nair 2012 used the Internal and External Memory Aids Questionnaires based on the Memory Aids Questionnaire (Wilson 1984). Four studies (Chiaravalloti 2005Chiaravalloti 2019aGoverover 2018aMani 2018) used the Memory Failures Questionnaire (MFQ) (Gilewski 1990); and three studies (Mousavi 2018bPerez‐Martin 2017Stuifbergen 2012) used the Multiple Sclerosis Neuropsychological Questionnaire (MSNQ) (Benedict 2004); one study (Chiaravalloti 2019a) used the Awareness Questionnaire (AQ) (Sherer 2004); one study (Chmelařová 2020) used the Cognitive Failures Questionnaire (CFQ) (Broadbent 1982); one study (Stuifbergen 2018) used the strategy subscale of the Multifactorial Memory Questionnaire (MMQ) (Troyer 2017).

Twenty‐five trials used list‐learning tasks: Hopkins Verbal Learning Task‐Revised (HVLT‐R) (Benedict 1998) (Chiaravalloti 2005Rilo 2018); Verbal Learning Test (VLT) (Sturm 1999a) (Tesar 2005); California Verbal Learning Task‐II (CVLT) (Delis 2000) (Arian Darestani 2020Campbell 2016Chiaravalloti 2013Chiaravalloti 2019aChiaravalloti 2019bGoverover 2018aHildebrandt 2007Stuifbergen 2012Stuifbergen 2018Vilou 2020); Greek Verbal Learning Trial (GVLT) (Messinis 2020); Auditory Verbal Learning Test (AVLT) (Lezak 2004) (Hancock 2015); Selective Reminding Task (De Luca 2019Gich 2015Impellizzeri 2020Lincoln 2020Mattioli 2016Messinis 2017Pedulla 2016Perez‐Martin 2017Rao 1993); and the list‐learning task used by one study was not specified (Jønsson 1993). Seven studies used neuropsychological test batteries or subtests of these. One study, Mendozzi 1998, used the memory scale of the Luria‐Nebraska Neuropsychological Battery (LNNB) consisting of 13 items (Golden 1980). Subtests from other test batteries included Buschke Selective Reminding Test from an Italian version of the Brief Repeatable Battery of Neuropsychological Tests (BRBNT) (Solari 2002), unspecified tests from the Rivermead Behavioural Memory Test (RBMT‐E) (Wilson 1999), and the Doors and People Test (Baddeley 1994). Pusswald 2014 used the MUSIC assessment (Calabrese 2004), and Jønsson 1993 used an unspecified battery. Non‐verbal memory was assessed using individual tests or part of a battery. Individual tests included the Noverbaler Lerntest (NVLT) (Sturm 1999b) (Tesar 2005), and an unspecified 50‐faces recognition test (Jønsson 1993).

Seventeen trials used visual objective memory measures: Brief Visuospatial Memory Test (BVMT‐R) (Benedict 1996) (Campbell 2016Chiaravalloti 2019aChiaravalloti 2019bMessinis 2017Messinis 2020Stuifbergen 2012Stuifbergen 2018Vilou 2020); 10/36 Spatial Recall Test (SPART) (Rao 1990) (De Luca 2019Impellizzeri 2020Lincoln 2020Maggio 2020Mattioli 2016Pedulla 2016Perez‐Martin 2017); Contextual memory test (CMT) (Toglia 2004) (Goverover 2018a); Rey‐Osterrieth complex figure (ROCF) (Rey 1941) (Maggio 2020).

Fourteen trials used working memory measures: Paced auditory serial addition test (PASAT) (Rao 1990) (Naeeni Davarani 2020De Luca 2019Impellizzeri 2020Lincoln 2020Maggio 2020Mattioli 2016Pedulla 2016Perez‐Martin 2017Rahmani 2020Stuifbergen 2018); N‐back test (Huiskamp 2016Pedulla 2016); Digit span WAIS subtest (Rilo 2018Shahpouri 2019).

In terms of ‘other cognitive outcomes’, the most frequently assessed cognitive domain was information processing. Nineteen studies included information processing measures: Symbol Digit Modalities Test (SDMT) (Campbell 2016De Luca 2019Hancock 2015Impellizzeri 2020Lincoln 2020Mattioli 2016Messinis 2017Messinis 2020Naeeni Davarani 2020Pedulla 2016Perez‐Martin 2017Rilo 2018Solari 2004Stuifbergen 2012Stuifbergen 2018Vilou 2020); Stroop colour test (SCWT) (Stroop 1938) (Rahmani 2020); Trail Making Test (TMT) (Chmelařová 2020); Behaviour Rating Inventory of Executive Function – Adult version (BRIEF‐A) (Hanssen 2015).

The most frequently used mood measure was the Beck Depression Inventory (BDI) (Beck 1987), used in 11 studies (Chiaravalloti 2005Chmelařová 2020De Luca 2019Hancock 2015Hildebrandt 2007Impellizzeri 2020Maggio 2020Messinis 2020Mousavi 2018aShahpouri 2019Tesar 2005). Six studies, Carr 2014das Nair 2012Goodwin 2020Lincoln 2002Lincoln 2020, and Mousavi 2018a, used the General Health Questionnaire (GHQ‐28) (Goldberg 1988); three, Chiaravalloti 2013Chiaravalloti 2019b, and Goverover 2018a, used the Chicago Mood Depression Inventory (CMDI) (Nyenhuis 1998), and two of these studies also used the STAI (depression and anxiety subscale); Chiaravalloti 2019b and Goverover 2018a, and another, Solari 2004, used the Italian version of the CMDI (Solari 2003). One study used the Montgomery‐Asberg Depression Rating Scale (MADRS, depression and anxiety subscale) (Mattioli 2016), one used the Centre for Epidemiological Studies Depression (CES‐D) (Stuifbergen 2018).

Nine studies (De Luca 2019Hancock 2015Impellizzeri 2020Lincoln 2002Maggio 2020Mattioli 2016Perez‐Martin 2017Shahpouri 2019Solari 2004) assessed quality of life using the Multiple Sclerosis Quality of Life (MSQOL‐54; Vickrey 1995), three studies, Carr 2014Hanssen 2015, and Lincoln 2020 used the Multiple Sclerosis Impact Scale (MSIS‐29) (Hobart 2001), two studies, Goodwin 2020 and Messinis 2020, used the EQ‐5D‐5L, one, Chiaravalloti 2019a, used the SF‐36 and one, Goverover 2018a, used the Satisfaction with Life Scales (SWLS).

Only two studies examined whether their rehabilitation programme affected instrumental ADL (das Nair 2012 and Lincoln 2002), by using the Extended Activities of Daily Living scale (EADL) (Nouri 1987). Four studies (Campbell 2016Chiaravalloti 2013Chiaravalloti 2019aGoverover 2018a) assessed functional independence with the Functional Assessment of Multiple Sclerosis (FAMS) (Cella 1996). One study, Stuifbergen 2018, used the Instrumental Activities of Daily Living (IADL).

Eighteen studies were observer‐blinded RCTs or quasi‐randomised trials (Carr 2014das Nair 2012Gich 2015Hildebrandt 2007Impellizzeri 2020Jønsson 1993Goodwin 2020Lincoln 2002Lincoln 2020Maggio 2020Mani 2018Mendozzi 1998Messinis 2020Mousavi 2018bPerez‐Martin 2017Stuifbergen 2012Tesar 2005Vilou 2020), and 14 stated that they were observer‐ and participant‐blinded (Charvet 2017Chiaravalloti 2005Chiaravalloti 2013Chiaravalloti 2019aChiaravalloti 2019bDe Luca 2019Goverover 2018aHancock 2015Huiskamp 2016Mousavi 2018aNaeeni Davarani 2020Shahpouri 2019Solari 2004Stuifbergen 2018). One study reported that blinding of participants was not possible due to the nature of the intervention, and there was no mention of observer blinding (Hanssen 2015). However, all outcomes were self‐report questionnaire‐based, therefore blinding was not deemed necessary. Twelve studies either did not use blinding procedures or were unclear in their methodology (Arian Darestani 2020Campbell 2016Chmelařová 2020Ernst 2015Jønsson 1993Mattioli 2016Messinis 2017Pedulla 2016Pusswald 2014Rahmani 2020Rilo 2018Tesar 2005), therefore we determined these studies to be at high risk of bias. One study reported that while the main scorer was not blinded, a blinded rater verified the scoring accuracy for 20% of memories randomly chosen with a reliability of.95 when assessed with intraclass correlations, therefore we determined this to be a low risk of bias (Ernst 2018).

Excluded studies

We excluded 64 studies based on the exclusion criteria specified for this review. Two were studies of Alzheimer’s disease, i.e. not MS (Akhtar 2006; Loewenstein 2004); four were unrelated to memory (comparative study of Barthel Index and Functional Independence Measure in van der Putten 1999, and falls in Aisen 1994, Canellopoulou 1998, and Flavia 2010); and one was a systematic review, not an intervention study (Thomas 2006). Sixteen studies were not specific to memory, but general neuropsychological rehabilitation, attention, or information processing (Amato 2014; Bhargav 2016; Cabrera‐Gomez 2010; Canellopoulou 1998; Chiaravalloti 2018; De Giglio 2014; Flavia 2010; Goreover 2011; Grasso 2017; Hanssen 2016; Mattioli 2012; Mäntynen 2014; Rosti‐Otajärvi 2013a; Rosti‐Otajärvi 2013b; Veldkamp 2019; Zimmer 2018). Seven studies used healthy controls instead of an MS control group (Aguirre 2019; Aldrich 1995; Chiaravalloti 2003; Ernst 2013; Lamargue 2020; Vogt 2009; Wilson 2001), and Wilson 2001 also did not distinguish between results for people with MS and others with acquired progressive brain injury. Eleven studies were not RCTs or quasi‐RCTs (one quasi‐experimental waiting‐list control: Rodgers 1996; one small group study: Allen 1998; six without random allocation: Barker 2019; Barbarulo 2018; Brenk 2008; Brissart 2013; Pineau 2019; Shatil 2010 three with no control group: Bove 2019; Brissart 2010; Güçlü Altun 2015). One study was a brain imaging study and had an active control group (Bonavita 2015). One study used a "music intervention" (Thaut 2014). One study was not considered to be a rehabilitation study according to our inclusion criteria because it only involved one hour‐long session of memory retraining (Moore 2008). Three studies used the same sample, or a subgroup of the sample, of Chiaravalloti 2013 (Chiaravalloti 2012; Dobryakova 2014; Leavitt 2014), and another, Martin 2014, was a subgroup analysis of das Nair 2012, and was therefore not included. Two studies had abstracts in English but no full‐text in English available (Fiorotto 2015; Jimenez‐Morales 2017). Four studies were study protocols and therefore had no results attached to them (Guijarro‐Castro 2017; Harand 2019; Lincoln 2015; Nauta 2017). Finally, 13 studies were conference poster presentations, and/or no full texts could be found (Bove 2019; Campbell 2015; das Nair 2017; Harand 2017; Iaffaldano 2015; Kavaklioglu 2017; Messinis 2015; Penner 2018; Perez‐Martin 2016; Rilo 2015; Rilo 2016; Rilo 2017; Nurova 2014).

Risk of bias in included studies

The risk of bias in the 44 included studies was generally low (see Figure 2 and Figure 3 – for individual study risk of bias assessments). However, high risk of selection and detection bias was found in the following: random sequence generation in four studies, allocation concealment in two studies, blinding procedures in 14 studies, incomplete outcome data in four studies, and possible selective reporting in four studies. Furthermore, we judged the risk of bias to be unclear in some instances due to insufficient reporting of methods for: randomised sequence generation in 11 studies, allocation concealment in 14 studies, blinding procedures in two studies, and incomplete outcome data in 10 studies.


Methodological quality graph: review authors' judgements about each methodological quality item presented as percentages across all included studies.

Methodological quality graph: review authors' judgements about each methodological quality item presented as percentages across all included studies.


Methodological quality summary: review authors' judgements about each methodological quality item for each included study.

Methodological quality summary: review authors' judgements about each methodological quality item for each included study.

Random sequence generation

Seventeen studies were judged to have a low risk of selection bias due to having adequate random sequence generation, having used a computerised random number generator by an independent unit (Campbell 2016Carr 2014Chiaravalloti 2013Chiaravalloti 2019aChiaravalloti 2019bdas Nair 2012Ernst 2018Huiskamp 2016Impellizzeri 2020Lincoln 2002Mani 2018Mattioli 2016Messinis 2017Messinis 2020Rilo 2018Shahpouri 2019Solari 2004), two used a random number generator from the study data analyst that was created prior to recruitment and kept in sealed envelopes (Stuifbergen 2012Stuifbergen 2018) and six used a block randomisation generated by a blind statistician (Charvet 2017De Luca 2019Goodwin 2020Goverover 2018aLincoln 2020Maggio 2020). Three studies used “randomised software” to randomly assign participants to three groups and therefore was determined to have low risk of bias (Mousavi 2018aMousavi 2018bVilou 2020). Four studies were judged not to have adequate sequence generation and therefore a high risk of bias, as three of these studies involved quasi‐random 'odd‐even' or alternating allocation (Chiaravalloti 2005Hildebrandt 2007Pusswald 2014), and one of these studies only randomised half the sample with no generation method stated (Mendozzi 1998). The method used for random sequence generation and the risk of bias in 11 other studies was unclear (Arian Darestani 2020Chmelařová 2020Ernst 2015Gich 2015Hanssen 2015Jønsson 1993Naeeni Davarani 2020Pedulla 2016Perez‐Martin 2017Rahmani 2020Tesar 2005).

Allocation

We judged 19 studies to have a low risk of selection bias due to effectively concealing allocation into groups using a computerised random number generator by an independent unit (Campbell 2016; Carr 2014; Chiaravalloti 2013; Chiaravalloti 2019a; Chiaravalloti 2019b; das Nair 2012; Ernst 2018; Huiskamp 2016; Impellizzeri 2020; Lincoln 2002; Mani 2018; Mattioli 2016; Messinis 2017; Messinis 2020; Mousavi 2018a; Mousavi 2018b; Rilo 2018; Shahpouri 2019; Solari 2004), a closed envelope system (Jønsson 1993; Stuifbergen 2012; Stuifbergen 2018; Vilou 2020), or having a separate staff member who was not otherwise involved in the study complete allocation (Charvet 2017; Goodwin 2020; Goverover 2018a; Lincoln 2020; Tesar 2005). We judged two studies as not having concealed allocation to groups, suggesting a high risk of bias: one having used "odd‐even" allocation completed by the principal investigator (Chiaravalloti 2005), and one stating that allocation concealment was not possible (Hanssen 2015). Fourteen studies were unclear in their explanation of allocation concealment: one informing participants whether they were to receive the intervention or assessment only (Hildebrandt 2007); one in which the principal investigator allocated groups and what other involvement they had in the study was not clearly explained (Mendozzi 1998); and 12 studies not mentioning allocation concealment (Arian Darestani 2020; Chmelařová 2020; De Luca 2019; Ernst 2015; Gich 2015; Hancock 2015; Maggio 2020; Naeeni Davarani 2020; Pedulla 2016; Perez‐Martin 2017; Pusswald 2014; Rahmani 2020).

Blinding

Sixteen studies were observer blinded (Carr 2014das Nair 2012De Luca 2019Gich 2015Goodwin 2020Hildebrandt 2007Impellizzeri 2020Lincoln 2002Lincoln 2020Maggio 2020Mani 2018Mendozzi 1998Messinis 2020Perez‐Martin 2017Stuifbergen 2012Vilou 2020). 14 studies stated they were “double blind” (Charvet 2017Chiaravalloti 2005Chiaravalloti 2013Chiaravalloti 2019aChiaravalloti 2019bGoverover 2018aHancock 2015Huiskamp 2016Mani 2018Mousavi 2018aNaeeni Davarani 2020Shahpouri 2019Solari 2004Stuifbergen 2018), however, three of these studies (Mousavi 2018aNaeeni Davarani 2020Stuifbergen 2018) were judged to be high risk of bias due to lack of evidence of how they blinded the personnel and/or participants. One study reported that while the main scorer was not blinded, a blinded rater verified the scoring accuracy for 20% of memories randomly chosen with a reliability of.95 when assessed with intraclass correlations, therefore we determined this to be a low risk of bias (Ernst 2018). One study reported that blinding of participants was not possible due to the nature of the intervention (Hanssen 2015), and there was no mention of observer blinding, but because the outcomes were self‐report questionnaires, we deemed this study to have an unclear risk of bias. One study was rated unclear bias due to discrpenaices in the blinding procedures found when the study stated that "Healthcare providers were not told of patients' allocation, but a few words would have given it away" (Jønsson 1993). It was not clear whether this occured or whether the authors made any attempt to prevent it by asking patients not to discuss their experience with the assessors (Jønsson 1993). Eleven studies either did not use any blinding procedures or were unclear in their methodology, suggesting a high risk of bias (Arian Darestani 2020Campbell 2016Chmelařová 2020Ernst 2015Mattioli 2016Messinis 2017Pusswald 2014Pedulla 2016Rahmani 2020Rilo 2018Tesar 2005). One study states the patients and statisticians were blind to group allocation, but it is unclear whether the assessor was blind, therefore suggesting an unclear risk of bias (Mousavi 2018b).

Incomplete outcome data

We deemed four studies to be at high risk of attrition bias: in three studies (Chiaravalloti 2013; Chmelařová 2020; Mattioli 2016), there was a post‐randomisation attrition rate of 12%, 25% and 21%, respectively and/or no discussion of how missing data were dealt with, and the study did not use intention‐to‐treat analysis; in the other study, the post‐randomisation attrition level was 44% (Hancock 2015). Eight studies did not address incomplete outcome data and did not use intention‐to‐treat analysis, which we deemed to be at unclear risk of bias: one study reported one dropout (Chiaravalloti 2005), two studies reported two dropouts (Hanssen 2015; Rilo 2018), three studies reported three dropouts (Campbell 2016; Ernst 2015; Mani 2018), two studies reported six dropouts (Chiaravalloti 2019b; Naeeni Davarani 2020); one study reported seven dropouts (Arian Darestani 2020); in another, participant outcome data were replaced with mid‐trial data if a participant dropped out (Mendozzi 1998); and two studies did not explain how dropout data were handled (Jønsson 1993; Tesar 2005). One study conducted analyses on data for those participants who completed the outcome assessments (Lincoln 2002), one used list‐wise deletion and baseline data imputed for any missing follow‐up data (das Nair 2012), and in two studies (Solari 2004; Stuifbergen 2012), missing values were imputed according to the last observation carried forward method. In one study, where less than 10% of items were missed on a questionnaire, these were replaced with the mean for the questionnaire (Carr 2014).

Selective reporting

We deemed four studies to have a high risk of reporting bias (Hancock 2015Mattioli 2016Mousavi 2018aMousavi 2018b). One study only reported on the memory outcomes, despite other outcomes having been assessed at follow‐up, and data were only reported for "good adherers" to the intervention (Hancock 2015). One study did not report outcome comparisons for control group, only the intervention group (Pedulla 2016). Three studies did not report several of their outcomes (Mattioli 2016Mousavi 2018aMousavi 2018b).

Other potential sources of bias

We judged 39 studies to have a low risk of other potential sources of bias (Carr 2014Campbell 2016Charvet 2017Chiaravalloti 2005Chiaravalloti 2013Chiaravalloti 2019aChiaravalloti 2019bChmelařová 2020das Nair 2012De Luca 2019Ernst 2015Ernst 2018Gich 2015Goodwin 2020Goverover 2018aHancock 2015Hanssen 2015Hildebrandt 2007Huiskamp 2016Impellizzeri 2020Jønsson 1993Lincoln 2002Lincoln 2020Maggio 2020Messinis 2020Mousavi 2018aMousavi 2018bNaeeni Davarani 2020Pedulla 2016Perez‐Martin 2017Pusswald 2014Rahmani 2020Rilo 2018Shahpouri 2019Solari 2004Stuifbergen 2012Stuifbergen 2018Tesar 2005Vilou 2020). One study had a potential source of bias, as one participant in the treatment group discontinued cognitive retraining and was replaced by a new entry without further explanation (Mendozzi 1998). One study had a potential source of bias as it was unclear what the control group were told about the study (Arian Darestani 2020). One study did not collect six‐month follow‐up data for the control group, therefore we determined this to be high risk of potential bias (Messinis 2017).

Effects of interventions

See: Summary of findings 1 Memory rehabilitation for people with multiple sclerosis

In this section, we first present study‐specific information regarding intervention effect on memory outcomes, and then present the meta‐analysis, synthesising results on various domains.

Nine studies concluded that there were no significant differences between the treatment and control groups on measures of memory, particularly after adjustments were made for multiple testing (Campbell 2016Carr 2014Chiaravalloti 2005Chiaravalloti 2019adas Nair 2012Hancock 2015Jønsson 1993Lincoln 2002Solari 2004), and Goodwin 2020 reported no significant within group improvements for the intervention group. Twenty‐nine studies reported significant differences on memory measures favouring the treatment groups (Arian Darestani 2020Chiaravalloti 2013Chiaravalloti 2019bChmelařová 2020De Luca 2019Gich 2015Goverover 2018aHildebrandt 2007Impellizzeri 2020Lincoln 2020Maggio 2020Mani 2018Mattioli 2016Mendozzi 1998Messinis 2017Messinis 2020Mousavi 2018aMousavi 2018bNaeeni Davarani 2020Pedulla 2016Perez‐Martin 2017Pusswald 2014Rahmani 2020Rilo 2018Shahpouri 2019Stuifbergen 2012Stuifbergen 2018Tesar 2005Vilou 2020). One study did not use memory outcomes (Hanssen 2015). Gich 2015 reported significant differences favouring treatment on some subtests of the Battery of Neuropsychological Tests (BRBN) (Rao 1993), although no significant differences were reported on the list‐learning task of the BRBN used in this meta‐analysis. Campbell 2016 showed no significant improvement on the California Verbal Learning Test (CVLT‐II) or Brief Visuospatial Memory Test (BVMT). Chiaravalloti 2019b showed significant improvements for the intervention group in the CVLT‐II at immediate follow‐up but not in the Rivermead Behavioural Memory Test (RBMT). Chmelařová 2020 showed significant improvement for the intervention group in the immediate memory component of the Repeatable Battery for the Assessment of Neuropsychological Status (RBANS) (Randolph 1998), but showed no improvement for Cognitive Failures Questionnaire (CFQ). Goverover 2018a showed significant improvement for the intervention group in CVLT‐II and: Contextual Memory Text (CMT), but not on the Memory Failures Questionnaire (MFQ). Hildebrandt 2007 reported improvements for the treatment group in the Learning Trials and Long Delay Free Recall subtests of the CVLT (Niemann 2003). Stuifbergen 2012 reported improvements in the CVLT total both over time and by group, and showed significantly more use of memory strategies in the intervention compared with control. Chiaravalloti 2013 and Chiaravalloti 2019a showed a greater learning slope for the treatment group compared to the control on the CVLT‐II (Delis 2000). Lincoln 2020 showed significant group differences in Everyday Memory Questionnaire (EMQ‐p) at both six and 12 months, in the Selective Reminding Test (SRT) total recall at six months and delayed recall at 12 months with no other significant group differences. Maggio 2020 showed significant group improvement in both groups with greater improvement in the intervention group for Spatial Recall Test (SPART) and a significant Group*Time interaction. Messinis 2017 showed significant improvement for the intervention group in SRT and BVMT‐R. Messinis 2020 showed significant improvement in everyday memory at immediate follow‐up but this was not sustained at longer‐term follow‐up. Pedulla 2016 showed significant Group*Time interaction for six out of 10 subtests of the BRBN. Tesar 2005 reported improvements on the computer‐aided card‐sorting test (CKV), Drühe‐Wienholt 1998, and the Mosaic Test of the Hamburg Wechsler Intelligence Test (HAWIE‐R), Tewes 1991, for the treatment group. Chiaravalloti 2005 observed no significant difference between the treatment and control groups on their list‐learning task (HVLT‐R) (Benedict 1998), but on subgroup analysis, we observed significant improvement on this task for the moderate‐to‐severe memory‐impaired subgroup, but not for other groups. However, this subgroup analysis was carried out only on the treatment group, which had 14 participants. Mendozzi 1998 reported improvement in the specific cognitive‐retraining group on seven measures of memory (Spatial Span from the Corsi, Digit Span Forward and Backward, Visual Reproduction, and Paired Associates‐Hard from the Italian Weschler Memory Scale (WMS), Wechsler 1945, and the LNNB, Golden 1980. There was an improvement in Digit Span Forward only in the non‐specific cognitive rehabilitation group.

Outcome 1: Subjective memory measures

Fifteen studies included subjective measures of participants’ memory functioning. Ten of these studies provided immediate outcomes (Chiaravalloti 2005Chiaravalloti 2019bChmelařová 2020Goodwin 2020Goverover 2018aMani 2018Mousavi 2018bPerez‐Martin 2017Stuifbergen 2012Stuifbergen 2018); 11 of these studies provided intermediate outcomes (Carr 2014Chiaravalloti 2005Chiaravalloti 2019bdas Nair 2012Lincoln 2002Lincoln 2020Mani 2018Mousavi 2018bShahpouri 2019Stuifbergen 2012Stuifbergen 2018); and five of these studies provided longer‐term outcomes (Carr 2014das Nair 2012Lincoln 2002Lincoln 2020Stuifbergen 2018). We found small to moderate differences between groups for subjective reports of memory for immediate, intermediate and longer‐term follow ups: (standardised mean difference (SMD) 0.32, 95% confidence interval (CI) 0.05 to 0.58; 568 participants, moderate‐quality evidence) Analysis 1.1; (SMD 0.23, 95% CI 0.11 to 0.35; 1045 participants, high‐quality evidence) Analysis 1.2; and (SMD 0.16, 95% CI 0.02 to 0.30; 775 participants, high‐quality evidence) Analysis 1.3, respectively. The intervention group performed better than the control group at each follow‐up.

Outcome 2: Objective verbal memory measures

Twenty‐one studies included objective verbal memory measures of participants’ memory functioning. Nineteen of these studies provided immediate outcomes (Arian Darestani 2020Campbell 2016Chiaravalloti 2005Chiaravalloti 2013Chiaravalloti 2019aChiaravalloti 2019bGich 2015Goverover 2018aHancock 2015Impellizzeri 2020Messinis 2017Messinis 2020Pedulla 2016Perez‐Martin 2017Rilo 2018Stuifbergen 2012Stuifbergen 2018Tesar 2005Vilou 2020), six of these studies provided intermediate outcomes (Arian Darestani 2020Campbell 2016Lincoln 2020Stuifbergen 2012Stuifbergen 2018Tesar 2005), and four of these studies provided longer‐term outcomes (Chiaravalloti 2019bLincoln 2020Mattioli 2016Stuifbergen 2018). We found small to moderate differences between groups for objective verbal reports of memory at immediate (SMD 0.40, 95% CI 0.22 to 0.58; 922 participants, low‐quality evidence) Analysis 2.1 and intermediate follow‐up (SMD 0.25, 95% CI 0.11 to 0.40; 753 participants, low‐quality evidence) Analysis 2.2, but no little to no difference at longer‐term follow‐up (SMD 0.13, 95% CI ‐0.03 to 0.29; 619 participants, moderate‐quality evidence) Analysis 2.3. The intervention group performed better than the control group at immediate and intermediate follow‐up.

Outcome 3: Objective visual memory measures

Nineteen studies included objective visual measures of participants’ memory functioning. Sixteen of these studies provided immediate outcomes (Campbell 2016Chiaravalloti 2019aChiaravalloti 2019bChmelařová 2020Goverover 2018aImpellizzeri 2020Maggio 2020Messinis 2017Messinis 2020Naeeni Davarani 2020Pedulla 2016Perez‐Martin 2017Stuifbergen 2012Stuifbergen 2018Tesar 2005Vilou 2020), six of these studies provided intermediate outcomes (Campbell 2016Lincoln 2020Naeeni Davarani 2020Stuifbergen 2012Stuifbergen 2018Tesar 2005), four of these studies provided longer‐term outcomes (Chiaravalloti 2019bLincoln 2020Mattioli 2016Stuifbergen 2018). We found a moderate difference between groups for objective reports of visual memory at immediate follow‐up (SMD 0.42, 95% CI 0.25 to 0.60; 799 participants, moderate‐quality evidence) Analysis 3.1, but little to no between group differences at intermediate (SMD 0.20, 95% CI ‐0.11 to 0.50;751 participants, moderate‐quality evidence) Analysis 3.2, and longer‐term follow‐up (SMD 0.12, 95% CI ‐0.13 to 0.37; 619 participants, high‐quality evidence) Analysis 3.3. The intervention group performed better than the control group at immediate follow‐up.

Outcome 4: Objective working memory measures

Thirteen studies included objective working measures of participants’ memory functioning. Twelve of these studies provided immediate outcomes (Chiaravalloti 2019bChmelařová 2020Impellizzeri 2020Maggio 2020Mousavi 2018aNaeeni Davarani 2020Pedulla 2016Perez‐Martin 2017Rahmani 2020Rilo 2018Stuifbergen 2012Stuifbergen 2018), eight of these studies provided intermediate outcomes (das Nair 2012;Huiskamp 2016Lincoln 2020Mousavi 2018aNaeeni Davarani 2020Rahmani 2020Stuifbergen 2012Stuifbergen 2018), five of these studies provided longer‐term outcomes (Chiaravalloti 2019bdas Nair 2012Lincoln 2020Mattioli 2016Stuifbergen 2018). We found a moderate difference between groups for objective reports of working memory at immediate follow‐up (SMD 0.45, 95% CI 0.18 to 0.72; 655 participants, low‐quality evidence) Analysis 4.1, but little to no between group differences at intermediate (SMD ‐0.16, 95% CI ‐0.09 to 0.40; 821 participants, moderate‐quality evidence) Analysis 4.2, or longer‐term follow‐up (SMD 0.04, 95% CI ‐0.11 to 0.20; 665 participants, moderate‐quality evidence) Analysis 4.3. The intervention group performed better than the control group at immediate follow‐up.

Outcome 5: Information processing

In terms of ‘other cognitive outcomes’, the most frequently assessed cognitive domain was information processing. Nineteen studies included information processing measures. Fifteen studies reported immediate outcomes (Campbell 2016Chiaravalloti 2019bChmelařová 2020Hancock 2015Messinis 2017Messinis 2020Naeeni Davarani 2020Pedulla 2016Perez‐Martin 2017Rahmani 2020Rilo 2018Solari 2004Stuifbergen 2012Stuifbergen 2018Vilou 2020), eight studies reported intermediate outcomes (Campbell 2016Hanssen 2015Lincoln 2020Naeeni Davarani 2020Rahmani 2020Solari 2004Stuifbergen 2012Stuifbergen 2018), five studies reported longer‐term outcomes (Hanssen 2015Lincoln 2020Mattioli 2016Pedulla 2016Stuifbergen 2018). We found moderate between group differences for information processing measures at immediate (SMD 0.51, 95% CI 0.19 to 0.82; 808 participants, low‐quality evidence) Analysis 5.1, and intermediate follow‐up (SMD 0.27, 95% CI 0.00 to 0.54; 933 participants) Analysis 5.2, but little to no difference at longer‐term follow‐up (SMD 0.21, 95% CI ‐0.03 to 0.45; 723 participants, moderate‐quality evidence) Analysis 5.3. The intervention group performed better than the control group at immediate and intermediate follow‐up.

Outcome 6: Mood ‐ Depression

Twenty‐two studies included measures of depression. Sixteen of these studies provided immediate outcomes (Campbell 2016Chiaravalloti 2005Chiaravalloti 2013Chmelařová 2020Goodwin 2020Goverover 2018aHancock 2015Hildebrandt 2007Impellizzeri 2020Maggio 2020Messinis 2017Messinis 2020Perez‐Martin 2017Solari 2004Stuifbergen 2018Tesar 2005), 10 of these studies provided intermediate outcomes (Campbell 2016Carr 2014Chiaravalloti 2005das Nair 2012Lincoln 2002Lincoln 2020Shahpouri 2019Solari 2004Stuifbergen 2018Tesar 2005), and seven of these studies provided longer‐term outcomes (Carr 2014Chiaravalloti 2013das Nair 2012Lincoln 2002Lincoln 2020Mattioli 2016Stuifbergen 2018). We found a moderate difference between groups for mood measures of depression at immediate (SMD 0.34, 95% CI 0.15 to 0.53; 853 participants, moderate‐quality evidence) Analysis 6.1, but little to no difference at intermediate (SMD 0.20, 95% CI ‐0.06 to 0.45; 1003 participants, moderate‐quality evidence) Analysis 6.2, or longer‐term follow‐up (SMD 0.15, 95% CI ‐0.04 to 0.34; 891 participants, high‐quality evidence) Analysis 6.3. The intervention group performed better than the control group at immediate follow‐up.

Outcome 7: Mood ‐ Anxiety

Seven studies included measures of anxiety. Four of these studies provided immediate outcomes (Campbell 2016Goodwin 2020Goverover 2018aPerez‐Martin 2017), four of these studies provided intermediate outcomes (Campbell 2016Carr 2014das Nair 2012Lincoln 2020), and three of these studies provided longer‐term outcomes (Carr 2014das Nair 2012Lincoln 2020). We found little to no between group differences for mood measures of anxiety at immediate (SMD 0.29, 95% CI ‐0.01 to 0.59; 178 participants, high‐quality evidence) Analysis 7.1, intermediate (SMD 0.16, 95% CI ‐0.15 to 0.46; 502 participants, high‐quality evidence) Analysis 7.2, or longer‐term follow‐up (SMD 0.27, 95% CI ‐0.12 to 0.65; 448 participants. high‐quality evidence) Analysis 7.3.

Outcome 8: Quality of life (QoL)

Eleven studies included QoL measures. Eight of these studies provided immediate outcomes (Goodwin 2020Goverover 2018aHancock 2015Maggio 2020Messinis 2020Perez‐Martin 2017;Shahpouri 2019Solari 2004), six of these studies provided intermediate outcomes (Carr 2014Hanssen 2015Lincoln 2002Lincoln 2020Shahpouri 2019Solari 2004), five of these studies provided longer‐term outcomes (Carr 2014Hanssen 2015Lincoln 2002Lincoln 2020Mattioli 2016). We found small to moderate between group differences for quality of life measures at immediate, intermediate, and longer‐term follow ups: (SMD 0.42, 95% CI 0.15 to 0.68; 371 participants, high‐quality evidence) Analysis 8.1, (SMD 0.30, 95% CI 0.02 to 0.58; 683 participants, high‐quality evidence) Analysis 8.2, and (SMD 0.17, 95% CI 0.02 to 0.32; 687 participants, high‐quality evidence) Analysis 8.3, respectively. The intervention group performed better than the control group at every follow‐up.

Outcome 9: Functional abilities / Activities of daily living (ADL)

Six studies included ADL measures of participants' daily functioning. Four of these studies provided immediate outcomes (Campbell 2016Chiaravalloti 2019aGoverover 2018aStuifbergen 2018), four of these studies provided intermediate outcomes (Campbell 2016das Nair 2012Goverover 2018aStuifbergen 2018),and three of these studies provided longer‐term outcomes (das Nair 2012Lincoln 2002Stuifbergen 2018). We found little to no between group differences for ADL at immediate, intermediate, and longer‐term follow‐ups: (SMD 0.02, 95% CI ‐0.26 to 0.29; 265 participants, high‐quality evidence) Analysis 9.1, (SMD ‐0.06, 95% CI ‐0.36 to 0.24; 400 participants, high‐quality evidence) Analysis 9.2, and (SMD ‐0.11, 95% CI ‐0.49 to 0.27; 369 participants, high‐quality evidence) Analysis 9.3, respectively.

Discussion

Summary of main results

In the last two decades, research groups globally have begun to address memory problems associated with multiple sclerosis (MS). However, the literature base examining the effectiveness of memory rehabilitation for MS has been weak. While single‐case and uncontrolled studies have found memory rehabilitation to be effective in reducing memory or psychological problems, these results had not been consistently replicated in randomised controlled trials (RCTs). However, more recently, we have seen larger, more methodologically‐robust trials published in this area.

We included 44 RCTs or quasi‐randomised trials in this review. These studies were either memory rehabilitation studies or cognitive rehabilitation trials with a specific memory component that included a memory intervention. These trials were mostly of relatively moderate quality, with many still not adhering to the Consolidated Standards of Reporting Trials (CONSORT) guidelines (Moher 2001). Descriptions of the randomisation protocol, blinding, and content of treatment and control groups were poor in approximately 50% of studies. Studies generally had modest sample sizes and used impairment‐level outcome assessments to determine the effectiveness of the intervention.

Twenty‐nine individual studies reported positive results on memory outcomes from their memory rehabilitation groups (Arian Darestani 2020Chiaravalloti 2013Chiaravalloti 2019bChmelařová 2020De Luca 2019Gich 2015Goverover 2018aHildebrandt 2007Impellizzeri 2020Lincoln 2020Maggio 2020Mani 2018Mattioli 2016Mendozzi 1998Messinis 2017Messinis 2020Mousavi 2018aMousavi 2018bNaeeni Davarani 2020Pedulla 2016Perez‐Martin 2017Pusswald 2014Rahmani 2020Rilo 2018Shahpouri 2019Stuifbergen 2012Stuifbergen 2018Tesar 2005Vilou 2020). However, these results need to be interpreted in the context of the methodological limitations and the measures used to assess effectiveness, which may have influenced the outcome. In fact, most of the studies that reported a positive memory outcome for the participants in the intervention group were also rated as having a high or unclear risk of bias in at least one area, with the exceptions of four studies (Lincoln 2020Shahpouri 2019Stuifbergen 2012Vilou 2020). One well‐designed large study with larger sample size (Lincoln 2020) found a significant effect for memory outcomes at six months follow‐up but did not find evidence of this effect at 12 months follow‐up, suggesting that the longevity or sustainability of the positive effects of the intervention cannot necessarily be expected in the long run.

Between group differences were found for quality of life outcomes in favour of the intervention group compared to the control group at each follow‐up point, suggesting that memory rehabilitation can lead to positive change in the overall perception of quality of life of people with MS. It should be noted that this positive finding has not been observed in many other cognitive rehabilitation reviews for people with MS (e.g. Rosti‐Otajärvi 2014), or reviews investigating cognitive rehabilitation in other cohorts such as post‐stroke patients (e.g. das Nair 2016a). This could be due to more recent trials having a broader focus on ‘impact’ of cognitive problems on MS, and therefore, likely to affect quality of life, while older trials mainly focused solely on memory impairments. It could also be that memory problems are more detrimental to the quality of life of people with MS compared to people with other neurological conditions and therefore, the tools to cope and self‐manage their problems (that they are taught during memory and attention rehabilitation) lead to a greater improvement in their quality of life compared to other patient groups.

The results of this review suggest there is substantial evidence to support the effectiveness of memory rehabilitation on subjective memory measures at immediate follow‐up in favour of the intervention condition, and this result is sustained at intermediate and longer‐term follow‐ups of up to one year. This is a significant change compared to the previous version of this review, which found no evidence to support that memory rehabilitation had a positive effect on subjective memory measures. In the current review, between group difference favouring the intervention group were also seen in the following outcomes: objective measures of verbal memory, both immediate and intermediate follow‐ups; objective measures of visual memory at immediate follow‐up; objective reports of working memory at immediate follow‐up; information processing at immediate and intermediate follow‐ups; mood measures of depressive symptoms at immediate follow‐up; quality of life measures at immediate, intermediate and longer‐term follow‐ups. Little to no between group differences were found in activities of daily living measures or measures of anxiety.

One well‐designed large study with a large sample size (Lincoln 2020) found a significant effect for memory outcomes at six‐month follow‐up but did not find evidence of this effect at 12‐month follow‐up, suggesting that the longevity or sustainability of the positive effects of the intervention cannot necessarily be expected in the long run. This supports the overall trend of these results in that, there were only two outcomes that maintained their significant effects at longer‐term follow‐up, suggesting that once the core intervention has been completed, maintenance plans (such as booster sessions) should be put in place to ensure the techniques learnt during the intervention are not forgotten or inconsistently used over time.

High heterogeneity (I2>=50%) was seen in four statistically significant outcomes (working memory at immediate (I2 = 62%) follow‐up, quality of life at intermediate (I2 = 55%) follow‐up, and information processing at immediate (I2 = 77%) and intermediate (I2 = 69%) follow‐up) and thus, these findings need to be treated with caution and explored further. Firstly, there does not appear to be one or two primary studies contributing to the increased heterogeneity for working memory measures at immediate follow‐up, therefore, this could be due to the wide variation in the type of intervention used by each study. Two studies (Stuifbergen 2012Stuifbergen 2018) used the same Memory Attention and Problem‐Solving Skills in Multiple Sclerosis (MAPSS‐MS) intervention, whereas the other 10 studies all used different interventions from each other. We found a large variation in both the type of methods used, e.g. computerised versus face‐to‐face, and the frequency at which the intervention was delivered, e.g. ranging from four to 12 weeks in duration and from once to six times per week. These results suggest that variation in both type of intervention and frequency of delivery contributed to the high heterogeneity. This theory is supported by the sensitivity analysis which shows that the heterogeneity drops to 0% when all but the two studies that used the same intervention methods are removed.

Secondly, for the high heterogeneity for information processing outcomes measured at immediate follow‐up, it appears that there are three main studies contributing towards this (Campbell 2016Messinis 2020Naeeni Davarani 2020). All three studies used RehaCom software for their interventions which took place at home and there was some variation in the frequency of sessions between each study. The type of intervention used in these studies may have contributed to the high heterogeneity however, without a meta‐regression we cannot be certain of the cause and thus these results should be interpreted with caution.

Lastly, for the high heterogeneity seen in quality of life outcomes measured at intermediate follow‐up, one study appeared to contribute towards this (Shahpouri 2019). One possible cause could be that the outcomes were measured quote: "within 3 months after cognitive rehabilitation therapy" (p. 113). This suggests that some participants may have had their outcomes assessments immediately after treatment and others three months later, which could account for the large clinical variance in outcome scores.

However, these results must be viewed in relation to the quality of the evidence for this outcome, with the GRADE rating showing as low for information processing at both immediate and intermediate follow‐up, low for working memory at immediate follow‐up but high for quality of life at intermediate follow‐up (GRADE Working Group 2004). Furthermore, improvements in outcomes are only maintained at follow‐up for subjective memory measures and quality of life, which suggests that regular booster sessions of cognitive rehabilitation are necessary to maintain the improvements made and without them participants appear to revert to where they started. The degree to which this has the potential to generalise to everyday life, given the varying ecological validity of these tests, is questionable. However, it is important to note that the methodological quality of studies included in this review has improved compared to the previous review.

Overall completeness and applicability of evidence

The size of the literature‐base examined in this review allowed us to address the research questions in as much depth as possible. The variety of outcomes in the trials enabled us to investigate domain‐specific memory such as visual, verbal and working, thus this review not only shows the positive effects for general memory but also identified which domains are being improved by the rehabilitation intervention programmes and which of these improvements are maintained. This review fully investigated all types of studies, participants, interventions and outcome measures as stated in the methods. The positive results in trials using computerised interventions have important implications for clinical practice in the current COVID‐19 pandemic, as cognitive rehabilitation may have to be delivered virtually for the foreseeable future. This review examined the evidence from RCTs and quasi‐RCTs and found evidence to suggest that memory rehabilitation is effective in improving memory performance on subjective, objective (verbal, visual and working memory) assessments across immediate and intermediate follow‐ups, and quality of life in the immediate, intermediate, and longer‐term, and reducing depression (but only immediately after the intervention). However, this evidence should be interpreted in the context of the methodological quality as reported in the summary of findings Table 1 before it is applied to a clinical setting.

Quality of the evidence

We identified 44 RCTs of memory rehabilitation for people with MS, and all but five had small sample sizes (Charvet 2017; Hanssen 2015; Lincoln 2002; Lincoln 2020; Stuifbergen 2018). However, studies included in this review were more methodologically sound than the memory rehabilitation RCTs included in systematic reviews of stroke or traumatic brain injury literature (das Nair 2007). Despite this, the CONSORT statement and guidelines were not always followed in these trials.

The randomisation protocol was inadequate and was poorly reported for 15 studies (Arian Darestani 2020; Chiaravalloti 2005; Chmelařová 2020; Ernst 2015; Gich 2015; Hanssen 2015; Hildebrandt 2007; Jønsson 1993; Mendozzi 1998; Naeeni Davarani 2020; Pedulla 2016; Perez‐Martin 2017; Pusswald 2014Rahmani 2020; Tesar 2005). Gich 2015, Hanssen 2015, and Tesar 2005 did not clearly mention how the randomisation list was created or what procedures were undertaken; Jønsson 1993 used closed envelopes but did not mention who created the random lists; Chiaravalloti 2005 employed odd‐even random allocation; and Hildebrandt 2007 and Pusswald 2014 used alternating allocation. These two latter forms of allocation are not always considered acceptable in RCTs (Glanville 2006), but are classed by Cochrane as a quasi‐randomised trial (Higgins 2019) and were therefore included in this review. Mendozzi 1998 randomised only half the sample, with no stated random generation method. Twenty‐seven studies reported their randomisation protocols adequately (Campbell 2016; Carr 2014; Charvet 2017; Chiaravalloti 2013; Chiaravalloti 2019a; Chiaravalloti 2019b; das Nair 2012; De Luca 2019; Ernst 2018; Goodwin 2020; Goverover 2018a; Hancock 2015; Huiskamp 2016; Impellizzeri 2020; Lincoln 2002; Lincoln 2020; Maggio 2020; Messinis 2017; Messinis 2020; Mousavi 2018a; Mousavi 2018b; Rilo 2018; Shahpouri 2019; Solari 2004; Stuifbergen 2012; Stuifbergen 2018; Vilou 2020). The 29 studies we have added in this update have improved in terms of quality of reporting of trials however, more work is needed to ensure that trialists follow the CONSORT statement (Moher 2001).

Furthermore, given that memory rehabilitation is a complex intervention (Craig 2008), much more detail is required about what participants experience in both the intervention and the control arms of the trial. Indeed, the description of the interventions was adequate in most studies, however the control groups were much less well‐described. Recently published guidelines such as the Template for Intervention Description and Replication (TIDieR) and the Criteria for Reporting the Development and Evaluation of Complex Interventions in healthcare: revised guideline (Hoffman 2014; Möhler 2015), alongside more specific guidance for memory rehabilitation (Martin 2015), may help improve the quality of reporting of trials of complex interventions.

Given the complex nature of the intervention, it is important to determine whether the intervention was delivered as intended. Only four studies (Carr 2014; Lincoln 2020; Stuifbergen 2012; Stuifbergen 2018) out of 44 reported whether a fidelity assessment was completed. Where it was assessed, authors found that the intervention was delivered with fidelity. Future trials should consider including fidelity assessments.

Inclusion and exclusion criteria were relatively well‐defined. While most studies described the flow of participants through the trial, one did not (Tesar 2005), and only 14 of the 44 studies had flowcharts (Carr 2014; Campbell 2016; Chiaravalloti 2005; Chiaravalloti 2013; das Nair 2012; Gich 2015; Goodwin 2020; Hancock 2015; Hanssen 2015; Lincoln 2002; Lincoln 2020; Pusswald 2014; Solari 2004; Stuifbergen 2012).

Because MS is found in demographically diverse populations, we expected to see better description of the samples. Only five out of 44 papers described the ethnicity of the sample, while 38 out of 44 papers described the level of education. No studies reported whether or not participants were drawn from economically‐disadvantaged groups or whether they had co‐morbid conditions. While these factors could be balanced out through randomisation, we need to know whether the effects of the intervention are the same for these groups. Future trials should collect and report these details. Furthermore, while many studies recruited samples with people with different types of MS, we note that several studies only included people with relapsing remitting multiple sclerosis (RRMS). We recommend that future trials consider including people with other MS subtypes also, and outcomes described separately for each subtype.

Most trials opted to use impairment‐level measures or tests with modest ecological validity and minimal chance of generalisation of treatment effects to activities of daily living. Fifteen studies employed subjective measures of memory (Carr 2014; Chiaravalloti 2005; Chiaravalloti 2019a; Chiaravalloti 2019b; Chmelařová 2020; das Nair 2012; Goodwin 2020; Goverover 2018a; Lincoln 2002; Lincoln 2020; Mani 2018; Mousavi 2018b; Perez‐Martin 2017; Shahpouri 2019; Stuifbergen 2012), which is a big improvement from the five studies in the last update as these measures have some degree of ecological validity and were activity‐level measures. However, these are prone to subjective reporting biases common to most Patient‐Reported Outcome Measures (PROMs). Furthermore, the cultural appropriateness of outcomes has improved since the previous review, with more studies including translated and adapted assessment tools such as the GVLT which is the Greek adaptation of the California Verbal Learning Test (CVLT‐II).

There has also been a shift in focus in some of the more recent trials from assessing only cognitive outcomes to including other outcomes such as mood and quality of life. This, we believe, is a positive step forward in memory and cognitive rehabilitation. This review highlights the importance of not only including these quality of life measures as key outcomes, as in the Lincoln 2020 trial. Only three studies assessed adverse events following memory rehabilitation (Chiaravalloti 2013; Chiaravalloti 2019a; Lincoln 2020). While, the likelihood of such adverse events is remote, trials should assess them to be certain of this.

Both parametric and nonparametric statistical tests were used to compare groups. Change scores were compared in six studies (Campbell 2016; Chiaravalloti 2005; Chiaravalloti 2013; Gich 2015; Hanssen 2015; Stuifbergen 2012), and all studies were concerned with significance testing. Contrary to the previous update of this review, the majority of the newly included studies included P values in their reporting of outcomes as opposed to the seven that included them previously (Carr 2014; das Nair 2012; Gich 2015; Hancock 2015; Lincoln 2002; Pusswald 2014; Solari 2004), with many trials providing all P values in tables that were readily accessible in the papers and online as supplementary information (Campbell 2016; Chiaravalloti 2013). Most studies also mentioned confidence intervals and often reported the post‐hoc tests or statistical corrections or adjustments performed on their data. Eight studies used intention‐to‐treat analysis (Carr 2014; das Nair 2012; Goodwin 2020; Hildebrandt 2007; Lincoln 2002; Lincoln 2020; Solari 2004; Stuifbergen 2012).

During risk of bias assessment, we observed that some studies stated that they were “double‐blind” studies without justifying how they were in fact double blind. This resulted in these studies being rated as high risk of bias. Such double‐blind studies were typically those where computerised memory rehabilitation was the intervention being tested. Even in these situations where participants could potentially be blinded, it was not clear how different the computerised rehabilitation was from the computerised control group. Therefore, it was difficult to determine whether the participants were truly blinded. In some instances, the study authors reported that either participants or therapists delivering the intervention were blinded to group allocation, but from the study description, it was not always clear how this could have been the case. In future, we would strongly encourage authors to be more explicit in describing the blinding procedures used.

One limitation of this review was that we could only obtain information on whether the studies used intention‐to‐treat or per‐protocol analyses for eight studies (Carr 2014; das Nair 2012; Hildebrandt 2007; Lincoln 2002; Lincoln 2020; Solari 2004; Stuifbergen 2012; Stuifbergen 2018), therefore we could not complete a sensitivity analysis of intention‐to‐treat in comparison with per‐protocol analysis. We were able to conduct a sensitivity analysis comparing studies judged to be at low risk of bias to all included studies, however, we were unable to run this analysis for four outcomes due to a lack of studies with low risk of bias in every area (see Table 2). This suggests that there could be a correlation between trials that measure their outcomes immediately post‐intervention and high risk of bias within the methodology. Our interpretation of the sensitivity analysis suggests that while the quality of the trials did not affect most outcomes, some differences were observed at immediate follow‐up, with studies with higher risk of bias inflating the overall effect size estimates for these outcomes, and the test of overall effect changing from being statistically significant to not significant when studies at high risk of bias were excluded. This suggests that lower‐quality studies may have positively influenced the outcomes, however, this could also be because only a few studies that measured immediate outcomes had low risk of bias in every area, and therefore, these results should be interpreted with caution. Furthermore, removing the studies with high risk of bias during this analysis often led to a reduction in heterogeneity. This could suggest an association between studies that have high risk of bias and increased heterogeneity. However, it is more likely that the heterogeneity was caused by wide variation in both type and frequency of intervention. There also appeared to be an association between studies that measure longer‐term outcomes and low risk of bias.

We conducted a separate sensitivity analysis for the studies where standard deviations were inputted and found no clinical differences between the sensitivity analysis and the primary analysis, suggesting that inputting the standard deviations had no significant effect. Only one study had a large sample size and sufficient data available to complete a subgroup analysis (Lincoln 2002). A subgroup meta‐analysis on the basis of type of MS will therefore need to be completed in a future review update when more studies become available.

Potential biases in the review process

Two of the review authors were lead investigators for three of the included studies (das Nair 2012, Lincoln 2002, Lincoln 2020), and named authors on another included study (Carr 2014), but to mitigate bias, we had multiple review authors who independently appraised the methodological quality of these studies. We only searched for papers in English, and we could only include mixed‐diagnosis studies where separate data for those participants with MS were provided. Therefore, there may be more data available that we did not have access to. There were also potential overlaps between attention and memory retraining, where an intervention could be described as attention when it actually addressed memory, so we may have missed some trials. To mitigate this, we checked papers at full‐text review to ensure that they were not excluded if a memory component was presented as part of the treatment. Finally, we searched GreyNet and the EThOS databases; however, we are not sure of the comprehensiveness of these, thus creating the possibility of further relevant grey literature that was not obtained via the searches.

Agreements and disagreements with other studies or reviews

This review complements the 'Psychological interventions for multiple sclerosis' intervention review (Thomas 2006). In one of their mini‐reviews, Thomas 2006 found quote: "some evidence of effectiveness of cognitive rehabilitation on cognitive outcomes, although this was difficult to interpret because of the large number of outcome measures used". Their interpretations have therefore been based on a narrative review of results from individual studies. The Thomas 2006 review covered interventions that were not specific to 'memory rehabilitation', however, their findings related to effectiveness of interventions to help people with cognitive impairments were inconclusive.

Similarly, the Rosti‐Otajärvi 2014 review found evidence that memory span, working memory, and delayed memory were significantly improved for the intervention compared with the control group. However, their review found no significant differences between intervention and control for emotional functions, whereas this review has found some significant differences, notably improved mood on depression scales and quality of life. Any discrepancies are likely due to the differences in inclusion criteria, as this review was specific to memory rehabilitation, or a cognitive rehabilitation with a memory component, whereas the Rosti‐Otajärvi 2014 review evaluated a much larger breadth of neuropsychological interventions and outcomes.

The Goverover 2018b review found promising results to support cognitive rehabilitation for improving memory function and stated that there had been substantial progress made in increasing the number of cognitive rehabilitation trials to allow for practise recommendations. However, they also suggest, like we do, there is still much work to be done to optimise cognitive rehabilitation potential by applying the most rigorous methodology to ensure the quality of evidence is as high as possible.

Flow diagram showing article screening process

Figuras y tablas -
Figure 1

Flow diagram showing article screening process

Methodological quality graph: review authors' judgements about each methodological quality item presented as percentages across all included studies.

Figuras y tablas -
Figure 2

Methodological quality graph: review authors' judgements about each methodological quality item presented as percentages across all included studies.

Methodological quality summary: review authors' judgements about each methodological quality item for each included study.

Figuras y tablas -
Figure 3

Methodological quality summary: review authors' judgements about each methodological quality item for each included study.

Comparison 1: Subjective memory measures, Outcome 1: Immediate

Figuras y tablas -
Analysis 1.1

Comparison 1: Subjective memory measures, Outcome 1: Immediate

Comparison 1: Subjective memory measures, Outcome 2: Intermediate

Figuras y tablas -
Analysis 1.2

Comparison 1: Subjective memory measures, Outcome 2: Intermediate

Comparison 1: Subjective memory measures, Outcome 3: Longer‐term

Figuras y tablas -
Analysis 1.3

Comparison 1: Subjective memory measures, Outcome 3: Longer‐term

Comparison 2: Objective verbal memory, Outcome 1: Immediate

Figuras y tablas -
Analysis 2.1

Comparison 2: Objective verbal memory, Outcome 1: Immediate

Comparison 2: Objective verbal memory, Outcome 2: Intermediate

Figuras y tablas -
Analysis 2.2

Comparison 2: Objective verbal memory, Outcome 2: Intermediate

Comparison 2: Objective verbal memory, Outcome 3: Longer‐term

Figuras y tablas -
Analysis 2.3

Comparison 2: Objective verbal memory, Outcome 3: Longer‐term

Comparison 3: Objective visual memory, Outcome 1: Immediate

Figuras y tablas -
Analysis 3.1

Comparison 3: Objective visual memory, Outcome 1: Immediate

Comparison 3: Objective visual memory, Outcome 2: Intermediate

Figuras y tablas -
Analysis 3.2

Comparison 3: Objective visual memory, Outcome 2: Intermediate

Comparison 3: Objective visual memory, Outcome 3: Longer‐term

Figuras y tablas -
Analysis 3.3

Comparison 3: Objective visual memory, Outcome 3: Longer‐term

Comparison 4: Objective working memory, Outcome 1: Immediate

Figuras y tablas -
Analysis 4.1

Comparison 4: Objective working memory, Outcome 1: Immediate

Comparison 4: Objective working memory, Outcome 2: Intermediate

Figuras y tablas -
Analysis 4.2

Comparison 4: Objective working memory, Outcome 2: Intermediate

Comparison 4: Objective working memory, Outcome 3: Longer‐term

Figuras y tablas -
Analysis 4.3

Comparison 4: Objective working memory, Outcome 3: Longer‐term

Comparison 5: Information processing, Outcome 1: Immediate

Figuras y tablas -
Analysis 5.1

Comparison 5: Information processing, Outcome 1: Immediate

Comparison 5: Information processing, Outcome 2: Intermediate

Figuras y tablas -
Analysis 5.2

Comparison 5: Information processing, Outcome 2: Intermediate

Comparison 5: Information processing, Outcome 3: Longer‐term

Figuras y tablas -
Analysis 5.3

Comparison 5: Information processing, Outcome 3: Longer‐term

Comparison 6: Mood ‐ Depression Scale, Outcome 1: Immediate

Figuras y tablas -
Analysis 6.1

Comparison 6: Mood ‐ Depression Scale, Outcome 1: Immediate

Comparison 6: Mood ‐ Depression Scale, Outcome 2: Intermediate

Figuras y tablas -
Analysis 6.2

Comparison 6: Mood ‐ Depression Scale, Outcome 2: Intermediate

Comparison 6: Mood ‐ Depression Scale, Outcome 3: Longer‐term

Figuras y tablas -
Analysis 6.3

Comparison 6: Mood ‐ Depression Scale, Outcome 3: Longer‐term

Comparison 7: Mood ‐ Anxiety Scale, Outcome 1: Immediate

Figuras y tablas -
Analysis 7.1

Comparison 7: Mood ‐ Anxiety Scale, Outcome 1: Immediate

Comparison 7: Mood ‐ Anxiety Scale, Outcome 2: Intermediate

Figuras y tablas -
Analysis 7.2

Comparison 7: Mood ‐ Anxiety Scale, Outcome 2: Intermediate

Comparison 7: Mood ‐ Anxiety Scale, Outcome 3: Longer‐term

Figuras y tablas -
Analysis 7.3

Comparison 7: Mood ‐ Anxiety Scale, Outcome 3: Longer‐term

Comparison 8: Quality of life, Outcome 1: Immediate

Figuras y tablas -
Analysis 8.1

Comparison 8: Quality of life, Outcome 1: Immediate

Comparison 8: Quality of life, Outcome 2: Intermediate

Figuras y tablas -
Analysis 8.2

Comparison 8: Quality of life, Outcome 2: Intermediate

Comparison 8: Quality of life, Outcome 3: Longer‐term

Figuras y tablas -
Analysis 8.3

Comparison 8: Quality of life, Outcome 3: Longer‐term

Comparison 9: Activities of Daily Living, Outcome 1: Immediate

Figuras y tablas -
Analysis 9.1

Comparison 9: Activities of Daily Living, Outcome 1: Immediate

Comparison 9: Activities of Daily Living, Outcome 2: Intermediate

Figuras y tablas -
Analysis 9.2

Comparison 9: Activities of Daily Living, Outcome 2: Intermediate

Comparison 9: Activities of Daily Living, Outcome 3: Longer‐term

Figuras y tablas -
Analysis 9.3

Comparison 9: Activities of Daily Living, Outcome 3: Longer‐term

Summary of findings 1. Memory rehabilitation for people with multiple sclerosis

Memory rehabilitation for people with multiple sclerosis

Patient or population: people with multiple sclerosis
Settings: clinic and home‐based
Intervention: memory rehabilitation

Comparison: active control or no treatment

Outcomes

Illustrative comparative risks* (95% CI)

Relative effect
(95% CI)

No of Participants
(studies)

Quality of the evidence
(GRADE)

Comments

Assumed risk

Corresponding risk

Control

Memory rehabilitation

Subjective memory measures ‐ intermediate

EMQ, MSNQ, CFQ, MFQa

Follow‐up: 1 to 6 months

 

The mean subjective memory measures ‐ immediate in the intervention groups was

0.23 standard deviations higher

(0.11 to 0.35 higher)

1045

(11 studies)

⊕⊕⊕⊕
high

Immediate follow‐up: 

SMD 0.32
(0.05 to 0.58)

 

Longer‐term follow‐up: 

SMD 0.16 (0.02 to 0.30)

Objective verbal memory measures ‐ intermediate

CVLT, AVLT, HVLT, VLT, SRT, MUSICa
Follow‐up: 1 to 6 months

 

 

 

 

 

The mean objective verbal memory measures ‐ intermediate in the intervention groups was

0.25 standard deviations higher

(0.11 to 0.4 higher)

 

 

 

 

 

 

 

753

(6 studies)

 

 

 

 

⊕⊕⊝⊝
lowb,c

 

 

 

 

Immediate follow‐up: 

SMD  0.40

(0.22 to 0.58)

 

Longer‐term follow‐up: 

SMD 0.13 (‐0.03 to 0.29)

 

 

 

 

 

Objective visual memory measures ‐ intermediate

BVMT‐R, SPART, CMT, ROCF

Follow‐up: 1 to 6 months

 

 

 

 

 

 

 

The mean objective visual memory measures ‐
intermediate in the intervention groups was
0.2 standard deviations higher
(0.11 lower to 0.5 higher)

 

 

 

 

 

751

(6 studies)

 

 

 

 

⊕⊕⊕⊝
moderatee

 

 

 

Immediate follow‐up: 

SMD 0.42

(0.25 to 0.60)

 

Longer‐term follow‐up: 

SMD 0.12

(‐0.13 to 0.37)

 

 

 

 

Objective working memory measures ‐ intermediate

PASAT, WAIS

Follow‐up: 1 to 6 months

 

 

 

 

 

 

The mean objective working memory measures ‐
intermediate in the intervention groups was
0.16 standard deviations higher
(0.09 lower to 0.40 higher)

 

 

 

 

821

(8 studies)

 

 

 

 

⊕⊕⊕⊝
moderatef

 

 

 

Immediate follow‐up: 

SMD 0.45 (0.18 to 0.72)

 

Longer‐term follow‐up: 

SMD 0.04

(‐0.11 to 0.2)

 

Informating processing ‐ intermediate

SDMT

Follow‐up: 1 to 6 months

 

 

 

 

 

 

 

The mean information processing measures ‐
intermediate in the intervention groups was
0.27 standard deviations higher
(0.00 to 0.54 higher)

 

 

 

 

 

933

(8 studies)

 

 

 

 

⊕⊕⊝⊝
lowg,h

 

 

Immediate follow‐up: 

SMD 0.51

(0.19 to 0.82)

 

Longer‐term follow‐up: 

SMD 0.21

(‐0.03 to 0.45)

 

 

 

 

Quality of life ‐ intermediate
MSIS, MSQOL, SF‐36, SF‐12, SWLS, EQ‐5D‐5La
Follow‐up: 1 to 6 months

 

 

 

The mean quality of life measures ‐ intermediate in the intervention groups was
0.30 standard deviations higher
(0.02 to 0.58 higher)

 

 

 

683

(6 studies)

 

 

⊕⊕⊕⊕
high

 

 

Immediate follow‐up: 

SMD 0.42

(0.15 to 0.68)

 

Longer‐term follow‐up: 

SMD 0.17

(0.02 to 0.32)

 

 

Acitivities of daily living ‐ intermediate
EADLa
Follow‐up: 1 to 6 months

 

 

 

The mean activities of daily living measures ‐ intermediate in the intervention groups was
0.06 standard deviations lower
(0.36 lower to 0.24 higher)

 

 

 

400

(4 studies)

 

 

 

⊕⊕⊕⊕
high

 

 

 

Immediate follow‐up: 

SMD 0.02 (‐0.26 to 0.29)

 

Longer‐term follow‐up: 

SMD ‐0.11 (‐0.49 to 0.27)

 

 

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: confidence interval; SMD: standardised mean difference

GRADE Working Group grades of evidence
High quality ⊕⊕⊕⊕: further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality ⊕⊕⊕⊝: further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality ⊕⊕⊝⊝: further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality ⊕⊝⊝⊝: we are very uncertain about the estimate.

Please note: As per Cochrane guidelines, we only report seven outcomes here. Details of our other outcomes can be found in Table 1

aCMT: Contextual Memory Text; EAQ: Emotional awareness questionnaire;EMQ: Everyday Memory Questionnaire;HADS: Hospital Anxiety and Depression Scale; STAI: State Trait Anxiety Inventory; MSNQ: Multiple Sclerosis Neuropsychological Screening Questionnaire;MFQ: Memory Functioning Questionnaire;RBMT: Rivermead Behavioural Memory Test;CVLT: California Verbal Learning Test; AVLT: Auditory Verbal Learning Test; HVLT: Hopkins Verbal Learning Test; VLT: Verbal Learning Test; LNNB: Luria‐Nebraska Neuropsychological Battery; BRBNT: Brief Repeatable Battery of Neuropsychological Tests; GHQ: General Health Questionnaire; BDI: Beck Depression Inventory; BDI‐FS: Beck Depression Inventory‐Fast Screen; EADL: Extended Activities of Daily Living; MSIS: Multiple Sclerosis Impact Scale; FAMS: Functional Assessment of Multiple Sclerosis;MSQOL: Multiple Sclerosis Quality of Life; PASAT: Paced auditory serial addition test; SF‐36: 36‐Item Short Form Health Survey; SF‐12: 12‐Item Short Form Health Survey.
b 1 of 10 studies had possible risk of bias related to random sequence generation, and in 2 of the 10 it was unclear. Allocation concealment was possible in 1 study, and unclear in 3 of the 10 studies. Blinding was a potential source of bias in 2 studies, and unclear in 1 of the 10 studies. Incomplete outcome data may have been biased in 1 study, an unclear in 3 of the 10 studies. Selective reporting may have been biased in 1 study.

Downgraded by 1 due to 95% confidence intervals including no effect, and the upper or lower confidence intervals limit crosses an effect size of 0.5 in either direction.

bAll or nearly all of the studies used a list‐learning task as an objective measure of verbal memory, which has poor ecological validity.
c2 of the 6 studies showed unclear risk of bias relating to random sequence generation. 1 study had unclear potential risk of allocation concealment bias. 4 studies had potential risk of bias related to blinding. 3 studies had unclear risk of bias due to incomplete outcome data. 1 study had unclear risk of other bias.

e2 of 6 studies showed unclear potential risk of bias related to random sequence generation. 1 study showed unclear potential risk of bias related to allocation concealment. 4 of 6 studies showed potential risk of bias related to blinding. 3 of 6 studies showed unclear risk of bias related to incomplete outcome data.

f5 of 12 studies showed unclear potential risk of bias related to random sequence generation. 6 of 12 studies showed unclear risk of bias related to allocation concealment. 7 of 12 studies showed possible risk of bias related to blinding procedures. 1 study showed potential risk of bias related to incomplete data, and 3 of 12 studies were unclear risk of bias. 1 study had potential risk of bias related to selective reporting.

g3 of 8 studies showed unclear risk of bias related to random sequence generation. 1 study showed potential risk of bias related to allocation concealment, 2 of 8 studies showed unclear risk of bias. 4 of 8 studies showed potential risk of bias related to blinding procedures, 1 study showed unclear risk of bias. 3 of 8 studies showed unclear risk of bias related to incomplete data.

hInconsistency with results, statistical heterogeneity>50%

Figuras y tablas -
Summary of findings 1. Memory rehabilitation for people with multiple sclerosis
Table 1. Summary of findings continued

 

Illustrative comparative risks (95% CI)

Relative effect (95% CI)

No of participants (studies)

Quality of evidence (GRADE)

Comments

Assumed Risk

Corresponding Risk

Outcomes

Control

Memory rehabiliation

Subjective memory ‐ immediate

EMQ, MSNQ, CFQ, MFQa
Follow‐up: within one month

The mean subjective memory measures ‐ immediate in the intervention groups was
0.32 standard deviations higher
(0.05 to 0.58 higher)

568
(10 studies)

⊕⊕⊕⊝
moderateb

SMD 0.32
(0.05 to 0.58)

Subjective memory ‐ longer‐term

EMQ, MSNQ, CFQ, MFQa
Follow‐up: 6 months+

The mean subjective memory measures ‐ long term in the intervention groups was
0.16 standard deviations higher
(0.02 to 0.30 higher)

775
(5 studies)

⊕⊕⊕⊕
high

SMD 0.16
(0.02 to 0.30)

Verbal memory ‐ immediate

CVLT, AVLT, HVLT, VLT, SRT, MUSICa
Follow‐up: within one month

The mean objective verbal memory measures ‐
immediate in the intervention groups was
0.4 standard deviations higher
(0.22 to 0.58 higher)

922
(19 studies)

⊕⊕⊝⊝
lowc,d

SMD 0.40

(0.22 to 0.58)

Verbal memory ‐ longer‐term 

CVLT, AVLT, HVLT, VLT, SRT, MUSICa
Follow‐up: 6 months+

The mean objective verbal memory measures ‐ longer‐term in the intervention groups was
0.13 standard deviations higher
(0.03 lower to 0.29 higher)

619
(4 studies)

⊕⊕⊕⊝
moderated

SMD 0.13 (‐0.03 to 0.29)

Visual memory ‐ immediate

BVMT‐R, SPART, CMT, ROCF

Follow‐up: within one month

The mean objective visual memory measures ‐
immediate in the intervention groups was
0.42 standard deviations higher
(0.25 to 0.6 higher)

799

(16 studies)

⊕⊕⊕⊝
moderatef

SMD 0.42

(0.25 to 0.60)

Visual memory ‐ longer‐term

BVMT‐R, SPART, CMT, ROCF

Follow‐up: 6 months+

The mean objective visual memory measures ‐
longer‐term in the intervention groups was
0.12 standard deviations higher
(0.13 lower to 0.37 higher)

619

(4 studies)

⊕⊕⊕⊕
high

SMD 0.12

(‐0.13 to 0.37)

Working memory ‐ immediate

PASAT, WAIS

Follow‐up: within one month

The mean objective working memory measures ‐
immediate in the intervention groups was
0.45 standard deviations higher
(0.18 to 0.72 higher)

655

(12 studies)

⊕⊕⊝⊝
lowh,p

SMD 0.45

(0.18 to 0.72)

Working memory ‐ longer‐term

PASAT, WAIS

Follow‐up: 6 months+

The mean objective working memory measures ‐
longer‐term in the intervention groups was
0.04 standard deviations higher
(0.11 lower to 0.2 higher)

665

(5 studies)

⊕⊕⊕⊕
high

SMD 0.04

(‐0.11 to 0.2)

Information processing ‐ immediate

SDMT

Follow‐up: within one month

The mean information processing measures ‐
immediate in the intervention groups was
0.51 standard deviations higher
(0.19 to 0.82 higher)

808

(15 studies)

⊕⊕⊝⊝
lowj,p

SMD 0.51

(0.19 to 0.82)

Information processing ‐ longer‐term

SDMT

Follow‐up: 6 months+

The mean information processing measures ‐
longer‐term in the intervention groups was
0.21 standard deviations higher
(0.03 lower to 0.45 higher)

723

(5 studies)

⊕⊕⊕⊝
moderatel

SMD 0.21

(‐0.03 to 0.45)

Depression (mood) ‐ immediate

GHQ, BDI, BDI‐FS, Chicago Multiscale Depression Inventory, HADS, EAQ, CES‐D, MADRSa
Follow‐up: within one month

The mean depression measures (mood) ‐ immediate in the intervention groups was
0.34 standard deviations higher
(0.15 to 0.53 higher)

853

(16 studies)

⊕⊕⊕⊝
moderatem

SMD 0.34

(0.15 to 0.53)

Depression (mood) ‐ intermediate

GHQ, BDI, BDI‐FS, Chicago Multiscale Depression Inventory, HADS, EAQ, CES‐D, MADRS

Follow‐up: 1 to 6 months

The mean depression measures (mood) ‐ intermediate in the intervention groups was
0.20 standard deviations higher
(0.06 lower to 0.45 higher)

1003 (10 studies)

⊕⊕⊕⊝
moderatem

SMD 0.20

(‐0.06 to 0.45)

Depression (mood) ‐ longer‐term

GHQ, BDI, BDI‐FS, Chicago Multiscale Depression Inventory, HADS, EAQ, CES‐D, MADRSa
Follow‐up: 1 to 6 months

The mean depression measures (mood) ‐ longer‐term in the intervention groups was
0.15 standard deviations higher
(0.04 lower to 0.34 higher)

891
(7 studies)

⊕⊕⊕⊕
high

SMD 0.15 (‐0.04 to 0.34)

Anxiety (mood) ‐ immediate

GHQ, EAQ, STAI, HADS

Follow‐up: within one month

The mean anxiety measures (mood) ‐ immediate in the intervention groups was
0.29 standard deviations higher
(0.01 lower to 0.59 higher)

178

(4 studies)

⊕⊕⊕⊕
high

SMD 0.29

(‐0.01 to 0.59)

Anxiety (mood) ‐ intermediate

GHQ, EAQ, STAI, HADS

Follow‐up: 1 to 6 months

The mean anxiety measures (mood) ‐ intermediate in the intervention groups was
0.16 standard deviations higher
(0.15 lower to 0.46 higher)

502 (4 studies)

⊕⊕⊕⊕
high

SMD 0.16

(‐0.15 to 0.46)

Anxiety (mood) ‐ longer‐term

GHQ, EAQ, STAI, HADS

Follow‐up: 6 months+

The mean anxiety measures (mood) ‐ longer‐term in the intervention groups was
0.27 standard deviations higher
(0.12 lower to 0.65 higher)

502

(4 studies)

⊕⊕⊕⊕
high

SMD 0.27

(‐0.12 to 0.65)

Quality of life ‐ immediate

MSIS, MSQOL, SF‐36, SF‐12, SWLS, EQ‐5D‐5La
Follow‐up: within one month

The mean quality of life measures ‐ immediate in the intervention groups was
0.42 standard deviations higher
(0.15 to 0.68 higher)

371

(8 studies)

⊕⊕⊕⊕
high

SMD 0.42

(0.15 to 0.68)

Quality of life ‐ longer‐term

MSIS, MSQOL, SF‐36, SF‐12, SWLS, EQ‐5D‐5La
Follow‐up: 6 months+

The mean quality of life measures ‐ longer‐term in the intervention groups was
0.17 standard deviations higher
(0.02 to 0.32 higher)

687

(5 studies)

⊕⊕⊕⊝
moderateo

SMD 0.17

(0.02 to 0.32)

Activities of daily living ‐ immediate

EADLa
Follow‐up: within one month

The mean activities of daily living measures ‐ immediate in the intervention groups was
0.02 standard deviations higher
(0.26 lower to 0.29 higher)

265
(4 studies)

⊕⊕⊕⊕
high

SMD 0.02 (‐0.26 to 0.29)

Activities of daily living ‐ longer‐term

EADLa
Follow‐up: 6 months+

The mean activities of daily living measures ‐ longer‐term in the intervention groups was
0.11 standard deviations lower
(0.49 lower to 0.27 higher)

369
(3 studies)

⊕⊕⊕⊕
high

SMD ‐0.11 (‐0.49 to 0.27)

*The basis for the assumed risk (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).
CI: confidence interval; SMD: standardised mean difference

GRADE Working Group grades of evidence
High quality ⊕⊕⊕⊕: further research is very unlikely to change our confidence in the estimate of effect.
Moderate quality ⊕⊕⊕⊝: further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate.
Low quality ⊕⊕⊝⊝: further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate.
Very low quality ⊕⊝⊝⊝: we are very uncertain about the estimate.

aCMT: Contextual Memory Text; EAQ: Emotional awareness questionnaire;EMQ: Everyday Memory Questionnaire;HADS: Hospital Anxiety and Depression Scale; STAI: State Trait Anxiety Inventory; MSNQ: Multiple Sclerosis Neuropsychological Screening Questionnaire;MFQ: Memory Functioning Questionnaire;RBMT: Rivermead Behavioural Memory Test;CVLT: California Verbal Learning Test; AVLT: Auditory Verbal Learning Test; HVLT: Hopkins Verbal Learning Test; VLT: Verbal Learning Test; LNNB: Luria‐Nebraska Neuropsychological Battery; BRBNT: Brief Repeatable Battery of Neuropsychological Tests; GHQ: General Health Questionnaire; BDI: Beck Depression Inventory; BDI‐FS: Beck Depression Inventory‐Fast Screen; EADL: Extended Activities of Daily Living; MSIS: Multiple Sclerosis Impact Scale; FAMS: Functional Assessment of Multiple Sclerosis;MSQOL: Multiple Sclerosis Quality of Life; PASAT: Paced auditory serial addition test; SF‐36: 36‐Item Short Form Health Survey; SF‐12: 12‐Item Short Form Health Survey.
b 1 of 10 studies had possible risk of bias related to random sequence generation, and in 2 of the 10 it was unclear. Allocation concealment was possible in 1 study, and unclear in 3 of the 10 studies. Blinding was a potential source of bias in 2 studies, and unclear in 1 of the 10 studies. Incomplete outcome data may have been biased in 1 study, an unclear in 3 of the 10 studies. Selective reporting may have been biased in 1 study.

Downgraded by 1 due to 95% confidence intervals including no effect, and the upper or lower confidence intervals limit crosses an effect size of 0.5 in either direction.
 

c1 study had possible risk of bias related to random sequence generation, and in 5 of 19 studies this was unclear. Allocation concealment was potentially biased in 1 study, and unclear in 6 of 19 studies. Blinding was a potential source of bias in 7 studies. Incomplete outcome data may have biased 2 of 19 studies and was unclear in 6 of 19 studies. Selective reporting may have been bias in 1 study. May have been other sources in of bias in 1 study and unclear in 1 study.
dAll or nearly all of the studies used a list‐learning task as an objective measure of verbal memory, which has poor ecological validity.
f5 of 16 studies showed unclear potential risk of bias related to random sequence generation. 6 of 16 studies showed unclear potential risk of bias related to allocation concealment. 7 of 16 studies showed potential risk of bias related to blinding. 1 study showed potential risk of bias related to incomplete outcome data, 4 of the 16 studies showed unclear risk of bias. May have other source of bias in 1 study.

h5 of 12 studies showed unclear potential risk of bias related to random sequence generation. 6 of 12 studies showed unclear risk of bias related to allocation concealment. 7 of 12 studies showed possible risk of bias related to blinding procedures. 1 study showed potential risk of bias related to incomplete data, and 3 of 12 studies were unclear risk of bias. 1 study had potential risk of bias related to selective reporting.

j5 of 15 studies showed unclear risk of bias related to random sequence generation. 6 of 15 studies showed unclear risk of bias related to allocation concealment. 8 of 15 studies showed potential risk of bias related to blinding procedures. 2 of 15 studIes showed potential risk of bias related to blinding procedures, and 3 of 15 were unclear risk of bias. 1 study showed potential risk of bias related to incomplete data. 1 study showed potential risk of bias related to other bias.

l2 of 5 studies showed unclear risk of bias related to random sequence generation. 1 study showed potential risk of bias related to allocation concealment, 1 study showed unclear risk of bias. 3 of 5 studies showed potential risk of bias related to blinding procedures, 1 study showed unclear risk of bias. 1 study showed potential risk of bias related to incomplete data, 1 study showed unclear risk of bias. 1 study showed potential risk of bias related to selective reporting.

m2 of 16 studies showed potential risk of bias related to random sequence generation, 3 of 16 studies showed unclear risk. 1 study showed potential risk of bias relating to allocation concealment, 6 of 16 studies showed unclear risk of bias. 5 of 16 studies showed potential risk of bias relating to blinding procedures. 3 of 16 studies showed potential risk of bias relating to incomplete data, 3 of 13 studies showed unclear risk of bias. 1 study showed potential risk of bias relating to selective reporting. 1 study showed potential risk of bias relating to other bias.

o 1 study showed unclear risk of bias related to random sequence generation,  blinding procedures and incomplete outcome data, as well as high risk of bias relating to allocation concealment. 1 study showed high risk of bias relating to blinding procedures, incomplete data and selective reporting.

pInconsistency with results, statistical heterogeneity>50%

Figuras y tablas -
Table 1. Summary of findings continued
Table 2. Sensitivity analysis

Outcome

No. of studies

No. of participants

Effect size

SMD (95% CI)

Heterogeneity (I2)

Test for overall effect

Subjective memory ‐ immediate

2

E = 127

C = 117

0.03 [‐0.24, 0.31]

10%

Z = 0.22 (P = 0.82)

Subjective memory ‐ intermediate

6

E = 396

C = 343

0.25 [0.11, 0.40]

0%

Z = 3.39 (P = 0.0007)

Subjective memory ‐ longer‐term

4

E = 325

C = 294

0.19 [0.03, 0.36]

0%

Z = 2.33 (P = 0.03)

Verbal memory ‐ immediate

5

E = 100

C = 96

0.72 [0.24, 1.19]

59%

Z = 2.96 (P = 0.003)

Verbal memory ‐ intermediate

2

E = 254

C = 209

0.22 [0.03, 0.40]

0%

Z = 2.32 (P = 0.02)

Verbal memory ‐ longer‐term

N/A

Visual memory ‐ immediate

5

E = 100

C = 94

0.27 [‐0.01, 0.56]

0%

Z = 1.86 (P = 0.06)

Visual memory ‐ intermediate

2

E = 251

C = 209

‐0.11 [‐0.29, 0.08]

0%

Z = 1.14 (P = 0.25)

Visual memory ‐ longer‐term

N/A

Working memory ‐ immediate

2

E = 49

C = 42

0.46 [‐0.68, 1.59]

84%

Z = 0.79 (P = 0.43)

Working memory ‐ intermediate

4

E = 284

C = 236

‐0.06 [‐0.28, 0.15]

11%

Z = 0.59 (P = 0.56)

Working memory ‐ longer‐term

2

E = 229

C = 193

‐0.02 [‐0.21, 0.17]

0%

Z = 0.18 (P = 0.86)

Information processing ‐ immediate

4

E = 131

C = 120

0.29 [‐0.04, 0.62]

40%

Z = 1.72 (P = 0.05)

Information processing ‐ intermediate

4

E = 294

C = 245

0.02 [‐0.14, 0.19]

0%

Z = 0.28 (P = 0.78)

Information processing ‐ longer‐term

N/A

Depression ‐ immediate

4

E = 93

C = 87

0.55 [0.03, 1.07]

65%

Z = 2.07 (P = 0.04)

Depression ‐ intermediate

6

E = 392

C = 350

0.29 [‐0.10, 0.67]

79%

Z = 1.45 (P = 0.15)

Depression ‐ longer‐term

4

E = 328

C = 270

0.14 [‐0.20, 0.48]

63%

Z = 0.80 (P = 0.42)

Anxiety ‐ immediate

N/A

Anxiety ‐ intermediate

3

E = 257

C = 214

0.29 [0.11, 0.48]

0%

Z = 3.11 (P = 0.002)

Anxiety ‐ longer‐term

3

E = 255

C = 193

0.27 [‐0.12, 0.65]

43%

Z = 1.37 (P = 0.17)

Quality of life ‐ immediate

4

E = 101

C = 96

0.49 [0.06, 0.91]

54%

Z = 2.25 (P = 0.02)

Quality of life ‐ intermediate

5

E = 340

C = 317

0.31 [‐0.01, 0.62]

64%

Z = 1.90 (P = 0.06)

Quality of life ‐ longer‐term

3

E = 295

C = 259

0.12 [‐0.05, 0.30]

5%

Z = 1.37 (P = 0.17)

Activities of daily living ‐ immediate

N/A

Activities of daily living ‐ intermediate

2

E = 100

C = 86

‐0.13 [‐0.60, 0.33]

37%

Z = 0.56 (P = 0.57)

Activities of daily living ‐ longer‐term

2

E = 100

C = 86

‐0.33 [‐0.63, ‐0.03]

0%

Z = 2.18 (P = 0.03)

E: Experimental; C: Control; SMD: Standardised mean difference.

Figuras y tablas -
Table 2. Sensitivity analysis
Comparison 1. Subjective memory measures

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1.1 Immediate Show forest plot

10

568

Std. Mean Difference (IV, Random, 95% CI)

0.32 [0.05, 0.58]

1.2 Intermediate Show forest plot

11

1045

Std. Mean Difference (IV, Random, 95% CI)

0.23 [0.11, 0.35]

1.3 Longer‐term Show forest plot

5

775

Std. Mean Difference (IV, Random, 95% CI)

0.16 [0.02, 0.30]

Figuras y tablas -
Comparison 1. Subjective memory measures
Comparison 2. Objective verbal memory

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

2.1 Immediate Show forest plot

19

922

Std. Mean Difference (IV, Random, 95% CI)

0.40 [0.22, 0.58]

2.2 Intermediate Show forest plot

6

753

Std. Mean Difference (IV, Random, 95% CI)

0.25 [0.11, 0.40]

2.3 Longer‐term Show forest plot

4

619

Std. Mean Difference (IV, Random, 95% CI)

0.13 [‐0.03, 0.29]

Figuras y tablas -
Comparison 2. Objective verbal memory
Comparison 3. Objective visual memory

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

3.1 Immediate Show forest plot

16

799

Std. Mean Difference (IV, Random, 95% CI)

0.42 [0.25, 0.60]

3.2 Intermediate Show forest plot

6

751

Std. Mean Difference (IV, Random, 95% CI)

0.20 [‐0.11, 0.50]

3.3 Longer‐term Show forest plot

4

619

Std. Mean Difference (IV, Random, 95% CI)

0.12 [‐0.13, 0.37]

Figuras y tablas -
Comparison 3. Objective visual memory
Comparison 4. Objective working memory

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

4.1 Immediate Show forest plot

12

655

Std. Mean Difference (IV, Random, 95% CI)

0.45 [0.18, 0.72]

4.2 Intermediate Show forest plot

8

821

Std. Mean Difference (IV, Random, 95% CI)

0.16 [‐0.09, 0.40]

4.3 Longer‐term Show forest plot

5

665

Std. Mean Difference (IV, Random, 95% CI)

0.04 [‐0.11, 0.20]

Figuras y tablas -
Comparison 4. Objective working memory
Comparison 5. Information processing

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

5.1 Immediate Show forest plot

15

808

Std. Mean Difference (IV, Random, 95% CI)

0.51 [0.19, 0.82]

5.2 Intermediate Show forest plot

8

933

Std. Mean Difference (IV, Random, 95% CI)

0.27 [0.00, 0.54]

5.3 Longer‐term Show forest plot

5

723

Std. Mean Difference (IV, Random, 95% CI)

0.21 [‐0.03, 0.45]

Figuras y tablas -
Comparison 5. Information processing
Comparison 6. Mood ‐ Depression Scale

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

6.1 Immediate Show forest plot

16

853

Std. Mean Difference (IV, Random, 95% CI)

0.34 [0.15, 0.53]

6.2 Intermediate Show forest plot

10

1003

Std. Mean Difference (IV, Random, 95% CI)

0.20 [‐0.06, 0.45]

6.3 Longer‐term Show forest plot

7

891

Std. Mean Difference (IV, Random, 95% CI)

0.15 [‐0.04, 0.34]

Figuras y tablas -
Comparison 6. Mood ‐ Depression Scale
Comparison 7. Mood ‐ Anxiety Scale

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

7.1 Immediate Show forest plot

4

178

Std. Mean Difference (IV, Random, 95% CI)

0.29 [‐0.01, 0.59]

7.2 Intermediate Show forest plot

4

502

Std. Mean Difference (IV, Random, 95% CI)

0.16 [‐0.15, 0.46]

7.3 Longer‐term Show forest plot

3

448

Std. Mean Difference (IV, Random, 95% CI)

0.27 [‐0.12, 0.65]

Figuras y tablas -
Comparison 7. Mood ‐ Anxiety Scale
Comparison 8. Quality of life

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

8.1 Immediate Show forest plot

8

371

Std. Mean Difference (IV, Random, 95% CI)

0.42 [0.15, 0.68]

8.2 Intermediate Show forest plot

6

683

Std. Mean Difference (IV, Random, 95% CI)

0.30 [0.02, 0.58]

8.3 Longer‐term Show forest plot

5

687

Std. Mean Difference (IV, Random, 95% CI)

0.17 [0.02, 0.32]

Figuras y tablas -
Comparison 8. Quality of life
Comparison 9. Activities of Daily Living

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

9.1 Immediate Show forest plot

4

265

Std. Mean Difference (IV, Random, 95% CI)

0.02 [‐0.26, 0.29]

9.2 Intermediate Show forest plot

4

400

Std. Mean Difference (IV, Random, 95% CI)

‐0.06 [‐0.36, 0.24]

9.3 Longer‐term Show forest plot

3

369

Std. Mean Difference (IV, Random, 95% CI)

‐0.11 [‐0.49, 0.27]

Figuras y tablas -
Comparison 9. Activities of Daily Living