Scolaris Content Display Scolaris Content Display

Programas de ejercicio para la espondilitis anquilosante

Contraer todo Desplegar todo

Antecedentes

Los programas de ejercicio se recomiendan a menudo para el tratamiento de la espondilitis anquilosante (EA), para reducir el dolor y mejorar o mantener la capacidad funcional.

Objetivos

Evaluar los efectos beneficiosos y perjudiciales de los programas de ejercicio para los pacientes con EA.

Métodos de búsqueda

Se hicieron búsquedas en CENTRAL, la Cochrane Library, MEDLINE Ovid, EMBASE Ovid, CINAHL EBSCO, PEDro, Scopus y en dos registros de ensayos hasta diciembre de 2018. Se realizaron búsquedas en las listas de referencias de las revisiones sistemáticas identificadas y los estudios incluidos, se realizaron búsquedas manuales en las actas de congresos pertinentes recientes y se estableció contacto con expertos en el área.

Criterios de selección

Se incluyeron informes de ensayos controlados aleatorizados (ECA) de adultos con EA que comparaban los programas de tratamiento con ejercicios con un control inactivo (ninguna intervención, lista de espera) o atención habitual.

Obtención y análisis de los datos

Se utilizó la metodología Cochrane estándar.

Resultados principales

Se incluyeron 14 ECA con 1579 participantes con EA. La mayoría de los participantes eran hombres (70%), la mediana de edad era de 45 años (rango 39 a 47) y la duración media de los síntomas era de nueve años. Los ejercicios utilizados con mayor frecuencia fueron los diseñados para ayudar a mejorar la fuerza, la flexibilidad, el estiramiento y la respiración. La mayoría de los programas de ejercicio se administraron junto con el tratamiento farmacológico o un agente biológico. La mayoría de los estudios se juzgó como en riesgo de sesgo incierto o alto para varios dominios. Los 14 estudios proporcionaron datos obtenidos inmediatamente después de completar el programa de ejercicios. La duración mediana del programa de ejercicios fue de 12 semanas (rango intercuartil [RIC] 8 a 16). Tres estudios (146 participantes) proporcionaron datos para el seguimiento a plazo medio (< 24 semanas después de la finalización de los programas de ejercicio) y uno (63 participantes) para el seguimiento a largo plazo (> 24 semanas después de la finalización de los programas de ejercicio). Nueve estudios compararon los programas de ejercicio con ninguna intervención; cinco estudios los compararon con la atención habitual (incluida la fisioterapia, la medicación o el autocuidado).

Programas de ejercicio frente a ninguna intervención

Todos los datos se obtuvieron inmediatamente después de completar el programa de ejercicio.

Para la función física, medida mediante un cuestionario de autoinforme (la escala del Bath Ankylosing Spondylitis Functional Index [BASFI] de 0 a 10; menor puntuación es mejor), la evidencia de calidad moderada no mostró una mejoría importante clínicamente significativa con los programas de ejercicio (diferencia de medias [DM] ‐1,3; intervalo de confianza [IC] del 95%: ‐1,7 a ‐0,9; 7 estudios, 312 participantes; reducción absoluta: 13%; IC del 95%: 17% a 9%).

Para el dolor, medido en una escala analógica visual (EAV, 0 a 10, menor puntuación es mejor), la evidencia de calidad baja mostró una reducción importante clínicamente significativa del dolor con el ejercicio (DM ‐2,1; IC del 95%: ‐3,6 a ‐0,6; seis estudios, 288 participantes; reducción absoluta 21%; IC del 95%: 36% a 6%).

Para la evaluación global de la actividad de la enfermedad por parte de los pacientes, medida mediante un cuestionario de autoinforme (la escala del Bath Ankylosing Spondylitis Disease Activity Index [BASDAI] de 0 a 10, menor puntuación es mejor), la evidencia de calidad moderada no mostró una reducción importante clínicamente significativa con el ejercicio (DM ‐0,9; IC del 95%: ‐1,3 a ‐0,5; 6 estudios, 262 participantes; reducción absoluta 9%; IC del 95%: 13% a 5%).

Para la movilidad de la columna, medida con un cuestionario autoinformado (la escala del Bath Ankylosing Spondylitis Metrology Index [BASMI], 0 a 10, menor puntuación es mejor), la evidencia de calidad muy baja mostró una mejoría con el ejercicio (DM ‐0,7; IC del 95%, ‐1,3 a ‐0,1; 5 estudios, 232 participantes) sin ningún beneficio importante clínicamente significativo (reducción absoluta 7%, IC del 95%: 13% a 1%).

Para la fatiga, medida en una EAV (0 a 10, menor puntuación es mejor), la evidencia de calidad muy baja no mostró una reducción importante clínicamente significativa con el ejercicio (DM ‐1,4; IC del 95%: ‐2,7 a ‐0,1; dos estudios, 72 participantes; reducción absoluta 14%; IC del 95%: 27% a 1%).

Programas de ejercicio frente a la atención habitual

Todos los datos se obtuvieron inmediatamente después de completar el programa de ejercicio.

Para la función física, medida con la escala del BASFI, la evidencia de calidad moderada mostró una mejoría con el ejercicio (DM ‐0,4; IC del 95%: ‐0,6 a ‐0,2; 5 estudios, 1068 participantes). No hubo beneficios importantes clínicamente significativos (reducción absoluta 4%, IC del 95%: 6% a 2%).

Para el dolor, medido en una EAV (0 a 10, menor puntuación es mejor), la evidencia de calidad moderada mostró una reducción del dolor con el ejercicio (DM ‐0,5; IC del 95%: ‐0,9 a ‐0,1; dos estudios, 911 participantes; reducción absoluta 5%; IC del 95%: 9% a 1%). No se encontró ningún beneficio importante clínicamente significativo.

Para la evaluación global de la actividad de la enfermedad por parte de los pacientes, medida con la escala del BASDAI, la evidencia de calidad baja mostró una reducción con el ejercicio (DM ‐0,7; IC del 95%: ‐1,3 a ‐0,1; 5 estudios, 1068 participantes), aunque no fue clínicamente importante (reducción absoluta 7%; IC del 95%: 13% a 1%) con un beneficio importante clínicamente significativo.

Para la movilidad de la columna, medida con la escala del BASMI, la evidencia de calidad muy baja no encontró una mejoría clínicamente significativa con el ejercicio (DM ‐1,2; IC del 95%: ‐2,8 a 0,5; dos estudios, 85 participantes; reducción absoluta 12%; IC del 95%: 5% menos a 28% más). No hubo ningún beneficio importante clínicamente significativo.

Ninguno de los estudios midió la fatiga.

Efectos adversos

Se encontró evidencia de calidad muy baja del efecto del ejercicio frente a ninguna intervención o a la atención habitual. No se conoce con certeza el potencial de efectos perjudiciales de los ejercicios, debido a las tasas bajas de eventos y al número limitado de estudios que informaron eventos.

Conclusiones de los autores

Se encontró evidencia de calidad moderada a baja de que los programas de ejercicio probablemente mejoran de forma leve la función, pueden reducir el dolor y probablemente reducen de forma leve la evaluación global de la actividad de la enfermedad por parte de los pacientes, en comparación con ninguna intervención, y medidos luego de la finalización del programa. Se encontró evidencia de calidad moderada a baja de que los programas de ejercicio probablemente tienen poco o ningún efecto sobre la mejoría de la función o la reducción del dolor, en comparación con la atención habitual, y pueden tener poco o ningún efecto sobre la reducción de la evaluación de la actividad de la enfermedad por parte de los pacientes, cuando se miden al finalizar los programas. No se sabe si los programas de ejercicio mejoran la movilidad de la columna, reducen la fatiga o inducen efectos adversos.

PICO

Population
Intervention
Comparison
Outcome

El uso y la enseñanza del modelo PICO están muy extendidos en el ámbito de la atención sanitaria basada en la evidencia para formular preguntas y estrategias de búsqueda y para caracterizar estudios o metanálisis clínicos. PICO son las siglas en inglés de cuatro posibles componentes de una pregunta de investigación: paciente, población o problema; intervención; comparación; desenlace (outcome).

Para saber más sobre el uso del modelo PICO, puede consultar el Manual Cochrane.

Resumen en términos sencillos

Efectos beneficiosos y perjudiciales de los programas de ejercicio para los pacientes con espondilitis anquilosante

Pregunta de la revisión

Se revisó la evidencia de los efectos beneficiosos y perjudiciales de los programas de ejercicio para los pacientes con espondilitis anquilosante (EA).

Antecedentes

Los programas de ejercicios se recomiendan a menudo para los pacientes con EA, para reducir el dolor y mejorar la movilidad articular o la función.

Características de los estudios

Se buscaron ensayos controlados aleatorizados (ECA) hasta diciembre de 2018. Se encontraron 14 informes (1579 participantes). Los estudios se realizaron en nueve países diferentes. La mayoría de los participantes eran hombres, de 39 a 47 años de edad, que habían presentado síntomas durante 9 a 18 años. En su mayoría, los programas incluyeron ejercicios desarrollados para mejorar la fuerza, la flexibilidad, el estiramiento y la respiración, y se agregaron al tratamiento farmacológico o a un agente biológico.

Resultados clave

Todos los datos se obtuvieron inmediatamente después de completar el programa de ejercicio.

Programas de ejercicio frente a ninguna intervención

El ejercicio probablemente mejora de forma leve la función (evidencia de calidad moderada), reduce de forma leve la actividad de la enfermedad informada por el paciente (evidencia de calidad moderada) y puede reducir el dolor (evidencia de calidad baja). No se conoce con certeza el efecto sobre la movilidad de la columna y la fatiga (evidencia de calidad muy baja).

La función física se midió en un cuestionario de autoinforme, la escala del Bath Ankylosing Spondylitis Functional Index (BASFI) (0 a 10; una menor puntuación significa mejor función). Los pacientes que no realizaron ejercicio calificaron su función en 4,1 puntos; los que realizaron ejercicio la calificaron en 1,3 puntos menos (13% de mejoría absoluta).

El dolor se midió en una escala analógica visual (EAV, 0 a 10; menor puntuación significa menor dolor). Los pacientes que no realizaron ejercicio calificaron su dolor en 6,2 puntos; los que realizaron ejercicio lo calificaron en 2,1 puntos menos (21% de mejoría absoluta).

La evaluación global de la actividad de la enfermedad por parte de los pacientes se midió en un cuestionario de autoinforme, el Bath Ankylosing Spondylitis Disease Activity Index (BASDAI, 0 a 10, menor puntuación significa menor actividad de la enfermedad). Los pacientes que no realizaron ejercicio calificaron la actividad de la enfermedad en 3,7 puntos; los que realizaron ejercicio la calificaron en 0,9 puntos menos (9% de mejoría absoluta).

La movilidad de la columna se midió en un cuestionario de autoinforme, el Bath Ankylosing Spondylitis Metrology Index (BASMI, 0 a 10, menor puntuación significa mejor movilidad). Los pacientes que no realizaron ejercicio calificaron la movilidad de la columna en 3,8 puntos; los que realizaron ejercicio la calificaron en 0,7 puntos menos (7% de mejoría absoluta).

La fatiga se midió en una EAV (0 a 10, menor puntuación significa menor fatiga). Los pacientes que no realizaron ejercicio calificaron la fatiga en 3 puntos; los que realizaron ejercicio la calificaron en 1,4 puntos menos (14% de mejoría absoluta).

Programas de ejercicio frente a la atención habitual

El ejercicio probablemente produce poca o ninguna mejoría en la función o en el dolor (evidencia de calidad moderada), y puede tener poco o ningún efecto en la reducción de la actividad de la enfermedad informada por los pacientes (evidencia de calidad baja). No se conoce con certeza el efecto sobre la movilidad de la columna (evidencia de calidad muy baja).

Funcionalidad física. Los pacientes que recibieron atención habitual calificaron su función en 3,7 puntos en el BASFI; los que realizaron ejercicio la calificaron en 0,4 puntos menos (4% de mejoría absoluta).

Dolor. Los pacientes que recibieron atención habitual calificaron su dolor en 3,7 puntos en una EAV de 10 puntos; los que realizaron ejercicio lo calificaron en 0,5 puntos menos (5% de mejoría absoluta).

Evaluación global del paciente de la actividad de la enfermedad. Los pacientes que recibieron atención habitual calificaron la actividad de la enfermedad en 3,7 puntos en el BASDAI; los que realizaron ejercicio la calificaron en 0,7 puntos menos (7% de mejoría absoluta).

Movilidad de la columna. Los pacientes que recibieron atención habitual calificaron la movilidad espinal en 8,9 puntos en el BASMI; los que realizaron ejercicio la calificaron en 1,2 puntos menos (12% de mejoría absoluta).

Ninguno de los estudios midió la fatiga.

Efectos adversos (EA)

Uno de los 67 participantes de los grupos de ejercicio, y ninguno de los 43 participantes de los grupos de control, experimentó un EA.

Calidad de la evidencia

La evidencia se disminuyó debido a problemas con el diseño de estudio, la variabilidad entre las intervenciones y la insuficiencia de datos, lo que dio lugar a evidencia de calidad moderada a muy baja en todos los resultados.

Authors' conclusions

Implications for practice

We found moderate‐ to low‐quality evidence indicating that exercise programmes compared to no intervention probably slightly improve function, may reduce pain (important clinical benefit), and probably slightly reduce patient‐assessed disease activity, measured after the completion of the exercise programmes. Whether there was an effect on spinal mobility and fatigue is uncertain. We found moderate‐ to low‐quality evidence that compared with usual care (including physiotherapy, medication, or self‐management), exercise programmes probably have little or no effect on improving function and reducing pain, and may have little or no difference on patient‐assessed disease activity. We are uncertain whether exercise programmes improve spinal mobility.

Readers should understand that we are uncertain of the potential for harm from exercise programmes, because of the limited number of studies reporting AEs, and the low rate of events.

We are unable to distinguish the best type of exercise, its components, or its mode of delivery.

Implications for research

The evidence for some of the major outcomes was low or very low quality, so new studies could change the estimate effects.

This review has raised new questions to answer:

  • The long‐term effects of exercise programmes for people with AS, and whether they are clinically relevant are unclear.

  • New trials should provide an accurate description of the content, dose, application, and adherence to the exercise interventions. The most effective components (e.g. supervised or home delivery) are unknown, as is the most effective dose, including frequency, intensity, and duration.

  • AEs were rarely measured and reported in RCT reports. Whether exercise programmes produce harmful effects is difficult to determine. AEs may be worthwhile to investigate in people with more advanced or severe stages of the disease. Studies should systematically investigate and report AEs.

  • Further studies should investigate the effect of exercise therapy in the early stages of the disease (even in the pre‐radiographic stages). Exercise programmes should be evaluated at different stages of the disease. This evaluation would be useful to ascertain whether the use of biologic agents and rehabilitation programmes in people with newly‐diagnosed, or early AS, are effective to prevent deformity and disability.

  • Studies of the effects of TNF blockers combined with exercise programmes, with cost‐effectiveness, are needed.

  • Cost‐effectiveness of interventions should be evaluated.

Future studies should use the Consensus on Exercise Reporting Template, or the CONSORT Template for Intervention Description and Replication, to improve the description of exercise programmes and facilitate their application in clinical practice.

Summary of findings

Open in table viewer
Summary of findings for the main comparison. Exercise programmes compared to no intervention for ankylosing spondylitis

Exercise programmes compared to no intervention

Patient or population: adults with ankylosing spondylitis
Setting: international hospitals, outpatient clinics, or home
Intervention: exercise programmes
Comparison: no intervention

Outcomes

Anticipated absolute effects* (95% CI)

Relative effect
(95% CI)

№ of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Risk with no intervention

Risk with exercise programmes

Physical function
assessed with self‐report questionnaire BASFI scale (0 (easy) to 10 (impossible)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean physical function in the control groups was 4.1a

The mean physical function in the exercise groups was 1.3 lower (1.7 lower to 0.9 lower)

312
(7 RCTs)

⊕⊕⊕⊝
MODERATEb

13% absolute reduction (95% CI 17% to 9%)

32% relative change (95% CI 23% to 42%)

NNTB 3 (2 to 4)

Pain
assessed with VAS scale (0 (no pain) to 10 (impossible)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean pain in the control groups was 6.2a

The mean pain in the exercise groups was 2.1 lower (3.6 lower to 0.6 lower)c

288
(6 RCTs)

⊕⊕⊝⊝
LOWb,d

MD ‐2.1 (95% CI ‐3.6 to ‐0.6)

21% absolute reduction (95% CI 36% to 6%)

34% relative change (95% CI 10% to 59%)

NNTB = 3 (2 to 8)

Patient global assessment of disease activity

assessed with self‐report questionnaire BASDAI scale (0 (absent) to 10 (extreme)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean patient global assessment of disease activity in the control groups was 3.7e

The mean patient global assessment of disease activity in the exercise groups was 0.9 lower (1.3 lower to 0.5 lower)

262
(6 RCTs)

⊕⊕⊕⊝
MODERATEb

9% absolute reduction (95% CI 13% to 5%)

27% relative change (95% CI 15% to 39%)

NNTB 4 (3 to 8)

Spinal mobility
assessed with self‐report questionnaire BASMI scale (0 (better) to 10 (very severe limitation)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean spinal mobility in the control groups was 3.8e

The mean spinal mobility in the exercise groups was 0.7 lower (1.3 lower to 0.1 lower)

232
(5 RCTs)

⊕⊝⊝⊝
VERY LOW b, d, f

7% absolute reduction (95% CI 13% to 1%)

18% relative reduction (95% CI 34% to 3%)

NNTB 5 (3 to 14)

Fatigue
assessed with VAS scale (0 (absent) to 10 (extreme)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean fatigue in the control groups was 3e

The mean fatigue in the exercise groups was 1.4 lower (2.7 lower to 0.1 lower)

72
(2 RCTs)

⊕⊝⊝⊝
VERY LOW b,f,g

14% absolute reduction (95% CI 27% to1%)

48% relative change (95% CI 5% to 91%)

NNTB 3 (1 to 9)

Adverse effects associated with exercises

Exercise programme duration: range 3 to 24 weeks

No adverse effects were reported in 43 control group participants

1 adverse effect was reported in 67 exercise group participants

Peto OR 6.25
(0.10 to 320.40)

110
(2 RCTs)j

⊕⊝⊝⊝
VERY LOW g,h

2% absolute increase (95% CI 5% less to 8% more)

152% relative change (95% CI 90% less to 5818% more)

it was not possible to calculate NNTB as too few events were reported

Withdrawals because of adverse events

Exercise programme duration: range 3 to 24 weeks

90 per 1000

96 per 1000
(68 to 134)

Peto OR 1.08
(0.74 to 1.57)

1343
(8 RCTs) j

⊕⊕⊝⊝
LOW b, i

1% absolute increase (95% CI 2% less to 4% more)

7% relative change (95% CI 23% less to 48% more)

NNTB was not applicable as results were not statistically significant

*The risk in the intervention groups (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

CI: confidence interval; OR: odds ratio; NNTB: number needed to treat (benefit); MD: mean difference; SMD: standardized mean difference; SD: standard deviation; BASFI: Bath Ankylosing Spondylitis Functional Index; VAS: visual analogue scale

GRADE Working Group grades of evidence
High quality: We are very confident that the true effect lies close to that of the estimate of the effect
Moderate quality: We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different
Low quality: Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect
Very low quality: We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect

aSouza 2017 is the source document for the control group baseline data

b Downgraded one level due to risk of detection bias for subjective outcomes (lack of blinding of participants)

c We calculated a pooled SMD and re‐expressed it in MD, as the SMD multiplied by the control group baseline SD (SF‐36 pain = 2.5 from Souza 2017)

d Downgraded one level for inconsistency; important heterogeneity

eMasiero 2011 is the source document for the control group baseline data

f Downgraded one level for imprecision; total number of participants less than 400 and large confidence intervals

g Downgraded one level for imprecision; low rate of events

h Downgraded two levels for risk of bias; no blinding, incomplete outcome reporting

i Downgraded one level for indirectness. Since only two studies explicitly monitored adverse events, we used dropouts or withdrawals for any reason as a major outcome measure to estimate adverse events

i Studies were included regardless of the comparator intervention

Open in table viewer
Summary of findings 2. Exercise programmes compared to usual care for ankylosing spondylitis

Exercise programmes compared to usual care

Patient or population: adults with ankylosing spondylitis
Setting: international hospitals, outpatient clinics, or home
Intervention: exercise programmes
Comparison: usual care (current practices included medication, self management, physiotherapy)

Outcomes

Anticipated absolute effects* (95% CI)

Relative effect
(95% CI)

№ of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Risk with usual care

Risk with exercise programmes

Physical function

assessed with self‐report questionnaire BASFI scale (0 (easy) to 10 (impossible)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean physical function in the control groups was 3.7a

The mean physical function in the exercise groups was 0.4 lower (0.6 lower to 0.2 lower)

1068
(5 RCTs)

⊕⊕⊕⊝
MODERATE b

4% absolute reduction (95% CI 6% to 2%)

11% relative change (95% CI 5% to 16%)

NNTB 10 (6 to 21)

Pain

assessed with VAS scale (0 (no pain) to 10 (impossible)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean pain in the control groups was 3.7a

The mean pain in the exercise groups was 0.5 lower (0.9 lower to 0.1 lower) c

911
(2 RCTs)

⊕⊕⊕⊝
MODERATE b

MD ‐0.5 (95% CI ‐0.9 to ‐0.1)

5% absolute reduction (95% CI 9% to 1%)

15% relative change (95% CI 2% to 22%)

NNTB = 10 (7 to 68)

Patient global assessment of disease activity

assessed with self‐report questionnaire BASDAI scale (0 (absent) to 10 (extreme)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean patient global assessment of disease activity in the control groups was 3.7a

The mean patient global assessment of disease activity in the exercise groups was 0.7 lower (1.3 lower to 0.1 lower)

1068
(5 RCTs)

⊕⊕⊝⊝
LOW a,d

7% absolute reduction (95% CI 13% to 1%)

19% relative change (95% CI 3% to 35%)

NNTB 6 (3 to 52)

Spinal mobility

assessed with self‐report questionnaire BASMI scale (0 (better) to 10 (very severe limitation)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean spinal mobility in the control groups was 8.9e

The mean spinal mobility in the exercise groups was 1.2 lower (2.8 lower to 0.5 higher)

85
(2 RCTs)

⊕⊝⊝⊝
VERY LOW

a,d, f

12% absolute change (95% CI 5% less to 28% more)

13% relative change (95% CI 6% less to 32% more)

NNTB = NA

Fatigue

see comment

(0 RCTs)

No included studies measured this outcome

Adverse effects associated with exercises

Exercise programme duration: range 3 to 24 weeks

No adverse effects were reported in 43 control group participants

1 adverse effect was reported in 67 exercise group participants

Peto OR 6.25
(0.10 to 320.40)

110
(2 RCTs) i

⊕⊝⊝⊝
VERY LOWg, h

2% absolute increase (95% CI 5% less to 8% more)

152% relative change (95% CI 90% less to 5818% more)

it was not possible to calculate NNTB as too few events were reported

Adverse events

Exercise programme duration: range 3 to 24 weeks

see comment

cannot be estimate

⊕⊝⊝⊝
VERY LOWg,h

Adverse events could not be calculate as events were not monitored or reported

*The risk in the intervention groups (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

CI: confidence interval; OR: odds ratio; NNTB: number needed to treat (benefit); MD: mean difference; SMD: standardized mean difference;

SD: standard deviation; BASFI: Bath Ankylosing Spondylitis Functionnal Index; VAS: visual analogic scale.

GRADE Working Group grades of evidence
High quality: We are very confident that the true effect lies close to that of the estimate of the effect
Moderate quality: We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different
Low quality: Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect
Very low quality: We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect

aRodriguez‐Lozano 2013 is the source document for the control group baseline data.

b Downgraded one level due to risk of detection bias for subjective outcomes (lack of blinding of participants)

c We calculated a pooled SMD and re‐expressed it as a MD; we multiplied the SMD by the control group baseline SD (VAS pain = 3.0 from Rodriguez‐Lozano 2013)

d Downgraded one level for inconsistency; important heterogeneity

eAltan 2012 is the source document for the control group baseline data

f Downgraded one level for imprecision; total number of participants less than 400, and large confidence intervals

g Downgraded one level for imprecision; low rate of events

h Downgraded two levels for risk of bias; no blinding, incomplete outcome reporting

i Studies were included regardless of the comparator intervention

Background

Description of the condition

Ankylosing spondylitis (AS) is a chronic, inflammatory rheumatic disease that mainly affects the axial skeleton and sacroiliac joints, causing characteristic inflammatory back pain (Braun 2003; Braun 2010; van der Heijde 2008). The inflammatory back pain is due to sacroiliitis and spondylitis, and to the formation of syndesmophytes, leading to ankylosis of the spine (Baraliakos 2005). AS can start early, and often affects young adults. Men are more affected than women, with a ratio of 2:1 (Braun 2007). The disease affects about 0.1% to 1.4% of the population, depending on the geographical region (Boonen 2006), and is closely associated with positivity for human leukocyte antigen 27 (Dougados 2011). In a recent systematic review, the estimated AS prevalence was reported to be 18.6/10,000 in Europe, 18.0/10,000 in Asia, 12.2/10,000 in Latin America, 39.9/10,000 in North America, and 7.4/10,000 in Africa (Dean 2014). The number of AS cases is estimated to range from 1.30 million to 1.56 million in Europe and 4.63 million to 4.98 million in Asia. The incidence ranges from 0.5 to 14 per 100,000 people per year, depending on the country (Braun 2007).

The main clinical features of AS are back pain and reduced mobility, caused by inflammation in the axial skeleton spinal region. Approximately one‐third of individuals report peripheral joint involvement, most often of the hip, shoulder, and knee joints. AS may also be associated with extra‐articular manifestations, including enthesitis, anterior uveitis, inflammatory bowel disease, and inflammatory skin conditions (Braun 2007). Enthesitis (inflammation of the entheses, the sites at which tendons or ligaments insert into the bone) is typical of, and a key problem in, AS, and occurs at peripheral joints, generally the hip, shoulder, knee, or heel. AS may result in varying degrees of structural and functional impairments and reduced general health (Dagfinrud 2005). The severity of symptoms and radiographic progression of the disease vary considerably: longer disease duration, increasing age, and smoking are associated with decreased functioning (Boonen 2006). A cohort study found individuals with a high C‐reactive protein (CRP) level and syndesmophytes to be at risk for radiographic progression of the spine (Poddubnyy 2012), an indicator of disease severity (Pradeep 2008). However, the major sequela of AS is decreased quality of life. Like many chronic diseases, AS is associated with high medical and socioeconomic costs: in a systematic review, Palla 2012 estimated that AS represents a total cost of USD 31.766 per year for individuals with increased functional disability and severe disease. About 20% of individuals with AS experience disability at work (Reveille 2012). Boonen 2006 found that AS had considerable impact on healthcare costs and workforce participation.

In AS, treatments are expected to reduce the pain and stiffness of the back and sacroiliac joints, and improve spine and peripheral joint mobility (Boonen 2004). Current recommendations for the global management of AS combine appropriate medication and exercises as the two cornerstones of treatment (Braun 2010). Pharmacological therapies have greatly improved disease management (Vliet 2009). Biologic therapies have been efficacious and have changed the management landscape of AS and axial spondyloarthritis (SpA (Baraliakos 2012)), particularly with the introduction of anti‐tumour necrosis factor (TNF) agents (Vliet 2009). However, some individuals with AS (20% to 40%) do not respond well to pharmacological treatments (Dougados 2011). Whether these treatments can prevent structural change is unclear. Non‐steroidal anti‐inflammatory drugs (NSAIDs) seem to affect new bone formation, and some data suggest that they can positively affect the radiographic progression of axial SpA (Poddubnyy 2012). The benefits of biologic treatment on the structural progression of the disease are still debated (van der Heijde 2008). Recent data indicate that biologic therapy can slow the structural progression of AS (Haroon 2013).

Description of the intervention

Exercise programmes have been used to treat AS and remain a part of its management (Braun 2010). Up to 10% to 20% of individuals with AS receive physical therapy in the United States (Reveille 2012). According to the typical clinical features of AS, exercise programmes have focused on improving or maintaining spinal and thoracic mobility. Recently, studies have been designed to target other aspects of physical fitness, and to develop muscular strength and aerobic capacity (Giannotti 2014). A growing body of evidence suggests a dose–response relationship between exercise and health effects, as for drugs, so the effect of exercise depends on the individual's adherence to the prescribed programme (Arem 2015; Vidoni 2015).

How the intervention might work

Exercise programmes are associated with different hypothesised mechanisms of effect (Kujala 2009; Hagen 2012), and may benefit people with AS (Altan 2012). They may help avoid stiffness, and improve or maintain functional capacity by moving joints, especially during back stretching, posture control, muscle strengthening, pulmonary function, and cardiovascular fitness (Fernández‐de‐Las‐Peñas 2005; Niedermann 2013). Other benefits include improving quality of life and reducing pain (Singh 2013). Different exercise programmes are available (Van den Berg 2012). Some clinical trials have reported that the use of tai chi, global posture re‐education, exercises combined with spa treatments, or multimodal exercise programmes may be effective but the effect of different types of exercise programmes remains unclear (Wang 2009). The exercises are extremely heterogeneous: they can vary in dosage, type of exercise, components, modes, and settings (Slade 2016). The optimal mode of delivery, optimal frequency and duration of treatment, and in particular whether particular components of exercise modalities can improve the clinical outcome need to be explored. A Cochrane Review of 11 RCTs of individuals with AS concluded that exercises have a small but significant positive effect on pain, spinal mobility, physical function, and patient global assessment (Dagfinrud 2008).

Why it is important to do this review

Given the publication of new RCTs on exercise programmes in AS, a comprehensive systematic review is important to examine the evidence for exercise for people with AS.

Objectives

To assess the benefits and harms of exercise programmes on physical function, pain, fatigue, and global assessment of disease activity in people with ankylosing spondylitis.

Methods

Criteria for considering studies for this review

Types of studies

We included RCTs.

Types of participants

We included studies involving adults (18 years or older, with no upper age limit) with a diagnosis of ankylosing spondylitis (AS) according to the modified New York criteria (Van der Linden 1984), with critical features of visible structural damage on the sacroiliac joint on X‐rays.

We excluded individuals with non‐radiographic axial spondyloarthritis (SpA (Slobodin 2015)), as defined by the European Spondyloarthropathy Study Group and the Amor criteria, or the Assessment of Spondyloarthritis International Society (ASAS) criteria for axial SpA (Rudwaleit 2009; Van den Berg 2013).

We included studies with other populations only if we were able to extract data for the AS group separately.

Types of interventions

We defined exercise as 'a form of physical activity that is planned, structured and repeated over a period of time' (Bouchard 2012), with the intention of 'reducing pain and disability and improving overall health' (Abenhaim 2000; Hayden 2012).

We included interventions that delivered any type of exercise. The exercises could aim to improve any combination of stretching, flexibility, mobilising, balance, aerobic, strengthening, or functional training. We considered multimodal physical therapy interventions if one group of participants received exercise as part of a multimodal intervention and the comparison group received a non‐exercise intervention (attentional, control intervention), or no intervention.

We considered trials that included co‐interventions. We included trials that compared an exercise programme plus a co‐intervention versus the co‐intervention alone (e.g. exercise training plus a non‐steroidal anti‐inflammatory drug (NSAID) versus the NSAID alone). The only difference between groups was the exercise intervention.

We included exercise programmes carried out in any setting or location (home, inpatient clinic, hospital, or elsewhere), with any type of delivery (individual, group, or mixed); they could be land‐based or water‐based.

We included specific programmes, such as tai chi or yoga.

We considered any trial comparing exercise programmes with:

  • No exercise (attention, no treatment, waiting list control). Participants were asked not to practice exercises during the study period.

  • Usual care (participants could practice exercises as usual).

We excluded trials with general activities (e.g. swimming or walking) that required only movements, and did not meet our definition of exercise.

Types of outcome measures

We assessed a core set of outcome measures recommended by the ASAS (www.asas‐group.org; Sieper 2009; Van der Heijde 1997), and the 1999 conference on Outcome Measures for Rheumatoid Arthritis Clinical Trials (Van der Heijde 1999). We extracted all outcomes for analysis according to the following preferred hierarchy:

Major outcomes
Physical function

If data on more than one physical function scale were provided for a trial, we extracted data on the physical function scale that was highest on the following list:

  • Physical function (Bath Ankylosing Spondylitis Functional Index (BASFI))

  • Dougados Functional Index (DFI)

  • Health Assessment Questionnaire for AS (HAQ‐AS)

Pain

If data on more than one type of pain scale were provided for a trial, we extracted data on the type of pain scale that was highest on the following list, according to a previously described hierarchy of pain‐related outcomes (Sieper 2009).

In a visual analogue scale (VAS) or numerical rating scale (NRS):

  • Total back or spine pain (Bath Ankylosing Spondylitis Disease Activity Index (BASDAI))

  • Overall pain

  • Back or spine pain at night

  • Overall pain at night

Patient global assessment of disease activity

If data on more than one patient global assessment of disease activity scale were provided for a trial, we extracted data on the patient global assessment of disease activity scale that was highest on the following list:

  • BASDAI

  • Patient global VAS or NRS (global disease activity in the previous week)

  • Stiffness VAS or NRS (duration of morning stiffness, spine, last week

Spinal mobility

If data on more than one spinal mobility scale were provided for a trial, we extracted data on the spinal scale that was highest on the following list:

  • Schober test score

  • Lateral spinal flexion

  • Cervical rotation

  • Occiput to wall movement

  • Chest expansion

  • Bath Ankylosing Spondylitis Metrology Index (BASMI)

We considered including BASMI and other spinal scales as separate outcomes.

Fatigue

  • BASDAI fatigue question.

Safety

  • Withdrawals due to adverse events (AEs).

  • Severe AE outcomes: inpatient hospitalisation, life‐threatening events, or death

  • Adverse effects associated with the exercise intervention: we extracted the proportion of participants who experienced adverse effects related to exercise programmes (including joint or muscle contractures, fatigue, pain, falls, functional limitations)

Minor outcomes
Quality of life

  • Medical Outcomes Survey Short Form‐36 (SF‐36)

  • Ankylosing Spondylitis Quality of Life Instrument (ASQoL)

  • EuroQol (EQ‐5D)

Acute‐phase reactant

  • C‐reactive protein (CRP) level (mg/L) or erythrocyte sedimentation rate (ESR)

Physician global assessment
Peripheral joints, entheses (pain, swelling, and tenderness)

  • Number of swollen joints (44‐joint count (Braun 2007))

  • Validated enthesitis score, such as the Maastricht Ankylosing Spondylitis Enthesis Score (MASES), the University of California, San Francisco Index, and the Berlin Index

Timing of outcome assessment

We extracted outcome measures at the following three times points:

  • end of intervention – measured immediately after completion of the exercise programme

  • medium‐term follow‐up – < 24 weeks after completion of the exercise programme

  • long‐term follow‐up – ≥ 24 weeks after completion of exercise

Search methods for identification of studies

Electronic searches

We searched the following electronic databases for primary studies, from database inception up to the search date. The last search was in 14 December 2018:

  • the Cochrane Central Register of Controlled Trials (CENTRAL; 2018, Issue 12) in the Cochrane Library (searched 14 December 2018);

  • MEDLINE Ovid (1946 to 14 December 2018);

  • Embase Ovid (1974 to 14 December 2018);

  • CINAHL EBSCO (Cumulative Index to Nursing and Allied Health Literature; 1982 to 14 December 2018);

  • PEDro (www.pedro.org.au/; searched 14 December 2018);

  • Scopus (searched 14 December 2018).

We searched the Cochrane Database of Systematic Reviews (14 December 2018) and the Database of Abstracts of Reviews of Effect (up to 14 December 2018) to identify relevant systematic reviews.

The queries combined free text words and controlled vocabulary. The search strategy was based on synonyms of (“exercise”) AND “spondyloarthritis”. The Cochrane Musculoskeletal Review Group's Information Specialist helped to develop each search strategy.

The electronic search strategy for MEDLINE is outlined in Appendix 1. We adapted this search strategy for use with other databases. We used the 'optimal sensitive search strategies' designed to identify clinical trials, described by Lefebvre 2011.

We did not restrict the search by language of publication or publication status.

Searching other resources

We hand‐searched the reference lists of selected trials and systematic reviews identified from electronic searches, and also searched in Google and Google Scholar.

We searched the proceedings of the conferences of the American College of Rheumatology (on July 2013, November 2014) , European League Against Rheumatism (EULAR) (October 2013; November 2014), and Osteoarthritis Research Society International (on April 2013, April 2014) available online, and contacted authors and field experts for any additional published or unpublished data.

We searched the US National Institutes of Health Ongoing Trials Register ClinicalTrials.gov (www.clinicaltrials.gov; searched December 2018) and the World Health Organization International Clinical Trials Registry Platform (apps.who.int/trialsearch/; searched December 2018) to identify any studies in progress.

We present a flow diagram of search results and selection of studies in Figure 1.


Study flow diagram. Search results from original June 2015 literature search, and May 2016 and January 2017 updates

Study flow diagram. Search results from original June 2015 literature search, and May 2016 and January 2017 updates

Data collection and analysis

Selection of studies

We removed duplicate records from the references identified. Two review authors (JPR, TD) independently reviewed the titles and abstracts of citations identified from the search strategy to select potentially relevant studies. Then, we obtained the full text of all potentially eligible studies and screened them for inclusion, according to the eligibility criteria. We resolved disagreements by reaching a consensus, or by consulting a third review author (MMLC) if necessary. We linked multiple reports relating to the same trial, or to trials with potentially overlapping populations. If the possibility of overlapping populations could not be excluded, we selected the most recent trial.

Data extraction and management

Two review authors (TD, JPR, or MMLC) independently extracted the results of individual trials by using a standardised, piloted extraction form, accompanied by a codebook. Disagreements were resolved by reaching consensus, or by consulting a third review author if necessary. The extraction form, based on other forms used by the Cochrane Musculoskeletal Review Group, was pilot tested with five reports of RCTs.

We extracted the following information:

  1. Trial characteristics (funding, settings and number of centres, country, study design);

  2. Participant characteristics (age, sex, measure of functional status, level of pain, description of radiographic damage, biologic medications, NSAIDs, corticosteroids or other drugs, coexisting diseases, other);

  3. Intervention characteristics:

    1. number of intervention groups;

    2. content and type of each intervention (details);

    3. qualitative data: a detailed description of the interventions, including the different components of the programme received by each group, mode of delivery (individual, group, over internet), with or without supervision (face‐to‐face or at home), clinical expertise and background of the healthcare professionals who provided the exercise programmes (physiotherapist, fitness instructor, registered nurse, other), and adherence. We followed the reporting of Saunders 2016 to evaluate adherence by including: (1) attendance at the exercise programme sessions, and (2) compliance with the protocol or exercise instructions during the training sessions.

    4. quantitative data: the number of sessions, timing and duration of each session, duration of each component, and overall duration intensity. We collected these data as more frequent interventions, conducted over a long time, may influence outcomes.

  4. Outcomes reported, including individual effect measures used (continuous or dichotomous data) and timing of outcome measurement.

  5. AEs: we reported any AEs and/ or adverse effects related to the interventions in each group.

  6. Economic data: we summarised economic evaluations in additional tables when available.

When necessary, we used PlotDigitizer to approximate data from graphs (arohatgi.info/WebPlotDigitizer/index.html). We entered the data into Review Manager 5 (RevMan 2014), and checked for accuracy.

Assessment of risk of bias in included studies

We evaluated the risk of bias in each included study according to Cochrane's 'Risk of bias' tool (Higgins 2011a). Two review authors (TD, JPR, or MMLC) independently examined seven specific domains: sequence generation, allocation concealment, blinding of participants, blinding of personnel who delivered exercise programmes, blinding of outcome assessors, incomplete outcome data, and selective outcome, plus other potential sources of bias (i.e. imbalanced baseline characteristics, small study participants, lack of power calculation, no assessments of attendance).

We separately assessed the blinding of self‐reported subjective outcomes (e.g. pain, function, health‐related quality of life) and the blinding of independent outcome assessors to objective outcomes (such as AEs).

Studies were classified at low risk of bias if all domains were assessed at low risk for potential bias; high risk of bias if one or more categories was assessed at high risk of bias; and unclear risk of bias if one or more key domains was assessed at unclear risk of bias. We resolved disagreements by discussion, or by consulting a third review author if necessary.

Measures of treatment effect

We calculated point estimates and 95% confidence intervals (CIs) for outcomes of individual RCTs whenever possible.

We summarised the intervention effect estimates in a meta‐analysis only when estimates displayed sufficient clinical and statistical homogeneity. The estimate of the common treatment effect was the weighted average of the individual estimates for each study.

If the meta‐analysis resulted in statistically significant overall estimates, we transformed these treatment effect measures (pooled estimate of the relative risk or SMD) into measures that are clinically useful in daily practice, such as number needed to treat for an additional beneficial outcome (NNTB), or number needed to treat for an additional harmful outcome (NNTH), and the absolute or relative improvement in the original units. We calculated the absolute risk difference and relative percentage change by using the recommendations provided by the Musculoskeletal Review Group (musculoskeletal.cochrane.org).

We assumed a minimal clinically important difference (MCID) of 1.5 on a 10‐cm scale for pain, patient global assessment of disease activity, physical function, or physician global assessment. We defined an important clinical benefit as an outcome improvement that was more than 15% for an absolute change, and more than 20% for a relative change (Tubach 2012). We did not consider outcome changes that were below these values to be clinically important.

For dichotomous data

We analysed AEs by using Peto odds ratios (Peto ORs).

For dichotomous outcomes, we calculated the NNTB or NNTH from the control group event rate (unless the population event rate was known) and the relative risk, by using the Visual Rx NNT calculator (Cates 2008). We used the baseline values observed in the comparator group in the trials.

For continuous data

We summarised results, such as mean differences (MD), if the same tool was used to measure the same outcome across studies. We calculated the standardised MD (SMD) when the same outcome was measured with different units and methods of assessment across studies (e.g. pain scales). SMDs are calculated by dividing the MD by the standard deviation (SD); we calculated 95% CIs.

To enhance interpretability of continuous outcomes, we back‐transformed pooled SMDs for overall pain and disability to an original 0 to 10 VAS for pain. When the direction of a scale (i.e. SF‐36, 100 representing more favourable state of health) differed from the VAS for pain (10 defining high pain), we subtracted the mean from the maximum possible value for the scale, following the procedure recommended by Cochrane, and described in Chapter 9 of the Cochrane Handbook for Systematic Reviews of Interventions (Deeks 2011).

For continuous outcomes, we calculated the absolute risk difference as the mean difference between intervention and control groups in the original measurement units (divided by the scale), expressed as a percentage; the relative difference was calculated as the absolute change (or MD) divided by the baseline mean of the control group obtained from a representative trial, or the pooled baseline mean calculated in RevMan 5 by using the generic inverse variance method (Buchbinder 2015). We re‐expressed outcomes pooled using SMDs as changes by multiplying by a representative control group baseline SD. We calculated the NNTB by using the Wells calculator software available at the Cochrane Musculoskeletal Review Group editorial office.

If we could not summarise results as described above, we reported them as 'other data' in narrative form, but did not include them in the meta‐analysis (Deeks 2011).

Unit of analysis issues

For studies containing more than two intervention groups, we combined groups to create a single pair‐wise comparison following the procedure recommended by Cochrane (Higgins 2011b).

Dealing with missing data

We contacted the original investigators to request any missing outcome data. If we did not receive a response, we sent two e‐mail reminders, with two‐week intervals.

For continuous outcomes with no SD reported, we calculated SDs from standard errors (SEs), 95% CIs, or P values (Higgins 2011c).

Assessment of heterogeneity

As recommended in the Cochrane Handbook for Systematic Reviews of Interventions (Deeks 2019, ch10.10), we assessed the presence of heterogeneity. We used the I² statistic: the percentage of the variability in effect estimates that is due to heterogeneity rather than sampling error (Higgins 2011a). We interpreted the value of the I² statistic according to the following thresholds:

  • 0% to 40%: might not be important

  • 30% to 60%: may represent moderate heterogeneity

  • 50% to 90%: substantial heterogeneity

  • 75% to 100%: considerable heterogeneity (Deeks 2011).

We also computed the 95% CI for the I² statistic (Ioannidis 2007a), and the between‐study variance Tau², estimated from the random‐effects model (Rucker 2008). .

When we found substantial to considerable heterogeneity (severe heterogeneity), we checked the extracted data and insured that the numbers were correctly entered in the analysis software. When the number of trials was sufficient, we discussed the potential sources of heterogeneity by identifying a study that could be responsible of the presence of heterogeneity. As it is recommended, we did not exclude any study from the meta‐analyses unless it can be considered as an outlier for an obvious reason (ie conflicting data). We also used the random‐effects model with the DerSimonian and Laird approach to take into account the clinical differences between the studies included (Deeks 2011, Deeks 2019).

Assessment of reporting biases

To assess the presence of small study effects, we had planned to visually inspected funnel plots for each meta‐analysis when the required statistical conditions were met (≥ 10 studies, no significant heterogeneity, and a ratio of the maximal to minimal variance across studies > 4).

Data synthesis

We performed a meta‐analysis if the data from the studies were sufficiently clinically and statistically homogeneous. Because of large clinical heterogeneity between exercise programmes, participants, and characteristics, we used the random‐model effects for all meta‐analyses . We analysed and presented data separately by common control group intervention: exercise programmes versus no intervention, and exercise programmes versus usual care.

We analysed data at study completion, medium‐term follow‐up (< 24 weeks after study completion), and long‐term follow‐up (> 24 weeks after study completion).

In this review, we included studies with different characteristics, used different types of interventions, and reported effects on different outcomes measures. For a better description and standardisation, we presented a synthesis of these different characteristics in additional tables. We systematically described the key exercise programme components, according to the items recently proposed by Slade 2016 in the 'Characteristics of included studies' tables.

Subgroup analysis and investigation of heterogeneity

We planned to separate the data analysis on the basis of the control group intervention. We did not perform the other planned subgroup analyses (see the "Differences between protocol and review" section) because of the small number of studies in each group.

Sensitivity analysis

We did not perform any sensitivity analyses (see the "Differences between protocol and review" section).

'Summary of findings' tables

We included 'Summary of findings' tables to provide key information concerning the quality of evidence, the magnitude of effect of the interventions examined, and the sum of available data on the main outcomes, as recommended in the Cochrane Handbook for Systematic Reviews of Interventions (Schünemann 2011a). We assessed the overall quality of the evidence for each main outcome by using the GRADE approach (Schünemann 2011b).

We developed 'Summary of findings' tables using GRADEpro GDT (GRADEpro GDT).

For the 'Summary of findings' tables, we included the following outcomes for each main comparison:

  • Physical function

  • Pain

  • Patient global assessment of disease activity

  • Spinal mobility

  • Fatigue

  • Adverse effects associated with exercise

  • Adverse events

Results

Description of studies

Results of the search

January 2015: we identified 806 citations after removing duplicates, and excluded 745 studies after screening titles and abstracts. In total, we selected 64 full‐text reports for evaluation. After assessing all records, we included 11 unique studies. Among the 64 full‐text reports, we contacted 18 authors (see Table 1): nine responded, and we obtained data for one study (Dönmez 2014). We identified two congress reports, but had insufficient information to include or extract the data, so we listed one study in the Characteristics of studies awaiting classification section. In addition, we identified four ongoing trials (Gallinaro 2016; Souza 2017; ChiCTR‐TRC‐14004650; NCT02098694). See Characteristics of ongoing studies.

Open in table viewer
Table 1. Authors contacted for missing or additional data

Authors

First contact

Second contact

Response

Altan 2012

25/05/2015

28/05/2015

Colina 2009

04/05/2015

13/05/2015

no e‐mail response

Durmus 2009

15/04/2015

04/05/2015

no e‐mail response

Dönmez 2014

15/04/2015

18/04/2015

Gunay 2012

15/04/2015

04/05/2015

no e‐mail response

Ince 2006

02/06/2015

05/06/2015

Kjeken 2013

02/06/2015

03/06/2015

Kraag 1990

02/06/2015

03/06/2015

Lim 2005

no available contact

no e‐mail response

Masiero 2011

20/04/2015

04/05/2015

04/05/2015

Masiero 2015

16/06/2016

20/06/2016

Mesquita 2014

15/04/2015

04/05/2015

no e‐mail response

Rodriguez‐Lozano 2013

02/06/2015

03/06/2015

Sveaas 2014

02/06/2015

04/06/2015

Sveeas 2018

24/01/2018

no e‐mail response

Sweeney 2002

19/05/2015

02/06/2015

no e‐mail response

Widberg 2009

02/06/2015

06/06/2015

Updated search in May 2016: we searched the listed electronic databases for reports of randomised controlled trials (RCT) published from January 2015 to May 2016. The search resulted in 75 records to screen. We assessed two full‐text records to determine their eligibility. We included one new study (Garcia 2015).

Updated search in January 2017: we searched the listed electronic databases for RCT reports published from May 2016 to 31 January 2017. The search identified 51 records. We included two new studies identified in a previous search as ongoing trials (Gallinaro 2016; Souza 2017). Souza 2017 had published their data in a scientific journal, and Gallinaro 2016 had limited data published on ClinicalTrials.gov, and additional data on a thesis online website (www.teses.usp.br/teses/disponiveis/5/5169/tde‐04112016‐150051/fr.php).

A flow chart shows the overall search process in Figure 1.

We performed a further search in December 2018. We added those results to Characteristics of studies awaiting classification, and will incorporate them into the review at the next update.

Included studies

We provided a full description of each included study in the 'Characteristics of included studies' table. We also provided a descriptive summary of the information on trials, participants, and exercise programmes in additional tables (Table 2; Table 3; Table 4).

Open in table viewer
Table 2. Summary of characteristics of included studies (N = 14)

Characteristics

N (%) or median (IQR)

Location

Brazil

Canada

Italy

Korea

Norway

Spain

Sweden

Turkey

UK

2 (14%)

1 (7%)

1 (7%)

1 (7%)

2 (14%)

2 (14%)

1 (7%)

3 (21%)

1 (7%)

Study design

RCT

14 (100%)

Number of study arms

2

3

11 (79%)

3 (21%)

Type of comparator

Usual care

No treatment

5 (36%)

9 (64%)

Total number participants per study

55 (35 to 73)

Trial size

> 100 subjects/arm

≤ 100 subjects/arm

3 (21)

11 (79)

Number subjects per arm

26 (15 to 29)

Study duration (weeks)

14 (range 12 to 24)

N (%) is the number of studies that reported the characteristic of interest

Open in table viewer
Table 3. Summary of characteristics of participants in included studies (N = 14)

Characteristics

N (%) or median (IQR)

Age (years)

45 (39 to 47)

Gender

Male

Female

70 (56 to 77)

33 (25 to 45)

Diagnostic criteria*

Modified New York

The Ankylosing Spondylitis Disease Activity Score

European spondyloarthropathy

not reported

10 (71%)

2 (14%)

1 (7%)

2 (14%)

Severity disease*

Bath Ankylosing Spondylitis Disease Activity Index ≥ 3.5

Bath Ankylosing Spondylitis Disease Activity Index < 3.5

Ankylosing Spondylitis stage1 or 2

no information

5 (36%)

2 (14%)

4 (29%)

3 (21%)

Disease duration (years)

9 (9 to 18)

coexisting medical treatments

Analgesics (in 2 studies)

Anti‐Tumour Necrosis Factor (in 7 studies)

Disease Modifying Anti‐Rheumatic Drug (in 5 studies)

Nonsteroidal anti‐inflammatory drugs (in 9 studies)

Sulfasalazine (in 4 studies)

No treatment (in 2 studies)

No information reported (in 4 studies)

21% (16% to 26%)

29% (14% to 38%)

17% (11% to 19%)

75% (32% to 76%)

22% (11% to 49%)

17% (10% to 15%)

NA

* N (%) is the number of studies that reported the characteristic of interest

Open in table viewer
Table 4. Summary of exercise programme characteristics in the included studies (N = 14)

Characteristics

N (%) or median (IQR)

Modalities

Monomodal

Multidisciplinary

9 (64%)

5 (36%)

Exercise components

Pain relief

Breathing

Cardio fitness

Flexibility, stretching

Endurance

Motion (active or passive)

Proprioception, posture

Relaxation

Strength

no information

1 (7%)

7 (50%)

2 (14%)

8 (57%)

1 (7%)

5 (36%)

4 (29%)

2 (14%)

9 (64%)

1 (7%)

Provider

Physiotherapist

Other trainer

Self delivery

Unclear

7 (50%)

3 (21%)

2 (14%)

2 (14%)

Supervision

With supervision

No supervision

Unclear

8 (50%)

3 (21%)

3 (21%)

Dose

Session duration (minutes)

Frequency (session/week)

programme duration (weeks)

60 (50 to 60)

3 (2 to 3)

12 (8 to 16)

N (%) is the number of studies that reported the characteristics of interest

We included a total of 14 reports of RCTs. Reports were published between 1990 and 2017. Three trials were conducted in Turkey (Altan 2012; Dönmez 2014; Ince 2006), two in Spain (Garcia 2015; Rodriguez‐Lozano 2013), two in Norway (Kjeken 2013; Sveaas 2014), two in Brazil (Gallinaro 2016; Souza 2017), and one in Canada (Kraag 1990), South Korea (Lim 2005), Italy (Masiero 2011), United Kingdom (Sweeney 2002), and Sweden (Widberg 2009).

Design

All included studies were RCTs, with a parallel‐group design. There were no cross‐over trials. Eleven studies included two groups, and three included three groups (Dönmez 2014; Gallinaro 2016; Masiero 2011). Most studies (N = 11, 79%) included fewer than 100 participants per group. The median number of participants per group was 26 (interquartile range (IQR): 15 to 29). All studies reported final values or pre–post differences for the exercise and control groups. We calculated individual study effects from means and standard deviations (SD). In one study, Masiero 2011 reported medians and IQRs. We used the formulas described by Hozo 2005 to estimate the mean and SD.

Participants

Participiants were recruited from hospital departments (Gallinaro 2016; Ince 2006; Kjeken 2013; Lim 2005; Masiero 2011; Rodriguez‐Lozano 2013; Souza 2017; Sveaas 2014; Sweeney 2002; Widberg 2009), clinics (Altan 2012), and arthritis patient associations (Garcia 2015; Kraag 1990; Sweeney 2002); the source was unclear in Dönmez 2014.

The 14 studies included a total of 1579 participants. The median sample size was 55 (range 35 to 73). The median age was 45 years (range 39 years to 47 years). Most participants were male (median 70% men). The modified New York criteria for ankylosing spondylitis (AS) diagnosis were most frequently used (71%). The median disease duration was nine years from diagnosis (range 9 years to 18 years). Many participants received non‐steroidal anti‐inflammatory drugs (NSAID (75%)); others received tumour necrosis factor (TNF) blockers (29%), or sulphasalazine (22%).

Interventions and comparators

Descriptions are provided in Table 4 and the 'Characteristics of included studies' tables.

The median exercise programme duration was 12 weeks (IQR 8 weeks to 16 weeks), with a median of three sessions (range two to seven) per week, and a median duration of 60 minutes per session (IQR 50minutes to 60 minutes). The description of dose components of exercise programmes was limited in three studies, in which exercise programmes were practiced at home (Kraag 1990; Lim 2005; Sweeney 2002). Intensity was variable and incompletely reported across studies.

Exercise programmes

For the 14 included studies, nine (64%) investigated exercise programmes alone in the experimental group (monomodal), and five (36%) combined exercise programmes with other interventions (education, self‐management). The exercise programmes included a variety of components. The most commonly used components were strengthening exercises (64%), flexibility or stretching exercises (57%), and breathing exercises (50%). Most of the studies were land‐based (11 studies). Two studies included an aquatic component in their exercise programmes (Garcia 2015; Kjeken 2013). One study was conducted only in water (Garcia 2015).

Exercise programmes were performed under the supervision of a therapist in nine studies (Altan 2012; Dönmez 2014; Gallinaro 2016; Garcia 2015; Ince 2006; Kraag 1990; Souza 2017; Sveaas 2014; Widberg 2009). Two studies instructed participants to undertake unsupervised exercise at home (Lim 2005; Sweeney 2002); three did not clearly report exercise supervision (Kjeken 2013; Masiero 2011; Rodriguez‐Lozano 2013).

Nine studies reported the setting of the intervention. Six studies delivered exercise programmes in facilities (Garcia 2015; Ince 2006; Sveaas 2014; Widberg 2009), or combined them with home delivery (Masiero 2011; Rodriguez‐Lozano 2013). Three studies were performed at participants' homes (Kraag 1990; Lim 2005; Sweeney 2002); five did not clearly mention where the exercise programmes were performed (Altan 2012; Dönmez 2014; Gallinaro 2016; Kjeken 2013; Souza 2017).

Control group interventions

Five included studies (36%) compared an exercise programme to usual care (Altan 2012; Kjeken 2013; Rodriguez‐Lozano 2013; Sweeney 2002; Widberg 2009). Nine studies (64%) compared an exercise programme to no intervention (Dönmez 2014; Gallinaro 2016; Garcia 2015; Ince 2006; Kraag 1990; Lim 2005; Masiero 2011; Souza 2017; Sveaas 2014). For two of the nine studies, the description of the control intervention was unclear, and we had to contact the trial authors for additional information (Dönmez 2014; Ince 2006; Table 1). Based on the response from the two trial authors, we classified the control intervention as 'no intervention'.

Adherence to exercise programmes

We were unable to analyse the attendance, since attendance or compliance was not clearly reported in most of the included studies.

Outcomes

The outcomes measured in each trial are summarised in Table 5, Table 6 and Table 7. For all 14 studies, the end of the intervention was considered the final data collection point (range 3 to 24 weeks).

Open in table viewer
Table 5. Major outcomes reported in the 14 included studies (part 1)

Study

Physical function (BASFI)

Patient global assessment (BASDAI)

Mobility (BASMI)

Mobility (chest expansion)

Mobility (occiput to wall distance)

Mobility ( Schober test)

Mobility (Fingertip to floor)

Mobility (Cervical Rotation)

Altan 2012

Yes

Yes

Yes

Yes

Dönmez 2014

Yes

Yes

Yes

Garcia 2015

Yes†

Yes†

Gallinaro 2016

Yes ††

Yes ††

Yes ††

Yes ††

Yes ††

Yes ††

Yes

Ince 2006

Yes

Yes

Yes (modified)

Yes

Kjeken 2013

Yes

Yes

Yes*

Kraag 1990

Yes

Yes

Yes

Lim 2005

Yes

Yes

Masiero 2011

Yes

Yes

Yes

Yes

Yes

Rodriguez‐Lozano 2013

Yes

Yes

Souza 2017

Yes

Yes

Yes

Yes

Sveaas 2014

Yes

Yes

Yes

Sweeney 2002

Yes

Yes

Widberg 2009

Yes

Yes

Yes

Yes

* Data are missing. cannot be included in the analysis

† median and 25th to 75th percentile reported

†† multiple exercise groups combined

BASFI: Bath Ankylosing Spondylitis Functionnal Index

BASDAI: Bath Ankylosing Spondylitis Disease Activity Index

BASMI: Bath Ankylosing Spondylitis Metrology Index

Open in table viewer
Table 6. Major outcomes reported in the 14 included studies (part 2)

Study

Pain (VAS)

Pain(SF‐36)

Pain (BASDAI)

Pain

(Nocturnal pain)

Pain

(Self efficacy scale Pain)

Fatigue

(Basdai)

Adverse Effects

associated with exercise

Altan 2012

Yes

Dönmez 2014

Yes†

Garcia 2015

Yes

Yes

Gallinaro 2016

Yes ††

Yes ††

Ince 2006

Kjeken 2013

Kraag 1990

Yes

Lim 2005

Yes

Masiero 2011

Yes**

Yes

Rodriguez‐Lozano 2013

Yes

Yes

Souza 2017

Yes

Sveaas 2014

Sweeney 2002

Yes

Widberg 2009

** mean score calculated from lumbar and cervical pain

† median and 25th to 75th percentile reported

†† multiple exercise groups combined

BASDAI: Bath Ankylosing Spondylitis Disease Activity Index

VAS: visual analogue scale

SF‐36: 36‐Item Short‐Form Health Survey

Open in table viewer
Table 7. Minor outcomes reported in the 14 included studies

Study

Quality of life (ASQoL)

Quality of life (SF‐36)

Quality of life (SF‐12)

physical component

CRP level

(mg/dL)

ESR

(mm/h)

MASES

Altan 2012

Yes

Dönmez 2014

Yes*

Garcia 2015

Yes†

Gallinaro 2016

Yes ††

Yes ††

Ince 2006

Kjeken 2013

Yes*

Kraag 1990

Lim 2005

Masiero 2011

not reported

not reported

Rodriguez‐Lozano 2013

Yes

Souza 2017

Yes*

Yes

Yes

Sveaas 2014

Yes

Yes

Sweeney 2002

Widberg 2009

* global score was not reported; could not be included in the analysis

† median and 25th to 75th percentile reported

†† multiple exercise groups combined

ASQoL: the Ankylosing Spondylitis Quality of Life

SF‐36: 36‐Item Short‐Form Health Survey

SF‐12: 12‐Item Short Form Health Survey

CRP: C‐reactive protein

ESR: erythrocyte sedimentation rate

MASES: Maastricht Ankylosing SpondylitisEnthesitis Score

Major Outcomes

Among the main outcomes (Table 5, Table 6), most trials included a measure of physical function (Bath Ankylosing Spondylitis Functional Index (BASFI), N = 12), and global patient assessment of disease activity (Bath Ankylosing Spondylitis Disease Activity Index (BASDAI), N = 11); fewer included measures of overall pain (N = 9), fatigue (N = 2), or adverse effects (N = 2). For spinal mobility, the Bath Ankylosing Spondylitis Metrology Index (BASMI) was the most commonly reported (N = 8), but other descriptors were also reported (chest expansion N = 6; distance occiput to wall distance N = 2; distance finger to floor N = 4; or the Schober test N = 3). No study explicitly reported adverse events. Only two studies monitored and reported adverse effects associated with the exercise intervention.

Minor outcomes

Quality of life was reported for five studies (Table 7): two studies used the Ankylosing Spondylitis Quality of Life (ASQoL) scale (Altan 2012; Rodriguez‐Lozano 2013), three used the SF‐36 (Dönmez 2014; Kjeken 2013; Souza 2017), and one used the SF‐12 (Garcia 2015). Only Sveaas 2014 and Souza 2017 reported C‐reactive protein (CRP) levels and erythrocyte sedimentation rates (ESR). No study reported peripheral joint modification scales.

Follow‐up

Three studies reported data at medium‐term follow‐up, from 12 to 24 weeks (Altan 2012; Dönmez 2014; Masiero 2011). The mean duration follow‐up period was 18 weeks. One study reported a 48‐week long‐term follow‐up (Kjeken 2013). We contacted 10 trial authors requesting missing data for unreported or partially reported outcomes (Table 1).

Excluded studies

We excluded 54 studies at full‐paper review, as described in (Figure 1). We excluded eight studies (Characteristics of excluded studies: ; Ciprian 2013; Colina 2009; Durmus 2009; Gunay 2012; Karahan 2016; Kraag 1990, Lee 2008; Masiero 2015); five were controlled but not randomised trials (Colina 2009; Durmus 2009; Gunay 2012, Lee 2008; Masiero 2015), one study was a duplicate of an included study (Kraag 1990); The intervention was irrelevant in two studies (Ciprian 2013; Karahan 2016).

Ongoing studies

See Characteristics of ongoing studies.

We identified two ongoing studies registered in the WHO ICTRP as potentially eligible, but results were not available. The two studies compared exercise programmes in Norway (NCT02098694), and China (ChiCTR‐TRC‐14004650).

Awaiting Studies

See Characteristics of studies awaiting classification.

We identified one study as potentially eligible after we read the abstract, but we could not access the full‐text article (Mesquita 2014). We tried to contact the trial authors for additional information, but received no response (Table 1).

We added one study report from our updated January 2018 search (Sveeas 2018), as we were unable to determine if the results of this study were new, or if it was a secondary analysis from the previous study (Sveaas 2014). We attempted to contact the authors, but received no response.

Risk of bias in included studies

The overall risk of bias assessment of the included studies is presented in Figure 2.


'Risk of bias' summary: review authors' judgements about each risk of bias item for each included study

'Risk of bias' summary: review authors' judgements about each risk of bias item for each included study

Allocation

Random sequence

We judged nine studies (64%) at low risk of bias, because they used and reported an appropriate method of randomisation (Altan 2012; Dönmez 2014; Garcia 2015; Ince 2006; Kjeken 2013; Rodriguez‐Lozano 2013; Souza 2017; Sveaas 2014; Widberg 2009).

We assessed five trials (36%) at unclear risk of bias because the methods used to generate allocation sequence were not described, or were unclear (Gallinaro 2016; Kraag 1990; Lim 2005; Masiero 2011; Sweeney 2002).

Allocation concealment

We judged three studies (21%) at low risk of bias, since they provided adequate information on the method of allocation concealment (Kjeken 2013; Masiero 2011; Rodriguez‐Lozano 2013).

For 11 studies (79%), the method used to conceal allocation sequence was unclear, or not described (Altan 2012; Dönmez 2014; Gallinaro 2016; Garcia 2015; Ince 2006; Kraag 1990; Lim 2005; Souza 2017; Sveaas 2014; Sweeney 2002; Widberg 2009).

Blinding

Participant and care provider blinding

We judged all studies at high risk of bias.

Blinding participants and care providers is difficult because of the nature of the intervention. Most of the included studies did not report information on blinding, or a masking procedure for treatment allocation or delivery. No studies reported using a blinding procedure (sham or attentional comparator, or blinding of study hypothesis (Boutron 2007)).

Outcome assessor

We judged all studies at high risk of bias. Most included studies used subjective outcomes (self‐reporting, self‐performance). Because participants were not blinded to treatment allocation, we considered the outcome assessors to be unblinded.

For studies that reported spinal mobility outcome, we considered them to be at unclear risk of bias, because it was impossible to evaluate whether assessors were blinded to treatment allocation (Gallinaro 2016; Ince 2006; Kraag 1990).

Incomplete outcome data

Eight studies (57%) reported no withdrawals, and drop‐out rates were less than 20% at study completion (Altan 2012; Dönmez 2014; Gallinaro 2016; Garcia 2015; Ince 2006; Masiero 2011; Souza 2017; Widberg 2009). We judged these studies at low risk of bias.

Five studies (36%) reported higher rates (Kjeken 2013; Lim 2005; Rodriguez‐Lozano 2013; Sveaas 2014; Sweeney 2002), and one trial reported an unbalanced rate between groups (Kraag 1990). Consequently, we judged these studies at high risk of bias. Only one study used an intention‐to‐treat approach for analysis (Souza 2017).

Selective reporting

Three studies (21%) had a registered protocol (Gallinaro 2016; Souza 2017; Sveaas 2014). We assessed two of them (14%) at low risk of reporting bias, because all outcomes reported were pre‐specified in the protocol (Gallinaro 2016; Souza 2017).

We judged two studies (14%) at high risk of bias, because we found outcomes listed and not reported in the results section of the published report (Kjeken 2013, Sveaas 2014).

We judged the 10 remaining studies (71%) at unclear risk of reporting bias, because we could not compare the pre‐specified outcomes with the reported ones.

Other potential sources of bias

We judged one study at low risk of bias because we identified no other potential source of bias (Rodriguez‐Lozano 2013). Three studies (21%) reported a power sample calculation (Rodriguez‐Lozano 2013; Souza 2017; Sveaas 2014).

Effects of interventions

See: Summary of findings for the main comparison Exercise programmes compared to no intervention for ankylosing spondylitis; Summary of findings 2 Exercise programmes compared to usual care for ankylosing spondylitis

Exercise programmes versus no intervention

Major outcomes

Data were obtained at the end of the intervention; see summary of findings Table for the main comparison.

Physical function (BASFI, 0 to 10 scale; lower score indicates higher function)

Seven studies (312 participants) found a reduction in physical function score with exercise versus no intervention at the end of the intervention (mean difference (MD) ‐1.3, 95% confidence interval (CI) ‐1.7 to ‐0.9); absolute risk difference 13% (95% CI 9% to 17%); relative change 32% (95% CI 23% to 42%); Analysis 1.1). The statistical heterogeneity was not important (I²= 23%) . There was no important clinical meaningful benefit. Because of study limitations, we downgraded the evidence by one point for high risk of bias; we rated the quality of the evidence as moderate (Dönmez 2014; Gallinaro 2016; Garcia 2015; Lim 2005; Masiero 2011; Souza 2017; Sveaas 2014).

Two studies (93 participants) found a reduction in physical function score with exercise at medium‐term follow‐up (overall 14 weeks (MD ‐1.5, 95% CI ‐1.8 to ‐1.2; Analysis 1.1)), which was clinically important (absolute risk difference 15% (95% CI 12% to 18%); relative change 57% (95% CI 44% to 67%)) (Dönmez 2014; Masiero 2011). The statistical heterogeneity was not important (I² = 0%).

Pain (VAS, 0 to 10; lower score indicates less pain)

The pooled analysis of six studies (288 participants) showed a decrease in pain with exercise at the end of the intervention (standardised mean difference (SMD) ‐0.82, 95% CI ‐1.4 to ‐0.25; Analysis 1.2; need to report the back‐translated mean difference too here, as per methods (MD ‐2.1, 95% CI ‐3.6 to ‐0.6; 6 studies; absolute reduction 21%, 95% CI 36% to 6%) absolute reduction 21% (95% CI 6% to 3 6% better); relative reduction 34% (95% CI 10% to 59% better); (Dönmez 2014; Gallinaro 2016; Garcia 2015; Lim 2005; Masiero 2011; Souza 2017)). There was an important clinical meaningful benefit. The statistical heterogeneity was considerable (I² = 81%). No rationale could be found to explain the observed severe heterogeneity. Because of study limitations, we downgraded the evidence by one level each for high risk of bias and imprecision; we rated the quality of the evidence as low. One study of 52 participants reported conflicting data (Kraag 1990) in their report. As the reported size effect (MD 0.4, 95% CI ‐0.2 to 0.9) was discordant and inconsistent with the findings of the other six studies, we decided not to include this study in the pooled analysis.

At medium‐term follow‐up (12 to 16 weeks), two studies (93 participants) assessed pain (Dönmez 2014; Masiero 2011) ; We found a statistically significant reduction of pain (SMD ‐2.46, 95% CI ‐5.19 to 0.28). The statistical heterogeneity was considerable (I² = 95%). No rationale could be found to explain the observed severe heterogeneity.

Patient global assessment of disease activity (BASDAI, 0 to 10 scale; lower score indicates lower disease activity)

Six studies (262 participants) found participants who exercised reported statistically significantly lower activity disease at the end of the intervention (MD ‐0.9, 95% CI ‐1.3 to ‐0.5; Analysis 1.3; absolute risk difference 9% (95% CI 5% to 13%); relative change 27% (95% CI 15% to 39%; Dönmez 2014; Gallinaro 2016; Garcia 2015; Masiero 2011; Souza 2017; Sveaas 2014)). The statistical heterogeneity was not important (I² = 18%). There was no important clinical meaningful benefit. Because of study limitations, we downgraded the evidence by one level for high risk of bias; we rated the quality of the evidence as moderate.

Two studies (93 participants) found a statistically significant reduction in patient global assessment of disease activity with exercise at medium‐term follow‐up (MD ‐1.1, 95% CI ‐1.6 to ‐0.7; Analysis 1.3; Dönmez 2014; Masiero 2011). The statistical heterogeneity was not important (I² = 0%).

Spinal mobility

Schober test (tape distance in cm; longer distance indicates greater spinal mobility)

Three studies used the Schober test to assess spinal mobility. One study (51 participants) reported change from baseline, and found no evidence of difference between groups in spinal mobility (Kraag 1990). Two studies (85 participants) reported final values from a Schober test (Gallinaro 2016), and a modified Schober test (Ince 2006). Pooled results found no evidence of difference between groups (SMD 0.4, 95% CI ‐1.0 to 0.25) at the end of the intervention. The statistical heterogeneity was moderate (I² = 45%). There was no important clinical benefit. Because of study limitations, we downgraded the evidence by one level for high risk of bias, and by two levels for imprecision; we rated the quality of the evidence as very low.

BASMI (0 to 10 scale; lower score indicates greater spinal mobility)

Five studies (232 participants) found more spinal mobility with exercise at the end of the intervention (MD ‐0.7, 95% CI ‐1.3 to ‐0.1; Analysis 1.4; absolute risk difference 7% (95% CI 1% to 13%); relative change 18% (95% CI3% to 34%) (Dönmez 2014; Gallinaro 2016; Masiero 2011; Souza 2017; Sveaas 2014). The statistical heterogeneity was substantial (I² = 51%).There was no important clinical meaningful benefit. Because of study limitations, we downgraded the evidence by one level each for high risk of bias, inconsistency, and imprecision; we rated the quality of the evidence as very low.

Two studies (93 participants) found more spinal mobility at medium‐term follow‐up (overall 14 weeks) with exercise (MD ‐1.4, 95% CI ‐2.0 to ‐0.8; Analysis 1.4; (Dönmez 2014; Masiero 2011)). The statistical heterogeneity was moderate (I² = 45%).

Fatigue (VAS, 0 to 10; lower score indicates less fatigue)

Two studies (72 participants) found a statistically significant reduction in fatigue with exercise versus no intervention at the end of the intervention (MD ‐1.4, 95 CI% ‐2.7 to ‐0.1; Analysis 1.5; absolute risk difference 14%, 95% CI 1% to 27%; relative change 48% (95% CI 5% to 91%; Garcia 2015; Masiero 2011). The statistical heterogeneity was substantial (I² = 70%). There was no important clinical meaningful benefit. Because of study limitations, we downgraded the evidence by one level each for high risk of bias, imprecision, and inconsistency; we rated the quality of the evidence as very low.

At medium‐term follow‐up (24 weeks), one study (42 participants) found a reduction of fatigue with exercise (Masiero 2011). The mean fatigue with exercise was 2.1 on a 10‐point VAS scale, with no exercise it was 3.7 (MD 1.6, 95% CI ‐2 to ‐1.2 ).

Minor outcomes
Quality of life (lower number is better)

We meta‐analysed two of the five studies that assessed quality of life as an outcome.Two studies (85 participants) found inconclusive effects of exercise (MD 1.74, 95% CI ‐0.44 to 3.91; Analysis 1.6; Gallinaro 2016; Garcia 2015). The statistical Heterogeneity was not important (I² = 0%). For three studies, data were either not available (Dönmez 2014), or could not be extracted (Kjeken 2013; Souza 2017).

C‐Reactive Protein (CRP) and Erythrocyte Sedimentation Rate (ESR)

Two studies (84 participants) reported data for CRP and ECR (Souza 2017; Sveaas 2014). For CRP, we found inconclusive results (MD 1.38, 95% CI ‐4.34 to 7.10 Analysis 1.7) at the end of the intervention . The statistical heterogeneity was substantial (I2 = 71 %). No rationale could be found to explain the observed severe heterogeneity. For ESR, exercise reduced the level of ESR (MD ‐5.36, 95% CI ‐10.31 to ‐0.41 Analysis 1.8). The statistical heterogeneity was not important (I2 = 0%).

Maastricht ankylosing spondylitis enthesitis score (MASES; 0 to 13 scale, lower is better)

One study (55 participants) reported final values at 16 weeks for the exercise and control groups on the 13‐point MASES. No statistical difference was reported between groups (Gallinaro 2016).

Exercise programmes versus usual care

Major outcomes

Data were obtained at the end of the intervention; See summary of findings Table 2.

Physical function (BASFI, 0 to 10 scale, lower score indicates higher function)

Five studies (1068 participants) found a reduction in physical function score (improvement; indicates higher function) with exercise (MD ‐0.4, 95% CI ‐0.6 to ‐0.2; absolute risk difference 4%; 95% CI 2% to 6%; relative change 11%, 95% CI 5% to 16%; Analysis 2.1; Altan 2012; Kjeken 2013; Rodriguez‐Lozano 2013; Sweeney 2002; Widberg 2009). The statistical heterogeneity was not important (I² = 0%). There was no important clinical meaningful benefit. Because of study limitations, we downgraded the evidence by one level for high risk of bias; we rated the quality as moderate.

One study (53 participants) reported data at medium‐term follow‐up (Altan 2012). The results were inconclusive (MD ‐0.60, 95% CI ‐1.6 to 0.4; Analysis 2.1). One study (63 participants) reported data at long‐term follow‐up (48 weeks). The results were inconclusive (MD ‐0.10, 95% CI ‐0.84 to 0.64; Kjeken 2013).

Pain (VAS, 0 to 10; lower score indicates less pain)

Two studies (911 participants) reported pain, but used different scales (Rodriguez‐Lozano 2013; Sweeney 2002). Rodriguez‐Lozano 2013 used a VAS to measure pain; Sweeney 2002 used the Standford Self efficacy pain Scale (SES). Pooled analysis found a reduction in pain with exercise (SMD ‐0.2, 95% CI ‐0.3 to ‐0.03; Analysis 2.2; absolute reduction 6%, 95% CI 1% to 8% better; relative reduction 15%, 95% CI 2% to 22% better. The statistical heterogeneity was not important (I² = 0%). There was no important clinical meaningful benefit. Because of study limitations, we downgraded the evidence by one level for high risk of bias; we rated the quality as moderate.

Patient global assessment of disease activity (BASDAI, 0 to 10 scale; lower score indicates lower disease activity)

Five studies (1068 participants) found a statistically significant reduction in patient global assessment of disease activity with exercise (MD ‐0.7, 95% CI ‐1.3 to ‐0.1; Analysis 2.3; absolute risk difference 7%, 95% CI 1% to 13%; relative change 19%, 95% CI 3% to 35%; Altan 2012; Kjeken 2013; Rodriguez‐Lozano 2013; Sweeney 2002; Widberg 2009). The statistical heterogeneity was substantial (I² = 70%). No rationale could be found to explain the observed severe heterogeneity. There was no important clinical meaningful benefit. Because of study limitations, we downgraded the evidence by one level each for high risk of bias and inconsistency; we rated the quality as low.

One study (93 participants) reported data at medium‐term follow‐up (24 weeks). A statistically significant improvement was found in patient global assessment of disease activity with exercise (MD ‐0.70, 95% CI ‐1.7 to 0.3; Altan 2012).

One study (63 participants) reported data at long‐term follow‐up (48 weeks); the results were inconclusive (MD ‐0.5, 95% CI ‐1.4 to 0.4; Kjeken 2013).

Spinal mobility

Schober test (tape distance in cm; longer distance indicates greater spinal mobility)

No study used the Schober test.

BASMI (0 to 10 scale; lower score indicates greater spinal mobility)

We meta analysed two of the three studies that used the BASMI (Altan 2012; Kjeken 2013; Widberg 2009). Kjeken 2013 did not report data. Two studies (85 participants) found inconclusive results for spinal mobility (MD ‐1.2, 95% CI ‐2.8 to 0.5; Analysis 2.4; absolute risk difference 12%. 95% CI 5% to 28%; relative change 163%, 95% CI 6% to 32%). The statistical heterogeneity was considerable (I² = 82%). No rationale could be found to explain the observed severe heterogeneity. There was no important clinical meaningful benefit. Because of study limitations, we downgraded the evidence by one level each for high risk of bias, inconsistency, and imprecision; we rated the quality as very low.

One study reported (53 participants) data at medium‐term follow‐up (24 weeks); the results were inconclusive for spinal mobility (MD ‐0.7 (95% CI ‐1.6 to 0.2; Analysis 2.4; Altan 2012).

One study (63 participants) reported data at long‐term follow‐up (48 weeks); the results were inconclusive (MD ‐0.00, 95% CI ‐0.6 to 0.6; Kjeken 2013).

Fatigue (VAS, 0 to 10; lower score indicates less fatigue)

We found no study measuring fatigue.

Minor outcomes
Quality of life (18‐point ASQol scale; lower number is better)

Data from two studies (809 participants) found inconclusive evidence for quality of life (MD ‐0.36, 95% CI ‐1.68 to 0.95; Analysis 2.5; Altan 2012; Rodriguez‐Lozano 2013). The statistical heterogeneity was moderate (I² = 46%)

C‐Reactive Protein (CRP) and Erythrocyte Sedimentation Rate (ESR)

No study measured CRP or ESR.

MASES (0 to 13 scale, lower is better)

No study measured the enthesitis index.

Safety
Adverse effects (AE) associated with exercises

Two studies (110 participants) reported adverse effects related to exercises programmes (Altan 2012; Gallinaro 2016). Because of very low‐quality evidence, we are uncertain of the effect of exercise programmes on AEs (Peto odds ratio (OR) 6.25, 95% CI 0.1 to 320; absolute risk difference 2%, 95% CI 5% fewer to 8% more; relative change 152%, 90% decrease to 5818% increase). The absolute numbers were very low: 1/67 in the exercise group versus 0/43 in the control group. Altan 2012 reported that one participant had an increase of back pain related to exercise. He did not mention whether this resulted in hospitalisation or not (Analysis 3.1). No adverse effects were considered serious. Because of study limitations, we downgraded the evidence by one point for high risk of bias and by two points for imprecision (large CI, small number of studies).

Adverse events

We found no study reporting adverse events.

Subgroup and sensitivity analyses

Given the small number of studies, we did not conduct subgroup analysis to explore the possible effect of type of delivery (supervision versus non‐supervised, mono versus multimodal intervention) on estimated effect size. Neither did we conduct a sensitivity analysis, because we judged all studies at unclear or high risk of bias for most items.

Assessment of publication bias

We had planned to assess publication bias by visual inspection of funnel plots, but we did not generate funnel plots because of the limited number of studies (< 10), and the risk of an underpowered test. We were unable to determine the existence of publication bias.

Discussion

Summary of main results

The main purpose of this review was to evaluate the benefit and harmful effects of exercise programmes for participants in trials of ankylosing spondylitis (AS). Overall, 14 randomised controlled trials (RCT; total of 1579 participants) met the inclusion criteria. Exercise programmes were examined alone in nine trials, and were combined with other interventions (education or self‐management) in five trials. Exercise programmes were compared to usual care in five trials, and to no intervention (waiting list, advice, no exercise) in nine trials. Exercise programmes included different components, and were delivered in a variety of ways.

We found moderate‐ to low‐quality evidence suggesting that exercise programmes, compared to no intervention, probably slightly improve function, may reduce pain (with an important clinical benefit), and probably slightly reduce patient assessment of disease activity at the end of the intervention. Whether there was an effect on spinal mobility and fatigue is unclear.

There is moderate‐ to low‐quality evidence that compared with usual care (including physiotherapy, medication, or self‐management), exercise programmes probably have little or no difference in improving function or reducing pain, and may have little or no difference in patient assessment of disease activity. We are uncertain whether exercise programmes improve spinal mobility.

All studies reported effects at the completion of the intervention. Only two studies assessed the medium‐term follow‐up effect of exercise programmes on physical function, patient global assessment of disease activity, and spinal mobility. One study reported long‐term follow‐up effects.

We have no clear evidence that exercise programmes can induce more adverse effects. Two studies reported adverse effects as an outcome. Only a small number of events were observed (one versus none in comparator groups). We were unable to draw any conclusions.

Overall completeness and applicability of evidence

The evidence provided by this review is limited to the 14 included RCTs that assessed the effects of exercise programmes versus no intervention, or usual care. We did not include three RCTs that were potentially eligible for this review, because their results have not yet been reported in full. Two were ongoing trials (ChiCTR‐TRC‐14004650; NCT02098694). According to the abstract of one trial, home exercises may improve function and spinal mobility, and ameliorate patient‐assessment of disease activity after 10 weeks. However, the control intervention is unknown.

Whether the trial participants reflect individuals with AS undergoing treatment is difficult to determine. Most of the studies in this review included more than 70% men. Since AS seems to affect men and women differently, our results may have limited applicability to women (Dagfinrud 2005; Ramiro 2014). The median age of participants across the included studies was 45 years (interquartile range (IQR) 39 to 47), which is representative of the overall population of patients with AS (Ramiro 2014; Webers 2016). The mean duration of disease of the participants was nine years in the included studies.The effects of exercise programmes of this review should be extrapolated with cautious to people with a shorter disease duration. None of the studies investigated the impact of exercise programmes in people with early or newly diagnosed AS. Only one study included participants with a short disease duration (median 2.5 to 3.5 years), and showed beneficial effects of exercise on mobility (Widberg 2009). In the last decade, improved imaging techniques and criteria for early diagnosis of AS have facilitated earlier and more effective medical treatment (Liang 2015; Lubrano 2015). However, we lack studies of the efficacy of physical exercises on individuals with early forms of AS. Five studies included participants with a BASDAI score ≥ 3.5, corresponding to patients with a low patient‐assessed disease activity (i.e. because the BASDAI scores were < 4/10 units). However, a cutoff of 3.9 to 4 is frequently used to define active disease, discriminating between people with well or poorly controlled disease, but this cutoff does not have a firm justification (Cohen 2006).

Most exercise programmes were delivered in conjunction with drug therapy (standard NSAIDs, disease‐modifying anti‐rheumatic drugs (DMARDs), or biological agents). The benefits of exercise programmes, depending on the type of drug therapy received, cannot be determined. Participants received different types of drug therapy. Nine of the included studies reported that 75% of the participants were taking NSAIDs. For seven studies, 29% of participants received anti‐TNF agents. Four studies did not report or provide any information. No study specifically evaluated the efficacy of physical exercises with biologic versus standard NSAID or DMARD therapy. Since the introduction of TNF blockers, the role of physical exercise in individuals receiving TNF blockers has rarely been studied. We found only one RCT of participants receiving TNF blockers (Masiero 2011). In this study, for participants with clinically stabilised AS who had started TNF blocker therapy at least nine months previously, an educational‐behavioural intervention and exercise training further improved spinal mobility, and reduced pain, stiffness, and disability. One hypothesis is that TNF blockers that reduce inflammation, pain, and fatigue may improve the efficacy of, and compliance with, regular physical exercises and activities, thereby resulting in better function and less disability (Maxwell 2015). Moreover, more motivated individuals are likely to spend longer periods of time on exercise, because they will have greater perceived benefits of exercise regimes (Dubey 2008). Further research should aim to determine the efficacy of exercise interventions in patients with AS receiving TNF blockers.

The included studies investigated a number of different types and combinations of exercise components. Breathing, strengthening, and stretching exercises were the most frequent exercise components. However, the components were incompletely described in most trials. For example the material used, who provided the intervention, how it was supervised, and where the exercise was delivered were often missing. However, the duration (mean 60 minutes) and frequency (three sessions/week) was similar among studies. The minimal effective dose and optimal level was unclear. The exercise dosage could not be explored with indirect statistical techniques, such as meta‐regression. Thus, we did not investigate heterogeneity by the type of exercise, because we were unable to isolate individual types of exercise from the programme reported by the authors.

Information was also lacking on adherence to exercise, which is important to assess with regular exercise and long disease duration. The optimal exercise programme for individuals with AS, and its efficacy, are still unknown. The poor reporting of non‐pharmacological interventions is well known, despite the existence of reported guidelines (TIDieR), thereby limiting the implementation of research results in clinical practice (Hoffmann 2014). Slade 2016 recently developed a specific template, Consensus on Exercise Reporting Template (CERT), for reporting exercise programmes in clinical trials. We hope that this template will help authors improve the reporting of exercise in clinical trials.

Another issue relates to outcomes. We assessed important outcomes for participants based on the ASAS/EULAR recommendations, to show a short‐term clinical benefit of exercise programmes versus no intervention (Sieper 2009; Van der Heijde 1997; www.asas‐group.org). The most common outcomes measured in the included studies were physical function (N = 12, 86%), patient global assessment of disease activity (N = 11, 79%), spinal mobility and pain (N = 7, 50%). Quality of life and fatigue were not frequently reported. The RCTs ranged from 8 to 12 weeks’ duration, so all data for benefit are based on only short‐term studies. Follow‐up effects were measured in only 14% of studies (N = 2). Whether the effects persist after the completion of exercise programmes is unknown.

Lastly, we were unable to assess the safety of exercise programmes, because adverse effects were not systematically monitored and reported in publications. Severe adverse events are rare with exercises but it can happen (for example fall). Exercise programmes are generally associated with minor adverse effects (muscle or joint pain, soreness) related to interventions (Kunutsor 2018). In our review, adverse effects were reported in one study, and only one event was associated with exercise programmes. Direct evidence for safety was not found, particularly for populations at risk, such as older people, or those with more severe AS. Data provided by Jacques 2014 suggested that mechanical strain can trigger inflammation, and cause bone degradation in mice. This result supports the need to systematically monitor adverse effects of exercise programmes, to determine if exercises are safe or if some adverse effects might be expected for example with a modification of exercise dosing / intensity (McGonagle 2014).

Quality of the evidence

We had concerns about risk of bias for all studies included: 79% (N = 11) had unclear allocation concealment; all failed to blind participants, staff, or outcome assessors; 43% had incomplete outcome data (N = 6), and 86% had unclear selective reporting, or a high risk of other bias (N = 12). Given the number of studies included in the review, we cannot rule out the existence of a small‐study effect, explaining the magnitude of the positive results we found.

We considered statistically significant group differences between exercise programmes versus no intervention or usual care. Larger effects were found when exercise was compared to no intervention. For each comparison, the number of studies (< 10), and small samples (many studies were small, with < 100 participants) might have contributed to a low‐power analysis. Low power is associated with bias (Button 2013). Most studies we included were assessed at high or unclear risk of bias, which suggests that the estimated effects might be over‐estimated, and reduces the likelihood that they reflect a true effect. We cannot provide conclusions with a high level of confidence. The magnitude of the estimated effects may change with larger studies.

We only presented the findings of trials that reported the major outcomes of interest in summary of findings Table for the main comparison and summary of findings Table 2; and used the GRADE approach to assess the quality of the evidence examined for each outcome (Schünemann 2011b). Most of the evidence was downgraded to low or very low quality, based on three factors: risk of bias, inconsistency generated by heterogeneity, and imprecision with small trials and large confidence intervals.

Potential biases in the review process

We made all attempts to reduce the bias involved in the review process by including the best available evidence. All studies included were randomised trials. We conducted an extensive search of the literature in all relevant databases, but because two studies have not yet been incorporated, this may be a source of potential bias. Two review authors independently selected studies, extracted data, and assessed the risk of bias. For missing data, we attempted to extract data that were graphically displayed by using software tools (arohatgi.info/WebPlotDigitizer/index.html), or to systematically seek information from authors of the included studies.

The review itself has some limitations. We could not determine whether participants who received usual care also had exercises, because the included studies poorly described the content of usual care interventions. Participants in the usual care group could have practiced exercises, which could explain why a smaller effect size was always found when comparing exercise programmes to usual care. A possible explanation could also be the result of performance bias, due to lack of blinding.

We found wide variations among the trials, likely related to different exercise components. . Despite the pre‐specification stated in the protocol, we could not perform subgroup analyses to explore heterogeneity for factors, such as supervision, modalities of exercises, or participant characteristics. Lastly, the number of included studies was too small.

Agreements and disagreements with other studies or reviews

Different systematic reviews have examined the effects of exercise programmes in people with AS. None included all of the RCTs we identified, all of which compared the effects of exercise programmes to no intervention or usual care.

Dagfinrud 2008 included 11 RCTs and quasi‐randomised trials (763 participants). Only four studies compared exercise programmes with no intervention (Ince 2006; Kraag 1990; Lim 2005; Sweeney 2002). The other included studies compared different modalities of exercise programmes. The systematic review did not meta‐analyse the results of the comparator groups. Only the effects of individual studies were reported. The authors reported low‐quality evidence for effects on spinal mobility and physical function. An update was performed by Dagfinrud 2011, which included one additional study. The authors included the same four previously included studies that compared different types of exercise programmes to no intervention. They did not meta‐analyse the results.

Van den Berg 2012 performed a systematic review that included randomised and uncontrolled trials (cohort studies, case–control studies, and cross‐sectional studies), which evaluated any type of non‐pharmacological intervention. They concluded that exercise programmes were better than no intervention. This review included only one of the 14 trials included in our review (Widberg 2009).

O'Dwyer 2014 included randomised and quasi‐randomised trials. This systematic review included only five of the 14 RCTs included in our review, and concluded that therapeutic exercise improved physical function, joint mobility, and cardiorespiratory function, and ameliorated patient‐assessed disease activity, pain, and stiffness compared with controls. They assessed the evidence to be of moderate quality.

A recent systematic review compared specific pulmonary exercise programmes to conventional exercise or no intervention (Saracoglu 2017). The authors included eight RCTs or controlled trials. Two of the trials were included in our review, but the other six were excluded, because they did not meet our inclusion criteria (Altan 2012; Ince 2006). Evidence showed that exercise improved functional capacities and pulmonary functions, but the authors did not provide a critical appraisal of available studies. They did not conduct a meta‐analysis.

Three systematic reviews examined the effect of exercises combined with stabilised TNF blocker therapy versus patients with AS stabilised by TNF blocker therapy (Giannotti 2014; Liang 2015; Lubrano 2015). These systematic reviews included non‐randomised controlled trials, observational studies, and abstracts, which evaluated spa exercise therapy combined with stabilised TNF blocker therapy. The authors concluded that exercises combined with stabilised TNF blocker therapy might reduce patient‐assessed disease activity, improve function and quality of life compared with biologic therapy alone. Our findings are consistent with previous reviews and guidelines that found short‐term effects of exercise programmes (Regel 2017; Ward 2016).

Study flow diagram. Search results from original June 2015 literature search, and May 2016 and January 2017 updates
Figuras y tablas -
Figure 1

Study flow diagram. Search results from original June 2015 literature search, and May 2016 and January 2017 updates

'Risk of bias' summary: review authors' judgements about each risk of bias item for each included study
Figuras y tablas -
Figure 2

'Risk of bias' summary: review authors' judgements about each risk of bias item for each included study

Comparison 1 Exercise vs no intervention, Outcome 1 Physical function.
Figuras y tablas -
Analysis 1.1

Comparison 1 Exercise vs no intervention, Outcome 1 Physical function.

Comparison 1 Exercise vs no intervention, Outcome 2 Pain.
Figuras y tablas -
Analysis 1.2

Comparison 1 Exercise vs no intervention, Outcome 2 Pain.

Comparison 1 Exercise vs no intervention, Outcome 3 Patient global assessment of disease activity.
Figuras y tablas -
Analysis 1.3

Comparison 1 Exercise vs no intervention, Outcome 3 Patient global assessment of disease activity.

Comparison 1 Exercise vs no intervention, Outcome 4 Spinal mobility.
Figuras y tablas -
Analysis 1.4

Comparison 1 Exercise vs no intervention, Outcome 4 Spinal mobility.

Comparison 1 Exercise vs no intervention, Outcome 5 Fatigue.
Figuras y tablas -
Analysis 1.5

Comparison 1 Exercise vs no intervention, Outcome 5 Fatigue.

Comparison 1 Exercise vs no intervention, Outcome 6 Quality of life.
Figuras y tablas -
Analysis 1.6

Comparison 1 Exercise vs no intervention, Outcome 6 Quality of life.

Comparison 1 Exercise vs no intervention, Outcome 7 C‐Reactive Protein (CRP).
Figuras y tablas -
Analysis 1.7

Comparison 1 Exercise vs no intervention, Outcome 7 C‐Reactive Protein (CRP).

Comparison 1 Exercise vs no intervention, Outcome 8 Erythrocyte Sedimentation Rate (ESR).
Figuras y tablas -
Analysis 1.8

Comparison 1 Exercise vs no intervention, Outcome 8 Erythrocyte Sedimentation Rate (ESR).

Comparison 2 Exercise vs usual care, Outcome 1 Physical function.
Figuras y tablas -
Analysis 2.1

Comparison 2 Exercise vs usual care, Outcome 1 Physical function.

Comparison 2 Exercise vs usual care, Outcome 2 Pain.
Figuras y tablas -
Analysis 2.2

Comparison 2 Exercise vs usual care, Outcome 2 Pain.

Comparison 2 Exercise vs usual care, Outcome 3 Patient global assessment of disease activity.
Figuras y tablas -
Analysis 2.3

Comparison 2 Exercise vs usual care, Outcome 3 Patient global assessment of disease activity.

Comparison 2 Exercise vs usual care, Outcome 4 Spinal mobility.
Figuras y tablas -
Analysis 2.4

Comparison 2 Exercise vs usual care, Outcome 4 Spinal mobility.

Comparison 2 Exercise vs usual care, Outcome 5 Quality of life.
Figuras y tablas -
Analysis 2.5

Comparison 2 Exercise vs usual care, Outcome 5 Quality of life.

Comparison 3 Safety, Outcome 1 Adverse effects associated with the exercise intervention.
Figuras y tablas -
Analysis 3.1

Comparison 3 Safety, Outcome 1 Adverse effects associated with the exercise intervention.

Summary of findings for the main comparison. Exercise programmes compared to no intervention for ankylosing spondylitis

Exercise programmes compared to no intervention

Patient or population: adults with ankylosing spondylitis
Setting: international hospitals, outpatient clinics, or home
Intervention: exercise programmes
Comparison: no intervention

Outcomes

Anticipated absolute effects* (95% CI)

Relative effect
(95% CI)

№ of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Risk with no intervention

Risk with exercise programmes

Physical function
assessed with self‐report questionnaire BASFI scale (0 (easy) to 10 (impossible)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean physical function in the control groups was 4.1a

The mean physical function in the exercise groups was 1.3 lower (1.7 lower to 0.9 lower)

312
(7 RCTs)

⊕⊕⊕⊝
MODERATEb

13% absolute reduction (95% CI 17% to 9%)

32% relative change (95% CI 23% to 42%)

NNTB 3 (2 to 4)

Pain
assessed with VAS scale (0 (no pain) to 10 (impossible)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean pain in the control groups was 6.2a

The mean pain in the exercise groups was 2.1 lower (3.6 lower to 0.6 lower)c

288
(6 RCTs)

⊕⊕⊝⊝
LOWb,d

MD ‐2.1 (95% CI ‐3.6 to ‐0.6)

21% absolute reduction (95% CI 36% to 6%)

34% relative change (95% CI 10% to 59%)

NNTB = 3 (2 to 8)

Patient global assessment of disease activity

assessed with self‐report questionnaire BASDAI scale (0 (absent) to 10 (extreme)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean patient global assessment of disease activity in the control groups was 3.7e

The mean patient global assessment of disease activity in the exercise groups was 0.9 lower (1.3 lower to 0.5 lower)

262
(6 RCTs)

⊕⊕⊕⊝
MODERATEb

9% absolute reduction (95% CI 13% to 5%)

27% relative change (95% CI 15% to 39%)

NNTB 4 (3 to 8)

Spinal mobility
assessed with self‐report questionnaire BASMI scale (0 (better) to 10 (very severe limitation)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean spinal mobility in the control groups was 3.8e

The mean spinal mobility in the exercise groups was 0.7 lower (1.3 lower to 0.1 lower)

232
(5 RCTs)

⊕⊝⊝⊝
VERY LOW b, d, f

7% absolute reduction (95% CI 13% to 1%)

18% relative reduction (95% CI 34% to 3%)

NNTB 5 (3 to 14)

Fatigue
assessed with VAS scale (0 (absent) to 10 (extreme)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean fatigue in the control groups was 3e

The mean fatigue in the exercise groups was 1.4 lower (2.7 lower to 0.1 lower)

72
(2 RCTs)

⊕⊝⊝⊝
VERY LOW b,f,g

14% absolute reduction (95% CI 27% to1%)

48% relative change (95% CI 5% to 91%)

NNTB 3 (1 to 9)

Adverse effects associated with exercises

Exercise programme duration: range 3 to 24 weeks

No adverse effects were reported in 43 control group participants

1 adverse effect was reported in 67 exercise group participants

Peto OR 6.25
(0.10 to 320.40)

110
(2 RCTs)j

⊕⊝⊝⊝
VERY LOW g,h

2% absolute increase (95% CI 5% less to 8% more)

152% relative change (95% CI 90% less to 5818% more)

it was not possible to calculate NNTB as too few events were reported

Withdrawals because of adverse events

Exercise programme duration: range 3 to 24 weeks

90 per 1000

96 per 1000
(68 to 134)

Peto OR 1.08
(0.74 to 1.57)

1343
(8 RCTs) j

⊕⊕⊝⊝
LOW b, i

1% absolute increase (95% CI 2% less to 4% more)

7% relative change (95% CI 23% less to 48% more)

NNTB was not applicable as results were not statistically significant

*The risk in the intervention groups (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

CI: confidence interval; OR: odds ratio; NNTB: number needed to treat (benefit); MD: mean difference; SMD: standardized mean difference; SD: standard deviation; BASFI: Bath Ankylosing Spondylitis Functional Index; VAS: visual analogue scale

GRADE Working Group grades of evidence
High quality: We are very confident that the true effect lies close to that of the estimate of the effect
Moderate quality: We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different
Low quality: Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect
Very low quality: We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect

aSouza 2017 is the source document for the control group baseline data

b Downgraded one level due to risk of detection bias for subjective outcomes (lack of blinding of participants)

c We calculated a pooled SMD and re‐expressed it in MD, as the SMD multiplied by the control group baseline SD (SF‐36 pain = 2.5 from Souza 2017)

d Downgraded one level for inconsistency; important heterogeneity

eMasiero 2011 is the source document for the control group baseline data

f Downgraded one level for imprecision; total number of participants less than 400 and large confidence intervals

g Downgraded one level for imprecision; low rate of events

h Downgraded two levels for risk of bias; no blinding, incomplete outcome reporting

i Downgraded one level for indirectness. Since only two studies explicitly monitored adverse events, we used dropouts or withdrawals for any reason as a major outcome measure to estimate adverse events

i Studies were included regardless of the comparator intervention

Figuras y tablas -
Summary of findings for the main comparison. Exercise programmes compared to no intervention for ankylosing spondylitis
Summary of findings 2. Exercise programmes compared to usual care for ankylosing spondylitis

Exercise programmes compared to usual care

Patient or population: adults with ankylosing spondylitis
Setting: international hospitals, outpatient clinics, or home
Intervention: exercise programmes
Comparison: usual care (current practices included medication, self management, physiotherapy)

Outcomes

Anticipated absolute effects* (95% CI)

Relative effect
(95% CI)

№ of participants
(studies)

Quality of the evidence
(GRADE)

Comments

Risk with usual care

Risk with exercise programmes

Physical function

assessed with self‐report questionnaire BASFI scale (0 (easy) to 10 (impossible)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean physical function in the control groups was 3.7a

The mean physical function in the exercise groups was 0.4 lower (0.6 lower to 0.2 lower)

1068
(5 RCTs)

⊕⊕⊕⊝
MODERATE b

4% absolute reduction (95% CI 6% to 2%)

11% relative change (95% CI 5% to 16%)

NNTB 10 (6 to 21)

Pain

assessed with VAS scale (0 (no pain) to 10 (impossible)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean pain in the control groups was 3.7a

The mean pain in the exercise groups was 0.5 lower (0.9 lower to 0.1 lower) c

911
(2 RCTs)

⊕⊕⊕⊝
MODERATE b

MD ‐0.5 (95% CI ‐0.9 to ‐0.1)

5% absolute reduction (95% CI 9% to 1%)

15% relative change (95% CI 2% to 22%)

NNTB = 10 (7 to 68)

Patient global assessment of disease activity

assessed with self‐report questionnaire BASDAI scale (0 (absent) to 10 (extreme)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean patient global assessment of disease activity in the control groups was 3.7a

The mean patient global assessment of disease activity in the exercise groups was 0.7 lower (1.3 lower to 0.1 lower)

1068
(5 RCTs)

⊕⊕⊝⊝
LOW a,d

7% absolute reduction (95% CI 13% to 1%)

19% relative change (95% CI 3% to 35%)

NNTB 6 (3 to 52)

Spinal mobility

assessed with self‐report questionnaire BASMI scale (0 (better) to 10 (very severe limitation)), at the end of intervention

Exercise programme duration: range 3 to 24 weeks

The mean spinal mobility in the control groups was 8.9e

The mean spinal mobility in the exercise groups was 1.2 lower (2.8 lower to 0.5 higher)

85
(2 RCTs)

⊕⊝⊝⊝
VERY LOW

a,d, f

12% absolute change (95% CI 5% less to 28% more)

13% relative change (95% CI 6% less to 32% more)

NNTB = NA

Fatigue

see comment

(0 RCTs)

No included studies measured this outcome

Adverse effects associated with exercises

Exercise programme duration: range 3 to 24 weeks

No adverse effects were reported in 43 control group participants

1 adverse effect was reported in 67 exercise group participants

Peto OR 6.25
(0.10 to 320.40)

110
(2 RCTs) i

⊕⊝⊝⊝
VERY LOWg, h

2% absolute increase (95% CI 5% less to 8% more)

152% relative change (95% CI 90% less to 5818% more)

it was not possible to calculate NNTB as too few events were reported

Adverse events

Exercise programme duration: range 3 to 24 weeks

see comment

cannot be estimate

⊕⊝⊝⊝
VERY LOWg,h

Adverse events could not be calculate as events were not monitored or reported

*The risk in the intervention groups (and its 95% confidence interval) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

CI: confidence interval; OR: odds ratio; NNTB: number needed to treat (benefit); MD: mean difference; SMD: standardized mean difference;

SD: standard deviation; BASFI: Bath Ankylosing Spondylitis Functionnal Index; VAS: visual analogic scale.

GRADE Working Group grades of evidence
High quality: We are very confident that the true effect lies close to that of the estimate of the effect
Moderate quality: We are moderately confident in the effect estimate: The true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different
Low quality: Our confidence in the effect estimate is limited: The true effect may be substantially different from the estimate of the effect
Very low quality: We have very little confidence in the effect estimate: The true effect is likely to be substantially different from the estimate of effect

aRodriguez‐Lozano 2013 is the source document for the control group baseline data.

b Downgraded one level due to risk of detection bias for subjective outcomes (lack of blinding of participants)

c We calculated a pooled SMD and re‐expressed it as a MD; we multiplied the SMD by the control group baseline SD (VAS pain = 3.0 from Rodriguez‐Lozano 2013)

d Downgraded one level for inconsistency; important heterogeneity

eAltan 2012 is the source document for the control group baseline data

f Downgraded one level for imprecision; total number of participants less than 400, and large confidence intervals

g Downgraded one level for imprecision; low rate of events

h Downgraded two levels for risk of bias; no blinding, incomplete outcome reporting

i Studies were included regardless of the comparator intervention

Figuras y tablas -
Summary of findings 2. Exercise programmes compared to usual care for ankylosing spondylitis
Table 1. Authors contacted for missing or additional data

Authors

First contact

Second contact

Response

Altan 2012

25/05/2015

28/05/2015

Colina 2009

04/05/2015

13/05/2015

no e‐mail response

Durmus 2009

15/04/2015

04/05/2015

no e‐mail response

Dönmez 2014

15/04/2015

18/04/2015

Gunay 2012

15/04/2015

04/05/2015

no e‐mail response

Ince 2006

02/06/2015

05/06/2015

Kjeken 2013

02/06/2015

03/06/2015

Kraag 1990

02/06/2015

03/06/2015

Lim 2005

no available contact

no e‐mail response

Masiero 2011

20/04/2015

04/05/2015

04/05/2015

Masiero 2015

16/06/2016

20/06/2016

Mesquita 2014

15/04/2015

04/05/2015

no e‐mail response

Rodriguez‐Lozano 2013

02/06/2015

03/06/2015

Sveaas 2014

02/06/2015

04/06/2015

Sveeas 2018

24/01/2018

no e‐mail response

Sweeney 2002

19/05/2015

02/06/2015

no e‐mail response

Widberg 2009

02/06/2015

06/06/2015

Figuras y tablas -
Table 1. Authors contacted for missing or additional data
Table 2. Summary of characteristics of included studies (N = 14)

Characteristics

N (%) or median (IQR)

Location

Brazil

Canada

Italy

Korea

Norway

Spain

Sweden

Turkey

UK

2 (14%)

1 (7%)

1 (7%)

1 (7%)

2 (14%)

2 (14%)

1 (7%)

3 (21%)

1 (7%)

Study design

RCT

14 (100%)

Number of study arms

2

3

11 (79%)

3 (21%)

Type of comparator

Usual care

No treatment

5 (36%)

9 (64%)

Total number participants per study

55 (35 to 73)

Trial size

> 100 subjects/arm

≤ 100 subjects/arm

3 (21)

11 (79)

Number subjects per arm

26 (15 to 29)

Study duration (weeks)

14 (range 12 to 24)

N (%) is the number of studies that reported the characteristic of interest

Figuras y tablas -
Table 2. Summary of characteristics of included studies (N = 14)
Table 3. Summary of characteristics of participants in included studies (N = 14)

Characteristics

N (%) or median (IQR)

Age (years)

45 (39 to 47)

Gender

Male

Female

70 (56 to 77)

33 (25 to 45)

Diagnostic criteria*

Modified New York

The Ankylosing Spondylitis Disease Activity Score

European spondyloarthropathy

not reported

10 (71%)

2 (14%)

1 (7%)

2 (14%)

Severity disease*

Bath Ankylosing Spondylitis Disease Activity Index ≥ 3.5

Bath Ankylosing Spondylitis Disease Activity Index < 3.5

Ankylosing Spondylitis stage1 or 2

no information

5 (36%)

2 (14%)

4 (29%)

3 (21%)

Disease duration (years)

9 (9 to 18)

coexisting medical treatments

Analgesics (in 2 studies)

Anti‐Tumour Necrosis Factor (in 7 studies)

Disease Modifying Anti‐Rheumatic Drug (in 5 studies)

Nonsteroidal anti‐inflammatory drugs (in 9 studies)

Sulfasalazine (in 4 studies)

No treatment (in 2 studies)

No information reported (in 4 studies)

21% (16% to 26%)

29% (14% to 38%)

17% (11% to 19%)

75% (32% to 76%)

22% (11% to 49%)

17% (10% to 15%)

NA

* N (%) is the number of studies that reported the characteristic of interest

Figuras y tablas -
Table 3. Summary of characteristics of participants in included studies (N = 14)
Table 4. Summary of exercise programme characteristics in the included studies (N = 14)

Characteristics

N (%) or median (IQR)

Modalities

Monomodal

Multidisciplinary

9 (64%)

5 (36%)

Exercise components

Pain relief

Breathing

Cardio fitness

Flexibility, stretching

Endurance

Motion (active or passive)

Proprioception, posture

Relaxation

Strength

no information

1 (7%)

7 (50%)

2 (14%)

8 (57%)

1 (7%)

5 (36%)

4 (29%)

2 (14%)

9 (64%)

1 (7%)

Provider

Physiotherapist

Other trainer

Self delivery

Unclear

7 (50%)

3 (21%)

2 (14%)

2 (14%)

Supervision

With supervision

No supervision

Unclear

8 (50%)

3 (21%)

3 (21%)

Dose

Session duration (minutes)

Frequency (session/week)

programme duration (weeks)

60 (50 to 60)

3 (2 to 3)

12 (8 to 16)

N (%) is the number of studies that reported the characteristics of interest

Figuras y tablas -
Table 4. Summary of exercise programme characteristics in the included studies (N = 14)
Table 5. Major outcomes reported in the 14 included studies (part 1)

Study

Physical function (BASFI)

Patient global assessment (BASDAI)

Mobility (BASMI)

Mobility (chest expansion)

Mobility (occiput to wall distance)

Mobility ( Schober test)

Mobility (Fingertip to floor)

Mobility (Cervical Rotation)

Altan 2012

Yes

Yes

Yes

Yes

Dönmez 2014

Yes

Yes

Yes

Garcia 2015

Yes†

Yes†

Gallinaro 2016

Yes ††

Yes ††

Yes ††

Yes ††

Yes ††

Yes ††

Yes

Ince 2006

Yes

Yes

Yes (modified)

Yes

Kjeken 2013

Yes

Yes

Yes*

Kraag 1990

Yes

Yes

Yes

Lim 2005

Yes

Yes

Masiero 2011

Yes

Yes

Yes

Yes

Yes

Rodriguez‐Lozano 2013

Yes

Yes

Souza 2017

Yes

Yes

Yes

Yes

Sveaas 2014

Yes

Yes

Yes

Sweeney 2002

Yes

Yes

Widberg 2009

Yes

Yes

Yes

Yes

* Data are missing. cannot be included in the analysis

† median and 25th to 75th percentile reported

†† multiple exercise groups combined

BASFI: Bath Ankylosing Spondylitis Functionnal Index

BASDAI: Bath Ankylosing Spondylitis Disease Activity Index

BASMI: Bath Ankylosing Spondylitis Metrology Index

Figuras y tablas -
Table 5. Major outcomes reported in the 14 included studies (part 1)
Table 6. Major outcomes reported in the 14 included studies (part 2)

Study

Pain (VAS)

Pain(SF‐36)

Pain (BASDAI)

Pain

(Nocturnal pain)

Pain

(Self efficacy scale Pain)

Fatigue

(Basdai)

Adverse Effects

associated with exercise

Altan 2012

Yes

Dönmez 2014

Yes†

Garcia 2015

Yes

Yes

Gallinaro 2016

Yes ††

Yes ††

Ince 2006

Kjeken 2013

Kraag 1990

Yes

Lim 2005

Yes

Masiero 2011

Yes**

Yes

Rodriguez‐Lozano 2013

Yes

Yes

Souza 2017

Yes

Sveaas 2014

Sweeney 2002

Yes

Widberg 2009

** mean score calculated from lumbar and cervical pain

† median and 25th to 75th percentile reported

†† multiple exercise groups combined

BASDAI: Bath Ankylosing Spondylitis Disease Activity Index

VAS: visual analogue scale

SF‐36: 36‐Item Short‐Form Health Survey

Figuras y tablas -
Table 6. Major outcomes reported in the 14 included studies (part 2)
Table 7. Minor outcomes reported in the 14 included studies

Study

Quality of life (ASQoL)

Quality of life (SF‐36)

Quality of life (SF‐12)

physical component

CRP level

(mg/dL)

ESR

(mm/h)

MASES

Altan 2012

Yes

Dönmez 2014

Yes*

Garcia 2015

Yes†

Gallinaro 2016

Yes ††

Yes ††

Ince 2006

Kjeken 2013

Yes*

Kraag 1990

Lim 2005

Masiero 2011

not reported

not reported

Rodriguez‐Lozano 2013

Yes

Souza 2017

Yes*

Yes

Yes

Sveaas 2014

Yes

Yes

Sweeney 2002

Widberg 2009

* global score was not reported; could not be included in the analysis

† median and 25th to 75th percentile reported

†† multiple exercise groups combined

ASQoL: the Ankylosing Spondylitis Quality of Life

SF‐36: 36‐Item Short‐Form Health Survey

SF‐12: 12‐Item Short Form Health Survey

CRP: C‐reactive protein

ESR: erythrocyte sedimentation rate

MASES: Maastricht Ankylosing SpondylitisEnthesitis Score

Figuras y tablas -
Table 7. Minor outcomes reported in the 14 included studies
Comparison 1. Exercise vs no intervention

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Physical function Show forest plot

7

Mean Difference (IV, Random, 95% CI)

Subtotals only

1.1 BASFI at end of intervention

7

312

Mean Difference (IV, Random, 95% CI)

‐1.32 [‐1.71, ‐0.93]

1.2 BASFI at medium‐term follow‐up

2

93

Mean Difference (IV, Random, 95% CI)

‐1.51 [‐1.84, ‐1.17]

2 Pain Show forest plot

6

Std. Mean Difference (IV, Random, 95% CI)

Subtotals only

2.1 End of intervention

6

288

Std. Mean Difference (IV, Random, 95% CI)

‐0.82 [‐1.40, ‐0.25]

2.2 Pain at medium term follow‐up

2

93

Std. Mean Difference (IV, Random, 95% CI)

‐2.50 [‐5.32, 0.32]

3 Patient global assessment of disease activity Show forest plot

6

Mean Difference (IV, Random, 95% CI)

Subtotals only

3.1 BASDAI at end of intervention

6

262

Mean Difference (IV, Random, 95% CI)

‐0.91 [‐1.32, ‐0.49]

3.2 BASDAI at medium‐term follow‐up

2

93

Mean Difference (IV, Random, 95% CI)

‐1.12 [‐1.57, ‐0.68]

4 Spinal mobility Show forest plot

5

Mean Difference (IV, Random, 95% CI)

Subtotals only

4.1 BASMI at end of intervention

5

232

Mean Difference (IV, Random, 95% CI)

‐0.70 [‐1.28, ‐0.13]

4.2 BASMI at medium‐term follow‐up

2

93

Mean Difference (IV, Random, 95% CI)

‐1.42 [‐2.05, ‐0.78]

5 Fatigue Show forest plot

2

72

Mean Difference (IV, Random, 95% CI)

‐1.43 [‐2.73, ‐0.14]

5.1 BASDAI at end of intervention

2

72

Mean Difference (IV, Random, 95% CI)

‐1.43 [‐2.73, ‐0.14]

6 Quality of life Show forest plot

2

85

Mean Difference (IV, Random, 95% CI)

1.74 [‐0.44, 3.91]

6.1 QQL at end of intervention

2

85

Mean Difference (IV, Random, 95% CI)

1.74 [‐0.44, 3.91]

7 C‐Reactive Protein (CRP) Show forest plot

2

84

Mean Difference (IV, Random, 95% CI)

1.38 [‐4.34, 7.10]

8 Erythrocyte Sedimentation Rate (ESR) Show forest plot

2

84

Mean Difference (IV, Random, 95% CI)

‐5.36 [‐10.31, ‐0.41]

Figuras y tablas -
Comparison 1. Exercise vs no intervention
Comparison 2. Exercise vs usual care

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Physical function Show forest plot

5

Mean Difference (IV, Fixed, 95% CI)

Subtotals only

1.1 BASFI at end of intervention

5

1068

Mean Difference (IV, Fixed, 95% CI)

‐0.36 [‐0.55, ‐0.16]

1.2 BASFI at medium‐term follow‐up

1

53

Mean Difference (IV, Fixed, 95% CI)

‐0.60 [‐1.62, 0.42]

2 Pain Show forest plot

2

Std. Mean Difference (IV, Random, 95% CI)

Subtotals only

2.1 End of intervention

2

911

Std. Mean Difference (IV, Random, 95% CI)

‐0.16 [‐0.29, ‐0.03]

3 Patient global assessment of disease activity Show forest plot

5

Mean Difference (IV, Random, 95% CI)

Subtotals only

3.1 BASDAI at end of intervention

5

1068

Mean Difference (IV, Random, 95% CI)

‐0.68 [‐1.27, ‐0.09]

3.2 BASDAI at medium‐term follow‐up

1

53

Mean Difference (IV, Random, 95% CI)

‐0.70 [‐1.71, 0.31]

4 Spinal mobility Show forest plot

2

Mean Difference (IV, Random, 95% CI)

Subtotals only

4.1 BASMI at end of intervention

2

85

Mean Difference (IV, Random, 95% CI)

‐1.15 [‐2.81, 0.52]

4.2 BASMI at medium‐term follow‐up

1

53

Mean Difference (IV, Random, 95% CI)

‐0.70 [‐1.64, 0.24]

5 Quality of life Show forest plot

2

Mean Difference (IV, Random, 95% CI)

Subtotals only

5.1 QQL at end of intervention

2

809

Mean Difference (IV, Random, 95% CI)

‐0.36 [‐1.68, 0.95]

Figuras y tablas -
Comparison 2. Exercise vs usual care
Comparison 3. Safety

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1 Adverse effects associated with the exercise intervention Show forest plot

2

110

Peto Odds Ratio (Peto, Fixed, 95% CI)

6.25 [0.12, 320.40]

Figuras y tablas -
Comparison 3. Safety