Mirtazapine for fibromyalgia in adults

Patrick Welsch; Kathrin Bernardy; Sheena Derry; R Andrew Moore; Winfried Häuser

doi:10.1002/14651858.CD012708

Mirtazapine for fibromyalgia in adults

Authors' declarations of interest

Version published: 29 June 2017 Version history

https://doi.org/10.1002/14651858.CD012708

Collapse all Expand all

Abstract

This is a protocol for a Cochrane Review (Intervention). The objectives are as follows:

To assess the efficacy (relief of fibromyalgia‐associated symptoms), tolerability (drop out due to adverse events), and safety (serious adverse events) of mirtazapine for fibromyalgia in adults.

Background

This protocol is based on a template for reviews of drugs used to relieve fibromyalgia. The aim is for all reviews to use the same methods, based on new criteria for what constitutes reliable evidence in chronic pain (Moore 2010a; Appendix 1).

Description of the condition

Fibromyalgia has been defined as widespread pain that lasts for longer than three months, with pain on palpation at 11 or more of 18 specified tender points (Wolfe 1990). It is frequently associated with other symptoms such as poor sleep, fatigue, and depression (Häuser 2015a; Wolfe 2014). Patients often report high disability levels and poor health‐related quality of life along with extensive use of medical care (Häuser 2015a). Fibromyalgia symptoms can be assessed by self‐report of the patient via the fibromyalgia criteria and severity scales for clinical and epidemiological studies: a modification of the ACR Preliminary Diagnostic Criteria for Fibromyalgia (the so‐called Fibromyalgia Symptom Questionnaire) (Wolfe 2011a). For a clinical diagnosis, the ACR 1990 classification criteria (Wolfe 1990), the ACR 2010 preliminary diagnostic criteria (Wolfe 2010), and the 2016 criteria (Wolfe 2016) can be used. Lacking a specific laboratory test, diagnosis is established by a history of the key symptoms and the exclusion of somatic diseases sufficiently explaining the key symptoms (Häuser 2015a).

Fibromyalgia is a heterogenous condition. The definite aetiology (cause) of this syndrome remains unknown. A model of interacting biological and psychosocial variables in the predisposition, triggering, and development of the chronicity of fibromyalgia symptoms has been suggested (Üceyler 2017). Genetics (Arnold 2012; Lee 2012), depression (Chang 2014; Forseth 1999), physical and sexual abuse in childhood (Häuser 2011), obesity combined with physical inactivity (Mork 2010), sleep problems (Mork 2012), and smoking (Choi 2011), might predispose a person to the development of fibromyalgia. Inflammatory rheumatic diseases (Lee 2013; Wolfe 2011b), psychosocial stress (e.g. workplace and family conflicts) and physical stress (e.g. infections, surgery, accidents) might trigger the onset of chronic widespread pain and fatigue (Clauw 2014). Depression and post‐traumatic stress disorder worsen fibromyalgia symptoms (Häuser 2013b; Lange 2010).

Several factors are associated with the pathophysiology (functional changes associated with or resulting from disease) of fibromyalgia, but the precise relationship to symptoms of the disorder are unclear (Üceyler 2017). The best established pathophysiological features are those of central sensitisation; i.e. augmented pain and sensory processing in the brain, with increased functional connectivity to pro‐nociceptive brain regions and decreased connectivity to antinociceptive regions, and accompanying changes in central nervous system (CNS) neurotransmitters as well as the size and shape of brain regions (Clauw 2014). Other findings include sympathetic nervous system dysfunction (Martínez‐Martínez 2014), increased pro‐inflammatory and reduced anti‐inflammatory cytokine profiles (produced by cells involved in inflammation) (Üceyler 2011), and small fibre pathology (Üceyler 2017).

Fibromyalgia is common. Numerous studies have investigated prevalence in different settings and countries. The Queiroz 2013 review gives a global mean prevalence of 2.7% (range 0.4% to 9.3%), and a mean in the Americas of 3.1%, in Europe of 2.5%, and in Asia of 1.7%. Fibromyalgia is more common in women, with a female to male ratio of 3:1 (4.2%:1.4%). The change in diagnostic criteria does not appear to have significantly affected estimates of prevalence (Wolfe 2013b). Estimates of prevalence in specific populations vary greatly, but have been reported to be as high as 9% in female textile workers in Turkey and 10% in metalworkers in Brazil (59% in those with repetitive strain injury; Queiroz 2013).

People with fibromyalgia often report high disability levels and poor quality of life along with extensive use of medical care (Häuser 2015a). Many people with fibromyalgia are significantly disabled, and experience moderate or severe pain for many years (Bennett 2007). Chronic painful conditions comprised 5 of the 11 top‐ranking conditions for years lived with disability in 2010 (Vos 2012), and are responsible for considerable loss of quality of life and employment, and increased health costs (Moore 2014a).

Fibromyalgia pain is known to be difficult to treat effectively, with only a minority of individuals experiencing a clinically‐relevant benefit from any one intervention. A multidisciplinary approach is now advocated, combining pharmacological interventions with physical or cognitive interventions, or both. Interventions aim to reduce the key symptoms of fibromyalgia (pain, sleep problems, fatigue) and the associated symptoms (e.g. depression, disability) and to improve daily functioning (Fitzcharles 2012; Macfarlane 2017; Petzke 2017). Conventional analgesics are usually not effective. Treatment is often by so‐called pain modulators, such as antidepressants like duloxetine and amitriptyline (Häuser 2013a; Lunn 2014; Moore 2012a), or antiepileptics like gabapentin or pregabalin (Cooper 2017; Moore 2009; Üceyler 2013, Wiffen 2013). The proportion of people who achieve worthwhile pain relief (typically at least a 50% reduction in pain intensity; Moore 2013a) is small, generally only 10% to 15% more than with placebo, with number needed to treat for an additional beneficial outcome (NNTB) usually between 6 and 14 (Wiffen 2013). Those who do experience good levels of pain relief with pregabalin also benefit from substantial improvements in other symptoms, such as fatigue, function, sleep, depression, anxiety, and ability to work, with significant improvement in quality of life (Moore 2010b; Moore 2014a). Fibromyalgia is not particularly different from other chronic pain in that only a small proportion of trial participants have a good response to treatment (Moore 2013b).

Description of the intervention

Mirtazapine is an atypical antidepressant with noradrenergic and specific serotonergic activity. It is licensed for use in major depressive disorders, but not in fibromyalgia. It is also used off‐label for a variety of other disorders, including anxiety‐related disorders and insomnia. Mirtazapine is administered orally, preferably once a day at bedtime. The recommended dosages for the treatment of depression range between 15 mg/d and 45 mg/d.

Mirtazapine is regarded to be a rather safe antidepressant. Studies comparing mirtazapine with placebo, amitriptyline (Häuser 2012; Moore 2012a) ‐ a drug which is frequently used in fibromyalgia ‐ or other active comparators demonstrated a significantly lower percentage of patients (65%) who complained of any adverse clinical experiences compared with the placebo‐ (76%) or amitriptyline‐treated group (87%). Moreover, drop‐out rates due to adverse clinical experiences were significantly lower than in the amitriptyline‐treatment group. Data show there are few cardiotoxic properties when used in patients with heart failure (Montgomery 1995).

How the intervention might work

Mirtazapine blocks the alpha 2 adrenergic auto‐ and hetero‐receptors (enhancing norepinephrine release), and selectively antagonises the 5‐HT2 serotonin receptors in the central and peripheral nervous system. It also enhances serotonin neurotransmission at the 5‐HT1 receptor and blocks the histaminergic and muscarinic receptors. Mirtazapine is not a serotonin norepinephrine reuptake inhibitor but increases serotonin and norepinephrine by other mechanisms of action (Kent 2000). Based on these pharmacological mechanisms, mirtazapine is classified as a noradrenergic and specific serotonergic antidepressant. In structure, mirtazapine can also be classified as a tetracyclic antidepressant (Antilla 2000). Based on its pharmacologic profile, mirtazapine has the potential to be beneficial in the treatment of fibromyalgia, especially in patients who suffer from sleep disturbances (Dolder 2012).

Why it is important to do this review

The serotonin‐norepinephrine reuptake inhibitor antidepressants duloxetine and milnacipran have been approved by the Food and Drug Administration (FDA), but not by the European Medicines Agency (EMA) for fibromyalgia (Häuser 2013a; Üceyler 2013). Both drugs increase the availability of serotonin (5‐hydroxytryptamine [5‐HT]) and norepinephrine at CNS synaptic clefts. They have the potential to reduce pain by correcting the functional deficit of 5‐HT and norepinephrine neurotransmission in the descending inhibitory pain pathway. These antidepressants are effective in relieving one key symptom of fibromyalgia, namely pain, but do not reduce sleep problems to a clinically‐relevant degree (Häuser 2013a). There is a need for additional pharmacological therapeutic options for the treatment of the key fibromyalgia symptoms pain, sleep problems and fatigue.

A patient survey in 2012 demonstrated that mirtazapine was rarely used by patients with fibromyalgia (Häuser 2012). However, uncontrolled trials suggested that mirtazapine might be effective in relieving fibromyalgia symptoms (Samborski 2004). The use of mirtazapine to reduce sleep problems is discussed in fibromyalgia internet chats (WebMD Fibromyalgia Community 2009). Mirtazapine increases the pain threshold in healthy adults (Arnold 2008). Moreover, a large (594 participant), open, post marketing survey of the use of mirtazapine in people with chronic pain and concomitant depression demonstrated that after six weeks almost 70% reported light or no pain on a faces scale, compared with 90% having severe or worst pain at baseline (Freynhagen 2006). There is, therefore, a need to evaluate the efficacy, tolerability and safety of mirtazapine in fibromyalgia in order to assist fibromyalgia patients and doctors in shared decision making on additional pharmacological treatment options.

The standards used to assess evidence in chronic pain trials have changed substantially, with particular attention being paid to trial duration, withdrawals, and statistical imputation following withdrawal, all of which can substantially alter estimates of efficacy. The most important change is the move from using average pain scores, or average change in pain scores, to the number of people who have a large decrease in pain (by at least 50%) and who continue in treatment, ideally in trials of 8 to 12 weeks or longer. Pain intensity reduction of 50% or more has been shown to correlate with improvements in comorbid symptoms, function, and quality of life for people with chronic pain (Conaghan 2015; Moore 2013a; Peloso 2016), and specifically fibromyalgia (Moore 2010b; Straube 2011). These standards are set out in the reference guide for pain reviews (PaPaS 2012).

In this Cochrane review we will assess evidence using methods that make both statistical and clinical sense, and will use developing criteria for what constitutes reliable evidence in chronic pain (Moore 2010a). The trials included and analysed will need to meet a minimum of reporting quality (blinding, randomisation), validity (duration, dose and timing, diagnosis, outcomes, etc), and size (ideally at least 500 participants in a comparison in which the number needed to treat for an additional beneficial outcome (NNTB) is 4 or above; Moore 1998). This approach sets high standards and marks a departure from how reviews were conducted previously.

Objectives

To assess the efficacy (relief of fibromyalgia‐associated symptoms), tolerability (drop out due to adverse events), and safety (serious adverse events) of mirtazapine for fibromyalgia in adults.

Methods

Criteria for considering studies for this review

Types of studies

We will include randomised controlled trials (RCTs) with double‐blind assessment of participant outcomes following four weeks of treatment or longer. We will include trials with a parallel, cross‐over, and enriched enrolment randomised withdrawal design. We will not include N‐of‐1 studies. We require full journal publication, with the exception of online clinical trial results, summaries of otherwise unpublished clinical trials, and abstracts with sufficient data for analysis. We will not include short abstracts (usually meeting reports). We will exclude studies that are non‐randomised, studies of experimental pain, case reports, and clinical observations.

Types of participants

Studies should include adult participants aged 18 years and above, diagnosed with fibromyalgia using the ACR 1990 classification criteria (Wolfe 1990), the ACR 2010 preliminary diagnostic criteria (Wolfe 2010), or the modified ACR 2010 preliminary diagnostic criteria (research criteria) (Wolfe 2011a).

Types of interventions

Mirtazapine at any dose, by any route, administered for the relief of fibromyalgia symptoms, and compared to placebo or any active comparator.

Types of outcome measures

We anticipate that studies will use a variety of outcome measures, with the majority using standard subjective scales (numerical rating scale or visual analogue scale) for pain intensity or pain relief, or both. We are particularly interested in the Initiative on Methods, Measurement, and Pain Assessment in Clinical Trials (IMMPACT) definitions for moderate and substantial benefit in chronic pain studies (Dworkin 2008).

These are defined as at least 30% pain relief over baseline (moderate), at least 50% pain relief over baseline (substantial), much or very much improved on Patient Global Impression of Change (PGIC) (moderate), and very much improved on PGIC (substantial). These dichotomous outcomes should be used where pain responses do not follow a normal (Gaussian) distribution. People with chronic pain desire high levels of pain relief, ideally more than 50%, and with pain not worse than mild (Moore 2013a; O'Brien 2010).

Primary outcomes

Participant‐reported pain relief of 50% or greater (substantial improvement).
PGIC very much improved (substantial improvement).
Safety: participants experiencing any serious adverse event. Serious adverse events typically include any untoward medical occurrence or effect that at any dose results in death, is life‐threatening, requires hospitalisation or prolongation of existing hospitalisation, results in persistent or significant disability or incapacity, is a congenital anomaly or birth defect, is an 'important medical event' that may jeopardise the person, or may require an intervention to prevent one of the above characteristics or consequences.
Tolerability: withdrawals due to adverse events.

Secondary outcomes

Participant‐reported pain relief of 30% or greater (moderate improvement).
PGIC much improved (moderate improvement).
Participant‐reported sleep problems (continuous outcome: we will prefer composite measures over single item scales).
Participant‐reported fatigue (continuous outcome: we will prefer composite measures over single item scales).
Participant‐reported mean pain intensity (continuous outcome: we will prefer change from baseline scores over intensity at the end of the study).
Participant‐reported health‐related quality of life (we will prefer disease‐specific instruments such as the Fibromyalgia Impact Questionnaire (FIQ) over generic instruments. If FIQ scores are reported we will calculate the number of participants with a clinically‐relevant improvement of 20% or greater.
Participant‐reported negative mood (continuous outcome: we will prefer composite measures such as the Beck Depression Inventory (BDI) or the Hospital Anxiety and Depression (HAD) scale over single item scales).
Withdrawals due to lack of efficacy.
Participants with any adverse event.
Participants with specific adverse events: somnolence; substantial weight gain; elevated liver enzymes are examples.

Search methods for identification of studies

Electronic searches

We will search the following databases from inception and without language restrictions.

Cochrane Central Register of Controlled Trials (CENTRAL via Cochrane Register of Studies Online).
MEDLINE (via Ovid).
Embase (via OVID).
Scopus (via Ovid).

The search strategy for MEDLINE is shown in Appendix 2. We will adapt the MEDLINE search strategy for CENTRAL, Embase and Scopus.

Searching other resources

We will review the bibliographies of any RCTs and review articles identified. We will search the following clinical trial databases to identify additional published or unpublished data: ClinicalTrials.gov and World Health Organization (WHO) International Clinical Trials Registry Platform (ICTRP) (apps.who.int/trialsearch/). We will not contact investigators or study sponsors.

Data collection and analysis

Selection of studies

We will determine eligibility by reading the abstract of each study identified by the search. We will eliminate studies that clearly do not satisfy the inclusion criteria, and obtain full copies of the remaining studies. Two review authors (WH, PW) will make the decisions. Two review authors (WH, PW) will read these studies independently and reach agreement by discussion. We will not anonymise the studies in any way before assessment. We will create a PRISMA flow chart.

Data extraction and management

Two review authors (WH, RAM) will extract data independently using a standard form and check for agreement before entry into Review Manager 5 (RevMan 2014), or any other analysis tool. We will include information about the study setting, demographic and clinical variables of the participants, number of participants treated, drug and dosing regimen, co‐medication, study design (placebo or active control; parallel, cross‐over, or enriched enrolment randomised withdrawal), study duration and follow‐up, outcome measures and results, withdrawals and adverse events (participants experiencing any adverse event, or serious adverse event).

Assessment of risk of bias in included studies

We will use the Oxford Quality Score as the basis for inclusion (Jadad 1996), limiting inclusion to studies that are randomised and double‐blind as a minimum.

Two review authors (WH, RAM) will independently assess risk of bias for each study, using the criteria outlined in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011), and adapted from those used by Cochrane Pregnancy and Childbirth, with any disagreements resolved by discussion. We will assess the following for each study.

Random sequence generation (checking for possible selection bias). We will assess the method used to generate the allocation sequence as: low risk of bias (i.e. any truly random process, for example random number table; computer random number generator); unclear risk of bias (when the method used to generate the sequence is not clearly stated). We will exclude studies at a high risk of bias that use a non‐random process (for example, odd or even date of birth; hospital or clinic record number).
Allocation concealment (checking for possible selection bias). The method used to conceal allocation to interventions prior to assignment determines whether intervention allocation could have been foreseen in advance of, or during, recruitment, or changed after assignment. We will assess the methods as: low risk of bias (for example, telephone or central randomisation; consecutively numbered, sealed, opaque envelopes); unclear risk of bias (when method not clearly stated). We will exclude studies that do not conceal allocation and are therefore at a high risk of bias (for example, open list).
Blinding of participants and personnel (checking for possible detection bias). We will assess the methods used to blind study participants and outcome assessors from knowledge of which intervention a participant received. We will assess the methods as: low risk of bias (e.g. study states that it was blinded and describes the method used to achieve blinding, for example, identical tablets, matched in appearance and smell); unclear risk of bias (study states that it was blinded but does not provide an adequate description of how it was achieved). We will exclude studies at a high risk of bias where participants and study personnel were not blinded.
Blinding of outcome assessment (checking for possible detection bias). We will assess the methods used to blind study outcome assessors from knowledge of which intervention a participant received. We will assess the methods as: low risk of bias (study states that outcome assessor was not involved in treatment); unclear risk of bias (study states that the assessor was blinded but does not provide an adequate description of how it was achieved); high risk of bias (outcome assessors not blinded to treatment).
Incomplete outcome data (checking for possible attrition bias due to the amount, nature, and handling of incomplete outcome data). We will assess the methods used to deal with incomplete data as: low risk of bias (i.e. less than 10% of participants did not complete the study or used 'baseline observation carried forward' analysis, or both); unclear risk of bias (used 'last observation carried forward' analysis); or high risk of bias (used 'completer' analysis).
Reporting bias due to selective outcome reporting (reporting bias). We will check if an a priori study protocol was available and if all outcomes of the study protocol were reported in the publications of the study. We will assess the methods used to deal with incomplete data as: low risk of reporting bias if the study protocol was available and all of the study’s prespecified (primary and secondary) outcomes that were of interest in the review were reported in the prespecified way, or if the study protocol was not available but it was clear that the published reports included all expected outcomes, including those that were prespecified (convincing text of this nature may be uncommon); high risk of reporting bias if not all of the study’s prespecified primary outcomes were reported; one or more primary outcomes was reported using measurements, analysis methods or subsets of the data (e.g. subscales) that were not prespecified; one or more reported primary outcomes were not prespecified (unless clear justification for their reporting was provided, such as an unexpected adverse effect); one or more outcomes of interest in the review were reported incompletely so that they could not be entered in a meta‐analysis; the study report did not include results for a key outcome that would be expected to have been reported for such a study. We will assess the methods as unclear risk of bias if insufficient information is available to permit judgement of ‘low risk’ or ‘high risk’.
Size of study (checking for possible biases confounded by small size). We will assess studies as being at low risk of bias (when there are 200 participants or more per treatment arm); unclear risk of bias (50 to 199 participants per treatment arm); or high risk of bias (fewer than 50 participants per treatment arm).
Group similarity at baseline (selection bias). We will assess similarity of the study groups at baseline for the most important prognostic clinical and demographic indicators. We will assign a low risk of bias if groups were similar at baseline for demographic factors, value of main outcome measure(s) and important prognostic factors. We will assign as unclear risk of bias if important prognostic clinical and demographic indicators were not reported. There was high risk of bias if groups were not similar at baseline for demographic factors, value of main outcome measure(s), and important prognostic factor.

Two review authors (WH, PW) will make quality ratings separately for each of the eight methodology quality indicators as defined by the ’Risk of bias’ tool. We will define a study to be of high quality if six to eight of the domains are of low risk of bias, to be of moderate quality if three to five of eight domains are of low risk of bias, and to be of low quality if zero to two of eight domains are of low risk of bias (Schaefert 2015).

Measures of treatment effect

We will calculate NNTBs as the reciprocal of the absolute risk reduction (ARR; McQuay 1998). For unwanted effects, the NNTB becomes the number number needed to treat for an additional harmful outcome (NNTH) and is calculated in the same manner.

We will use the following terms to describe adverse outcomes in terms of harm or prevention of harm.

When significantly fewer adverse outcomes occur with treatment than with control (placebo or active), we will use the term: 'number needed to treat to prevent one event (NNTp)'.
When significantly more adverse outcomes occur with treatment compared with control (placebo or active), we will use the term: 'number needed to treat for an additional harmful outcome (NNTH)'.

For dichotomous data we will calculate risk differences (RDs) (inverse variance method) with 95% confidence intervals (CIs) using a fixed‐effect model unless significant statistical heterogeneity is found (see below). We will set the threshold for a clinically‐relevant benefit for categorical variables as a NNTB of less than 10 (Moore 2008).

For continuous data we will calculate standardised mean differences (SMDs) with 95% CIs using a fixed‐effect model unless significant statistical heterogeneity is found. We will use Cohen's categories to evaluate the magnitude of the effect size, calculated by SMD, with Hedges' g of 0.2 = small, 0.5 = medium and 0.8 = large (Cohen 1988). We will label g < 0.2 to be a 'not substantial' effect size. We will assume a minimally important difference if Hedges' g was ≥ 0.2 Fayers 2014). We will calculate the number needed to treat for an additional beneficial outcome (NNTB) for continuous variables (sleep problems, depression) using the Wells calculator software available at Cochrane Musculoskeletal editorial office, which estimates from SMDs the proportion of patients who will benefit from treatment. We will use a minimal clinically important difference of 20% for the calculation of NNTB from SMDs for all continuous outcomes.

Unit of analysis issues

We will split the control treatment arm between active treatment arms in a single study if the active treatment arms are not combined for analysis.

We will include studies with a cross‐over design where separate data from the two periods are reported, where data are presented that exclude a statistically significant carry‐over effect, or where statistical adjustments are carried out in the event of a significant carry‐over effect.

Dealing with missing data

We will use intention‐to‐treat (ITT) analysis where the ITT population consists of participants who were randomised, took at least one dose of the assigned study medication, and provided at least one post‐baseline assessment. Missing participants will be assigned zero improvement wherever possible. Where standard deviations (SDs) are not reported, we will calculate them from t‐values, CIs or standard errors, where reported in articles (Higgins 2011). Where 30% and 50% pain relief rates and 20% FIQ improvement rates are not reported and not provided on request, we will calculate them from means and SDs by a validated imputation method (Furukawa 2005).

Assessment of heterogeneity

We will deal with clinical heterogeneity by combining studies that examine similar conditions. We will assess statistical heterogeneity visually (L'Abbé 1987), and with the use of the I² statistic. When the I² value is greater than 50%, we will consider possible reasons for this.

Assessment of reporting biases

The aim of this review is to use dichotomous outcomes of known utility and of value to patients (Hoffman 2010; Moore 2010b; Moore 2010c; Moore 2010d; Moore 2013a). The review will not depend on what the authors of the original studies chose to report or not, though clearly difficulties will arise in studies failing to report any dichotomous results.

We will assess publication bias using a method designed to detect the amount of unpublished data with a null effect required to make any result clinically irrelevant (usually taken to mean a NNTB of 10 or higher; Moore 2008).

Data synthesis

We plan to use a fixed‐effect model for meta‐analysis. We will use a random‐effects model for meta‐analysis if there is significant clinical heterogeneity and it is considered appropriate to combine studies.

If data are sufficient, we will undertake a quantitative synthesis and present data in forest plots. In the event of substantial clinical heterogeneity, we will switch off the totals in the forest plots.

We will undertake a meta‐analysis only if participants, interventions, comparisons, and outcomes are judged to be sufficiently similar to ensure an answer that is clinically meaningful, and only where there are data from at least 2 studies and 200 participants for analysis.

We will use RevMan for meta‐analysis (RevMan 2014) and Excel for NNTB and NNTH.

Quality of the evidence

We will use the GRADE approach to assess the quality of evidence related to each of the key outcomes listed in Types of outcome measures (Chapter 12, Higgins 2011), and to interpret findings (Guyatt 2011; Langendam 2013). The GRADE approach defines the quality of the evidence as the extent of confidence in the estimates of treatment benefits and their safety. Two review authors (KB, WH) will independently make quality ratings separately for each of the 14 outcomes. We will consider the following potential reasons to downgrade the quality of evidence (Guyatt 2011; Häuser 2015b).

Limitations of study design: where more than 50% of participants are from low quality studies as defined by the 'Risk of bias' tool.
Inconsistency of results: where point estimates vary widely across studies or CIs of studies showed minimal or no overlap (Guyatt 2011).
Imprecision: where there is only one trial or, where there is more than one trial, the total number of participants was fewer than 400.
Small size: where there are so few data that the results are highly susceptible to the random play of chance (McQuay 1998; Thorlund 2011).
Indirectness: if exclusion of participants with inflammatory rheumatic disease or anxiety and depressive disorders, or both, in the included studies resulted in ≥ 50% of the total patient collective of the systematic review coming from studies in which patients with inflammatory rheumatic or anxiety and depressive disorders, or both, were excluded. This takes into account whether the question being addressed by the systematic review diverged from the available evidence, in terms of the population in routine clinical care.
Imputation: if studies use last observation carried forward imputation in circumstances where there were substantial differences in adverse event withdrawals (Moore 2012b).
Publication bias: where there is potential for publication bias, based on the amount of unpublished data required to make the result clinically irrelevant (Moore 2008), or where there is any concern over selective reporting influencing efficacy or harm estimates.

We will pay particular attention to inconsistency, indirectness and imprecision. In addition, there may be circumstances where the overall rating for a particular outcome needs to be adjusted as recommended by GRADE guidelines (Guyatt 2013a); for example, where one would have no confidence in the result, and would need to downgrade the quality of the evidence by three levels, to very low quality. In circumstances where there were no data reported for an outcome, we will report the level of evidence as very low quality (Guyatt 2013b).

'Summary of findings' tables

We will create 'Summary of findings' tables as appropriate. These tables provide outcome‐specific information concerning the overall quality of evidence from studies included in the comparison, the magnitude of effect of the interventions examined and the sum of available data on the outcomes we considered.

The 'Summary of findings' table(s) will include outcomes of participant‐reported pain relief of 50% or greater , PGIC (moderate and/or substantial), participant reported fatigue and sleep problems, withdrawals due to adverse events, weight gain and serious adverse events.

For the 'Summary of findings' tables we will use the following descriptors for levels of evidence (EPOC 2015).

High: This research provides a very good indication of the likely effect. The likelihood that the effect will be substantially different^† is low.
Moderate: This research provides a good indication of the likely effect. The likelihood that the effect will be substantially different^† is moderate.
Low: This research provides some indication of the likely effect. However, the likelihood that it will be substantially different^† is high.
Very low: This research does not provide a reliable indication of the likely effect. The likelihood that the effect will be substantially different^† is very high.

^† Substantially different: a large enough difference that it might affect a decision.

Subgroup analysis and investigation of heterogeneity

We do not plan subgroup analyses since experience of previous reviews indicates that there will be too few data for any meaningful subgroup analysis.

Sensitivity analysis

We plan no sensitivity analysis because the evidence base is known to be too small to allow reliable analysis.

Cochrane Review language

Website language