Prognostic models for chronic lymphocytic leukaemia: an exemplar systematic review and meta-analysis

  • Protocol
  • Prognosis



This is the protocol for a review and there is no abstract. The objectives are as follows:

Primary objective

The objective of this systematic review is to identify, describe and appraise all prognostic models developed to predict overall or progression-free survival in patients with CLL. If enough models for the same type of outcome are present, we will meta-analyse performance of the models.


Description of the condition

Chronic lymphocytic leukaemia (CLL) is the most common form of malignant neoplasm of the lymphatic system in the Western nations. It is responsible for 25% of all leukemias and occurs mainly in the elderly population (Chiorazzi 2005). The reported age-adjusted incidence rate of CLL in the United States between 2006 and 2010 was 4.3 per 100,000 persons with the age-adjusted death rate 1.4 per 100,000 persons per year (Howlader 2013). In the European Union, it is estimated that in 2006 around 46,000 individuals were living with CLL five years post-diagnosis (Watson 2008).

The disease is characterised by a highly variable clinical course and prognosis. Some patients have no or minimal symptoms over many years, do not require treatment, and have a normal life expectancy. Other patients, however, have symptoms at diagnosis or shortly thereafter and die within a few years despite chemotherapy. This heterogeneity in the clinical presentation makes it difficult for the physician to predict accurately which patients may benefit from an early and aggressive treatment strategy and to provide patients with relevant prognostic information.

The requirement for a diagnosis of CLL has been modified from a chronic absolute lymphocytosis greater than 5.0 × 10⁹/L to an absolute count of greater than 5.0 × 10⁹/L monoclonal B cells with a CLL immunophenotype in the peripheral blood, if there is an absence of disease-related symptoms or cytopenias, or tissue involvement other than bone marrow (Hallek 2008). A diagnosis of small lymphocytic lymphoma (SLL) is made when lymphadenopathy or splenomegaly caused by infiltrating CLL cells is found, with fewer than 5 × 10⁹ CLL-type cells in the blood (Hallek 2008).

Description of prognostic models

Prognostic factors measure the likely outcome of the disease (e.g. likelihood of survival or of relapse) of an individual patient performed at diagnosis. They enable clinicians to choose the appropriate treatment strategies for defined patient groups.

The increasing array of biological markers in CLL provides a challenge and opportunity to develop more exact algorithms that integrate a combination of markers and tools to guide counselling and treatment decisions for individual patients. This is in contrast to non-Hodgkin lymphoma where such a standard prognostic index that can be used to group patients with CLL does not exit (Solal-Celigny 2007).

The two most common staging systems according to Binet 1981 and Rai 1975 distinguish between early (Rai 0, Binet A), intermediate (Rai I, II; Binet B) and advanced stages (Rai III, IV; Binet C). The stage of disease is determined by the number of lymphocytes in the peripheral blood, presence of enlarged lymph nodes, presence of anaemia or thrombocytopenia, and presence of an enlarged liver or spleen. The prognostic value of these two staging systems are limited because within these stages survival times still vary markedly. According to guidelines by the European Society for Medical Oncology, B-symptoms at diagnosis (fever, night sweats or weight loss) categorise people with early stage disease according to Binet or Rai who would be considered to have aggressive disease requiring treatment (Eichhorst 2015).

During the last few years, a number of clinical and biological factors of prognostic relevance have been identified, such as genomic aberrations, gene abnormalities (p53, ATM), the mutation status of the variable segments of the immunoglobulin heavy chain genes (IGVH) or surrogate markers for these factors, such as CD38 and ZAP-70 expression (Crespo 2003; Dohner 2000; el Rouby 1993; Kay 2007). By considering recent molecular markers in the diagnosis it is expected to achieve and provide more reliable information on the best time to initiate therapy, to the type of therapy and the individual prognosis (Claus 2014; Haferlach 2010; Kay 2007; Shanafelt 2004; Wierda 2011; Zenz 2011). Thus, progressive and smouldering forms of the disease can now be separated more accurately than using Rai or Binet staging systems alone. Moreover, this early recognition of aggressive stage A and indolent stage B and C disease would allow rational application of risk-adapted treatment strategies. Factors influencing the choice of treatment include age; fitness to tolerate chemotherapy or immunotherapy or both; TP53 status; previous or current immune cytopenias; and evidence of lymphomatous transformation (Goede 2012).

Many trials evaluated the prognostic significance of one or two prognostic indicators only. For the introduction of an improved staging classification, a comprehensive evaluation of all currently available prognostic models is urgently needed. This unique feature of the largest and most comprehensive collection of prognostic models for CLL patients should be used for an improved prognostic classification of CLL.

Why it is important to do this review

For chronic lymphocytic leukaemia (CLL) several markers have been published in the last decade, strongly influencing treatment decisions (Pflug 2014; Stilgenbauer 2014; Zaja 2013). Currently, the priority of the individual prognostic markers is partly still unclear and the therapeutic consequences of models including these markers are controversial and the subject of research (Haferlach 2010; Pflug 2014). It remains unclear which models have the greatest validity and should be preferred to guide clinical decision making. To date no systematic review has been conducted to evaluate and assess these prognostic models in CLL. To shed light on this important research question, we will conduct a systematic review and, if possible, a meta-analysis on existing prognostic models and validation studies in CLL.


Primary objective

The objective of this systematic review is to identify, describe and appraise all prognostic models developed to predict overall or progression-free survival in patients with CLL. If enough models for the same type of outcome are present, we will meta-analyse performance of the models.


This protocol for a systematic review of prognostic models is an exemplar protocol of a new review type within The Cochrane Library. Methods have not been standardised by Cochrane.

Criteria for considering studies for this review

Types of studies

According to the checklist for critical appraisal and data extraction for systematic reviews of prediction modelling studies (CHARMS) checklist we will include (Moons 2014):

  • Prognostic model development studies without external validation in independent data

  • Prognostic model development studies with external validation in independent data

  • External model validation studies


We will include studies on patients with previously untreated, confirmed B-cell CLL. Both male and female adult patients (age >= 18 years) will be included.

Types of prognostic models

We will assess all prognostic models and validation studies developed for CLL. A prognostic model is defined as some form of mathematical function including at least two independent variables to predict overall survival or progression-free survival. We will include only models that include prognostic factors known before the onset of treatment.

We will exclude studies which were published only as abstracts.

Type of outcome measures

Primary outcome

We will include models that predict overall survival as an outcome. We chose this outcome because it has the greatest clinical relevance and is most important for patients. Furthermore, death due to any cause is an objective endpoint not susceptible to be biased by the outcome assessor.

Secondary outcomes

We will include models that predict progression-free survival, as patients with similar survival may nevertheless have differing lengths of time without symptoms or requirement for treatment, depending both on initial treatment and disease characteristics. Response is defined as the level of disease regression obtained with front line treatment (Cheson 2012). Determination of which patients are less likely to obtain a good response will help with decisions about which patients might be treated with new, more aggressive treatment strategies.

Search methods for identification of studies

Electronic searches

Reporting and therefore retrieval of prognostic model studies is very poor, as guidelines on reporting of prediction models have just been published (Collins 2015). Moreover, no specific search filter exists for this new methodological approach, therefore published filters have to be combined for a sensitive search strategy (Geersing 2012). As this search strategy will not be very specific, many hits are expected to be screened in detail by two review authors. We will search MEDLINE and will not apply a language restriction to reduce the language bias.

In order to be able to get information about laboratory and cytogenetic parameters, we will include models which have been published after 1990. Indeed before this time, information was not widely available and the markers were rarely documented partly because fluorescent in-situ hybridisation (FISH) was a new test. With this inclusion criterion we want to ensure that all relevant parameters are documented and a sufficiently long follow up is recorded.

We will search the following databases and sources.

  • Databases of medical literature:

    • MEDLINE (Ovid) (1990 to present) (Appendix 1).

    • PUBMED (2015) (Appendix 2).

    • database of prognostic studies maintained by the Cochrane Prognosis Methods Group (PMG).

  • Databases of ongoing trials:

  • Conference proceedings of annual meetings of the following societies for abstracts (2010 to present):

    • American Society of Hematology;

    • American Society of Clinical Oncology;

    • European Hematology Association.

Searching other resources

  • Handsearching of references

    • References of all identified trials, relevant review articles and current treatment guidelines for further literature

  • Personal contacts

    • Authors of relevant studies, study groups, experts and investigators from transplantation centres worldwide who are known to be active in the field will be contacted for unpublished material or further information on ongoing studies.

Data collection and analysis

Selection of studies

Two review authors will independently screen the results of the search strategies for eligibility for this review by reading the abstracts. In the case of disagreement the full text publication will be obtained. If no consensus can be reached, we will ask a third review author (Higgins 2011).

We will document in a flow chart as recommended in the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) statement (Moher 2009), showing the total numbers of retrieved references and the numbers of included and excluded studies.

Data extraction and data management

Two review authors will independently extract the data according to the CHARMS checklist (Moons 2014) to investigate both, the reporting and the use of methods known to influence the quality of prognostic models. We will contact authors of individual studies for additional information, if required. We will use a standardised data extraction form containing the following items.

  • General information:

    • author, title, source, publication date, country, language, duplicate publications.

  • Source of data

    • (e.g., cohort, case-control, randomised trial participants, or registry data)

  • Participants

    • Participant eligibility and recruitment method (e.g., consecutive participants, location, number of centres, setting, inclusion and exclusion criteria)

    • Participant description

    • Details of treatments received

    • Study dates

  • Outcomes to be predicted

    • Definition and method for measurement of outcome

    • Was the same outcome definition (and method for measurement) used in all patients?

    • Was the outcome assessed without knowledge of the candidate predictors (i.e., blinded)?

    • Time of outcome occurrence or summary of duration of follow-up

  • Candidate predictors

    • Number and type of predictors (e.g., demographics, patient history, physical examination, additional testing, disease characteristics, tumour markers)

    • Definition and method for measurement of candidate predictors

    • Timing of predictor measurement (e.g., at patient presentation, at diagnosis, at treatment initiation)

    • Were predictors assessed blinded for outcome, and for each other (if relevant)?

    • Handling of predictors in the modelling (e.g., continuous, linear, non-linear transformations or categorised)

  • Sample size

    • Number of participants and number of outcomes/events

    • Number of outcomes/events in relation to the number of candidate predictors (Events Per Variable)

  • Missing data

    • Number of participants with any missing value (include predictors and outcomes)

    • Number of participants with missing data for each predictor

    • Handling of missing data (e.g., complete-case analysis, imputation, or other methods)

  • Model development

    • Modelling method (e.g., logistic, survival, neural networks, or machine learning techniques)

    • Modelling assumptions satisfied

    • Method for selection of predictors for inclusion in multivariable modelling (e.g., all candidate predictors, pre-selection based on unadjusted association with the outcome)

    • Method for selection of predictors during multivariable modelling (e.g., full model approach, backward or forward selection) and criteria used (e.g., p-value, Akaike Information Criterion)

    • Shrinkage of predictor weights or regression coefficients (e.g., no shrinkage, uniform shrinkage, penalized estimation)

  • Model performance

    • Calibration (calibration plot, calibration slope, Hosmer-Lemeshow test) and discrimination (C-statistic, D-statistic, log-rank) measures with confidence intervals

    • Classification measures (e.g., sensitivity, specificity, predictive values, net reclassification improvement) and whether a priori cut points were used

  • Model evaluation

    • Method used for testing model performance: development dataset only (random split of data, resampling methods, e.g., bootstrap or cross-validation, none) or separate external validation (e.g., temporal, geographical, different setting, different investigators)

    • In case of poor validation, whether model was adjusted or updated (e.g., intercept recalibrated, predictor effects adjusted, or new predictors added)

  • Results

    • Final and other multivariable models (e.g., basic, extended, simplified) presented, including predictor weights or regression coefficients, intercept, baseline survival, model performance measures (with standard errors or confidence intervals)

    • Any alternative presentation of the final prediction models, e.g., sum score, nomogram, score chart, predictions for specific risk subgroups with performance

    • Comparison of the distribution of predictors (including missing data) for development and validation datasets

  • Interpretation and Discussion

    • Interpretation of presented models (confirmatory, i.e., model useful for practice versus exploratory, i.e., more research needed)

    • Comparison with other studies, discussion of generalisability, strengths and limitations

Assessment of methodological quality

We will adapt the CHARMS checklist for critical appraisal of prognostic modeling studies, to perform a risk of bias assessment (Moons 2014). Two review authors will independently assess the risk of bias for each study using the following items with the criteria mentioned above:

  • source of data;

  • measurement of outcomes to be predicted;

  • measurement of candidate predictors;

  • sample size;

  • missing data;

  • model development;

  • model performance;

  • model evaluation;

  • other risk of bias

We will make a judgement for every criterion, using one of the following three categories.

  1. 'Low risk': if the criterion is adequately fulfilled in the study, i.e. the study is at a low risk of bias for the given criterion.

  2. 'High risk': if the criterion is not fulfilled in the study, i.e. the study is at high risk of bias for the given criterion.

  3. 'Unclear': if the study report does not provide sufficient information to allow for a clear judgement or if the risk of bias is unknown for one of the criteria listed above.

Investigation/description of heterogeneity

We will investigate and discuss clinical and statistical heterogeneity and design aspects of included studies mentioned in the section data extraction and data management. Special aspects will be given on the year of diagnosis, as old and new criteria for diagnosis (see Background, Description of the condition) could be assessing different groups of patients due to the change in definition (changing from a chronic absolute lymphocytosis greater than 5.0 × 109/L to an absolute count of greater than 5.0 × 109/L monoclonal B cells with a CLL immunophenotype in the peripheral blood, if there is an absence of disease-related symptoms or cytopenias, or tissue involvement other than bone marrow (Hallek 2008)).

Discussing reporting deficiencies

It's widely recommended that all prognostic models assess discrimination and calibration (Collins 2015), however, it is known from numerous systematic reviews of methodological conduct and reporting of prognostic models, that calibration is rarely reported and when it is reported, it is done quite poorly. We will evaluate potential reporting deficiencies.

Methods and reporting in prognostic research often do not follow current methodological recommendations, limiting retrieval, reliability and applicability of these publications (Bouwmeester 2012; Peat 2014). There are some hints that prognosis research is cluttered with false-positive studies which would not have been published, if the results were negative. Moreover, studies evaluating development studies of prognostic models are not prospectively registered, usually no protocol is published (Peat 2014). Therefore it is difficult, to assess publication bias. We will use sensitive search filters to increase retrieval (Geersing 2012) and will discuss potential publication bias.

Data synthesis

In contrast to “classical” meta-analysis focusing on treatment efficacy as parameter of interest, there is no single recommended methodology to meta-analyse on an aggregate level the predictive performance of prognostic models (Debray 2012a; Debray 2014). Hence, we will extract from the development and validation studies the reported predictive performance measures of each studied model (notably the model's discrimination and calibration in relation to the method of evaluation (i.e. apparent performance, split sample, bootstrapping, etc)).

Discrimination refers to the ability of a prediction model to differentiate between those who do or do not experience the outcome event. A model has perfect discrimination if the predicted risks for all individuals who develop the outcome are higher than those for all individuals who do not experience the outcome. Discrimination is commonly estimated by the so-called concordance index (c-index). The c-index reflects the probability that for any randomly selected pair of individuals, one with and one without the outcome, the model assigns a higher probability to the individual with the outcome. The c-index is identical to the area under the receiver-operating characteristic curve for models with binary endpoints, and can be generalized for time-to-event (survival) models accounting for censoring.

Has a particular model been validated on numerous occasions, we will pool the results by applying meta-analyses and meta-regression using the meta-analysis packages in the R statistical language (Debray 2014).


We would like to thank Andrea Will of the Cochrane Haematological Malignancies Editorial Base as well as the editors Benjamin Djulbegovic and Céline Fournier for commenting on this review. We also thank Elizabeth Royle and the Copy Edit Support team for their assistance.


Appendix 1. MEDLINE search strategy

# Searches
2((leuk?em$ or leu?em$ or lymph$) adj (lymphocyt$ or lymphoblast$ or linfoid$ or b-cell$)).tw,kf,ot.
3(chronic$ or cronic$ or chroniq$ or well-differentia$).tw,kf,ot.
42 and 3
5(lymphom$ and (small cell$ or small-cell$)).tw,kf,ot.
6(lymphom$ adj2 lymphocyt$).tw,kf,ot.
111 or 4 or 10
12(predict$ or clinical$ or outcome$ or risk$).mp.
13validat$.mp. or predict$.ti. or rule$.mp.
14(predict$ and (outcome$ or risk$ or models$)).mp.
15((history or variable$ or criteria or scor$ or characteristic$ or finding$ or factor$) and (predict$ or model$ or decision$ or identif$ or prognos$)).mp.
16decision$.mp. and ((model$ or clinical$).mp. or LOGISTIC MODELS/)
17(prognostic and (history or variable$ or criteria$ or scor$ or characteristic$ or finding$ or factor$ or model$)).mp.
19validat$.mp. or predict$.ti. or DECISION SUPPORT TECHNIQUES/ or rule$.mp. or PREDICTIVE VALUE OF TESTS/
20(predict$ and (clinical$ or identif$)).mp.
2119 or 20
23(risk$ adj scores$).tw,kf,ot.
25(risk$ adj (score$ and factor$)).tw,kf,ot.
27(decision$ adj2 (techniqu$ or model$)).tw,kf,ot.
28(decision$ and support$ and technique$).tw,kf,ot.
29(prediction$ and rule$ and clinical$).tw,kf,ot.
30(decision$ adj2 (modeling$ or aid$ or analys$ or technique$)).tw,kf,ot.
3211 and (18 or 21 or 31)
3432 not 33
3511 and (12 or 19 or 22 or 33)
3735 not 36

Appendix 2. PUBMED search strategy

((((leukem*[Title/Abstract] OR leukaem*[Title/Abstract] OR leucem*[Title/Abstract] OR lymph*[Title/Abstract]) AND (lymphocyt*[Text Word] OR lymphoblast*[Text Word] OR linfoid*[Text Word] OR b-cell*[Text Word]) AND (chronic*[Text Word] OR cronic*[Text Word] OR chroniq*[Text Word] OR well-differentia*[Text Word])) OR ((lymphom*[Text Word] AND (small cell*[Text Word] OR small-cell*[Text Word])))) AND ((((predict*[Text Word] OR clinical*[Text Word] OR outcome*[Text Word] OR risk*[Text Word] OR validat*[Text Word] OR rule*) AND Text Word) OR ((predict*[Text Word] AND (outcome*[Text Word] OR risk*[Text Word] OR models*)) AND Text Word) OR (((history[Text Word] OR variable*[Text Word] OR criteria[Text Word] OR scor*[Text Word] OR characteristic*[Text Word] OR finding*[Text Word] OR factor*) AND Text Word AND (predict*[Text Word] OR model*[Text Word] OR decision*[Text Word] OR identif*[Text Word] OR prognos*)) AND Text Word) OR ((decision*[Text Word] AND (model*[Text Word] OR clinical*)) AND Text Word) OR ((prognostic[Text Word] AND (history[Text Word] OR variable*[Text Word] OR criteria*[Text Word] OR scor*[Text Word] OR characteristic*[Text Word] OR finding*[Text Word] OR factor*[Text Word] OR model*)) AND Text Word)) OR ((validat*[Text Word] OR predict*[Text Word]) OR ((predict*[Text Word] AND (clinical*[Text Word] OR identif*)) AND Text Word)) OR (((risk* AND scores*) AND Text Word) OR ((risk* AND (score* AND factor*)) AND Text Word) OR ((decision* AND (techniqu* OR model*)) AND Text Word) OR ((decision* AND support* AND technique*) AND Text Word) OR ((prediction* AND rule* AND clinical*) AND Text Word) OR ((decision* AND (modeling* OR aid* OR analys* OR technique*)) AND Text Word)))) AND ((pubstatusaheadofprint OR publisher[sb] OR pubmednotmedline[sb]))

Contributions of authors

Nicole Skoetz: Protocol development

Lise J Estcourt: Medical and content input

Karl-Anton Kreuzer: Medical and content input

Nicola Köhler: Content input

Robert Wolff: Methological input

Gary Collins: Statistical input

Marialena Trivella: Statistical input

Karel Moons: Methodological Input

Declarations of interest

Nicole Skoetz: none known

Lise J Estcourt: none known

Karl-Anton Kreuzer: Board member and consultant for AbbVie, Alexion, Amgen, Ariad, Baxter, Bayer Health Care, Biotest, Boehringer Ingelheim, Bristol Myers Squibb, Celgene, Chugai, Gilead, Glaxo-SmithKline, Grifols, Hexal, Janssen, Jazz Pharmaceuticals, Leo, Mundipharma, MSD, Novartis, Pfizer, Roche, Shire, Teva. Grants, fees, honoraria and travel grants from AbbVie, Alexion, Amgen, Ariad, Baxter, Bayer Health Care, Biotest, Boehringer Ingelheim, Bristol Myers Squibb, Celgene, Chugai, Gilead, Glaxo-SmithKline, Grifols, Hexal, Janssen, Jazz Pharmaceuticals, Leo, Mundipharma, MSD, Novartis, Pfizer, Roche, Shire, Teva. None of the mentioned relationships has a direct influence on his activities within the Cochrane group.

Robert Wolff: As employee of Kleijnen Systematic Reviews, he was the lead author of a systematic review on prostate cancer, commissioned by Elekta, Nucletron.

Nicola Köhler: none known

Gary Collins: none known

Marialena Trivella: Her position at the UK Cochrane Center is independent to her involvement in this review. She declares that her involvement here as an author has no related financial relationships.

Karel Moons: none known

Sources of support

Internal sources

  • University Hospital of Cologne, Department I of Internal Medicine, Germany.

External sources

  • No sources of support supplied


Parts of this review, especially the methods, are from the Cochrane Haematological Malignancies standard template.