Scolaris Content Display Scolaris Content Display

Model for end stage liver disease for prediction of mortality in people with cirrhosis

Contraer todo Desplegar todo

Abstract

Objectives

This is a protocol for a Cochrane Review (prognosis). The objectives are as follows:

Primary aim

The primary aim of this systematic review is to assess the performance of the MELD, in terms of discrimination and calibration, to predict all‐cause mortality of people with cirrhosis.

PICOTS (Population, Intervention, Comparator, Outcome, Timing, Setting) format for the primary aim

The scope of the study is to provide physicians working in general practice or in hospital (outside of ICUs, which represent a highly specialist setting with a large number of complicating factors), with a reliable measure of the model's performance for prognostic information about an individual and for therapeutic decision making.

  • Target population: unselected people with cirrhosis

  • Intervention: the MELD prognostic tool

  • Comparator: none

  • Outcome: all‐cause mortality

  • Timing: 3, 6, and 12‐month (or longer) mortality predicted at any time along the disease course

  • Setting: in‐hospital or ambulatory assessment of people with cirrhosis for the risk of death

Secondary aims

The secondary aim is to assess the performance of the MELD to predict all‐cause mortality of people with cirrhosis under specific clinical conditions: TIPS, cirrhosis with alcoholic hepatitis, or OLT.

Investigation of sources of heterogeneity between studies

We will perform meta‐regression analysis to explore potential reasons for heterogeneity, according to study level and participant level characteristics.

Background

The Model for End‐stage Liver Disease (MELD) is a prognostic score used widely to prioritise people for liver transplant. Based on three objective variables, bilirubin, creatinine, and the international normalised ratio (INR, a measure of coagulation in blood), the MELD does not require any subjective assessment and seems to allow for more reproducible patient prediction compared to predictive tools that use subjective criteria (Pugh 1973). Originally, the MELD was developed to predict three‐month survival after elective transjugular intrahepatic portosystemic shunt (TIPS) for refractory variceal bleeding or for refractory ascites (Malinchoc 2000). Subsequently, its use was extended to survival prediction for cirrhosis from any aetiology (Kamath 2001; Wiesner 2003). Several studies showed a good discriminant quality of the MELD in stratifying people according to survival probability at fixed time points, with a C‐statistic (Hanley 1982) mostly in the order of 0.8 (Botta 2003; Bransdaeter 2003; Said 2004; Kamath 2007).

Given its characteristics of objective assessment and reliable patient stratification according to risk of death, the MELD was adopted for organ allocation priority for orthotopic liver transplantation (OLT) in 2002 in the USA, and in many other countries thereafter (Kamath 2007EUROTRANSPLANT 2020; OPTN 2020). By use of a dynamic reassessment protocol, the MELD also allows a prioritisation policy of 'the sickest patient first', instead of the 'oldest patient in list first' policy previously used (EUROTRANSPLANT 2020; OPTN 2020). This has resulted in reductions in both the number of people included on waiting lists, and the number of people dropping from the waiting list because of disease progression (being too sick for treatment) or death (Freeman 2004; Wiesner 2004; Kanwal 2005; Kamath 2007; Berg 2009).

However, some studies have also pointed out that the MELD has limitations in survival prediction for several population subgroups. Typical examples of such limitations are people with hepatocellular carcinoma (Wiesner 2004), refractory ascites (Heuman 2004), acute variceal bleeding (Reverter 2014), and several others. In the setting of liver transplantation, some of these limitations have prompted exceptions to the MELD, with additional points assigned to the most severe conditions for a more proper organ allocation priority (Freeman 2006). In general, such limitations have encouraged studies to recalibrate the MELD to achieve more accurate survival predictions (Biggins 2006; Luca 2007). However, no validated recalibration of the MELD has yet entered into clinical practice to overcome MELD's limitations.

Description of the health condition and context

This review will explore the performance of the MELD when used for prediction of death in people with compensated or decompensated cirrhosis who are not in the intensive care unit (ICU) and are free of acute on chronic liver failure (ACLF), at any time in the clinical course of the disease.

Cirrhosis is the final stage of any chronic inflammatory disease of the liver following parenchymal necrosis, activated fibrogenesis, angiogenesis, and profound vascular changes. Typically, cirrhosis is classified as compensated or decompensated, based on the absence or presence (or previous history) of variceal bleeding, ascites, jaundice, or encephalopathy (Saunders 1981; Gines 1987; D'Amico 2006; Tsochatzis 2014). The significantly longer survival of people with compensated cirrhosis, usually free of symptoms and with a better quality of life than people with decompensated cirrhosis, has brought about the concept that compensated and decompensated cirrhosis are distinct clinical states of the disease (Bruno 2009; Garcia‐Tsao 2010). Further disease states have been recognised according to the presence of oesophageal varices and to the presence of only one or more disease complications (D'Amico 2006; Jepsen 2010; Zipprich 2012; Gomez 2013; D'Amico 2014).

Median survival time for compensated cirrhosis has been reported to be in the order of 12 years, compared with around two years for people with decompensated cirrhosis (D'Amico 2006). In clinical practice, prognosis of cirrhosis is established by the Child‐Pugh classification (Pugh 1973), or by the MELD (Kamath 2001; Wiesner 2003).

Description of the prognostic model

This systematic review will assess the performance of the MELD in the prediction of death for people with cirrhosis. The model is based on the values of three variables: bilirubin, creatinine, and INR. The formula to calculate the score is as follows (Malinchoc 2000; Kamath 2001; Wiesner 2003):

((9.57 * ln(creatinine)) + (3.78 * ln(bilirubin)) + (11.2 * ln(INR)) + 6.43).

The MELD allows clinicians to rank people according to increasing risk of death, and to calculate an individual's risk of death at a given time (Malinchoc 2000; Kamath 2001; Wiesner 2003).

Health outcomes

The health outcome of interest is all‐cause mortality.

Why it is important to do this review

The MELD is currently the most widely used prognostic score for predicting all‐cause mortality in people with cirrhosis. A number of studies have reported its performance in terms of discrimination using the C‐statistic, with a 95% confidence interval (CI) yielding a range from 0.50 to 0.87. With such a sizeable variability in performance, it is important to have a summary measure of the score's discrimination ability and an indication of whether the variability can be explained. This would facilitate more appropriate clinical use, and possibly reduce the potential harms that arise from incorrect prediction of the risk of death.

Besides discrimination, it is also important to provide a summary measure of calibration for a complete assessment of the model's performance.

Objectives

Primary aim

The primary aim of this systematic review is to assess the performance of the MELD, in terms of discrimination and calibration, to predict all‐cause mortality of people with cirrhosis.

PICOTS (Population, Intervention, Comparator, Outcome, Timing, Setting) format for the primary aim

The scope of the study is to provide physicians working in general practice or in hospital (outside of ICUs, which represent a highly specialist setting with a large number of complicating factors), with a reliable measure of the model's performance for prognostic information about an individual and for therapeutic decision making.

  • Target population: unselected people with cirrhosis

  • Intervention: the MELD prognostic tool

  • Comparator: none

  • Outcome: all‐cause mortality

  • Timing: 3, 6, and 12‐month (or longer) mortality predicted at any time along the disease course

  • Setting: in‐hospital or ambulatory assessment of people with cirrhosis for the risk of death

Secondary aims

The secondary aim is to assess the performance of the MELD to predict all‐cause mortality of people with cirrhosis under specific clinical conditions: TIPS, cirrhosis with alcoholic hepatitis, or OLT.

Investigation of sources of heterogeneity between studies

We will perform meta‐regression analysis to explore potential reasons for heterogeneity, according to study level and participant level characteristics.

Methods

Criteria for considering studies for this review

We will consider studies to be eligible for this review if they meet the following criteria: publication in full with no language restrictions; inclusion of people with cirrhosis independently of disease aetiology and clinical stage; assessment of mortality; assessment of the performance of the MELD for the prediction of mortality in terms of discrimination or calibration, or both. We will not include studies published in abstract form only, because reported data are usually insufficient for appropriate description of the target population and setting, assessment of study quality, and full assessment of predictive model performance.

For people to be included in a study, the study should report at least the following information: clinical and demographic characteristics of the included participants; time at which the risk of an individual's death is predicted; clinical condition in which — or disease stage at which — the MELD was used to make the mortality prediction. Lack of this information will be the only exclusion criterion.

Types of studies

We will include prospective and retrospective cohort studies as well as randomised controlled trials. Studies based on data derived from registries will be eligible if they included consecutive participants. We will exclude case‐control studies because of the inherent risk of selection bias and because these studies do not allow for assessment of outcome incidence.

We will not include studies that assess MELD performance in critically ill people admitted to ICU or with ACLF because of the high specificity of such conditions. For the same reason we will not include studies that assess performance of the MELD in the prediction of mortality at less than three months since this is usually an ouctome prediction time of interest for critically ill people. We will consider such studies including critically ill participants for a separate review.

Targeted population

The target population for mortality prediction is people with cirrhosis, independent of aetiology or disease stage (compensated or decompensated), observed either as in‐ or out‐patients. We will also include specific subsets of participants, such as people undergoing TIPS or OLT, or those with alcoholic hepatitis, because no modifications of the MELD are currently recommended in clinical practice for these population groups.

People admitted to ICUs or admitted with ACLF will not be a target population in this review (see above).

Predictive model

The predictive model assessed is the MELD, as described in the section 'Description of the prognostic model' above. In this systematic review, we will include all studies assessing the performance of the model, independently of whether their primary objective was to validate the MELD or to use the MELD as a comparator for other prognostic tools.

Types of outcome measures

The outcome measure of interest is discrimination and calibration of the MELD to predict all‐cause mortality.

Search methods for identification of studies

Electronic searches

We will identify studies of interest through electronic searches in the Cochrane Central Register of Controlled Trials (CENTRAL) in the Cochrane Library, PubMed, and Embase Ovid. Appendix 1 gives the search strategies with the expected time spans of the searches. The time span for the searches will be from 2000, which is the year of publication of the article reporting the development of the model (Malinchoc 2000).

Searching other resources

Two authors (DF, GP) will identify additional references by manually searching the reference lists of the retrieved studies.

Data collection and analysis

Selection of studies

Two review authors (GD, GP) will assess eligibility of studies independently; they will solve potential discordances in eligibility assessment of individual studies by discussion and, when needed, will consult a third review author. In the case of important missing data to assess study eligibility, we will ask the authors of the included studies for this information, where possible.

Data extraction and management

Review authors (GD, GP, DF, FL, AA, GP, GEMR, MM, IP, MGB) will carry out data extraction using the pilot data extraction form reported in Appendix 2, after testing the appropriateness and applicability of the form in the first 10 included studies. The pilot data extraction form will be set in accordance with the Checklist for Critical Appraisal and Data Extraction for Systematic Reviews of Prediction Modelling Studies (CHARMS) guidance (Moons 2014) and will also include any other information specifically needed for the aim of this review. Two review authors will perform data extraction independently, with the aim to assess whether inclusion and exclusion criteria were fulfilled, evaluate the risk of bias, and extract the required descriptive and quantitative data for meta‐analysis (if appropriate).The two data extractors will resolve any discrepancy in data extraction by discussion, involving a third review author if necessary. We will contact study authors whenever appropriate to retrieve relevant information missing in the available study publication.

Review authors will also extract mean values and standard deviations of the MELD, and of the major participant characteristics (demographic and clinical) to derive information of participant‐mix across studies.

Assessment of risk of bias in included studies

Review authors (GD, GP, DF, FL, AA, GP, GEMR, MM, IP, MGB) will assess methodological quality or risk of bias in the retrieved studies using the Prediction model Risk Of Bias ASsessment Tool (PROBAST) (Moons 2019). We will include relevant items in the data extraction form. For the domain 'outcome', only the question 'Time from predictors assessment to time of outcome assessment' of the tool applies to the present review, because the outcome of interest is death. Items and scoring for quality assessment according to the PROBAST tool, including applicability, are included in the last section of the data extraction form (Appendix 2). The two data extractors will resolve any discrepancy in 'Risk of bias' assessment by discussion between themselves and, if needed, with a third author.

In accordance with the PROBAST tool, we will classify studies as being at low, high, or unclear risk of bias (Moons 2019). We will then explore the impact of the quality of included studies on the summary measures of MELD performance using separate sensitivity analyses, including studies according to their risk of bias.

Predictive performance measures

Authors (GD, GP, DF, FL, AA, GP, GEMR, MM, IP, MGB) will extract discrimination and calibration measures as follows.

We will assess discrimination using the C‐statistic. We will extract relevant information from each study together with uncertainty measures (CI or standard errors).

Authors will assess calibration using the ratio of the number of observed events to the number of expected events (O:E). We will extract these figures from the articles when available, or derive them from the calibration plots. To account for non‐observed events (dropouts), we will derive the number of observed events from the corresponding Kaplan Meier (KM) curves when available, but we will use the number of actually observed events if the number of dropouts is small and the study does not report the KM curve.

To explore the effect of participant mix on MELD performance, we will extract the mean values and standard deviations of the MELD, its components (bilirubin, creatinine, and INR) and age.

Dealing with missing data

In the case of important missing data relevant to the MELD score, its components, major clinical and demographic characteristics, and survival status, we will ask the authors of the included studies for this information, where possible.

Two review authors (GD, AM) will approximate missing mean MELD scores and means of other relevant variables in individual studies using the median value and range, if available (Hozo 2005; Debray 2017a).

When relevant data are available, we will derive missing values for the standard error of the C‐statistic (for discrimination) or the O:E ratio (for calibration), according to Debray 2017a (with online appendix at Debray 2017b).

In validation studies not reporting the expected probability of death, we will calculate its mean value in the individual study population using the formula:

S(t) = S0(t)exp(R ‐ R0)

where R0 is the risk score of the average participant in the series of participants included in the derivation study (Malinchoc 2000) with a value of 1.127, and R is the mean value of the MELD in the validation study. S0(t), or the estimated survival probability for an average participant in the derivation study, was given in the reporting article (Table 5, Malinchoc 2000) as follows: S0 (3 months) = 0.707; S0 (6 months) = 0.621; S0 (1 year) = 0.551; S0 (2 years) = 0.428.

This will allow us to calculate the O:E ratio in those studies not reporting it, if the number of observed deaths or the corresponding KM curve is available.

Assessment of heterogeneity

To account for heterogeneity, review authors (GD, AM) will perform meta‐analyses, where appropriate, using a random‐effects model. We will calculate the amount of between study heterogeneity using Tau2 (as a measure of the between‐study variance) and I2 (as a measure of the variation in model performance attributable to heterogeneity).

If we find important heterogeneity, we will calculate a prediction interval either for the C‐statistic (discrimination) or the O:E ratio (calibration) to give a range for expected MELD performance in new patient samples (Debray 2017a).

Assessment of reporting deficiencies

If we include more than 10 studies, we plan to assess reporting bias using contour‐enhanced funnel plots of the log performance (respectively the C‐statistic for discrimination and the O:E ratio for calibration) against its standard error, to disclose any potential effect of small studies on performance assessment (Riley 2019). Two authors (GD, AM) will conduct this analysis.

Data synthesis

Two review authors (GD, AM) will carry out the data synthesis work described in the following sections.

Data synthesis and meta‐analysis approaches

If we consider the identified studies to be sufficiently comparable, we will carry out meta‐analysis of the C‐statistic for discrimination and the O:E ratio for calibration. We plan to use a restricted maximum‐likelihood (REML) random‐effects model to allow for the expected heterogeneity across studies (Debray 2017a; Debray 2019; Riley 2019).

For model discrimination, we will use logit transformation of the C‐statistic (logit(C) = ln(C/(1 ‐ C)) from each study to improve the validity of the normality assumption (Snell 2018). We will calculate the standard error (SE) of logit(C) by the delta method when the upper and lower CI boundaries are not available (Debray 2017a; Debray 2019). Otherwise, when the CI is available, we will calculate it by (logit(Cub) ‐ logit(Clb)/(2 x 1.96), where Cub is the upper boundary and Clb is the lower boundary of the 95% CI of the C‐statistic (Debray 2019).

For calibration, we will perform meta‐analysis of the O:E ratio at the time of interest using the log scale:

ln(O:E) = ln(O) ‐ ln(E), with SE(ln(O:E)) = sqrt(1 ‐ PO/PO)

where PO is the observed probability of death and O is the number of observed deaths at the time of interest (Debray 2017a; Debray 2019).

We plan to perform separate meta‐analyses in terms of discrimination and calibration for MELD performance in the prediction of mortality at three, six, and 12 months. We will include studies that report MELD performance at intermediate time intervals in the nearest group in terms of time of prediction. We will include studies with time of prediction longer than two years in a separate group.

Subgroup analysis and investigation of heterogeneity

We plan to explore differences in participant characteristics, predictor measurements, and methodological aspects between studies as potential reasons for heterogeneity.

To identify potential sources of heterogeneity, we plan to compare the MELD performance in the whole study population with that in subgroups of studies that included participants with the following characteristics:

  • people with only compensated cirrhosis;

  • people with only decompensated cirrhosis

  • people undergoing TIPS;

  • people on a waiting list for OLT;

  • people with acute alcoholic hepatitis;

  • people with viral aetiology of cirrhosis;

  • people with alcoholic aetiology.

We will use meta‐regression analyses if we include at least 10 studies in the meta‐analysis, in order to identify reasons for heterogeneity. In meta‐regression, the performance measure, respectively logit(C) for discrimination and ln(O:E) for calibration, will be the dependent variable, and the following candidate causes of heterogeneity will be the covariates (independent variables): mean MELD, age, and their standard deviations; Child‐Pugh score (Pugh 1973); mean bilirubin; and percentage of people with ascites.

Sensitivity analysis

We plan to perform two sensitivity analyses: by excluding studies at high risk of bias according to the PROBAST tool, and by excluding studies with fewer than 30 deaths.

Conclusions and summary of findings

We will show the study flow across the review process in a graph, summarising the study search results as the numbers of retrieved studies, eligible studies, excluded studies, included studies, and studies included in the meta‐analysis. We will show study characteristics in tables reporting adequate information on study level and participant level characteristics. We will summarise the risk of bias according to the PROBAST tool and present relevant graphs. GRADE tools are available for overall prognosis studies (Iorio 2015), and for single prognostic factors (Foroutan 2020), but not for studies of performance of prognostic models. We are therefore planning to assess the certainty of the evidence of the included studies through their risk of bias, as assessed by the PROBAST tool. However, we will use GRADE as soon as a specific tool for prognostic score performance studies becomes available. We will summarise meta‐analysis results in tables and forest plots, with separate tables and graphs for subgroup analyses.

We will also show meta‐regression analyses in summary tables and regression graphs. We will base our study conclusions on the summary estimates of discrimination and calibration of the MELD shown by the relevant meta‐analyses. Analysis of heterogeneity and its potential explanation will address suggestions about different model performance expectancies according to participants' characteristics or disease stages, possibly supported by appropriate prediction intervals.