Scolaris Content Display Scolaris Content Display

Índices de calificación histológica para la evaluación de la actividad de la enfermedad en la colitis ulcerosa

Contraer todo Desplegar todo

Resumen

Antecedentes

En los pacientes con colitis ulcerosa (CU), la actividad de la enfermedad se puede determinar mediante criterios clínicos, endoscópicos o histológicos. La actividad persistente de la enfermedad se asocia con resultados deficientes. Se ha mostrado que la actividad histológica de la enfermedad se asocia con recurrencia, colectomía y cáncer colorrectal. La capacidad de evaluar objetivamente la actividad microscópica de la enfermedad mediante la histología es importante para la práctica clínica y para los ensayos clínicos. Sin embargo, aún no pueden precisarse las propiedades operativas de los índices histológicos disponibles en la actualidad.

Objetivos

Se realizó una revisión sistemática para identificar y evaluar el desarrollo y las características operativas de los índices histológicos de actividad de la enfermedad utilizados para evaluar la actividad de la enfermedad en los pacientes con colitis ulcerosa.

Métodos de búsqueda

Se realizaron búsquedas de estudios pertinentes en MEDLINE, EMBASE, PubMed, CENTRAL y en el registro especializado de ensayos del Grupo Cochrane de Enfermedad Inflamatoria Intestinal y Trastornos Funcionales del Intestino (Cochrane Inflammatory Bowel Disease and Functional Bowel Disorders Group, IBD/FBD Group) desde el inicio hasta el 2 de diciembre de 2016. No hubo restricciones de idioma ni de tipo de documento.

Criterios de selección

Se consideró para inclusión cualquier diseño de estudio (p.ej., ensayos controlados aleatorizados, estudios de cohortes, serie de casos) que evaluaran un índice histológico en pacientes con CU. Los pacientes elegibles eran adultos (> 18 años), diagnosticados con CU usando criterios convencionales clínicos, radiográficos, endoscópicos e histológicos.

Obtención y análisis de los datos

Dos autores de la revisión (MHM y CEP) examinaron de forma independiente los títulos y resúmenes de los estudios identificados a partir de la búsqueda en la literatura. Se utilizó un formulario estandarizado para evaluar la elegibilidad de los ensayos para su inclusión y para la extracción de los datos.

Dos autores (MHM y CEP) de forma independiente extrajeron y registraron los datos, que incluyeron el número de pacientes reclutados, el número de pacientes por brazo de tratamiento, las características de los pacientes incluida la distribución por edad y sexo, así como el nombre del índice histológico. Se registraron los resultados (es decir, la confiabilidad intraevaluador, la confiabilidad interevaluador, la consistencia interna, la validez del contenido, la validez del criterio, la validez del concepto, la respuesta y la factibilidad) de cada ensayo.

Resultados principales

Mediante el proceso de revisión, se identificaron 126 informes que describieron 30 índices de calificación. Once de los 30 índices de calificación se han sometido a alguna forma de validación índice. Se evaluó la confiabilidad intraevaluador de ocho índices de calificación. Se evaluó la confiabilidad interevaluador de los 11 índices de calificación. Tres de los índices se sometieron a la validación de contenidos. Dos de los índices de calificación incluidos se evaluaron para la validez del criterio. Seis de los índices de calificación incluidos se exploraron para la validez del contenido. Dos de los índices de calificación incluidos se analizaron para la respuesta.

Conclusiones de los autores

El Nancy Index y el Robarts Histopathology Index han sido los más validados en esas cuatro propiedades operativas que son la confiabilidad, la validez del contenido, la validez del concepto (prueba de hipótesis) y la validez del criterio. Sin embargo, ninguno de los índices de calificación histológica disponibles en la actualidad se ha validado completamente. Para determinar la variable de evaluación óptima para la curación histológica en la CU, se requiere más investigación. El índice óptimo se debería validar completamente.

PICO

Population
Intervention
Comparison
Outcome

El uso y la enseñanza del modelo PICO están muy extendidos en el ámbito de la atención sanitaria basada en la evidencia para formular preguntas y estrategias de búsqueda y para caracterizar estudios o metanálisis clínicos. PICO son las siglas en inglés de cuatro posibles componentes de una pregunta de investigación: paciente, población o problema; intervención; comparación; desenlace (outcome).

Para saber más sobre el uso del modelo PICO, puede consultar el Manual Cochrane.

Resumen en términos sencillos

Herramientas de medición histológica para la evaluación de la enfermedad en los pacientes con colitis ulcerosa

¿Qué es la colitis ulcerosa?

La colitis ulcerosa es una enfermedad intestinal inflamatoria para toda la vida (crónica) que provoca inflamación y ulceración (heridas) en el intestino grueso (colon). Los pacientes con colitis ulcerosa a menudo presentan diarrea, heces sanguinolentas, pérdida de peso y dolor abdominal. Cuando los pacientes presentan síntomas la enfermedad se considera "activa", mientras que cuando los síntomas no están presentes la enfermedad se considera "en remisión".

¿Qué es un índice de calificación histológica?

Durante la colonoscopia, se puede tomar una muestra tisular histológica (biopsia) del colon del paciente. La colonoscopia es un procedimiento no quirúrgico utilizado para visualizar el intestino grueso. Una vez que se toma la muestra de tejido, se coloca (monta) en un portaobjetos de vidrio y se observa con el uso de un microscopio. Un índice de calificación histológica es un sistema utilizado para evaluar la gravedad de enfermedad del paciente mediante la muestra tisular.

¿Qué examinaron los investigadores?

Es importante que los índices de calificación histológica sean válidos (es decir, que midan con exactitud lo que se supone que miden). Los investigadores identificaron los índices de calificación histológica que se han validado.

¿Qué encontraron los investigadores?

Los investigadores encontraron que 11 de los 30 índices de calificación histológica que existen se han validado parcialmente. El Nancy Index y el Robarts Histopathology Index han sido los más validados en comparación con los otros nueve índices. Sin embargo, ninguno de los índices de calificación histológica disponibles en la actualidad se ha validado completamente. Para determinar el índice ideal para medir la curación histológica en la CU, se requiere más investigación. El índice ideal se debería validar completamente.

Authors' conclusions

Implication for methodological research

While several of the histologic indices identified have undergone several aspects of validation testing, none of the measures identified by this systematic review are fully validated. In order to determine the optimal endpoint for histologic healing in UC, more research is required.

Background

Ulcerative colitis (UC) is a chronic inflammatory bowel disease (IBD) of unknown etiology that occurs in genetically predisposed individuals. The spectrum of disease in UC ranges from mild diarrhoea to fulminant colitis requiring hospitalisation and possibly emergent colectomy. Patients with UC typically present with bloody diarrhoea and abdominal cramps. Patients with chronically uncontrolled disease can develop long‐term complications such as colorectal cancer or need for colectomy (Abraham 2009). In order to avoid such complications, pharmacologic treatments, including oral or rectal aminosalicylates, or both, corticosteroids, immunosuppressives, tumour necrosis factor‐alpha antagonists, and anti‐integrin therapy are initiated with the goal of inducing and maintaining clinical and endoscopic remission (Cohen 2002; Baumgart 2007). In clinical trials, investigators rely on composite scores to assess disease activity, which frequently involves determining the presence or absence of colonic inflammation by standard colonoscopy (Truelove 1955). Although microscopic inflammation or histologic activity has been linked to an overall poor outcome, it has not been adopted by investigators as a routine outcome in clinical trials, mostly due to the lack of a gold‐standard, to incremental costs and to uncertainty regarding its practical value (Pineton de Chambrun 2010; Ardizzone 2011).

Histology plays an important role in diagnosing UC and can serve as a tool to determine response to therapy. Multiple histologic scoring systems have been developed to quantify colonic micro‐inflammation seen in tissue samples obtained during colonoscopy in a categorical or numerical way. Such tools determine the degree of acute or chronic inflammatory cell infiltrates, the presence or absence of architectural distortion of colonic crypts, and the integrity of the colonic epithelium.

The first histologic index used in clinical trials for UC was the Truelove and Richards index (Truelove 1956). This was followed by the Matts Score which was described in 1961 and first applied to 126 serial biopsies showing that there was a direct relationship between endoscopic and histologic activity (Matts 1961). Similarly, the Watts Score demonstrated that a preserved mucosal vascular pattern is almost always indicative of microscopically inactive disease (Watts 1966).

The Initial Riley Score was described in 1988 as an evaluative instrument to determine mucosal healing in a randomized controlled trial (RCT) that compared delayed release mesalamine to enterically coated sulfasalazine and placebo as maintenance therapy for clinically quiescent UC. In this study, two blinded pathologists independently graded inflammation according to five levels which subjectively categorize tissue samples based on the degree of chronic inflammation and tissue destruction (Riley 1988). Riley 1991 subsequently described the widely used Riley Score in a clinical trial designed to predict relapse in clinically and endoscopically quiescent UC patients. This score incorporates six histologic features commonly used to determine disease activity which include: acute inflammatory cell infiltrate (neutrophils in the lamina propria), crypt abscesses, mucin depletion, surface epithelial integrity, chronic inflammatory cell infiltrate (round cells in the lamina propria), and crypt architectural irregularities. Each feature was graded on a four‐point scale as none, mild, moderate, or severe (Riley 1991). This score was modified by Feagan 2005 and used as an outcome in a large multicenter randomised placebo‐controlled trial evaluating the use of vedolizumab for the treatment of active UC. The Modified Riley Score removed features of chronicity that were thought to be resistant to responsiveness (Feagan 2005).

The Geboes Score, a commonly used histologic index for UC, was developed in 2000 using a multivariate regression model which resulted in an index composed of seven categories (Geboes 2000).

The Chicago Index, also known as the Rubin Index, has not been widely used in clinical practice or clinical trials as it has only been reported in abstract form. This six‐point histologic index was used in a case‐control study in which two blinded pathologists graded UC inflammation without clinical knowledge and as a result disease activity was determined to be an independent risk factor for colonic neoplasia. This result was confirmed by a subsequent analysis between histologic activity and increased risk of future colectomy and hospitalizations (Rubin 2007). The Riley Index has also been shown to correlate directly with the risk of neoplasia, a feature that was also seen with the Harpaz Index, also known as the Mount Sinai Index or Fiel Index (Fiel 2003).

Why it is important to do this review

There are few data available on the operating properties of existing histologic scoring indices despite widespread use in clinical trials. This review will evaluate the relative merits of histologic indices that have undergone validation testing in order to underscore where further research is needed.

Objectives

The primary objective is to systematically review the current literature describing the development and operating characteristics of histologic disease activity indices in UC.

Methods

Criteria for considering studies for this review

Types of studies

Any study design (e.g. randomised controlled trials, cohort studies, case series) that evaluates a histologic index in patients with UC was considered for inclusion. Study subjects included adult patients (> 18 years), diagnosed with UC using conventional clinical, radiographic, histologic and endoscopic criteria.

Types of data

Histologic scoring data obtained from eligible studies were considered for inclusion.

Types of methods

The methods used to construct and validate the histologic index (e.g. reliability, validity, responsiveness and feasibility) were examined in detail and described for each eligible study. We also reported the number of pathologists who scored the histologic index in each study and whether these pathologists were blinded or were aware of other rater's scores.

Types of outcome measures

Reliability: Reliability was assessed by recording reports of intra‐rater and inter‐rater reliability, test‐retest reliability, or internal consistency. Intra‐rater and inter‐rater reliability are assessed by determining the inter‐class correlation coefficient (ICC) or kappa statistic for repeat assessments made by the same rater, and for assessments made by different raters. The Landis and Koch criteria was used to interpret the ICC and kappa values. An ICC of < 0.2 was considered 'slight', 0.21‐0.40 was considered 'fair', 0.41 to 0.60 was considered 'moderate', 0.61 to 0.80 was considered 'substantial' and 0.81 to 1.00 was considered 'almost perfect' (Landis 1977)

Validity: Each study was assessed to determine whether validity was measured, broadly defined as evidence that variations in UC activity causally produce variations in the index measurement outcomes. Studies were reviewed for whether content validity, criterion validity, and construct validity for histologic index scores in specific clinical situations were reported.

Studies attempting to demonstrate content validity are successful if the components of the histologic index are sufficient to measure disease activity in UC. Generally, content validation is qualitatively assessed. For example, an expert panel may be asked to give an opinion on face validity, or a systematic review of the literature may be conducted to support the development of an index.

Criterion validity is considered to be established if the index is considered to be an adequate reflection of true UC disease activity, as assessed against a gold standard of measurement. Unfortunately, there is no single gold standard for assessing histologic activity in UC, which limited, but did not prevent, this kind of assessment. Statistical parameters reported for agreement between the histologic index and objective biomarkers were assessed (i.e. sensitivity, specificity, receiver operating characteristic (ROC) curve, area under the curve (AUC), mean difference, weighted kappa, Spearman’s r squared, and the ICC). Data from studies of predictive criterion validity, which compare whether the score predicts true UC activity or sequelae in the future (such as surgery, or disability) were also recorded.

Studies that reported on the construct validity of the histologic indices, which takes into account the lack of a gold standard for disease activity and assesses whether histologic indices are consistent with other hypotheses of true disease activity, were included in the current review. For example, correlations between the histologic index and clinical and endoscopic indices were recorded.

Responsiveness: The ability of the index to detect change following a period of known histologic change (e.g. after a treatment of known efficacy is administered) serves as an assessment of responsiveness. Responsiveness can be quantified by examining the correlation between mean change scores between indices and indicators of effect size or its functions (Zou 2005), or the use of ROC curves to describe how well various score changes can distinguish between improved and unimproved patients (Deyo 1991).

Feasibility: Feasibility was assessed as rater evaluation of the ease of administration and time required for scoring.

Search methods for identification of studies

Electronic searches

We searched MEDLINE (Ovid), EMBASE (Ovid), PubMed, the Cochrane Library (CENTRAL), and the Cochrane IBD Group Specialized Trials Register from inception to December 2, 2016 for applicable studies. No language or document type restrictions were applied. The search strategies are listed in Appendix 1.

Searching other resources

We performed a manual review of bibliographies and abstracts submitted to major gastroenterology meetings (2000 to present) including:

1. Digestive Disease Week (DDW);
2. United European Gastroenterology Week (UEGW); and
3. European Crohn's and Colitis Organization (ECCO) annual conference.

Reference lists from retrieved articles were scanned to identify additional citations that may have been overlooked by the database search.

Data collection and analysis

Selection of studies

Two authors (MHM and CEP) independently reviewed the titles and abstracts of the studies identified from the literature search. The full text of potentially relevant citations were reviewed for inclusion and the study investigators were contacted to clarify any unclear data. Any disagreements were resolved by discussion and consensus with a third author (BGL).

A standardized form was used to assess eligibility of trials for inclusion in the study based on the inclusion criteria outlined above.

Data extraction and management

A standardized form was used to extract information from selected studies. Two authors (MHM, CEP) independently extracted and recorded data. The following data were recorded from each eligible study:
a) Number of patients enrolled, number of patients per treatment arm;
b) Patient characteristics: age and gender distribution;
c) Histologic Index used; and,
d) Outcomes: measures of intra‐rater reliability; inter‐rater reliability; responsiveness; validity; feasibility; construct validity; criterion validity.

Assessment of risk of bias in included studies

We used the following criteria to appraise the risk of bias of included studies:

  • Blinding to clinical information

  • Independent observation by endoscopists

Blinding to clinical information such as symptoms, physical examination or laboratory information is important to the objective assessment of histologic data. Furthermore, independent observation is essential to ensure that we are confident in the inter‐rater reliability coefficients.

We also assessed the methodological quality of the included studies using the COSMIN (COnsensus‐based Standards for the selection of health Measurement INstrumets) checklist. The checklist consists of 10 properties: internal consistency, reliability, measurement error, content validity, structural validity (factor analysis), hypothesis testing, cross‐cultural validity, criterion validity, responsiveness to change and interpretability. Each property is rated on a four‐point scale (1 = poor, 2 = fair, 3 = good, or 4 = excellent). The overall score for the assessment of an individual measurement property is obtained by taking the lowest score for any of the items in the property box (i.e. if any item in the property box is scored as "poor" then the overall score for that property is 'poor').

Measures of the effect of the methods

Descriptive statistics were used to report the validation outcome data. Frequencies and percentages were shown for categorical variables.

Dealing with missing data

Where possible, authors were contacted to provide any missing information.

Results

Description of studies

Results of the search

In total, 6036 results were retrieved by the search strategies. After the exclusion of 2615 duplicates, 3421 citations were assessed for eligibility. Of these citations, 3295 were deemed non‐applicable and 126 reports of 30 histologic scoring indices were identified. Nineteen of the histologic scoring indices (described in 44 reports) were excluded due to a lack of validation testing. Eleven of the histologic scoring indices (described in 82 reports), which have been partially validated were included in the systematic review (see Figure 1).


Study flow diagram.

Study flow diagram.

Included studies

The eleven histologic scoring indices (Feagan 2005; Fiel 2003; Geboes 2000; Gomes 1986; Jauregui‐Amezaga 2016; Marchal‐Bressenot 2017; Mosli 2017; Riley 1991; Rubin 2007; Theede 2015; Truelove 1956), that have been partially validated are described in the Characteristics of included studies tables and Table 1.

Open in table viewer
Table 1. Indices that have been fully or partially validated

Reference

Index

1

Feagan 2005

Modified Riley Score

2

Fiel 2003

Harpaz/Mount Sinai Index

3

Geboes 2000

Geboes Score

4

Gomes 1986

Gomes Index

5

Jauregui‐Amezaga 2016

Simplified Geboes Score

6

Marchal‐Bressenot 2017

Nancy Index

7

Mosli 2017

Robarts Histopathology Score

9

Riley 1991

Riley Score

9

Rubin 2007

Chicago/Rubin/Histologic

Inflammation Activity Scale

10

Theede 2015

Modified Harpaz Index

11

Truelove 1956

Truelove and Richards Index

Setting

Validation testing of four of the scoring indices (the Geboes Score, Modified Riley Score, Nancy Index and Robarts Histopatholgy Index) was based on retrospectively collected data. In Mosli 2017, five pathologists assessed 154 biopsy slides, previously collected during an RCT of MLNO2 (Feagan 2005), three times using the Geboes and Modified Riley Scores. Subsequently, four pathologists were asked to read 50 biopsy slides (taken from the same RCT) three times using the Geboes and Modified Riley Scores with standardized scoring conventions applied. During the responsiveness phase of Mosli 2017, a single central reader assessed 154 pairs of slides (each pair consisting of a baseline biopsy and a week 4 or 6, post‐treatment biopsy) using the Geboes Score, Modified Riley Score and newly created Robarts Histopathology Index.

The validation of the Nancy Index employed retrospectively collected cohort data. During the reliability testing phase, three pathologists scored 100 biopsies using the Geboes Score and the newly created Nancy Index (Marchal‐Bressenot 2017). To test for responsiveness, 30 pairs of biopsy slides (that showed a histologic change) were assessed by a single expert pathologist according to the Geboes Score and Nancy Index.

Validation testing of seven of the scoring indices (Geboes Score, Gomes Index, Simplified Geboes Score, Riley Score, Chicago Index, Modified Harpaz Index and Truelove and Richards Index) was done using prospectively collected data. The development and initial reliability testing of the Geboes Score was conducted with prospective RCT data. In this study, 99 biopsies (68 obtained from inflamed mucosa and 31 from endoscopically non‐inflamed mucosa) were assessed on two occasions by three pathologists (Geboes 2000).

The remaining six scoring indices were partially validated using prospectively collected observational data. Criterion validation was performed by a single pathologist for the Gomes Index using biopsies from 28 UC patients undergoing routine colonoscopy. In Jauregui‐Amezaga 2016, the Geboes Score and Simplified Geboes Score were tested for reliability by two trained pathologists using biopsy specimens from 92 patients requiring colonoscopy at a tertiary referral center. The Riley Score was evaluated for reliability by two pathologists using biopsies taken from 82 patients with a confirmed diagnosis of UC who were asymptomatic and in endoscopic remission (Riley 1991). In order to explore criterion validity, the Riley Score and Mayo Endoscopic Subscore were calculated by a single pathologist using 263 biopsies from 131 UC patients. Rubin 2007 performed construct validation by having pathologists (number not reported) assess biopsies of 86 UC patients requiring standard colonoscopy or sigmoidoscopy with the Chicago Index. Clinical and endoscopic disease activity were assessed using the Simple Colitis Clinical Activity Index and the Mayo Endoscopic Subscore, respectively. In Theede 2015, biopsies from 120 UC patients with inactive or active disease requiring sigmoidoscopy were collected, and two pathologists scored specimens according to the Modified Harpaz Index. Construct validity was also assessed using the Mayo Clinic Endoscopic Subscore and the Ulcerative Colitis Endoscopic Index of Severity. Finally, the Truelove and Richards Index was used by two pathologists to assess 91 biopsies from UC patients who had presented at a tertiary referral center requiring sigmoidoscopy (Truelove 1956). Clinical disease activity using the Simple Clinical Colitis Actiivty Index and endoscopic disease activity using the Baron Score were also evaluated.

It is unclear whether the Harpax Index was partially validated using prospective or retrospectively collected data, since this study is available in abstract form only (Fiel 2003).

Excluded studies

Nineteen histologic scoring indices (Baars 2012; D'Argenio 2001; Floren 1987; Friedman 1985; Gramlich 2007; Hanauer 1993; Iacucci 2015; Keren 1984; Korelitz 1976; Matts 1961; Nishiyama 2014; Odze 1993; Powell‐Tuck 1982; Riley 1988; Rutter 2004; Sandborn 1993; Saverymuttu 1986; Watts 1966; Wiernicka 2015) have not undergone any validation procedures, and were therefore excluded (Characteristics of excluded studies and Table 2).

Open in table viewer
Table 2. Indices that have not been fully or partially validated

Index

Reference

1

Baars Index

Baars 2012

2

British Society of Gastroenterology Protocol

Wiernicka 2015

3

D'Argenio/Scheppach Index

D'Argenio 2001

4

ECAP (Extent, Chronicity, Activity Plus additional findings) System

Iacucci 2015

5

Endocytoscopy System (ECS)

Nishiyama 2014

6

Floren Index

Floren 1987

7

Friedman Index

Friedman 1985

8

Gramlich Score

Gramlich 2007

9

Hanauer Index

Hanauer 1993

10

Initial Riley Score

Riley 1988

11

Keren Score

Keren 1984

12

Korelitz Index

Korelitz 1976

13

Matts Score

Matts 1961

14

Odze Index

Odze 1993

15

Powell‐Tuck Score

Powell‐Tuck 1982

16

Rutter Score

Rutter 2004

17

Sandborn Index

Sandborn 1993

18

Saverymuttu Index

Saverymuttu 1986

19

Watts Score

Watts 1966

Risk of bias in included studies

Blinding

In seven of the eleven validated scoring indices (Feagan 2005; Geboes 2000; Gomes 1986; Marchal‐Bressenot 2017; Mosli 2017; Theede 2015; Truelove 1956), the individual(s) who performed the histologic assessment were blinded to other relevant patient data (e.g. clinical and endoscopic data). The other four indices that underwent validation testing were reported in abstract form only, and did not include sufficient information on blinding to allow a judgment (Figure 2).


Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Independent Observation

A total of six of the eleven validated scoring indices indicated that the histologists who performed the assessments did so in an independent fashion (Feagan 2005; Geboes 2000; Marchal‐Bressenot 2017; Mosli 2017; Riley 1991; Truelove 1956). Three of the included scoring indices did not adequately describe whether independent assessment took place (Jauregui‐Amezaga 2016; Rubin 2007; Theede 2015). This risk of bias item was not relevant in the case of Gomes 1986, as only criterion validity (> one reader not required) was assessed for this scoring index (Figure 2).

Effect of methods

Reliability

Eight of the included scoring indices (the Modified Riley Score, Harpaz Index, Geboes Score, Modified Geboes Score, Nancy Index, Robarts Histopathology Index, Riley Score, Modified Harpaz Index) have undergone reliability testing (Feagan 2005; Fiel 2003; Geboes 2000; Jauregui‐Amezaga 2016; Marchal‐Bressenot 2017; Mosli 2017; Riley 1991; Theede 2015).

Intra‐rater reliability (i.e. agreement with self over multiple assessments of the same biopsy slide) was calculated for four of these indices: the Modified Riley Score, the Geboes Score, the Nancy Index and the Robarts Histopathology Index (Feagan 2005; Geboes 2000; Marchal‐Bressenot 2017; Mosli 2017). According to the Landis and Koch criteria, correlation estimates ranged from 'substantial' to 'almost perfect'. Mosli 2017 evaluated intra‐rater reliability of the Modified Riley Score and Geboes Score on two occasions: once before standardized scoring conventions were applied, and once after. Initially, the ICCs were 0.71 (95% CI 0.63 to 0.80) for the Modified Riley Score and 0.82 (95% CI 0.73 to 0.88) for the Geboes Score. After the conventions were applied, the ICC for the Modified Riley Score increased to 0.85 (95% CI 0.77 to 0.91) and 0.88 (95% CI 0.79 to 0.93) for the Geboes Score. Mosli 2017 also evaluated the intra‐rater reliability of the Robarts Histopathology Index, and determined the intra‐rater ICC to be 0.82 (95% CI 0.74 to 0.86). The intra‐rater reliability of the Nancy Index was examined in Marchal‐Bressenot 2017, and the ICC was found to be 0.88 (95% CI 0.82 to 0.92) (Table 3).

Open in table viewer
Table 3. Reliability

Study ID

Index

Inter‐rater kappa

(between raters)

Inter‐rater ICC

(between raters)

Intra‐rater ICC

(within rater)

Feagan 2005

Modified Riley Score

See Mosli 2017

See Mosli 2017

Fiel 2003

Harpaz/Mount Sinai Index

0.90

Geboes 2000

Geboes Score

Three readers: 0.62, 0.70, 0.59; also see

Jauregui‐Amezaga 2016

See Mosli 2017

See Mosli 2017

Jauregui‐Amezaga 2016

Simplified Geboes Score

Reader A vs C: 0.7

Reader B vs C: 0.7

Geboes Score

Reader A vs C: 0.6

Reader B vs C: 0.5

Marchal‐Bressenot 2017

Nancy index

0.86 (95% CI 0.81 to 0.99)

0.88 (95% CI 0.82 to 0.92)

Mosli 2017

Robarts Histopathoogy Index

0.92 (95% CI 0.88 to 0.94)

0.82 (95% CI 0.74 to 0.86)

Geboes Score

0.56 (95% CI 0.39 to 0.67)

0.82 (95% CI 0.73 to 0.88)

Modified Riley Score

0.48 (95% CI 0.35 to 0.66)

0.71 (95% CI 0.63 to 0.80)

Geboes Score**

0.79 (95% CI 0.63 to 0.87)

0.88 (95% CI 0.79 to 0.93)

Modified Riley Score**

0.80 (95% CI 0.69 to 0.87)

0.85 (95% CI 0.77 to 0.91)

Riley 1991

Riley Score

0.94 (95% CI 0.90 to 0.98)

Theede 2015

Modified Harpaz Index

4.35%*

Abbreviations: ICC, intraclass correlation coefficient

*Inter‐rater variation of 4.35% between the two pathologists evaluating the biopsies

**After scoring conventions applied

All eight scoring indices that underwent reliability testing were assessed for inter‐rater reliability (i.e. agreement between different individuals). Estimates of correlation ranged from 'moderate' to 'almost perfect'. The Modified Riley Score was initially found to have an ICC of 0.48 (95% CI 0.35 to 0.66), but after scoring conventions were applied in Mosli 2017, agreement improved to 0.80 (95% CI 0.69 to 0.87). For the Harpaz Index, Fiel 2003 reported a kappa statistic (κ) of 0.90 (95% CI not reported). Geboes 2000 evaluated inter‐rater agreement of the Geboes Index using three raters, and reported κ = 0.59, 0.62 and 0.70 (95% CIs not reported). Likewise, a reliability study conducted by Jauregui‐Amezaga 2016 found the Geboes Score to have ICCs of 0.60 and 0.50 (95% CIs not reported). Prior to introducing scoring conventions, Mosli 2017 determined the inter‐rater ICC to be 0.82 (95% CI 0.73 to 0.88) for the Geboes Score, however, after the conventions were applied, agreement was 0.79 (95% CI 0.63 to 0.87). Inter‐rater reliability, as measured by ICCs, was found to be 0.86 (95% CI 0.81 to 0.99)] for the Nancy Index and 0.92 (95% CI 0.88 to 0.94) for the Robarts Histopathology Index (Marchal‐Bressenot 2017; Mosli 2017). For the Riley Index, the estimate of agreement was 0.94 (95% CI 0.90 to 0.98), as measured by κ (Riley 1991). Inter‐rater reliability of the Modified Harpaz Index was expressed as having varied by 4.35% between the two pathologists who scored the biopsies (Theede 2015).

Validity

Content validity

Three of the scoring indices (the Geboes Index, Nancy Index and Robarts Histopathology Index) have undergone content validation. In Geboes 2000, item selection and index development was based on a review of the existing literature. With respect to the Nancy Index, the inter‐rater and intra‐rater reliability of individual items derived from pre‐existing scores (the Geboes Score, Riley Score, Gramlich Index, Gupta Index, and Global Visual Evaluation) was calculated (Marchal‐Bressenot 2017). Items with high reliability were incorporated into the new index, with novel items identified through literature searching and expert opinion panels also included. For the development of the Robarts Histopathology Index, items from pre‐existing scores (the Geboes Score, Modified Riley Score and Visual Analogue Scale) were included if intra‐rater and inter‐rater reliability was rated as high (Mosli 2017). A consensus process took place after item selection in order to standardize definitions and scoring rules for items with relatively higher rates of disagreement (Table 4).

Open in table viewer
Table 4. Content Validity

Study ID

Index

Methods

Geboes 2000

Geboes Score

Items were included in the new index based a literature review

Marchal‐Bressenot 2017

Nancy Index

Intra‐rater and inter‐rater reliability for index items of pre‐existing histologic scores (the Geboes Score, The Riley Score, the Gramlich Index, the Gupta Index and the Global Visual Evaluation) were measured using ICCs. Items with high reliability were used in the Nancy Index

Other items were included in the new index based on expert opinion and literature review

Mosli 2017

Robarts Histopathology Index

Intra‐rater and inter‐rater reliability for index items of pre‐existing histologic scores (the Geboes Score, Modified Riley Score and a Visual Analogue Scale) were measured using intraclass correlation coefficients. Items with high reliability were used in the Robarts Histopathology Index.

Consensus process was conducted after the item selection phase to standardize definitions/scoring rules for high disagreement items

Criterion validity

Three of the included scoring indices (the Gomes Index, The Harpaz Index, and Riley Score) have undergone criterion validation (Gomes 1986; Riley 1991; Theede 2015) (Table 5). The correlation estimates for this operating property range from 'slight' to 'moderate'.

Open in table viewer
Table 5. Criterion Validity

Study ID

Index

Outcome

Correlation coefficient (r)

Positive predictive value

Negative predictive value

Gomes 1986

Gomes Index

Albumen concentration

0.028

C‐reactive protein

0.01

Erythrocyte sedimentation rate

0.13

Platelets

0.13

White blood cell count

0.131

Riley 1991

Riley Score

C‐reactive protein

0.307 (p <0.0001)

Fecal lactoferrin

0.408 (p <0.0001)

Fecal calprotectin

0.458 (p <0.0001)

PMN‐elastase

0.447 (p <0.0001)

White blood cell count

0.203 (p <0.02)

Theede 2015

Modified Harpaz Index

Fecal calprotectin

A cutoff level of 171 mg/kg predicted histological healing, with positive predictive value of 0.75 and negative predictive value of 0.90.

Albumen concentration

The correlation coefficient (r) for the relationship between the Gomes Index and albumen concentration was 0.03 (no P value reported).

C‐reactive protein (CRP)

The estimate of correlation for the Gomes Index and CRP was r = 0.01 (no P value reported), while the correlation between the Riley Score and CRP was r = 0.31 (P < 0.0001).

Erythrocyte sedimentation rate (ESR)

For the Gomes Index, the estimate of correlation with ESR was r = 0.13 (no P value reported).

Fecal calprotectin

The correlation between the Riley Score and fecal calprotectin was r = 0.46 (P < 0.0001).

Higher concentrations of fecal calprotectin were observed in patients with mild histological activity compared to those with histological remission (236.5 mg/kg versus 56 mg/kg, P = 0.02). A cutoff level of 171 mg/kg predicted histological healing, with positive predictive value of 0.75 and negative predictive value of 0.90.

Fecal lactoferrin

The estimate of correlation between the Riley Score and fecal lactoferrin was r = 0.41 (P < 0.0001).

Platelets

The estimate of correlation between platelets and the Gomes Index was r = 0.13 (no P value reported).

PMN‐elastase

PMN‐elastase was estimated to have a correlation of r = 0.45 (P < 0.0001) with the Riley Score.

White blood cell (WBC) count

The estimated correlation between the Gomes Index and WBC count and the Riley Score and WBC count was r = 0.13 (no P value reported) and r = 0.20 (P < 0.02), respectively.

Construct validity

A total of six scoring indices have undergone construct validation (Geboes 2000; Marchal‐Bressenot 2017; Mosli 2017; Rubin 2007; Theede 2015; Truelove 1956). The correlation estimates between the histologic scoring indices and other measures of diease activity (i.e. clinical and endoscopic disease activity indices) ranged from 'moderate' to 'almost perfect'. Geboes 2000 reported a correlation of r = 0.48 (P < 0.0001) between the Geboes Score and the Mayo Clinic Endoscopic Subscore. The Nancy Index was compared to two indices: the Global Visual Evaluation and the Geboes Score. Correlations of r = 0.88 (95% CI 0.82 to 0.91), 0.82 (95% CI 0.74 to 0.87) and 0.87 (95% CI 0.82 to 0.91) were reported for the former and r = 0.90 (95% CI 0.85 to 0.93), 0.84 (95% CI 0.77 to 0.89) and 0.94 (95% CI 0.91 to 0.96) for the latter (Marchal‐Bressenot 2017). The Robarts Histopathology Index was compared to the Visual Analogue Scale, Modified Baron Score, Ulcerative Colitis Clinical Score and Inflammatory Bowel Disease Questionnare, with respective correlations of r = 0.91, 0.61, 0.62 and ‐0.48 observed (Mosli 2017). The Chicago Index was compared to a clinical index (the Ulcerative Colitis Endoscopic Index of Severity) and an endoscopic index (the Mayo Endoscopic Subscore). Correlations of r = 0.51 (P < 0.0001) and r = 0.60 (P < 0.0001) were calculated for the former and latter (Rubin 2007). Two endoscopic scores were compared to the Modified Harpaz Index: the Ulcerative Colitis Endoscopic Index of Severity and the Mayo Clinic Endoscopic Subscore. Estimates of correlation (as measured by Kendall's tau (τ)) were τ = 0.63 (P < 0.0001) and τ = 0.67 (P < 0.0001), respectively. Finally, the estimate of correlation between the Truelove and Richards Index and the Simple Colitis Clinical Activity Index was κ = 0.47 and the Truelove and Richards Index and Baron Score was κ = 0.58 (Truelove 1956) (Table 6).

Open in table viewer
Table 6. Construct Validity

Study ID

Index

Comparison

Correlation coefficient (r)

Kappa

Geboes 2000

Geboes Score

Mayo Clinic Endoscopic Subscore

0.482* (p < 0.0001)

Marchal‐Bressenot 2017

Nancy Index

Global Visual Evaluation

Reader 1: 0.876 (95% CI 0.819‐0.914)

Reader 2: 0.819 (95% 0.739 to 0.873)

Reader 3: 0.874 (95% CI 0.816 to 0.913)

Geboes Score

Reader 1: 0.899 (95% CI 0.853 to 0.931)

Reader 2: 0.843 (95% CI 0.773 to 0.891)

Reader 3: 0.939 (95% CI 0.909 to 0.958)

Mosli 2017

Robarts Histopathology Index

Visual Analgoue Scale

0.91 (predicted 0.90)

Modified Baron Score

0.61 (predicted 0.60)

Ulcerative Colitis Clinical Score

0.62 (predicted 0.40)

Inflammatory Bowel Disease Questionnaire

‐0.48 (predicted ‐0.30)

Rubin 2007

Rubin/Chicago/Histological Activity Index

Simple Colitis Clinical Activity Index

0.508 (p < 0.0001)

Mayo Clinic Endoscopic Subscore

0.597 (p < 0.0001)

Theede 2015

Modified Harpaz Index

Ulcerative Colitis Endoscopic Index of Severity

0.63* (p < 0.0001)

Mayo Clinic Endoscopic Subscore

0.67* (p < 0.0001)

Truelove 1956

Truelove and Richards Index

Simple Colitis Clinical Activity Index

0.47

Baron Score

0.58

*As measured by Kendall's tau

Responsiveness

The Nancy Index and the Robarts Histopathology Index are the only two histologic indices that have been subject to responsiveness testing (Marchal‐Bressenot 2017; Mosli 2017). In Marchal‐Bressenot 2017, responsiveness of the Nancy Index was assessed by retrospectively evaluating paired biopsies taken from 30 patients in which consecutive endoscopies were performed and histologic disease activity had changed (as defined by > 1 point on the Geboes conversion 9 scale). The median time between the two biopsies was 451 days, and one central reader scored all 60 biopsies. The mean change and standard deviation (SD) between the two biopsy scores was calculated using the Nancy Index, in addition to the Geboes Score and Global Visual Evaluation. The relationship between the change score of the Nancy Index and the Geboes Score and the Nancy Index and the Global Visual Evaluation was assessed using Pearson's correlation co‐efficient and the 95% CI. The mean change in Nancy Index, Geboes Score and Global Visual Evaluation was ‐2.53 (SD 1.10), ‐15.86 (SD 7.26) and ‐4.83 (SD 2.13), respectively. The estimate of correlation between the Nancy Index and the Geboes Score was r = 0.91 (95% CI 0.81 to 0.96) and the Nancy Index and Global Visual Evaluation was r = 0.89 (95% CI 0.77 to 0.94).

In Mosli 2017, responsiveness of the Robarts Histopathology Index was evaluated using data from an RCT of a treatment of known efficacy (MLN02). One central reader evaluated 154 pairs of biopsies, taken before treatment and post‐treatment (week 4 or 6). Change score correlations and the standardized effect size and Guyatt's responsiveness statistic (using: (1) patients assigned to MLN02 considered 'changed' and 'unchanged'; (2) an absolute change in endoscopic disease activity, as assessed by > 1 point on the Modified Baron Scale; (3) an absolute change > 2 points in the Mayo Clinic Score rectal bleeding score combined with the stool frequency score) were calculated for the Robarts Histopathology Index, the Geboes Score and the Modified Riley Score. The estimate of correlation between the change in Robarts Histopathology Index and change in Geboes Score, and the change in Robarts Histopathology index and change in Modified Riley Score, was estimated to be 0.75 (95% CI 0.67 to 0.82) and 0.84 (95% CI 0.79 to 0.88), respectively. With respect to (1) patients assigned to MLN02 considered 'changed' and 'unchanged'; (2) an absolute change in endoscopic disease activity, as assessed by > 1 point on the Modified Baron Scale; (3) an absolute change > 2 points in the Mayo Clinic Score rectal bleeding score combined with the stool frequency score, the standardized effect size was 1.05 (95% CI 0.79 to 1.3), 0.81 (95% CI 0.58 to 1.05) and 1.05 (95% CI 0.78 to 1.31), respectively. Guyatt's responsiveness statistic for (1) patients assigned to MLN02 considered 'changed' and 'unchanged'; (2) an absolute change in endoscopic disease activity, as assessed by > 1 point on the Modified Baron Scale; (3) an absolute change > 2 points in the Mayo Clinic Score rectal bleeding score combined with the stool frequency score was 0.88 (95% CI 0.64 to 1.12), 0.73 (95% CI 0.50 to 0.96) and 0.84 (95% CI 0.59 to 1.09), respectively (Table 7).

Open in table viewer
Table 7. Responsiveness

Study ID

Index

Treatment

Responsiveness Measure

Correlation

Marchal‐Bressenot 2017

Nancy Index

No treatment of known efficacy used

Biopsy specimen pairs from 30 UC patients were retrospectively reviewed

Median time between the two biopsies was 451 days (range: 41‐1169 days)

a) Nancy Index

Mean change (standard deviation (SD)): −2.53 (1.10)

b) Global Visual Evaluation

Mean change (SD): −4.83 (2.13)

c) Geboes Score

Mean change (SD): −15.86 (7.26)

a) Nancy Index and Global Visual Evaluation: 0.886 (95% CI 0.766 to
0.943)

b) Nancy Index and Geboes Score: 0.910
(95% CI 0.813 to 0.955)

Mosli 2017

Robarts Histopathology Index

MLN02

(baseline and week 6 biopsies for 154/181 patients)

Biopsy specimen pairs from 154 UC patients (baseline and 4‐6 weeks post‐treatment) were retrospectively reviewed

Correlation estimates for change in Robarts Histopathology Index, Geboes Score and Modified Riley Score

a) Change in Robarts Histopathology Index and Geboes Score: 0.75 (95% CI 0.67 to 0.82)

b) Change in Robarts Histopathology Index and Modified Riley Score: 0.84 (95% CI 0.79 to 0.88)

Standardised effect size (SES) calculated using:

a) Treatment allocation with patients assigned to MLN02 considered 'changed' and those assigned to placebo considered 'unchanged'

b) Absolute change in Modified Baron Score of greater than 1 point

c) Absoulte change in sum of Mayo Clinic Score rectal bleeding and stool frequency subscores of at least 2 points

a) 1.05 (95% CI 0.79 to 1.3)*

b) 0.81 (95% CI 0.58 to 1.05)*

c) 1.05 (95% CI 0.78 to 1.31)*

Guyatt’s responsiveness statistic (GRS; Guyatt 1987) calculated using:

a) treatment allocation with patients assigned to MLN02 considered 'changed' and those assigned to placebo considered 'unchanged'

b) Absolute change in Modified Baron Score of greater than 1 point

c) Absoulte change in sum of Mayo Clinic Score rectal bleeding and stool frequency subscores of at least 2 points

a) 0.88 (95% CI 0.64 to 1.12)*

b) 0.73 (95% CI 0.50 to 0.96)*

c) 0.84 (95% CI 0.59 to 1.09)*

*Effect sizes of 0.2, 0.5 and 0.8 were considered to represent low, moderate and large degrees of responsiveness, respectively (Cohen 1988).

Feasibility

None of the included studies assessed feasibility.

Methodological quality

The methodological quality of the included studies was assessed using the COSMIN tool (Table 8; Table 9). Of the eight studies that evaluated reliability, five studies (Feagan 2005; Geboes 2000; Marchal‐Bressenot 2017; Mosli 2017; Theede 2015) were rated as 'excellent', one (Riley 1991) was rated as 'good' and one (Jauregui‐Amezaga 2016) was rated as 'fair'.

Open in table viewer
Table 8. Summary of Operating Properties of Histologic Scoring Indices for Ulcerative Colitis

Study ID

Scoring index

Validity

Reliability

Responsiveness

Feasibility

Content validity

Criterion validity

Construct validity

Intra‐rater

Inter‐rater

Test‐retest

Internal consistency

Feagan 2005

Modified Riley Score

?

?

?

?

+

?

?

?

?

Fiel 2003

Harpaz/Mount Sinai Index

?

?

?

?

+

?

?

?

?

Geboes 2000

Geboes Score

+

?

+

?

+

?

?

?

?

Gomes 1986

Gomes Index

?

+

?

?

?

?

?

?

?

Jauregui‐Amezaga 2016

Simplified Geboes Score

?

?

?

?

+

?

?

?

?

Marchal‐Bressenot 2017

Nancy Index

+

?

+

+

+

?

?

+

?

Mosli 2017

Robarts Histopathology Score

+

?

+

+

+

?

?

+

?

Riley 1991

Riley Score

?

+

?

?

+

?

?

?

?

Rubin 2007

Chicago/Rubin/Histologic

Inflammation Activity Scale

?

?

+

?

?

?

?

?

?

Theede 2015

Modified Harpaz Index

?

?

+

?

+

?

?

?

?

Truelove 1956

Truelove and Richards Index

?

?

+

?

?

?

?

?

?

+ positive rating

? no information or indeterminate rating

‐ Negative rating

Open in table viewer
Table 9. The Methodological Quality of Histologic Index Measurement Properties (COSMIN Checklist)

Study ID

Scoring index

Internal

consistency

Reliability

Measurement

error

Content

validity

Structural validity (CSV)

Hypothesis

testing (CSV)

Cross

‐cultural

validity (CSV)

Criterion

validity

Responsiveness

Interpretability

1

Feagan 2005

Modified Riley Score

?

excellent

?

?

?

?

?

?

?

?

2

Fiel 2003

Harpaz/Mount Sinai Index

?

fair

?

?

?

?

?

?

?

?

3

Geboes 2000

Geboes Score

?

excellent

?

excellent

?

excellent

?

?

?

?

4

Gomes 1986

Gomes Index

?

?

?

?

?

?

?

excellent

?

?

5

Jauregui‐Amezaga 2016

Simplified Geboes Score

?

fair

?

?

?

?

?

?

?

?

6

Marchal‐Bressenot 2017

Nancy Index

?

excellent

?

excellent

?

excellent

?

?

excellent

?

7

Mosli 2017

Robarts Histopathology Score

?

excellent

?

excellent

?

excellent

?

?

excellent

?

8

Riley 1991

Riley Score

?

good

?

?

?

?

?

good

?

?

9

Rubin 2007

Chicago/Rubin/Histologic

Inflammation Activity Scale

?

?

?

?

?

good

?

?

?

?

10

Theede 2015

Modified Harpaz Index

?

excellent

?

?

?

excellent

?

?

?

?

11

Truelove 1956

Truelove and Richards Index

?

?

?

?

?

excellent

?

?

?

?

CSV = construct validity

? no information or indeterminate rating

With regard to content validity, all three of the studies (Geboes 2000; Marchal‐Bressenot 2017; Mosli 2017) that assessed this property were rated as 'excellent'. The six studies that assessed construct validity focused on hypothesis testing. Of these studies, five were rated as 'excellent' (Geboes 2000; Marchal‐Bressenot 2017; Mosli 2017; Theede 2015; Truelove 1956), and one was rated as 'good' (Rubin 2007). Of the two studies that assessed criterion validity, one was rated 'excellent' (Gomes 1986), and one was rated as 'good' (Riley 1991). Two studies explored responsiveness: Marchal‐Bressenot 2017 (the Nancy Index) and Mosli 2017 (the Robarts Histopathology Index). Both were rated as 'excellent'. None of the eleven studies assessed internal consistency, measurement error, structural validity, cross‐cultural validity and interpretability and therefore these measurement properties could not be assessed using COSMIN.

Discussion

Summary of main results

A total of 126 reports describing 30 scoring indices were identified by the search strategy and screening process. Eleven of these scoring indices have been partially validated and 19 have not undergone any form of validation testing. Correlation estimates of intra‐rater reliability for eight of the scoring indices range from 'substantial' to 'almost perfect'. Inter‐rater reliability has been assessed for all 11 of the partially validated indices, with correlation estimates ranging from 'moderate' to 'almost perfect'. Three of the included scoring indices, the Geboes Score, Nancy Index and Robarts Histopathology Index, used literature review and expert opinion for index development and therefore integrated content validation. Two of the included indices, the Gomes Index and the Riley Index, assessed criterion validity by calculating correlation estimates between the index in question and various biomarkers (CRP, ESR, WBC count, platelets, albumen concentration, fecal lactoferrin, fecal calprotectin and PMN‐elastase). A total of six of the included scoring indices explored construct validity by comparing the index in question to other measures of disease activity (histologic, clinical or endoscopic). Two of the included studies, the Nancy Index and the Robarts Histopathology Index, measured responsiveness. Responsiveness testing of the Nancy Index consisted of an expert pathologist scoring two sets of 30 biopsy specimens that had undergone histologic change (as defined by the Geboes Score) and calculating change scores and correlation estimates for the Nancy Index, Global Visual Evaluation and Geboes Score. The Robarts Histopathology Index was evaluated for responsiveness by having an expert pathologist score two sets of 154 biopsy specimens (baseline and post‐treatment), from patients who had received MLN02 as part of an RCT. The pathologist also scored the biopsy specimens using the Geboes Score and Modified Riley Score. Correlation estimates between the change scores were calculated, in addition to the standard effect size and Guyatt's responsiveness statistic for (1) patients assigned to MLN02 considered 'changed' and 'unchanged'; (2) an absolute change in endoscopic disease activity, as assessed by > 1 point on the Modified Baron Scale; (3) an absolute change > 2 points in the Mayo Clinic Score rectal bleeding score combined with the stool frequency score.

Overall completeness and applicability of evidence

The Nancy Index and the Robarts Histopathology Index have undergone the most validation in that four (reliability, content validity, construct validity (hypothesis testing) and criterion validity) operating properties have been tested. However, none of the currently available histologic scoring indices have been fully validated.

Quality of the evidence

The COSMIN tool was used to assess the methodological quality of the included studies (Table 8). The eleven included studies received scores ranging from 'fair' to 'excellent' with respect to the 10 operating properties assessed by this tool.

Potential biases in the review process

Sample quality is an important potential confounder that is infrequently mentioned or assessed in validation studies of histologic scoring indices. In Geboes 2000, the researchers found that out of the 99 biopsies studied, 31, 36 and 22 of the samples were rated as good, substandard and poor quality, respectively. As a result, 13 of the 22 poor quality samples were omitted from this study. While it is possible to remove poor quality specimens from a validation study, it was not always clear whether this quality check and exclusion of poor quality specimens was performed in the studies included in this systematic review.

Further research is needed to determine the best way to collect and process biopsy samples in order to facilitate both validation studies and the use of histologic disease activity indices in clinical practice and clinical trials.

Agreements and disagreements with other studies or reviews

A systematic review conducted by Byrant et al. identified four UC histologic scoring indices that have been partially validated: the Truelove and Richards Index, the Riley Score, the Geboes Score and the Harpaz Index (Bryant 2014). The current systematic review identified an additional seven partially validated histologic indices: the Modified Riley Score, the Gomes Index, the Simplified Geboes Score, the Nancy Index, the Robarts Histopathology Index, the Chicago Index and the Modified Harpaz Index.

Study flow diagram.
Figuras y tablas -
Figure 1

Study flow diagram.

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.
Figuras y tablas -
Figure 2

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Table 1. Indices that have been fully or partially validated

Reference

Index

1

Feagan 2005

Modified Riley Score

2

Fiel 2003

Harpaz/Mount Sinai Index

3

Geboes 2000

Geboes Score

4

Gomes 1986

Gomes Index

5

Jauregui‐Amezaga 2016

Simplified Geboes Score

6

Marchal‐Bressenot 2017

Nancy Index

7

Mosli 2017

Robarts Histopathology Score

9

Riley 1991

Riley Score

9

Rubin 2007

Chicago/Rubin/Histologic

Inflammation Activity Scale

10

Theede 2015

Modified Harpaz Index

11

Truelove 1956

Truelove and Richards Index

Figuras y tablas -
Table 1. Indices that have been fully or partially validated
Table 2. Indices that have not been fully or partially validated

Index

Reference

1

Baars Index

Baars 2012

2

British Society of Gastroenterology Protocol

Wiernicka 2015

3

D'Argenio/Scheppach Index

D'Argenio 2001

4

ECAP (Extent, Chronicity, Activity Plus additional findings) System

Iacucci 2015

5

Endocytoscopy System (ECS)

Nishiyama 2014

6

Floren Index

Floren 1987

7

Friedman Index

Friedman 1985

8

Gramlich Score

Gramlich 2007

9

Hanauer Index

Hanauer 1993

10

Initial Riley Score

Riley 1988

11

Keren Score

Keren 1984

12

Korelitz Index

Korelitz 1976

13

Matts Score

Matts 1961

14

Odze Index

Odze 1993

15

Powell‐Tuck Score

Powell‐Tuck 1982

16

Rutter Score

Rutter 2004

17

Sandborn Index

Sandborn 1993

18

Saverymuttu Index

Saverymuttu 1986

19

Watts Score

Watts 1966

Figuras y tablas -
Table 2. Indices that have not been fully or partially validated
Table 3. Reliability

Study ID

Index

Inter‐rater kappa

(between raters)

Inter‐rater ICC

(between raters)

Intra‐rater ICC

(within rater)

Feagan 2005

Modified Riley Score

See Mosli 2017

See Mosli 2017

Fiel 2003

Harpaz/Mount Sinai Index

0.90

Geboes 2000

Geboes Score

Three readers: 0.62, 0.70, 0.59; also see

Jauregui‐Amezaga 2016

See Mosli 2017

See Mosli 2017

Jauregui‐Amezaga 2016

Simplified Geboes Score

Reader A vs C: 0.7

Reader B vs C: 0.7

Geboes Score

Reader A vs C: 0.6

Reader B vs C: 0.5

Marchal‐Bressenot 2017

Nancy index

0.86 (95% CI 0.81 to 0.99)

0.88 (95% CI 0.82 to 0.92)

Mosli 2017

Robarts Histopathoogy Index

0.92 (95% CI 0.88 to 0.94)

0.82 (95% CI 0.74 to 0.86)

Geboes Score

0.56 (95% CI 0.39 to 0.67)

0.82 (95% CI 0.73 to 0.88)

Modified Riley Score

0.48 (95% CI 0.35 to 0.66)

0.71 (95% CI 0.63 to 0.80)

Geboes Score**

0.79 (95% CI 0.63 to 0.87)

0.88 (95% CI 0.79 to 0.93)

Modified Riley Score**

0.80 (95% CI 0.69 to 0.87)

0.85 (95% CI 0.77 to 0.91)

Riley 1991

Riley Score

0.94 (95% CI 0.90 to 0.98)

Theede 2015

Modified Harpaz Index

4.35%*

Abbreviations: ICC, intraclass correlation coefficient

*Inter‐rater variation of 4.35% between the two pathologists evaluating the biopsies

**After scoring conventions applied

Figuras y tablas -
Table 3. Reliability
Table 4. Content Validity

Study ID

Index

Methods

Geboes 2000

Geboes Score

Items were included in the new index based a literature review

Marchal‐Bressenot 2017

Nancy Index

Intra‐rater and inter‐rater reliability for index items of pre‐existing histologic scores (the Geboes Score, The Riley Score, the Gramlich Index, the Gupta Index and the Global Visual Evaluation) were measured using ICCs. Items with high reliability were used in the Nancy Index

Other items were included in the new index based on expert opinion and literature review

Mosli 2017

Robarts Histopathology Index

Intra‐rater and inter‐rater reliability for index items of pre‐existing histologic scores (the Geboes Score, Modified Riley Score and a Visual Analogue Scale) were measured using intraclass correlation coefficients. Items with high reliability were used in the Robarts Histopathology Index.

Consensus process was conducted after the item selection phase to standardize definitions/scoring rules for high disagreement items

Figuras y tablas -
Table 4. Content Validity
Table 5. Criterion Validity

Study ID

Index

Outcome

Correlation coefficient (r)

Positive predictive value

Negative predictive value

Gomes 1986

Gomes Index

Albumen concentration

0.028

C‐reactive protein

0.01

Erythrocyte sedimentation rate

0.13

Platelets

0.13

White blood cell count

0.131

Riley 1991

Riley Score

C‐reactive protein

0.307 (p <0.0001)

Fecal lactoferrin

0.408 (p <0.0001)

Fecal calprotectin

0.458 (p <0.0001)

PMN‐elastase

0.447 (p <0.0001)

White blood cell count

0.203 (p <0.02)

Theede 2015

Modified Harpaz Index

Fecal calprotectin

A cutoff level of 171 mg/kg predicted histological healing, with positive predictive value of 0.75 and negative predictive value of 0.90.

Figuras y tablas -
Table 5. Criterion Validity
Table 6. Construct Validity

Study ID

Index

Comparison

Correlation coefficient (r)

Kappa

Geboes 2000

Geboes Score

Mayo Clinic Endoscopic Subscore

0.482* (p < 0.0001)

Marchal‐Bressenot 2017

Nancy Index

Global Visual Evaluation

Reader 1: 0.876 (95% CI 0.819‐0.914)

Reader 2: 0.819 (95% 0.739 to 0.873)

Reader 3: 0.874 (95% CI 0.816 to 0.913)

Geboes Score

Reader 1: 0.899 (95% CI 0.853 to 0.931)

Reader 2: 0.843 (95% CI 0.773 to 0.891)

Reader 3: 0.939 (95% CI 0.909 to 0.958)

Mosli 2017

Robarts Histopathology Index

Visual Analgoue Scale

0.91 (predicted 0.90)

Modified Baron Score

0.61 (predicted 0.60)

Ulcerative Colitis Clinical Score

0.62 (predicted 0.40)

Inflammatory Bowel Disease Questionnaire

‐0.48 (predicted ‐0.30)

Rubin 2007

Rubin/Chicago/Histological Activity Index

Simple Colitis Clinical Activity Index

0.508 (p < 0.0001)

Mayo Clinic Endoscopic Subscore

0.597 (p < 0.0001)

Theede 2015

Modified Harpaz Index

Ulcerative Colitis Endoscopic Index of Severity

0.63* (p < 0.0001)

Mayo Clinic Endoscopic Subscore

0.67* (p < 0.0001)

Truelove 1956

Truelove and Richards Index

Simple Colitis Clinical Activity Index

0.47

Baron Score

0.58

*As measured by Kendall's tau

Figuras y tablas -
Table 6. Construct Validity
Table 7. Responsiveness

Study ID

Index

Treatment

Responsiveness Measure

Correlation

Marchal‐Bressenot 2017

Nancy Index

No treatment of known efficacy used

Biopsy specimen pairs from 30 UC patients were retrospectively reviewed

Median time between the two biopsies was 451 days (range: 41‐1169 days)

a) Nancy Index

Mean change (standard deviation (SD)): −2.53 (1.10)

b) Global Visual Evaluation

Mean change (SD): −4.83 (2.13)

c) Geboes Score

Mean change (SD): −15.86 (7.26)

a) Nancy Index and Global Visual Evaluation: 0.886 (95% CI 0.766 to
0.943)

b) Nancy Index and Geboes Score: 0.910
(95% CI 0.813 to 0.955)

Mosli 2017

Robarts Histopathology Index

MLN02

(baseline and week 6 biopsies for 154/181 patients)

Biopsy specimen pairs from 154 UC patients (baseline and 4‐6 weeks post‐treatment) were retrospectively reviewed

Correlation estimates for change in Robarts Histopathology Index, Geboes Score and Modified Riley Score

a) Change in Robarts Histopathology Index and Geboes Score: 0.75 (95% CI 0.67 to 0.82)

b) Change in Robarts Histopathology Index and Modified Riley Score: 0.84 (95% CI 0.79 to 0.88)

Standardised effect size (SES) calculated using:

a) Treatment allocation with patients assigned to MLN02 considered 'changed' and those assigned to placebo considered 'unchanged'

b) Absolute change in Modified Baron Score of greater than 1 point

c) Absoulte change in sum of Mayo Clinic Score rectal bleeding and stool frequency subscores of at least 2 points

a) 1.05 (95% CI 0.79 to 1.3)*

b) 0.81 (95% CI 0.58 to 1.05)*

c) 1.05 (95% CI 0.78 to 1.31)*

Guyatt’s responsiveness statistic (GRS; Guyatt 1987) calculated using:

a) treatment allocation with patients assigned to MLN02 considered 'changed' and those assigned to placebo considered 'unchanged'

b) Absolute change in Modified Baron Score of greater than 1 point

c) Absoulte change in sum of Mayo Clinic Score rectal bleeding and stool frequency subscores of at least 2 points

a) 0.88 (95% CI 0.64 to 1.12)*

b) 0.73 (95% CI 0.50 to 0.96)*

c) 0.84 (95% CI 0.59 to 1.09)*

*Effect sizes of 0.2, 0.5 and 0.8 were considered to represent low, moderate and large degrees of responsiveness, respectively (Cohen 1988).

Figuras y tablas -
Table 7. Responsiveness
Table 8. Summary of Operating Properties of Histologic Scoring Indices for Ulcerative Colitis

Study ID

Scoring index

Validity

Reliability

Responsiveness

Feasibility

Content validity

Criterion validity

Construct validity

Intra‐rater

Inter‐rater

Test‐retest

Internal consistency

Feagan 2005

Modified Riley Score

?

?

?

?

+

?

?

?

?

Fiel 2003

Harpaz/Mount Sinai Index

?

?

?

?

+

?

?

?

?

Geboes 2000

Geboes Score

+

?

+

?

+

?

?

?

?

Gomes 1986

Gomes Index

?

+

?

?

?

?

?

?

?

Jauregui‐Amezaga 2016

Simplified Geboes Score

?

?

?

?

+

?

?

?

?

Marchal‐Bressenot 2017

Nancy Index

+

?

+

+

+

?

?

+

?

Mosli 2017

Robarts Histopathology Score

+

?

+

+

+

?

?

+

?

Riley 1991

Riley Score

?

+

?

?

+

?

?

?

?

Rubin 2007

Chicago/Rubin/Histologic

Inflammation Activity Scale

?

?

+

?

?

?

?

?

?

Theede 2015

Modified Harpaz Index

?

?

+

?

+

?

?

?

?

Truelove 1956

Truelove and Richards Index

?

?

+

?

?

?

?

?

?

+ positive rating

? no information or indeterminate rating

‐ Negative rating

Figuras y tablas -
Table 8. Summary of Operating Properties of Histologic Scoring Indices for Ulcerative Colitis
Table 9. The Methodological Quality of Histologic Index Measurement Properties (COSMIN Checklist)

Study ID

Scoring index

Internal

consistency

Reliability

Measurement

error

Content

validity

Structural validity (CSV)

Hypothesis

testing (CSV)

Cross

‐cultural

validity (CSV)

Criterion

validity

Responsiveness

Interpretability

1

Feagan 2005

Modified Riley Score

?

excellent

?

?

?

?

?

?

?

?

2

Fiel 2003

Harpaz/Mount Sinai Index

?

fair

?

?

?

?

?

?

?

?

3

Geboes 2000

Geboes Score

?

excellent

?

excellent

?

excellent

?

?

?

?

4

Gomes 1986

Gomes Index

?

?

?

?

?

?

?

excellent

?

?

5

Jauregui‐Amezaga 2016

Simplified Geboes Score

?

fair

?

?

?

?

?

?

?

?

6

Marchal‐Bressenot 2017

Nancy Index

?

excellent

?

excellent

?

excellent

?

?

excellent

?

7

Mosli 2017

Robarts Histopathology Score

?

excellent

?

excellent

?

excellent

?

?

excellent

?

8

Riley 1991

Riley Score

?

good

?

?

?

?

?

good

?

?

9

Rubin 2007

Chicago/Rubin/Histologic

Inflammation Activity Scale

?

?

?

?

?

good

?

?

?

?

10

Theede 2015

Modified Harpaz Index

?

excellent

?

?

?

excellent

?

?

?

?

11

Truelove 1956

Truelove and Richards Index

?

?

?

?

?

excellent

?

?

?

?

CSV = construct validity

? no information or indeterminate rating

Figuras y tablas -
Table 9. The Methodological Quality of Histologic Index Measurement Properties (COSMIN Checklist)