Scolaris Content Display Scolaris Content Display

Ecografía abdominal y alfafetoproteína para el diagnóstico del carcinoma hepatocelular en adultos con hepatopatía crónica

Contraer todo Desplegar todo

Antecedentes

El carcinoma hepatocelular (CHC) ocurre sobre todo en personas con hepatopatía crónica y ocupa el sexto lugar en cuanto a casos de cáncer a nivel mundial y el cuarto en cuanto a muertes por cáncer en los hombres. A pesar de que la ecografía abdominal se utiliza como prueba inicial para excluir la presencia de lesiones hepáticas focales y de que la medición de la alfafetoproteína (AFP) sérica puede hacer sospechar la aparición de un CHC, es necesario realizar más pruebas para confirmar el diagnóstico, así como el estadiaje del CHC. Las guías actuales recomiendan un programa de monitorización con ecografía, con o sin AFP, para detectar el CHC en poblaciones de alto riesgo, a pesar de la falta de efectos beneficiosos claros en la supervivencia general. La evaluación de la exactitud diagnóstica de la ecografía y la AFP puede aclarar si la falta de efectos beneficiosos de los programas de monitorización podría estar relacionada con el infradiagnóstico. Por lo tanto, es necesario evaluar la exactitud de estas dos pruebas para diagnosticar el CHC en personas con hepatopatía crónica, no incluidas en los programas de monitorización.

Objetivos

Principal: la exactitud diagnóstica de la ecografía y la AFP, solas o combinadas, para el diagnóstico del CHC de cualquier tamaño y en cualquier estadio en adultos con hepatopatía crónica, ya sea en un programa de monitorización o en un ámbito clínico.

Secundario: evaluar la exactitud diagnóstica de la ecografía abdominal y la AFP, solas o combinadas, para el diagnóstico del CHC resecable; comparar la exactitud diagnóstica de las pruebas individuales versus la combinación de ambas pruebas; investigar las fuentes de heterogeneidad en los resultados.

Métodos de búsqueda

Se realizaron búsquedas en el Registro de ensayos controlados del Grupo Cochrane Hepatobiliar (Cochrane Hepato‐Biliary Group), en el Registro de estudios de exactitud de pruebas diagnósticas del Grupo Cochrane Hepatobiliar (Cochrane Hepato‐Biliary Group Diagnostic‐Test‐Accuracy Studies Register), en la Biblioteca Cochrane, en MEDLINE, en Embase, en LILACS y en Science Citation Index Expanded, hasta el 5 de junio de 2020. No hubo restricciones de idioma ni de tipo de documento.

Criterios de selección

Estudios que evalúen la exactitud diagnóstica de la ecografía y la AFP, de forma independiente o combinada, para el diagnóstico del CHC en adultos con hepatopatía crónica, con diseños transversales y de casos y controles, utilizando una de las pruebas de referencia aceptables, como la patología del hígado explantado, la histología de la lesión hepática focal resecada o biopsiada, o las características típicas en la tomografía computarizada, o el diagnóstico por resonancia magnética, todos con un seguimiento de seis meses.

Obtención y análisis de los datos

Los autores, de forma independiente, seleccionaron los estudios, extrajeron los datos y evaluaron el riesgo de sesgo y los problemas de aplicabilidad mediante la lista de verificación QUADAS‐2. Se presentaron los resultados de la sensibilidad y la especificidad con el uso de diagramas de bosque emparejados, y se tabularon los resultados. Cuando fue apropiado se utilizó un modelo jerárquico de metanálisis. La incertidumbre con respecto a las estimaciones de la exactitud se presentó mediante intervalos de confianza (IC) del 95%. Se comprobaron dos veces todas las extracciones y análisis de los datos.

Resultados principales

Se incluyeron 373 estudios. La prueba índice fue la AFP (326 estudios, 144 570 participantes), la ecografía (39 estudios, 18 792 participantes) y una combinación de AFP y ecografía (ocho estudios, 5454 participantes).

Todos los estudios, menos uno, se consideraron con alto riesgo de sesgo. La mayoría de los estudios utilizaron diferentes pruebas de referencia, a menudo inadecuadas para excluir la presencia de la afección buscada, y rara vez se definió el intervalo de tiempo entre la prueba índice y la prueba de referencia. La mayoría de los estudios que utilizaron la AFP tenían un diseño de casos y controles. También existieron muchas preocupaciones relacionadas con la aplicabilidad debido a las características de los participantes.

Como los estudios primarios con la AFP utilizaron diferentes puntos de corte, se realizó un metanálisis utilizando el modelo jerárquico de la curva de resumen de rendimiento diagnóstico, y luego se realizaron dos metanálisis que incluyeron sólo los estudios que informaron sobre los puntos de corte más utilizados: alrededor de 20 ng/ml o 200 ng/ml.

Punto de corte de AFP 20 ng/ml: para el CHC (147 estudios) sensibilidad del 60% (IC del 95%: 58% a 62%), especificidad del 84% (IC del 95%: 82% a 86%); para el CHC resecable (seis estudios) sensibilidad del 65% (IC del 95%: 62% a 68%), especificidad del 80% (IC del 95%: 59% a 91%).

Punto de corte de AFP 200 ng/ml: para el CHC (56 estudios) sensibilidad del 36% (IC del 95%: 31% a 41%), especificidad del 99% (IC del 95%: 98% a 99%); para el CHC resecable (dos estudios) uno con sensibilidad del 4% (IC del 95%: 0% a 19%), especificidad del 100% (IC del 95%: 96% a 100%), y otro con sensibilidad del 8% (IC del 95%: 3% a 18%), especificidad del 100% IC del 95%: 97% a 100%).

Ecografía: para el CHC (39 estudios) sensibilidad del 72% (IC del 95%: 63% a 79%), especificidad del 94% (IC del 95%: 91% a 96%); para el CHC resecable (siete estudios) sensibilidad del 53% (IC del 95%: 38% a 67%), especificidad del 96% (IC del 95%: 94% a 97%).

Combinación de AFP (punto de corte de 20 ng/ml) y ecografía: para el CHC (seis estudios) sensibilidad del 96% (IC del 95%: 88% a 98%), especificidad del 85% (IC del 95%: 73% a 93%); para el CHC resecable (dos estudios) uno con sensibilidad del 89% (IC del 95%: 73% a 97%), especificidad del 83% (IC del 95%: 76% a 88%), y otro con sensibilidad del 79% (IC del 95%: 54% a 94%), especificidad del 87% (IC del 95%: 79% a 94%).

La heterogeneidad observada en los resultados es en su mayoría inexplicable, y sólo en parte atribuible a diferentes puntos de corte o ámbitos (programa de monitorización en comparación con series clínicas). Los análisis de sensibilidad, con la exclusión de los estudios publicados como resúmenes o con diseño de casos y controles, no mostraron una variación en los resultados.

Se comparó la exactitud obtenida en los estudios con AFP (punto de corte en torno a 20 ng/ml) y ecografía: una comparación directa en 11 estudios (6674 participantes) mostró una mayor sensibilidad de la ecografía (81%; IC del 95%: 66% a 90%) versus la AFP (64%; IC del 95%: 56% a 71%) con una especificidad similar: ecografía del 92% (IC del 95%: 83% a 97%) versus AFP del 89% (IC del 95%: 79% a 94%). Una comparación directa de seis estudios (5044 participantes) mostró una mayor sensibilidad (96%; IC del 95%: 88% a 98%) de la combinación de AFP y ecografía versus la ecografía (76%; IC del 95%: 56% a 89%) con una especificidad similar: AFP y ecografía del 85% (IC del 95%: 73% a 92%) versus ecografía del 93% (IC del 95%: 80% a 98%).

Conclusiones de los autores

En la vía de atención clínica para el diagnóstico del CHC en adultos, la AFP y la ecografía, por separado o en combinación, tienen la función de pruebas de triaje. Se encontró que al utilizar la AFP, con 20 ng/ml como punto de corte, no se detectarían alrededor del 40% de los CHC, y solo con la ecografía, más de una cuarta parte. La combinación de las dos pruebas mostró la mayor sensibilidad y no se detectarían menos del 5% de los CHC, con aproximadamente el 15% de resultados falsos positivos. La incertidumbre debido a la calidad deficiente y la heterogeneidad de los estudios incluidos limita la posibilidad de establecer conclusiones fiables sobre la base de estos resultados.

Ecografía abdominal y alfafetoproteína para el diagnóstico del carcinoma hepatocelular

¿Por qué es importante mejorar el diagnóstico del carcinoma hepatocelular?

El carcinoma hepatocelular (CHC), es decir, el cáncer que se origina en el hígado, ocupa el sexto lugar en cuanto a la incidencia global del cáncer y el cuarto en cuanto a las muertes por cáncer en los hombres. Este cáncer aparece sobre todo en personas con enfermedad crónica del hígado, independientemente de la causa. La ecografía, que utiliza ondas de ultrasonido para mostrar anomalías en el hígado, puede detectar la presencia de lesiones hepáticas sospechosas de ser un CHC. La alfafetoproteína (AFP), una glicoproteína producida por el hígado y medible en la sangre, se considera un marcador tumoral porque sus niveles elevados se pueden asociar con la presencia de CHC. Estas dos pruebas (ecografía y AFP) se utilizan, solas o combinadas, para excluir la presencia de CHC en personas con alto riesgo de desarrollarlo. Las personas de alto riesgo son las que presentan una enfermedad hepática crónica. Las guías actuales recomiendan programas de vigilancia, repitiendo la ecografía abdominal con o sin una prueba de AFP cada seis meses para detectar un CHC precoz, susceptible de resección quirúrgica u otro tratamiento.

¿Cuál era el objetivo de esta revisión?

Determinar la exactitud de la AFP, la ecografía y una combinación de AFP y ecografía para diagnosticar el CHC en personas con enfermedad hepática crónica.

¿Qué se estudió en esta revisión?

AFP (marcador tumoral), que se puede medir fácilmente en la sangre, utilizando un kit comercial. Los estudios con la AFP utilizaron varios valores umbrales para definir la prueba como positiva o negativa.

La ecografía se realiza con un equipo y está disponible en todo el mundo. Produce imágenes del hígado y otros órganos abdominales. Puede detectar la presencia de lesiones hepáticas sospechosas de ser CHC.

Una combinación de AFP y ecografía puede detectar o descartar la presencia de lesiones hepáticas sospechosas de ser CHC.

¿Cuáles son los principales resultados de esta revisión?

Se encontraron 373 estudios totales en adultos: La AFP se analizó en 326 estudios, 144 570 participantes; la ecografía en 39 estudios, 18 792 participantes; y la combinación de AFP y ecografía en ocho estudios, 5454 participantes.

‐ AFP con umbral de 20 ng/ml (147 estudios): la prueba fue positiva en 60 de 100 participantes con CHC y en 16 de 100 participantes sin CHC. AFP con umbral de 200 ng/ml (56 estudios): la prueba fue positiva en 36 de 100 participantes con CHC y sólo en uno de 100 sin CHC.
‐ Ecografía (39 estudios): la prueba fue positiva en 72 de 100 participantes con CHC y en seis de 100 participantes sin CHC.
‐ La combinación de AFP con un umbral de 20 ng/ml y ecografía (seis estudios): una o ambas pruebas fueron positivas en 96 de 100 participantes con CHC y en 15 de 100 participantes sin CHC.

Por lo tanto, la combinación de las dos pruebas es mejor para detectar a los participantes con CHC. Teniendo en cuenta que cinco de cada 100 personas con enfermedad hepática crónica presentan CHC, se puede suponer que 50 de cada 1000 personas con enfermedad hepática crónica tendrán CHC y, con la AFP y la ecografía abdominal combinadas, se detectarán 48 de las personas con CHC y dos no se detectarán y no recibirán el tratamiento adecuado; 950 de 1000 no tendrán CHC y 143 de ellas recibirán un diagnóstico erróneo de CHC y se someterán a más pruebas innecesarias como la tomografía computarizada, la resonancia magnética o la biopsia.

¿Qué fiabilidad tienen los resultados de los estudios de esta revisión?

Todos los estudios, excepto uno, tenían problemas de riesgo de sesgo, especialmente en la selección de los participantes y en la definición correcta de la presencia de CHC. Estos problemas podrían perjudicar la estimación correcta de la capacidad diagnóstica de las tres pruebas.

¿Para quiénes son relevantes los resultados de esta revisión?

Personas con enfermedad hepática crónica (hepatopatía)

¿Cuáles son las implicaciones de esta revisión?

Al utilizar la AFP, con 20 ng/ml como umbral, se pasaría por alto alrededor del 40% de los casos de CHC y solo con la ecografía, más de una cuarta parte. La sensibilidad fue mayor cuando se utilizaron las dos pruebas combinadas, y se pasó por alto menos del 5% de los casos de CHC, con un 15% de falsos positivos.

¿Cuál es el grado de actualización de esta revisión?

5 de junio de 2020

Authors' conclusions

Implications for practice

Hepatocellular carcinoma (HCC) is a frequent complication of chronic liver disease. The detection of a tumour amenable to surgical resection, thermal ablation, or liver transplantation could improve the prognosis which in the absence of indications to radical treatment is severe. Being the fourth leading cause of death from cancer worldwide, accurate tests are needed to diagnose HCC, either in a surveillance programme or in a clinical setting. In the clinical pathway for the diagnosis of HCC in people with chronic liver disease, AFP and US are the first step investigations. Both tests, in separate or in combination, can be considered as triage tests. Ideally, they should ensure a low proportion of false‐negative results because people with undetected HCC cannot receive proper treatment. False‐positive results would have less severe consequences as misclassified people would undergo unnecessary further testing with CT, MRI, or rarely biopsy.

In surveillance programmes for HCC in high risk patients, the pooled sensitivity of alpha‐foetoprotein (AFP) measurement, with a cut‐off value of 20 mg/mL, suggests that using this test alone, a relevant number of HCC occurrences would be missed. The estimated sensitivity of ultrasound (US) is higher, but again more than a quarter of HCC occurrences would be missed. The combination of the two tests, considered positive when at least one is positive, reduces the false‐negative ratio to around 5%, sparing further testing in case of negative results. The cost of the improvement of the sensitivity is an increased number of false‐positive results from 6% to 15%. Moreover, our findings suggest that US sensitivity decreases for the diagnosis of potentially resectable HCC.

In a clinical setting, where the pre‐test probability of having an HCC is expected to be higher than in surveillance programmes, both US and AFP, with a cut‐off value of 20 ng/mL, have an estimated specificity higher than 80%, AFP with a cut‐off value of 200 ng/mL, allows confirmation of the diagnosis with a specificity even around 99%. In any case, further testing is required for staging the disease and planning appropriate treatment. However, the role of these two tests is mainly as triage tests, but they individually do not ensure an adequate sensitivity. In particular, AFP is higher than 200 mg/mL only in 36% of patients with HCC. Therefore, clinicians cannot avoid further testing in case of negative results. In this context, the role of the combination of AFP and US cannot be assessed as we found only one study with pathology of explanted liver as reference standard.

Overall, caution is needed in interpreting our review results as we found large heterogeneity which is not due to a few outliers, and despite the investigation of multiple potential factors, heterogeneity remains unexplained. Furthermore, all studies were at high risk of bias, and most of them with high concern regarding their applicability, mainly due to participant selection domain.

Implications for research

As the evidence of the accuracy of AFP, US, and especially of the combination of AFP and US is not conclusive, further studies are needed. In order to obtain more consistent and applicable results; these studies should assess the sensitivity and specificity of AFP and US in people with chronic liver disease at a definite risk for HCC, with a cross‐sectional design, evaluating either participants with positive or negative results of the index test with computed tomography (CT) or magnetic resonance imaging (MRI) as the reference standard. This reference standard, even if not absolutely accurate, should be chosen as in the clinical pathway both AFP and US tests play the role of a triage test, just before CT and MRI tests. The time interval between the index test and the reference standard should be clearly reported and should not exceed three months. The number of uninterpretable results should be reported at least for US due to their not negligible frequency. Moreover, no further study with a case‐control design can be expected to be informative.

To explore the possible role of these tests on patient relevant outcomes, beyond their accuracy, studies with different designs are needed (Colli 2014). Only randomised clinical trials assessing the overall mortality in different surveillance programmes including these tests in separate or in combination could properly answer this question.

Summary of findings

Open in table viewer
Summary of findings 1. 'Summary of findings' table: diagnostic accuracy of AFP, US, and combination of AFP and US for the diagnosis of HCC

Review question: what is the diagnostic accuracy of alpha‐foetoprotein (AFP), abdominal ultrasound (US), or of the combination of AFP and abdominal US for the diagnosis of hepatocellular carcinoma (HCC) in adults with chronic liver disease?

Population: adults with chronic liver disease

Setting: clinical setting (secondary or tertiary care setting) or surveillance programs

Study design: prospective and retrospective cross‐sectional and case‐control studies

Index tests

Serum alpha‐foetoprotein (AFP) measurement with a cut‐off value of 20 ng/mL

Serum alpha‐foetoprotein (AFP) measurement with a cut‐off value of 200 ng/mL

Abdominal ultrasound (US)

Combination of serum alpha‐foetoprotein (AFP) measurement with a cut‐off value of 20 ng/mL and abdominal ultrasound (US)

Target condition: HCC of any size, any stage

Reference standards:

the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesion(s), or the histology of resected or biopsied focal liver lesion(s) with a follow‐up period of at least six months to exclude the presence of focal lesions non detected by the index test and synchronous lesions from the parenchyma surrounding the resected or biopsied area;

typical characteristics on cross‐sectional multiphasic contrast computer tomography (CT) or magnetic resonance imaging (MRI), with a follow‐up period of at least six months in order to allow the confirmation of an initial negative result on CT or on MRI.

Limitations in the evidence ‐ Risk of bias/Applicability

Index test: serum alpha‐foetoprotein (AFP) measurement cut‐off value 20 ng/mL

‐ Participant selection: high/unclear risk of bias 141 studies (96%), high concern 115 studies (78%)

‐ Index tests: high/unclear risk of bias in 73 studies (50%) high concern: no study

‐ Reference standard: high/unclear risk of bias in 105 studies (71%) high concern 33 studies (22%)

‐ Flow and timing: high risk of bias in 143 studies (97%)

Index test: serum alpha‐foetoprotein (AFP) measurement cut‐off value 200 ng/mL

‐ Participant selection: high/unclear risk of bias 48 studies (86%), high concern 47(84%)

‐ Index tests: high/unclear risk of bias in 54 studies (96%) high concern no study

‐ Reference standard: high/unclear risk of bias in 39 studies (70%) high concern 13 studies (23%)

‐ Flow and timing: high risk of bias in 55 studies (98%)

Index test: abdominal ultrasound

‐ Participant selection: high/unclear risk of bias in 23 studies (59%) high concern 22 studies (56%)

‐ Index tests: high/unclear risk of bias in 15 studies (38%) high concern no study

‐ Reference standard: high/unclear risk of bias in 27 studies (69%) high concern 13 studies (33%)

‐ Flow and timing: high risk of bias in 27 studies (TN) (69%)

Index test: combination of serum alpha‐foetoprotein (AFP) measurement with a cut‐off value of 20 ng/mL and abdominal ultrasound

‐ Participant selection: high/unclear risk of bias in 2 studies (33%) high concern 2 studies (33%)

‐ Index tests: high/unclear risk of bias in 2 studies (33%) high concern no study

‐ Reference standard: high/unclear risk of bias in 4 studies (67%) high concern one study (17%)

‐ Flow and timing: high risk of bias in 6 studies (100%)

Findings

Implications in a hypothetical cohort of 1000 people

Index test

Number of studies (participants)

Sensitivity

(95% CI)

Specificity

(95% CI)

Assumed prevalence of hepatocellular carcinoma (HCC)a

%

True positives will receive appropriately further necessary testing with CT or MRI, or contrast enhanced ultrasound (CEUS) and possibly treatment.

False negatives will be misdiagnosed and not receive appropriate treatment.

True negatives will not appropriately undergo unnecessary further testing with CT, MRI, CEUS, biopsy.

False positives will inappropriately undergo further unnecessary testing with CT, MRI, CEUS biopsy.

Certainty of the evidence

AFP (cut‐off 20 ng/mL)

147

(52144)

59.8%

(57.9% to 61.7%)

84.4%

(82.3% to 86.3%)

5%

30

20

802

148

very low b

⨁◯◯◯

30%

179

121

591

109

AFP (cut‐off 200 ng/mL)

56

(20452)

36% (31% to 41%)

99% (98% to 100%)

5%

18

32

940

10

very low c

⨁◯◯◯

30%

108

192

693

7

US

39

(18792)

72%

(63% to 79%),

94% (91% to 96%)

5%

36

14

893

57

very low d

⨁◯◯◯

30%

216

84

658

42

Combination of AFP (cut‐off 20 ng/mL) and US

6

(5044)

96%

(88% to 98%)

85%

(73% to 93%)

5%

48

2

807

143

low e

⨁⨁◯◯

30%

288

12

595

105

a We chose for exemplification two values of HCC prevalence: 5% for a population at low risk (compensated advanced chronic liver disease and chronic viral hepatitis) Lok 2009 and 30% for a population with high risk, a median of the prevalence in the included cross‐sectional studies conducted in clinical cohorts.

b Downgraded by three levels: risk of bias, indirectness, and inconsistency. Risk of bias downgraded one level because all studies were judged at high risk of bias; indirectness downgraded one level as we considered most studies to have concern regarding applicability mainly in relation to the population (including disease spectrum); inconsistency downgraded one level as for individual studies ranged from 24% to 90% and we could not explain the heterogeneity by study quality or other factors

c Downgraded by three levels: risk of bias, indirectness, and inconsistency. Risk of bias downgraded one level because all studies were judged at high risk of bias; indirectness downgraded one level as we considered most studies to have concern regarding applicability mainly in relation to the population (including disease spectrum); inconsistency downgraded one level as for individual studies ranged from 4% to 83% and we could not explain the heterogeneity by study quality or other factors

d Downgraded by three levels: risk of bias, indirectness, and inconsistency. Risk of bias downgraded one level because all studies were judged at high risk of bias; indirectness downgraded one level as we considered most studies to have concern regarding applicability mainly in relation to the population (including disease spectrum); inconsistency downgraded one level as for individual studies ranged from 28%to 100% and we could not explain the heterogeneity by study quality or other factors

eDowngraded by two levels: risk of bias, indirectness. Risk of bias downgraded one level because all studies were judged at high risk of bias; indirectness downgraded one level as we considered most studies to have concern regarding applicability mainly in relation to the population (including disease spectrum).

GRADE certainty of the evidence

High: we are very confident that the true effect lies close to that of the estimate of the effect.
Moderate: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.
Very low: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

The results presented in this table should not be interpreted in isolation from results of the individual included studies contributing to each summary test accuracy measure.

Open in table viewer
Summary of findings 2. 'Summary of findings' table: direct comparison of US, and combination of AFP and US

Review question: what is the diagnostic accuracy of the combination of alpha‐foetoprotein (AFP) and abdominal ultrasound (US) compared to US for the diagnosis of hepatocellular carcinoma (HCC) in adults with chronic liver disease?

Population: adults with chronic liver disease

Setting: clinical setting (secondary or tertiary care setting) or surveillance programs

Study design: prospective and retrospective cross‐sectional studies

Index tests:abdominal ultrasound; combination of serum alpha‐foetoprotein (AFP) measurement with a cut‐off value of 20 ng/mL and abdominal ultrasound

Target condition: HCC of any size, any stage

Reference standards:the pathology of the explanted liver in case of transplantation;the histology of resected focal liver lesion(s), or the histology of resected or biopsied focal liver lesion(s) with a follow‐up period of at least six months to exclude the presence of focal lesions non detected by the index test and synchronous lesions from the parenchyma surrounding the resected or biopsied area;typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months in order to allow the confirmation of an initial negative result on computer tomography (CT) or on magnetic resonance imaging (MRI).

Limitations in the evidence

Risk of bias/ Applicability

‐ Participant selection: high/unclear risk of bias in 2 studies (33%)/ high concern 2 studies (33%)

‐ Index tests: high/unclear risk of bias in 2 studies (33%)/ high concern no study

‐ Reference standard: high/unclear risk of bias in 4 studies (67%)/ high concern 1 study (17%)

‐ Flow and timing: high risk of bias in 6 studies (100%)

Findings

Implications in a hypothetical cohort of 1000 people

Index test

Number of studies (participants)

Sensitivity

(95% CI)

Relative sensitivity (95% CI)

P value

Specificity

(95% CI)

Relative specificity

(95% CI)

P value

Assumed prevalence of hepatocellular carcinoma (HCC)a

%

True positives
will receive appropriately further necessary testing with CT or MRI, or contrast enhanced ultrasound (CEUS) and possibly treatment .

False negatives
will be misdiagnosed and not receive appropriate treatment.

True negatives
will not appropriately undergo unnecessary further testing with CT, MRI, CEUS, biopsy

False positives
will inappropriately undergo further unnecessary testing with CT, MRI, CEUS biopsy.

Certainty of the evidence

US

6 (5044)

76% (56% to 89%)

1.28 (1.03 to 1.539

P = 0.014

93% (80% to 96%)

0.94, (0.87 to 1.01)

P = 0.102

5%

38

12

883

67

lowb

⨁⨁◯◯

30%

228

72

651

49

Combination of AFP (cut‐off 20 ng/mL) and US

96% (88% to 98%)

85% (73% to 82%)

5%

48

2

807

143

30%

288

12

595

105

a We chose for exemplification two values of HCC prevalence: 5% for a population at low risk (compensated advanced chronic liver disease and chronic viral hepatitis) Lok 2009 and 30% for a population with high risk, a median of the prevalence in the included cross‐sectional studies conducted in clinical cohorts.

bDowngraded by two levels: risk of bias, indirectness. Risk of bias downgraded one level because all studies were judged at high risk of bias; indirectness downgraded one level as we considered most studies to have concern regarding applicability mainly in relation to the population (including disease spectrum)

GRADE certainty of the evidence

High: we are very confident that the true effect lies close to that of the estimate of the effect.
Moderate: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.
Very low: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

The results presented in this table should not be interpreted in isolation from results of the individual included studies contributing to each summary test accuracy measure.

Background

Hepatocellular carcinoma (HCC) is the most common primary liver neoplasm, usually developing in the setting of chronic liver disease. It is the sixth most commonly diagnosed cancer and the fourth leading cause of death from cancer worldwide; there were 782,000 deaths due to HCC in 2018 (Bray 2018). In men, HCC ranks fifth in terms of global cases of cancer and second in terms of cancer deaths (Bray 2018). In Western countries, the incidence and mortality rates of HCC increased substantially between 1990 and 2015 (Ryerson 2016; GBD 2017). Most common risk factors include liver cirrhosis, severe liver fibrosis, hepatitis B, hepatitis C, alcohol intake, and non‐alcoholic fatty liver disease (Yang 2011), although some people may develop HCC without the presence of known risk factors (Bralet 2000; Young 2012).

Clinically, HCC is frequently diagnosed in the late stages because of the absence of specific symptoms of the malignancy, other than those related to chronic liver disease. Only 20% of patients with HCC are eligible for curative treatments — such as liver resection, transplantation, or ablation — due to advanced tumour stage, liver dysfunction, or shortage of liver donors (Davila 2012). According to the current guidelines, HCC can only be considered as resectable and amenable to surgical radical resection if the cancer presents as either a single lesion with a maximum diameter of less than 5 cm, or up to three lesions, each with a maximum diameter of 3 cm (Mazzaferro 1996; EASL‐EORTC 2012; Omata 2017; EASL 2018; Heimbach 2018). Furthermore, curative treatment options are not feasible for most patients due to severe clinical deterioration at the moment of diagnosis, or due to the inaccuracy of the preoperative clinical evaluation and staging procedure.

Despite the poor initial prognosis (the mortality‐to‐incidence overall ratio has been reported as 0.93; (Bray 2018)), a five‐year survival rate of more than 50% can be achieved if HCC is detected at an early stage (Forner 2012). According to the Barcelona Clinic Liver Cancer staging system, only patients with early‐stage HCC are eligible for curative treatment (Llovet 1999). Therefore, it is very important to make an accurate diagnosis of HCC as early as possible.

Abdominal ultrasound (US) has become an acceptable imaging modality in detecting HCC because it is non‐invasive, acceptable to patients, has moderate costs, and no associated risks. A recent meta‐analysis showed a pooled sensitivity of 84% of US surveillance in detecting HCC in people without any symptoms (Tzartzeva 2018). However, the same publication showed a poor result for US in the detection of early‐stage HCC in people who are eligible for curative therapies, with a pooled sensitivity of only 47% (Tzartzeva 2018). Accordingly, detection of HCC poses a challenge. The sonographic liver tissue characteristics in people with fibrosis make it particularly difficult to detect and differentiate small neoplastic nodules from the surrounding parenchyma and from regenerative nodules. Furthermore, the performance of US can be influenced by the expertise of the operator and the quality of the equipment.

Alpha‐foetoprotein (AFP) is a tumour marker which has been used as a diagnostic test for HCC since the 1970s, when most patients were diagnosed in the late stage and with clinical symptoms (Kew 1975). Although the test for AFP is widely available, inexpensive, and easy to perform, it has poor accuracy as a serological test for the early detection of HCC (Tateishi 2008). Levels of AFP increase not only in people with HCC, but also in people with active hepatitis, cirrhosis without HCC, or exacerbation of the underlying liver disease, due to pathophysiological changes of inflammation and regeneration; this means the test can have low specificity in the population at risk (Di Bisceglie 2005; Gopal 2014).

Surveillance programmes for early detection of HCC in high‐risk patients have been implemented in the current medical practice in most Western and Asian‐Pacific countries, despite the very low‐certainty evidence regarding the effects on mortality (Kansagara 2014; Singal 2014). The American Association for the Study of Liver Disease (AASLD), European Association for the Study of the Liver with European Organization for Research and Treatment of Cancer (EASL‐EORTC), and Asian Pacific Association for the Study of the Liver (APASL) recommend abdominal US as an imaging modality for surveillance of HCC every six months in people at risk. However, disagreement exists between using serum biomarker AFP as an additional test (EASL‐EORTC 2012; Omata 2017; EASL 2018; Heimbach 2018).

There are several published systematic reviews which examine the accuracy of ultrasonography and AFP in detecting HCC (Colli 2006; Tateishi 2008; Singal 2009; Kansagara 2014; Singal 2014; Chou 2015; Tzartzeva 2018), but to our knowledge, there is no recent systematic review which compares AFP alone, US alone, and the combination of AFP and US in detecting HCC. Therefore, the aim of our review is to use Cochrane methodology to assess the diagnostic accuracy of these three modalities for the diagnosis of HCC, as well as the early stage of HCC (when the cancer may still be resectable), in people with chronic liver disease.

Target condition being diagnosed

Hepatocellular carcinoma is the most common primary liver cancer which occurs mostly in people with chronic liver disease. The incidence of HCC increases in individuals with hepatitis B and C, alcohol use, and non‐alcoholic fatty liver disease, and in those with liver cirrhosis of various aetiologies (Bruix 2011). There is no definite threshold in the definition of lesion size, although the literature tends to classify lesions with a diameter equal to or less than 2 cm as 'small' (Hussain 2002; Choi 2014; Park 2017).

In clinical practice, and according to pertinent guidelines, multiphasic computed tomography (CT) or magnetic resonance imaging (MRI) with intravascular contrast allow for a highly accurate diagnosis of HCC, without an invasive biopsy (EASL 2018; Heimbach 2018). The diagnosis of HCC is usually obtained on the basis of cross‐sectional CT or MRI features: focal liver lesions which show non‐rim‐like hyper enhancement in the arterial phase, subsequent non‐peripheral washout appearance, and capsule appearance (LI‐RADS 2018). Liver histology is required only for undefined lesions during CT and MRI (EASL‐EORTC 2012; Omata 2017; Heimbach 2018).

A number of staging systems for HCC have been proposed and developed; however, there is no globally applicable staging system (Kinoshita 2015). Among different staging protocols, the Barcelona Clinic Liver Cancer (BCLC) classification system has a notable feature of treatment recommendations for each stage, based on the best treatment options currently available (Llovet 1999; Llovet 2003; Llovet 2008). The staging is based on four elements: tumour extension, liver functional reserve, physical status, and cancer‐related symptoms. According to the BCLC classification system, only patients with early‐stage HCC are eligible for curative treatment, such as surgical resection or percutaneous treatment. Orthotopic liver transplantation is reserved for patients with decompensated cirrhosis.

Orthotopic liver transplantation is considered a definite curative treatment for HCC. When orthotopic liver transplantation for HCC was initially introduced in the 1980s, it was associated with poor five‐year survival rates and high recurrence rates, which led to the treatment being contraindicated for HCC (Yokoyama 1990). In 1996, specific criteria, known as Milan criteria (Mazzaferro 1996), were developed for the selection of patients for liver transplantation. With the implementation of these criteria, the overall five‐year survival rates for post‐orthotopic liver transplantation patients exceeded 70% (Mazzaferro 2011). The criteria for patients eligible for orthotopic liver transplantation include: a single HCC lesion with a diameter equal to or less than 5 cm, or up to three HCC lesions, each with a diameter equal to or less than 3 cm; no vascular invasion; and no extrahepatic involvement (no metastasis). The same criteria are recommended for the selection of patients eligible for surgical resection.

Along with interferon‐based treatment, a new direct‐acting antiviral (DAA) therapy was developed for people with chronic hepatitis C; these therapies therefore acted against one of the major risk factors for developing HCC (Bourliere 2015; Charlton 2015; Leroy 2016). DAA therapy allowed the achievement of sustained virologic response (SVR) in more than 70% of patients, compared to less than 40% with interferon therapy (Jakobsen 2017; Calvaruso 2018). However, a consensus exists that even after achieving SVR, people with chronic hepatitis C should be surveyed closely, especially those with advanced fibrosis and those who received a recent treatment for HCC in order to detect HCC at an early stage (Butt 2018).

Index test(s)

Abdominal US is a safe, inexpensive, non‐invasive, and real‐time diagnostic technique with relatively low costs. A transducer transforms electrical energy into sound waves (two megahertz (mHz) to eight mHz) and transmits them into the body. Simultaneously, the transducer detects the sound waves reflected by the underlying tissue. The intensity of these reflected (echo) waves is based on several properties of the tissue, such as density, depth, and properties of adjacent tissues. The echo waves are converted into electrical energy and displayed as a cross‐sectional tomography image.

According to the Liver Reporting and Data System (LI‐RADS) for detection of HCC, there are three US categories for diagnosing suspected liver lesions: US‐1 (negative), US‐2 (subthreshold), and US‐3 (positive). Since US is an operator‐dependent imaging modality and limitations due to patient characteristics can occur, an US visualisation score is added: A (no or minimal limitations); B (moderate limitations); and C (severe limitations). A negative observation is reported when no liver lesions have been detected or the detected lesions are definitely benign. Subthreshold lesions of less than 10 mm are noted only when no definitely benign features have been observed. A positive observation is reported when a lesion of more than 10 mm with no definitely benign features is observed, or a new venous thrombus has been detected (LI‐RADS 2018; Rodgers 2019).

Alpha‐foetoprotein (AFP) is a glycoprotein of 591 amino acids and a carbohydrate moiety which is assessed in serum by enzyme immunoassays (Pucci 1991). In presence of HCC, high serum values of AFP are reported with variable accuracy (Colli 2006; Tateishi 2008; Singal 2009; Kansagara 2014; Singal 2014; Tzartzeva 2018).

Clinical pathway

For people with chronic liver disease, a surveillance programme is usually recommended. There are minimal variations among the surveillance programmes of the different scientific societies (Table 1).

Open in table viewer
Table 1. Guideline recommendations for surveillance for hepatocellular carcinoma

GUIDELINE

INDICATION TO SURVEILANCE

TEST

INTERVAL

American Association for the Study of Liver Disease (AASLD; (Heimbach 2018))

Cirrhosis

Abdominal ultrasound alone or plus AFP

6 months

European Association for the Study of the Liver with European Organization for Research and Treatment of Cancer (EASL‐EORTC; (EASL‐EORTC 2012; EASL 2018))

Cirrhosis in Child Pugh stages A and B; cirrhosis in Child C stage awaiting liver transplantation; non‐cirrhotic hepatitis B virus (HBV) carriers with active hepatitis or family history of HCC; non‐cirrhotic chronic hepatitis C with advanced liver fibrosis stage 3 (F3)

Abdominal ultrasound

6 months

3 to 4 months: people with a nodule less than 1 cm or after resection or loco‐regional therapies

Asian Pacific Association for the Study of the Liver (APASL; (Omata 2017))

Cirrhosis and chronic HBV infection at risk of HCC

Abdominal ultrasound with serum AFP

6 months

AFP: alpha‐foetoprotein; HCC: hepatocellular carcinoma

American Association for the Study of Liver Disease (AASLD) guidelines

According to the AASLD guidelines, to increase overall survival, only adults with cirrhosis who are considered at risk of developing HCC need surveillance. It is suggested that surveillance be performed using abdominal US, with or without AFP, every six months. However, it is not possible to determine which type of surveillance test (ultrasound alone or ultrasound plus AFP) would lead to a greater improvement in survival. Surveillance is not suggested for those with Child‐Pugh class C cirrhosis, unless they are on the liver transplant waiting list, because of low anticipated survival (Heimbach 2018).

European Association for the Study of the Liver with European Organization for Research and Treatment of Cancer (EASL‐EORTC) guidelines

According to the EASL‐EORTC guidelines, people at risk of developing HCC for which surveillance should be performed include: people with Child‐Pugh stage A or stage B cirrhosis, people with Child‐Pugh stage C cirrhosis awaiting liver transplantation, non‐cirrhotic hepatitis B virus carriers with active hepatitis or family history of HCC, and people with chronic hepatitis C in the absence of cirrhosis but with advanced liver fibrosis stage 3 (F3). People on liver transplant waiting lists should be screened for HCC in order to detect and manage tumour progression. Surveillance should be performed using abdominal US every six months. A three‐ to four‐month interval is recommended in people where a nodule of less than 1 cm has been detected, and in the follow‐up strategy, after resection or loco‐regional therapies. Serum biomarkers such as AFP, AFP‐L3 (third electrophoretic form of lentil lectin‐reactive AFP), and des‐gamma‐carboxy prothrombin are suboptimal for routine clinical practice, and therefore, not recommended for screening (EASL‐EORTC 2012; EASL 2018).

Asian Pacific Association for the Study of the Liver (APASL) guidelines

According to the APASL guidelines, the following people are at risk of HCC development and therefore are eligible for HCC screening: those with cirrhosis, those who have chronic hepatitis B virus infection with cirrhosis, and those who have chronic hepatitis B virus infection in the absence of cirrhosis. The optimal surveillance strategy includes abdominal US with serum AFP measurement every six months. Measurement of AFP alone is not recommended for routine surveillance of people with HCC (Omata 2017).

Outside surveillance programmes

Ultrasound and AFP are usually performed in people with clinically suspected HCC, or liver cirrhosis, or both, or at the moment of decompensation of chronic liver disease, or all these factors together.

Prior test(s)

The diagnosis of liver cirrhosis is usually based on clinical judgement derived from history, laboratory testing, physical examination, imaging, liver stiffness measurement, liver histology, or a combination of these. Due to the accuracy of non‐invasive tests, liver histology is reserved to only a minority of patients with unclear diagnosis, and a non‐invasive diagnosis of advanced chronic liver disease is considered equivalent to a histological diagnosis of cirrhosis (de Franchis 2015). No test is recommended by the above guidelines, prior to a surveillance programme for HCC detection.

Role of index test(s)

Abdominal US and AFP (independently, or in combination, or in sequence) are used as triage tests to exclude the presence of focal liver lesions suspected of being HCC. Further alternative testing is required to confirm the diagnosis as well as staging.

Alternative test(s)

Contrast‐enhanced ultrasound (CEUS) is an advanced form of US examination in which images are acquired using intravenously injected microbubble contrast agent with optimised technology required for contrast visualisation. The CEUS exam consists of a 'bolus' administration of contrast media through a superficial peripheral vein. The sequence of blood entering the liver is first arterial (10 seconds to 40 seconds), then portal (40 seconds to 120 seconds after injection), and then late venous (more than 120 seconds). This vascular discrimination, similar to that obtained by contrast CT or MRI, allows for the collection of information regarding the circulatory system of a tumour (e.g. types of feeding vessels, tumour circulatory volume). Positivity criteria for HCC are based on arterial hyper enhancement and subsequent washout appearance. The advantages of US agent over CT and MRI agents include no adverse reactions, possible multiple injections of contrast in the same examination, safety, practicality, no risk of nephrotoxicity, and no ionising radiation (Chung 2015).

Contrast‐enhanced multiphasic multi detector CT and contrast‐enhanced MRI have been established as relevant non‐invasive modalities for detection and evaluation of liver lesions (Lee 2012; O'Neill 2015). The ability to detect HCC rests on characterising the enhancement patterns in arterial, portal venous and subsequent phases relative to the surrounding liver tissue. The differences in blood flow and extracellular volume between HCC and normal liver tissue lead to main radiological hallmarks such as non‐rim‐like arterial phase hyper enhancement and subsequent non‐peripheral washout with enhancing capsule in later phases (Hennedige 2012; Choi 2014; Shah 2014; LI‐RADS 2018). CT is a commonly used modality for diagnosing HCC due to its short acquisition time and high spatial resolution. However, MRI offers several beneficial features such as absence of X‐ray radiation and combination of various sequences (multiphasic T1‐ and T2‐weighted sequences, diffusion‐weighted imaging, and apparent diffusion coefficient) in combination with the use of extracellular or hepatocellular gadolinium‐based contrast agent, or both (Arif‐Tiwari 2014; Roberts 2018).

Apart from AFP, there are other potential serological tumour biomarkers for the detection of HCC. Des‐gamma‐carboxyprothrombin, also known as prothrombin induced by vitamin K absence‐II (PIVKA‐II), is an abnormal prothrombin protein that is increased in the serum of people with HCC. It is recognised as a specific marker for the detection and prognosis of HCC (Imamura 1999; Koike 2001), although contrary data exist on the benefit of using PIVKA‐II over AFP (Nakamura 2006; Li 2014). AFP‐L3 can differentiate an increase in AFP due to HCC from that in people with benign liver disease, and from a potential biomarker for early HCC detection (Kumada 2014). Glypican‐3 (GPC3) is considered to be a promising biomarker for early detection of HCC and a potential epitope for HCC‐targeted therapies (Zhou 2018). Other biomarkers include Golgi protein 73, osteopontin, circulating free DNA, and microRNAs. However, none of these have been introduced in daily practice (Omata 2017).

Rationale

Hepatocellular carcinoma is currently detected by liver ultrasound in people with chronic liver disease with normal or high AFP levels during surveillance programmes. Following ultrasound, the diagnosis is usually confirmed by high levels of AFP or by using contrast‐enhanced ultrasound (CEUS) (or both), CT, or MRI. The diagnosis in people who are not in a surveillance programme is usually obtained at decompensation of chronic liver disease (i.e. detection of oesophageal varices, gastrointestinal haemorrhage, or ascites), or during the diagnosis of previously unrecognised chronic liver disease. In such patients, liver ultrasound or AFP (or both) are also the first test(s) of choice and, if positive, further testing is required with CEUS, CT, or MRI.

There is no clear evidence on the benefit of surveillance programmes in terms of overall survival: the conflicting results could be a consequence of inaccurate detection, ineffective treatment, or both. Assessing the diagnostic accuracy of abdominal US and AFP serum concentration may clarify whether the absence of benefit in surveillance programs might be related to under‐diagnosis. Furthermore, an assessment of the accuracy of these two tests for diagnosing HCC is needed for either ruling out, diagnosing, or supporting further testing in people with chronic liver disease who are not included in surveillance programs.

People with previous diagnoses of, and who had previous treatments for, HCC make up a distinct group. The diagnostic accuracy for the recurrence of HCC after surgical or any other type of treatment is not the focus of this review.

This review represents the first part of a planned overall evaluation of diagnostic performances of the most commonly used modalities for diagnosing HCC in people with chronic liver disease. The present systematic review will assess the diagnostic accuracy of ultrasound and AFP serum concentration for the diagnosis of HCC. Another systematic review will focus on the diagnostic accuracy of CEUS in characterising suspected lesions as HCC as a second‐line diagnostic modality (Fraquelli 2019), and a third systematic review will focus on the assessment of CT as another second‐ or third‐line imaging modality (if CEUS was used as second‐line test) in assessing focal liver lesions detected on ultrasound (Nadarevic 2019). A review assessing the accuracy of MRI for diagnosing HCC is also in progress (Nadarevic 2020). We are planning to produce an overview of the systematic reviews that assess abdominal US and AFP, CEUS, CT, and MRI for the diagnosis of HCC.

Objectives

To assess the diagnostic accuracy of abdominal ultrasound (US) and alpha‐foetoprotein (AFP), alone or in combination, for the diagnosis of hepatocellular carcinoma (HCC) of any size and at any stage in adults with chronic liver disease, either in a surveillance programme or in a clinical setting.

Secondary objectives

  • To assess the diagnostic accuracy of abdominal US and AFP, alone or in combination, for the diagnosis of resectable HCC in people with chronic liver disease, either in a surveillance programme or in a clinical setting. The definition of resectable HCC is a neoplasm amenable to surgical radical resection according to the current guidelines (EASL‐EORTC 2012; Omata 2017; EASL 2018; Heimbach 2018), that is, a single lesion with a maximum diameter of less than 5 cm, or fewer than three lesions with a maximum diameter of 3 cm.

  • To compare the diagnostic accuracy of individual tests versus the combination of both tests.

  • To investigate the following predefined sources of heterogeneity:

    • study design (prospective compared to retrospective; case‐control studies compared to cross‐sectional cohort studies);

    • study date (studies published before the year 2000 compared to studies published after the year 2000, due to advancements in technology and changes in diagnostic criteria);

    • inclusion of participants without cirrhosis (studies including more than 10% participants without cirrhosis compared to studies including less than 10% participants without cirrhosis);

    • study location (population differences): studies conducted in North and South America compared to Europe compared to Asia and Africa;

    • prevalence of the target condition (studies with HCC prevalence more than 10% compared to studies with HCC prevalence less than 10%);

    • participant selection (participants recruited from planned surveillance programs compared to clinical cohorts);

    • different HCC stage (studies with more than 20% of participants with resectable HCC compared to studies with less than 20% of participants with resectable HCC);

    • different reference standard (histology of the explanted liver compared to liver biopsy compared to another reference standard);

    • different liver cirrhosis aetiology: studies with more than 80% participants with viral (hepatitis C or hepatitis B) chronic liver disease compared to studies with less than 80% of participants with viral chronic hepatitis;

    • different severity of the underlying chronic liver disease: studies with more than 50% of participants with MELD (model for end‐stage liver disease) score less than 15 or with Child Pugh score A compared to studies with less than 50% of participants with MELD less than 15 or Child Pugh score A.

Methods

Criteria for considering studies for this review

Types of studies

We aimed to include studies, irrespective of publication status and language, that have evaluated the diagnostic accuracy of abdominal ultrasound (US) and alpha‐foetoprotein (AFP), independently or in combination, for the diagnosis of: hepatocellular carcinoma (HCC) in people with chronic liver disease. These studies should have used one of the acceptable reference standards (see below Reference standards).

We considered for inclusion studies of cross‐sectional design including participants with clinical suspicion of HCC or cohort studies including high‐risk participants in a surveillance programme, as well as studies with a case‐control design that compared people with known HCC to a matched control (participants with chronic liver disease without evidence of HCC). We excluded studies that analysed data only per lesion, that is, those that considered the number of lesions rather than participants, unless participant data were made available by study authors.

Participants

Eligibility criteria

We included study participants aged 18 years and older, of any sex, who are diagnosed with a chronic liver disease, irrespective of the severity and duration of the disease. Study participants should have been treatment‐naive for HCC when enrolled in the respective study.

Exclusion criteria

We excluded studies which had included participants treated for HCC unless they represented less than 5% of all the included participants, or if data were presented in such a way as to allow this group of participants to be isolated from the remaining included participants.

Index tests

We included abdominal US alone, AFP alone, and a combination of abdominal US and AFP for the detection of HCC in adults with chronic liver disease. For AFP, different cut‐off values were used, ranging from 7 mg/mL to 400 mg/mL. For ultrasound (US), positive criteria include the minimum diameter of a detectable lesion and exclusion of benign criteria.

Target conditions

  • Hepatocellular carcinoma of any size and at any stage.

  • Resectable hepatocellular carcinoma (see Secondary objectives).

Reference standards

We accepted as a reference standard for the diagnosis of HCC one of the following.

  • The pathology of the explanted liver in case of transplantation.

  • The histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months to exclude the presence of focal lesions not detected by the index test and synchronous lesions from the parenchyma surrounding the resected or biopsied area.

  • Typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months in order to allow the confirmation of an initial negative result on CT or on MRI.

We acknowledge that all these reference standards, even if commonly used in clinical practice, are not perfect. The pathology of the explanted liver is possible only in the case when all the included patients undergo liver transplantation; therefore, the setting does not correspond to the clinical question as only people with advanced and decompensated liver disease are candidates for orthotopic liver transplantation. In the case of histology of resected focal lesion, histology of biopsied liver lesions, CT or MRI examination, the negative result can be confirmed only with an adequate follow‐up period. This would introduce an unavoidable differential verification bias. In addition, CT and MRI cannot be considered completely accurate.

Search methods for identification of studies

Electronic searches

We searched the Cochrane Hepato‐Biliary Group (CHBG) Controlled Trials Register and the Cochrane Hepato‐Biliary Group Diagnostic‐Test‐Accuracy Studies Register (both maintained and searched internally by the CHBG Information Specialist via the Cochrane Register of Studies Web; June 2020), the Cochrane Library (2020, Issue 6), MEDLINE Ovid (1946 to June 2020), Embase Ovid (1974 to June 2020), LILACS (Bireme; 1982 to June 2020), Science Citation Index Expanded (Web of Science; 1900 to June 2020), and Conference Proceedings Citation Index – Science (Web of Science; 1990 to June 2020; (Royle 2003)). Appendix 1 gives the search strategies with the time spans of the searches.

We applied no language or document type restrictions.

Searching other resources

We attempted to identify additional references by manually searching articles retrieved from digital databases and relevant review articles. We sought information on unpublished studies by contacting experts in the field. In addition, we handsearched abstract books from meetings of the American Association for the Study of Liver Diseases (AASLD), the European Association for the Study of the Liver (EASL), and Asia‐Paciifc Association for the study of the Liver (APASL), held over the past 10 years. We also searched for other kinds of grey literature in the System for Information on Grey Literature in Europe “OpenGrey” (www.opengrey.eu/).

Data collection and analysis

We followed available guidance as provided in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (DTA Handbook 2013).

Selection of studies

Two review authors (AC and MF) independently scrutinised half of the titles and abstracts identified by electronic literature searching to identify potentially eligible studies, and two other review authors (TN and VG) independently scrutinised the other half. We recorded any citation, identified by one of the four review authors, as potentially eligible for full‐text review. Then, two review authors (AC and TN) independently reviewed publications for eligibility. To determine eligibility, we assessed each publication to determine whether participants met the inclusion criteria detailed above. We included abstracts only if they provided sufficient data for analysis. We resolved disagreements by consensus.

Data extraction and management

We developed a standardised data extraction form and piloted the form on nine of the included studies. Based on the pilot, we finalised the form.
Then, two review authors (AC and TN) completed the data extraction form for each included study. Each review author independently retrieved study data. In cases of disagreement, we reached consensus through discussion with a third review author (GC).

We retrieved the following data.

  • General information: title, journal, year, publication status, and study design (prospective versus retrospective), surveillance program or clinical cohorts.

  • Sample size: number of participants meeting the criteria and total number of participants screened.

  • Baseline characteristics: baseline diagnosis, age, sex, and presence of cirrhosis and mean diameter of HCC.

  • Index tests with predefined positivity criteria and when appropriate all cut‐off values.

  • Target condition.

  • Order of tests.

  • Time between tests.

  • Reference standard tests.

  • Numbers of true‐positive, true‐negative, false‐positive, false‐negative, and uninterpretable index test results. We extracted these data for each presented cut‐off value and for either HCC of any size, stage, and resectable HCC.

We summarised the data from each study in 2 × 2 tables (true positive, false positive, false negative, true negative), according to the index tests considered, and we entered the data into Review Manager 5.4 software (Review Manager 2020).

Missing data

We contacted primary authors by email to request missing data: number of AFP false‐positive results (Baig 2009; Chen 2010; Abdelgawad 2013; El‐Emshaty 2014; Dengler 2017), and results of per patients analyses as only per lesions were reported in Lim 2006. We received no reply and sent a second email after two weeks. No reply was received; therefore, we excluded the above‐mentioned studies.

Assessment of methodological quality

Two review authors (AC and TN) independently assessed the risk of bias of included studies and applicability of their results using QUADAS‐2 (revised tool for quality assessment of diagnostic accuracy studies; (Whiting 2011)). In cases of disagreement, we reached consensus through discussion. We addressed aspects of study quality involving the participant spectrum, index tests, target conditions, reference standards, and flow and timing. For studies that assessed ultrasound as the index test, the visualisation of the liver can often be sub optimal due to patient characteristics; therefore, lack of reporting or exclusion of uninterpretable results from analyses could overestimate the accuracy of ultrasound. We considered the study to be at high risk of bias if uninterpretable results were excluded from the analysis. We classified a study at high risk of bias if at least one of the domains of QUADAS‐2 was judged as being at high or unclear risk of bias (Appendix 2).

Statistical analysis and data synthesis

We provided a description of the included studies by calculating median values and interquartile ranges (IQR) across studies for some characteristics of our interest, defined at study level. In particular, we considered HCC mean diameter and the prevalence of participants with the following characteristics: HCC, Child‐Pugh class A, liver cirrhosis, viral aetiology of cirrhosis, and resectable HCC.

We carried out statistical analyses according to recommendations provided in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (DTA Handbook 2013). We designed 2 × 2 tables (see Data extraction and management) for each primary study for the two index tests and for their combination. We planned the following strategy of analyses.

Alpha‐foetoprotein

Alpha‐foetoprotein (AFP) was considered positive when higher than a defined cut‐off (threshold) value was noted (Colli 2006; Marrero 2009; Lok 2010). Firstly, we performed a graphical descriptive analysis of the included studies. We presented forest plots (sensitivity and specificity separately, with their 95% confidence intervals (CIs)), and we provided a graphical representation of the studies in the receiver operating characteristic (ROC) space (sensitivity against 1 ‐ specificity). Secondly, we performed a meta‐analysis. In the case that primary studies reported accuracy estimates of AFP using different cut‐off values, we used the hierarchical summary ROC model (HSROC) in order to pool data (sensitivities and specificities) and to estimate a summary ROC (SROC) curve (Rutter 2001). When considering studies with a common cut‐off value, we used the bivariate model, and we provided estimates of summary sensitivity and specificity. We used the pooled estimates obtained from the fitted models to calculate summary estimates of positive and negative likelihood ratios (LR+ and LR‐, respectively). For primary studies reporting accuracy results for more than one cut‐off value, we reported sensitivities and specificities for all cut‐off values, but we used a single cut‐off value for each study in HSROC or bivariate analysis. The most common cut‐off values were expected to be 10, 20, 200, or 400 nanograms per millilitre (ng/mL).

Abdominal ultrasound

Abdominal ultrasound (US) was considered positive when a lesion of more than 10 mm with no definitely benign features was observed, or a new venous thrombus was detected according to defined criteria (LI‐RADS 2018). Subthreshold lesions of less than 10 mm were noted only when no definitely benign features were observed (LI‐RADS 2018). Firstly, we performed a graphical descriptive analysis of the included studies. We presented forest plots (sensitivity and specificity separately, with their 95% CIs), and we provided a graphical representation of studies in the receiver operating characteristic (ROC) space (sensitivity against 1 ‐ specificity). Secondly, we performed a meta‐analysis using the bivariate model, and we provided estimates of summary sensitivity and specificity. We used the pooled estimates obtained from the fitted models to calculate summary estimates of positive and negative likelihood ratios (LR+ and LR‐, respectively).

Uninterpretable index test results

In case of uninterpretable index test results (especially relevant for US), we performed a further analysis according to the intention‐to‐diagnose (ITD) principle (Schuetz 2012). We classified participants with uninterpretable results as false‐positive if they had a negative reference standard, or false‐negative result on a positive reference standard.

Combination of abdominal ultrasound and alpha‐foetoprotein

The index test obtained by the combination of US and AFP tests is considered positive when at least one of the two tests is positive. Firstly, we performed a graphical descriptive analysis of the included studies. We presented forest plot results (sensitivity and specificity separately, with their 95% CIs), and we provided a graphical representation of studies in the receiver operating characteristic (ROC) space (sensitivity against 1 ‐ specificity). Secondly, we performed a meta‐analysis. In the case that primary studies reported accuracy estimates of the combination of tests using different cut‐off values for AFP, we used the hierarchical summary ROC model (HSROC) to pool data (sensitivities and specificities) and to estimate a summary ROC (SROC) curve (Rutter 2001). When considering studies with a common cut‐off value, we used the bivariate model and provided estimates of summary sensitivity and specificity. We used the pooled estimates obtained from the fitted models to calculate summary estimates of positive and negative likelihood ratios (LR+ and LR). For primary studies reporting accuracy results for more than one cut‐off value, we reported sensitivities and specificities for all cut‐off values, but we used a single cut‐off value for each study in HSROC or bivariate analysis.

Comparisons

The combination of the two tests, US and AFP, was considered positive when at least one of the two tests was positive. We made pair‐wise comparisons between individual tests, and between individual tests and the index test obtained by the combination of the two tests when both tests are used, by adding a covariate for the index test to the bivariate model. We assessed the significance of differences in test accuracy by using the log‐likelihood ratio test for comparison of models with and without the index test covariate term. We included separate variance terms for sensitivity and specificity in the bivariate model for the two tests in comparison. We performed both indirect and direct comparisons when sufficient data were available. We calculated relative sensitivity (i.e. ratio between the sensitivities of the two index tests) and relative specificity (i.e. ratio between the two specificities).

We considered two‐sided P values less than 0.05, as statistically significant. We performed all statistical analyses using SAS statistical software, release 9.4 (SAS Institute Inc., Cary, NC, USA) and macro METADAS (DTA Handbook 2013).

Investigations of heterogeneity

We investigated the effects of the following predefined sources of heterogeneity.

  • Study design (case‐control compared to cross‐sectional studies, prospective compared to retrospective).

  • Study date (studies before compared to after the year 2000 due to advancements in technology and change in diagnostic criteria).

  • Inclusion of participants without cirrhosis (studies including more than 10% participants without cirrhosis compared to studies including less than 10% participants without cirrhosis).

  • Study location (population differences): studies conducted in the USA compared to Europe compared to Asia and Africa.

  • Prevalence of the target condition (studies with HCC prevalence of more than 10% compared to studies with HCC prevalence of less than 10%).

  • Participant selection (participants recruited from planned surveillance programs compared to clinical cohorts).

  • Different HCC stage (studies with more than 20% of participants with resectable HCC compared to studies with less than 20% of participants with resectable HCC).

  • Different reference standard (histology of the explanted liver compared to liver biopsy compared to another reference standard).

  • Different liver cirrhosis aetiology: studies with more than 80% participants with viral (hepatitis C or hepatitis B) chronic liver disease compared to studies with less than 80% of participants with viral chronic hepatitis.

  • Different severity of the underlying chronic liver disease: studies with more than 50% of participants with MELD (model for end‐stage liver disease) score less than 15 or with Child Pugh score A compared to studies with less than 50% of participants with MELD less than 15 or Child Pugh score A.

We estimated the above effects by adding covariates to the bivariate models. We assessed the statistical significance of the covariate effect by using the log‐likelihood ratio test for comparison of models with and without the covariate term.

Sensitivity analyses

We assessed the effects of risk of bias of the included studies on diagnostic accuracy by performing a sensitivity analysis in which we exclude studies classified as having high or unclear risk of bias in at least one of the domains of QUADAS‐2 (Appendix 2). In addition, we defined the following signalling questions as most relevant, and planned to conduct a sensitivity analyses in which we excluded studies with answers of 'no' or 'unclear'.

  • “Was a case‐control design avoided?” (i.e. was the study design clearly cross‐sectional including a series of participants at risk of with a clinical suspicion of HCC?)

  • For studies using AFP as index test: “if a threshold was used, was it pre‐specified?”; or for ultrasound as index test: “were the positivity criteria defined?”.

  • "Were all participants included in the analysis and analysed according to ITD principle (non‐evaluable results considered as false)?”

We did not perform the planned analysis excluding studies using AFP without a pre‐specified threshold as we chose to analyse the results of studies using the two most common cut‐off values of 20 ng/mL and 200 ng/mL. We did not perform the planned analysis excluding studies not reporting results obtained with ITD principle for uninterpretable results due to lack of data because only two studies reported the number of uninterpretable results.

We also conducted, as planned, a sensitivity analysis in which studies published only in abstract or letter form are excluded.

Assessment of reporting bias

In order to reduce reporting bias, we did not plan to use a filter search strategy nor to implement any language or sample limitations. We did not plan to test for publication bias due to the lack of validated methods for diagnostic test accuracy reviews.

'Summary of findings' table

We prepared 'Summary of findings' tables to present the main results and key information regarding the certainty of evidence, We assessed the certainty of evidence as recommended using the GRADE approach (Schünemann 2008; Balshem 2011; Schünemann 2016; GRADEpro GDT). We rated the certainty of evidence as either high (when not downgraded), moderate (when downgraded by one level), low (when downgraded by two levels), or very low (when downgraded by more than two levels) based on five domains: risk of bias, indirectness, inconsistency, imprecision, and publication bias. For each outcome, the certainty of evidence started as high when there were high‐quality observational studies (cross‐sectional or cohort studies) that enrolled participants with diagnostic uncertainty. If we found a reason for downgrading, we used our judgement to classify the reason as either serious (downgraded by one level) or very serious (downgraded by two levels; (Schünemann 2020a; Schünemann 2020b)).

Five authors (AC, TN, MF, VG, and GC) discussed judgments and applied GRADE In the following way.

  • Risk of bias: we used QUADAS‐2 to assess risk of bias

  • Indirectness: we assessed indirectness in relation to the population (including disease spectrum), setting, interventions, and outcomes (accuracy measures). We also used prevalence as a guide to whether there was indirectness in the population.

  • Inconsistency: we carried out prespecified analyses to investigate potential sources of heterogeneity and downgraded when we could not explain inconsistency in the accuracy estimates

  • Imprecision: we looked at the confidence intervals of sensitivity and specificity estimates and at the unexplained heterogeneity of the results

  • Publication bias: we did not evaluate publication bias due to the lack of validated methods for diagnostic test accuracy reviews

Results

Results of the search

We ran the search on 5 June 2020. We identified 45,837 records by searching the Cochrane Hepato‐Biliary Group Controlled Trials Register (n = 31), the Cochrane Hepato‐Biliary Group Diagnostic Test Accuracy Register (n = 3), the Cochrane Library (n = 958), MEDLINE Ovid (n = 12,856), Embase Ovid (n = 22,264), LILACS (n = 351), and Science Citation Index Expanded and Conference Proceedings Citation Index – Science (both Web of Science) (n = 9374). We retrieved seven additional records through handsearching. After exclusion of 11,347 duplicates, 34,497 records remained for possible eligibility. After reading the title and the abstract of these records, we excluded 33,932 of them, as they did not meet the inclusion criteria. We retrieved full texts of the remaining 565 records, and after reading the full texts, we excluded 219 studies for various reasons (Figure 1; Characteristics of excluded studies). In particular, we excluded 109 studies not reporting data or reported only incomplete data on the accuracy of the index tests, 54 studies comparing participants with hepatocellular carcinoma (HCC) with healthy participants or including healthy participants in the control arm, and not reporting the results of the comparison of participants with HCC and participants with chronic liver disease, 31 reporting no original data on the index tests, 10 studies including participants with treated HCC and suspected recurrences, seven studies reporting only per lesion analyses, seven studies not conducted in people with chronic liver disease, and one study (Heyward 1985) reporting preliminary data fully reported in an included study (McMahon 2000). Fourteeen full‐text articles were translated from non English languages, but then excluded (Del Vecchio‐Blanco 1977; Aburano 1979; Mebazaa 1985; Salmi 1988; Luning 1991; Sakai 1991; Biwole Sida 1992; Bago 1993; Carriere 1993; Ding 1995; Beaugrand 2000; Baumgarten 2001; Ben Hassine 2007; Gao 2012).


Study flow diagram 
Date of search: 5 June 2020

Study flow diagram
Date of search: 5 June 2020

Finally, we included in our review 346 records reporting data on 373 studies (Figure 1), including as a whole 168,816 participants, with a percentage of males ranging from 40% to 100% and age ranging from 14 to 97 years. Thirteen papers reported multiple studies in different populations that we quoted and analysed separately as 22 studies (Wang 2013a; Wang 2013b; Wang 2014a; Wang 2014b; Wong 2014a; Wong 2014b; da Costa 2015a; da Costa 2015b: da Costa 2015c; da Costa 2015d; Li 2016b, Li 2016c; Tayob 2016a; Tayob 2016b; Wang 2016a, Wang 2016b; Wang 2016c; Wang 2016d; Wang 2016e; Luo 2018a; Luo 2018b;Luo 2018c). We translated six studies from non‐English languages in order to include them in this review (Mauduit Astolfi 1987; Buffet 1988; Garretti 1988; Lee 2004; Kim 2006c; Kim 2006b). Concerning the direction of data collection, 77% (288/373) of the studies were retrospective.

We included 326 studies that assessed alpha‐foetoprotein (AFP) as the index test in 144,570 participants; 39 studies that assessed abdominal ultrasound (US) in 18,792 participants; eight studies that assessed both AFP and abdominal US as the index tests in 5454 participants. The studies were conducted since 1971 for AFP, 1983 for abdominal US, and 1988 for the combination of AFP and US.

We reported in the Characteristics of included studies tables the main characteristics of the 373 studies. Investigators reported 19 studies only in abstract form, of which 17 with AFP as the index test (Song 2011; Cheng 2012; Kim 2012; Chan 2013; Unic 2013; Min 2014; Raff 2014; Khairy 2015; El‐Serag 2017; Omar 2017; Park 2017b; Tsai 2017; Zheng 2017; Aboelfotoh 2018; Iyer 2018; Loglio 2018; Talkahn 2018), one with abdominal US as index test (Raff 2014), and one with both AFP and US as index tests (Raff 2014).

Of the 373 included studies, 190 were conducted in Asia, 66 in Europe, 57 in Africa, 55 in North and South America, and six were collaborative studies in two or three continents. Seventy‐seven studies were conducted in the context of a surveillance program, and 297 studies in participants with clinical suspicion of having an HCC. Two hundred and eighty‐eight studies were conducted retrospectively and 86, prospectively. Three hundred and eight studies used a mix of radiological imaging with or without histology as reference standard, 49 used only histology, and 17 used pathology of the explanted liver.

Methodological quality of included studies

We have reported in detail results of the quality assessment of included studies in the Characteristics of included studies tables, and we have summarised this information in Figure 2 and Appendix 3.


Risk of bias and applicability concerns graph: review authors' judgements about each domain presented as percentages across included studies

Risk of bias and applicability concerns graph: review authors' judgements about each domain presented as percentages across included studies

Patient selection domain

Two hundred fifty‐nine studies had a case‐control design, 108 a cross‐sectional design, and six a nested case‐control design (Wong 2009; Lok 2010; Wang 2016d; Yu 2016; Choi 2019; Tayob 2019).

Alpha‐foetoprotein (AFP)

Risk of bias: we judged 291 of 326 studies assessing the accuracy of AFP, with any cut‐off, to be at high risk of bias. The most common reason was the case‐control design (256 studies). Among the 70 cross‐sectional studies,40 were judged to be at high risk of bias for inappropriate exclusion or for non‐consecutive enrolment of participants. Seventeen studies were at low risk of bias in this domain (Arrigoni 1988; Cottone 1988; Sherman 1995; Chalasani 1999; Gambarin‐Gelwan 2000; Ishii 2000; Tong 2001; Matievskaya 2003; Lee 2004; Sterling 2009; Song 2011; Singal 2012; Sterling 2012; Tayob 2016a; Tayob 2016b; Wang 2016b; Choi 2019). Among the 147 studies using 20 ng/mL as a cut‐off value, we judged 129 studies to be at high risk of bias and 12 at unclear risk of bias; among the 56 studies using AFP with a cut‐off value of 200 ng/mL, we judged 48 studies to be at high risk of bias.

Applicability: we judged 273 studies to be at high concern because study participants were highly selected on the basis of aetiology or severity of the liver diease and HCC characteristics. Among the 147 studies using 20 ng/mL as a cut‐off value, we judged 115 studies to be at high concern; among the 56 studies using AFP with a cut‐off value of 200 ng/mL, we judged 47 studies to be at high concern.

Abdominal ultrasound (US)

Risk of bias: 21 of the 39 studies assessing the accuracy of abdominal US were judged to be at high risk of bias: three studies were case‐control studies (Powell‐Jackson 1987; Jalli 2015; Yang 2019), and the remaining 18 were cross‐sectional studies. The risk of bias was judged as high because of inappropriate exclusion or for non‐consecutive enrolment of participants. Two studies were judged to be at unclear risk of bias for the latter domain as they did not report any exclusion criteria (Pateron 1994; Atiq 2017).

Applicability: we judged 22 studies at high concern as participants were highly selected on the basis of aetiology or severity of the liver diease and HCC characteristics.

Combination of AFP and abdominal (US)

Risk of bias: of the eight studies assessing the accuracy of the combination of AFP and abdominal US, three studies were judged at high risk of bias for inappropriate exclusion or for non‐consecutive enrolment of participants (Buffet 1988; Chang 2015; Ungtrakul 2016). Chang 2015 and Ungtrakul 2016 used AFP with a cut‐off of 20 ng/mL. All the eight studies were cross‐sectional.

Applicability: we judged two studies to be at high concern, both of which with AFP cut‐off value of 20 ng/mL, as only participants with severe liver disease on waiting list for orthotopic liver transplantation were included (Ungtrakul 2016; Gambarin‐Gelwan 2000).

Index tests domain

Alpha‐foetoprotein (AFP)

Risk of bias: we judged a total of 196 studies to be at high risk of bias. In 128 studies, no pre‐definition of a cut‐off value was reported. In 122 studies, the result of AFP measurement was interpreted knowing the result of the reference standard, and in 47 studies, it was unclear. Among the 147 studies using 20 ng/mL as a cut‐off value, we judged 73 studies to be at high risk of bias; among the 56 studies using AFP wit a cut‐off value of 200 ng/mL, we judged 54 studies to be at high risk of bias.

Applicabilty: we judged 10 studies to be at high concern due to variations in test technology, execution or interpretation (Alpert 1971; Giannelli 2005; Tan 2014; Wang 2014b; Wang 2016b; Wang 2016c; Wang 2016d; Wang 2016e; Wang 2019a; Sun 2020). All the studies using AFP with a cut‐off value of 20 ng/mL or 200 ng/mL were at low concern.

Abdominal ultrasound (US)

Risk of bias: we judged 16 studies to be at high risk of bias as no definition of positivity criteria was reported (Okazaki 1984; Tanaka 1986; Cottone 1988; Garretti 1988; Tremolada 1989; Saada 1997; Yu 2011; Raff 2014; Chang 2015; Jalli 2015; Pinero 2015; Atiq 2017; Choi 2019; Kim 2019b; Kudo 2019; Yang 2019).

Applicability: we judged all the 39 studies to be at low concern.

Combination of AFP and abdominal US

Risk of bias: we judged three studies, two with a cut‐off value of 20 ng/mL (Tremolada 1989; Kim 2019b), and one with a cut‐off value of 5 ng/mL (Choi 2019) to be at high risk of bias as no definition of US positivity criteria was reported.

Applciability: we judged all eight studies to be at low concern.

Reference standard domain

Alpha‐foetoprotein (AFP)

Risk of bias: we judged 174 studies to be at high risk of bias. In 105 studies with a case‐control design, the reference standard was not adequate to exclude the presence of HCC, and in 24 studies, authors reported only how they assessed the presence of a chronic liver disease without any information concerning the target disease. In 100 studies, the reference standard was interpreted knowing the results of the index test, and in 43 studies we judged the available information to be insufficient.

Applicability: we judged 55 studies to be at high concern as pathological examination of explanted liver, or of surgical specimen, or necroscopy, or technologies that were no longer in use, were required to confirm the presence of HCC.

Abdominal ultrasound (US)

Risk of bias: we judged 23 studies to be at high risk of bias. In 20 studies, the reference standard was interpreted knowing the results of the index test, and in 11 studies the reference standard was judged to be inadequate to exclude the absence of HCC.

Applicability: we judged 13 studies to be at high concern as pathological examination of explanted liver, or of surgical specimen, or necroscopy, or technologies no longer in use, were required to confirm the presence of HCC.

Combination of AFP and abdominal US

Risk of bias: we judged five studies to be at high risk of bias, four using AFP with cut‐off 20 ng/mL (Tremolada 1989; Singal 2012; Ungtrakul 2016; Kim 2019b) and one with a cut‐off of 250 ng/mL (Buffet 1988). In these studies, the reference standard was interpreted knowing the results of the index test and was judged inadequate to exclude the absence of HCC.

Applicability: we judged two studies to be at high concern as the reference standard was the pathological examination of explanted liver (Gambarin‐Gelwan 2000) or histology and arteriography (Buffet 1988). Of these two studies, Gambarin‐Gelwan 2000 used AFP with a cut‐off value of 20 ng/mL.

Flow and timing domain

Alpha‐foetoprotein (AFP)

Risk of bias: we judged 263 studies to be at high risk of bias. In 259 studies, participants did not receive the same reference standard. In six studies, the time interval between the index test and the reference standard was judged to be too long, whereas in other 305 studies, this information was not reported.

Abdominal ultrasound (US)

Risk of bias: we judged at high risk of bias 27 studies: in 22 studies participants did not receive the same reference standard. In six studies, the time interval between the index test and the reference standard was judged to be too long, whereas in other 25 studies, this information was not reported. Two studies reported the proportion of uninterpretable results (Atiq 2017, 56/523 and Maringhini 1988, 28/363), allowing an analysis according to the intention‐to‐diagnose principle, and another study included in the analyses uninterpretable results (Chang 2015).

Combination of AFP and abdominal US

Risk of bias: we judged six studies to be at high risk of bias (Buffet 1988; Tremolada 1989; Singal 2012; Chang 2015; Ungtrakul 2016; Kim 2019b). In five studies, participants did not receive the same reference standard,and in five studies, there was no information on the time interval between the index test and the reference standard. We judged one study to be at unclear risk of bias (Gambarin‐Gelwan 2000), and one study to be at low risk of bias (Choi 2019). Of the six studies using AFP with cut‐off 20 ng/mL, five were at high risk of bias and one at unclear risk of bias.

Overall assessment

As shown in Figure 2, we judged 304 studies at high risk of bias and 13 studies at unclear risk for the patient selection domain. For the index test domain, 196 studies with AFP were judged at high risk of bias and 23 at unclear risk; 16 studies with US were judged at high risk, and three studies with combination of AFP and US were judged at high risk. For the reference standard domain, 184 studies were judged at high risk of bias and 47 at unclear risk. For the flow and timing domain, 276 studies were judged at high risk of bias and 53 at unclear risk. We classified a study as having a high risk of bias if at least one of the domains of QUADAS‐2 was judged as being at high or unclear risk of bias (Methods). We judged only one study to be at low risk of bias (Bennett 2002): this study was retrospectively conducted in a series of consecutive participants who underwent liver transplantation. The index test was abdominal US performed according to predefined positivity criteria and performed less than 90 days earlier, and the reference standard was the pathological examination of the explanted liver.

Concerning applicability, for the patient selection domain we judged at high concern 289 studies; for the index test domain 10 studies using AFP were judged at high concern, none using US or combination of AFP and US; for the reference standard domain 60 studies were judged at high concern and 10 at unclear concern.

Findings

Alpha‐foetoprotein (AFP)

Description of the included studies

Three hundred and twenty‐six studies with 144,570 participants provided data assessing serum alpha‐foetoprotein (AFP) measurement for the diagnosis of HCC. The median prevalence of the target disease was 50% (interquartile range (IQR) 33% to 59%). When considering only the 70 cross‐sectional studies, the median prevalence was 16% (IQR 9% to 33%). The cut‐off values ranged from 5 ng/mL to 1000 ng/mL. The median prevalence of cirrhosis was 100% (IQR 73% to 100%). The median of the proportion of participants in Child‐Pugh class A was 61% (IQR 38% to 82%) while the median proportion of participants with viral aetiology was 100% (IQR 76% to 100%). The median proportion of resectable HCC was 57% (IQR 34% to 91%) and the median of the mean HCC diameter across studies was 29.5 mm (IQR 20.5 mm to 46 mm). The studies were conducted from 1971 to 2020. Considering study location, 174 studies were conducted in Asia, 57 in Africa, 52 in Europe, 39 in North and South America, and four in more than one continent. Fifty studies were conducted in the context of a surveillance programme for HCC and 276 in a clinical setting.

Pooled results

Appendix 4 shows a forest plot of sensitivity and specificity with their 95% confidence intervals (CIs), and Figure 3 shows a graphical representation of studies in the receiver operating characteristic (ROC) space (sensitivity against 1 ‐ specificity). We performed a meta‐analysis using the hierarchical summary ROC model (HSROC) as the primary studies reported accuracy estimates of AFP using different cut‐off values (Figure 3).


Summary receiver operating characteristic (ROC) comparing in 326 studies alpha‐foetoprotein serum measurement with any cut‐off value and different reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

Summary receiver operating characteristic (ROC) comparing in 326 studies alpha‐foetoprotein serum measurement with any cut‐off value and different reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

We then carried out two meta‐analyses that included only studies that reported a cut‐off value of 20 ng/mL or 200 ng/mL (the most used values).

AFP cut‐off value around 20 ng/mL

Description of the included studies

One hundred forty seven studies with 52,144 participants provided data using a cut‐off value of around 20 ng/mL (from 19 to 21 ng/mL). Five studies were published only in abstract form; 111 were case‐control studies. The median prevalence of HCC across studies was 50% (IQR 33% to 63%). When considering only the 32 cross‐sectional studies, the median prevalence was 11% (IQR 7% to 20%). The median proportion of participants with liver cirrhosis was 100% (data reported by 96 studies, IQR 75% to 100%), and the median prevalence of participants in Child‐Pugh class A was 67% (51 studies, IQR 43% to 82%). The median proportion of participants with viral aetiology of cirrhosis was 97% (119 studies, IQR 78% to 100%) and the median of mean HCC across studies diameter was 27 mm (20 studies, IQR 22.5 to 46.5 mm). Finally, the median of participants with resectable HCC was 59% (29 studies, IQR 42% to 87%). The studies were conducted from 1982 to 2020. Considering study location, 98 were conducted in Asia, 22 in Europe, 7 in Africa, 19 in North and South America, and one in three continents. Thirty studies were conducted in the context of a surveillance programme for HCC and 117 in a clinical setting. The sensitivity varied from 25% to 90% (IQR from 53% to 67%) and the specificity from 35% to 100% (IQR from 76% to 90%; Figure 4).


Forest plots of sensitivity and specificity of alpha‐foetoprotein with a cut‐off value around 20 ng/mL against different reference standards in 147 studies ordered by study design, setting and increasing HCC prevalence. Reference standards were: the pathology of the explanted liver in case of transplantation, the histology of resected focal liver lesions, or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line).

Forest plots of sensitivity and specificity of alpha‐foetoprotein with a cut‐off value around 20 ng/mL against different reference standards in 147 studies ordered by study design, setting and increasing HCC prevalence. Reference standards were: the pathology of the explanted liver in case of transplantation, the histology of resected focal liver lesions, or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line).

Pooled results

By using the bivariate model, we obtained the following pooled estimates: sensitivity 60% (95% CI 58% to 62%), specificity 84% (95% CI 82% to 86%), LR+ 3.84 (95% CI 3.39 to 4.33), LR‐ 0.48 (95% CI 0.45 to 0.50; Figure 5).


Summary receiver operating characteristic (ROC) comparing alpha‐foetoprotein with a cut‐off value around 20 ng/mL (black circles) and alpha‐foetoprotein with a cut‐off value around 200 ng/mL (red diamonds) against the same reference standards.
Reference standards were: the pathology of the explanted liver in case of transplantation;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.The solid circles represent the summary estimates of sensitivity and specificity for AFP cut‐off around 20 ng/ml (black circle) and AFP cut off 200 ng/ml (red circle).The dotted lines represent the 95% confidence regions. The dashed lines represent the 95% prediction regions.

Summary receiver operating characteristic (ROC) comparing alpha‐foetoprotein with a cut‐off value around 20 ng/mL (black circles) and alpha‐foetoprotein with a cut‐off value around 200 ng/mL (red diamonds) against the same reference standards.
Reference standards were: the pathology of the explanted liver in case of transplantation;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

The solid circles represent the summary estimates of sensitivity and specificity for AFP cut‐off around 20 ng/ml (black circle) and AFP cut off 200 ng/ml (red circle).

The dotted lines represent the 95% confidence regions. The dashed lines represent the 95% prediction regions.

In the 30 studies conducted in a surveillance programme, the pooled sensitivity was 54% (95% CI 59% to 63%) and specificity 83% (95% CI 84% to 85%); in the 117 studies conducted in a clinical setting, the pooled sensitivity was 61% (95% CI 59% to 63%) and the specificity 83% (95% CI 84% to 85%; Figure 6).


Summary receiver operating characteristic (ROC) comparing the results of studies conducted in different settings, surveillance programs (black circles) and clinical setting (red diamonds) against the same reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.The solid circles represent the summary estimates of sensitivity and specificity for surveillance setting (black circle) and clinical suspect setting (red circle).The dotted lines represent the 95% confidence regions. The dashed lines represent the 95% prediction regions.

Summary receiver operating characteristic (ROC) comparing the results of studies conducted in different settings, surveillance programs (black circles) and clinical setting (red diamonds) against the same reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

The solid circles represent the summary estimates of sensitivity and specificity for surveillance setting (black circle) and clinical suspect setting (red circle).

The dotted lines represent the 95% confidence regions. The dashed lines represent the 95% prediction regions.

We assessed the diagnostic accuracy for resectable HCC as a secondary objective. We found six studies with 1722 participants with more than 90% of participants with resectable HCC (Nomura 1996; Nomura 1999; Gambarin‐Gelwan 2000; Shen 2012b; Tan 2012; Song 2014). By using the bivariate model, the sensitivity was 65% (95% CI 62% to 68%), the specificity 80% (95% CI 59% to 91%), LR+ 3.2 (95% CI 1.4 to 7.2) and LR‐ 0.44 (95% CI 0.34 to 0.56).

Heterogeneity analysis

We investigated heterogeneity while considering studies with AFP cut‐off values around 20 ng/mL. Table 2 shows the comparisons of different predefined subgroups. The estimates of sensitivity and specificity were different only for the comparison of studies including participants recruited from planned surveillance programs compared to clinical cohorts (P = 0.005).

Open in table viewer
Table 2. Heterogeneity and sensitivity analyses for alpha‐foetoprotein (AFP) cut‐off value around 20 ng/mL

Subgroup

N of studies

Sensitivity (95% CI)

Specificity (95% CI)

P value

All

147

60% (58% to 62%)

84% (82% to 86%)

case‐control

111

60% (58% to 62%)

83% (81% to 85%)

0.133

cross‐sectional

36

57% (52% to 62%)

88% (84% to 91%)

prospective

29

59% (54% to 63%)

86% (81% to 90%)

0.828

retrospective

118

60% (58% to 62%)

84% (82% to 86%)

before 2000

22

65% (59% to 71%)

85% (81% to 88%)

0.264

after 2000

125

59% (57% to 61%)

84% (82% to 86%)

cirrhosis > 10%

94

59% (56% to 61%)

85% (82% to 87%)

§

cirrhosis < 10%

2

61% (51% to 70%)*

57% (50% to 63%)**

87% (84% to 90%)*

83% (74% to 90%)**

Europe

22

60% (54% to 65%)

87% (83% to 90%)

0.447

America

19

56% (50% to 61%)

89% (85% to 92%)

Asia

98

60% (58% to 62%)

83% (80%to 86%)

Africa

7

68% (54% to 80%)

81% (71% to 89%)

HCC prevalence < 10%

16

54% (47% to 62%)

89% (84% to 93%)

0.147

HCC prevalence > 10%

131

60% (58% to 62%)

84% (81% to 86%)

clinical suspect

117

61% (59% to 63%)

83% (80% to 85%)

0.005

surveillance

30

54% (49% to 60%)

89% (86% to 92%)

HCC resectable < 20%

4

61% (48% to 72%)

82% (64% to 92%)

0.909

HCC resectable > 20%

25

56% (51% to 61%)

87% (81% to 91%)

biopsy

22

63% (58% to 68%)

82% (77% to 87%)

0.832

other reference standard

124

59% (57% to 61%)

85% (82% to 87%)

viral < 80%

35

59% (55% to 63%)

87% (83% to 90%)

0.694

viral > 80%

84

59% (57% to 62%)

84% (81% to 86%)

Child A < 50%

17

59% (52% to 67%)

86% (82%to 89%)

0.746

Child A > 50%

34

59% (55% to 62%)

83% (77% to 87%)

Full text

142

60% (58% to 62%)

84% (82% to 86%)

* Hallager 2018 ; ** Liu 2017

§ Model failed to converge

HCC: hepatocellular carcinoma

Sensitivity analysis

When considering only the 36 studies with a cross‐sectional design, we obtained an AFP sensitivity of 57% (95% CI 52% to 62%) and specificity of 88% (95% CI 84% to 91%; Table 2). When considering the 142 studies published in full text, we obtained an AFP sensitivity of 60% (95% CI 58% to 62%) and specificity of 84% (95% CI 82% to 86%; Table 2). We did not perform the remaining sensitivity analyses as all studies were judged to be at high risk of bias, and no study reported uninterpretable results.

AFP cut‐off value 200 ng/mL

Description of the included studies

Fifty‐six studies with 20,452 participants provided data using a cut‐off value of 200 ng/mL.

Two studies were published only in abstract form, 42 were case‐control studies. The median prevalence of HCC was 51% (IQR 34% to 63%). When considering only the 14 cross‐sectional studies, the median prevalence was 21% (IQR 9% to 34%). The median proportion of participants with liver cirrhosis was 100% (data reported by 41 studies, IQR 92% to 100%) and the median prevalence of Child‐Pugh class A participants was 47% (24 studies, IQR 32% to 77%); the median proportion of participants with viral aetiology of cirrhosis was 100% (41 studies, IQR 79% to 100%); the median of the mean HCC diameter across studies was 31 mm (10 studies, IQR 20 mm to 42 mm). The median prevalence of resectable HCC was 51% (10 studies, IQR 36% to 73%). The studies were conducted from 1988 to 2018. Considering study location, 31 studies were conducted in Asia, nine in Africa, nine in North and South America, and eight in Europe, Seven studies were conducted in the context of a surveillance programme for HCC and 49 in a clinical setting. Sensitivity varied from 4% to 83% (IQR 23% to 50%) and specificity from 87% to 100% (IQR from 97% to 100%; Appendix 5).

Pooled results

By using the bivariate model, we obtained the following estimates: sensitivity 36% (95% CI 31% to 41%), specificity 99% (95% CI 98% to 100%), LR+ 35.9 (95% CI 22.2 to 57.9) LR‐ 0.64 (95% CI 0.60 to 0.695; Figure 5).

We assessed the diagnostic accuracy for resectable HCC as a secondary objective. We found only two studies with more than 90% of participants with resectable HCC, preventing a meta‐analysis of their results: Nomura 1996, with 128 participants, reported a sensitivity of 4% (95% CI 0% to 19%) and a specificity of 100% (95% CI 96% to 100%) and Sassa 1999, with 195 participants, reported a sensitivity of 8% (95% CI 3% to 18%) and a specificity of 100% (95% CI 97% to 100%).

Heterogeneity analysis

We investigated heterogeneity while considering studies with AFP cut‐off value of 200 ng/mL. Table 3 shows the comparisons of different predefined subgroups. The estimates of sensitivity and specificity were different for the comparison of studies conducted in different continents; and also for studies including more than 50% of participants in Chil‐Pugh class A compared to studies including less than 50% in Child‐Pugh class A.

Open in table viewer
Table 3. Heterogeneity and sensitivity analyses for alpha‐foetoprotein (AFP) cut‐off value around 200 ng/mL

Subgroup

N of studies

Sensitivity (95% CI)

Specificity (95% CI)

P value

All

56

36% (31% to 41%)

99% (98% to 100%)

case‐control

42

35% (30% to 40%)

99% (98% to 100%)

0.874

cross‐sectional

14

39% (28% to 51%)

99% (98% to 100%)

prospective

9

42% (27% to 58%)

99% (97% to 100%)

0.713

retrospective

47

35% (30% to 40%)

99% (98% to 100%)

before 2000

9

28% (15% to 47%)

100% (98% to 100%)

0.336

after 2000

47

37% (33% to 42%)

99% (98% to 100%)

cirrhosis > 10%

41

40% (28% to 40%)

99% (99% to 100%)

cirrhosis < 10%

0

Europe

8

40% (28% to 54%)

99% (98% to 100%)

0.020

America

9

27% (21% to 35%)

100% (98% to 100%)

Asia

31

34% (29% to 40%)

98% (97% to 99%)

Africa

8

53% (39% to 66%)

99% (97% to 100%)

HCC prevalence < 10%

5

30% (16% to 48%)

100% (95% to 100%)

0.805

HCC prevalence > 10%

51

36% (32% to 41%)

99% (98% to 99%)

clinical suspect

49

36% (32% to 41%)

99% (98% to100%)

0.995

surveillance

7

34% (18% to 54%)

99% (96% to 100%)

HCC resectable < 20%

2

42% (8% to 85%)

99% (82% to 100%)

0.931

HCC resectable > 20%

8

27% (12% to 50%)

99% (97% to 100%)

biopsy

9

31% (24% to 39%)

100% (97% to 100%)

0.140

other reference standard

46

37% (32% to 43%)

99% (98% to 100%)

viral < 80%

11

37% (29% to 46%)

99% (97% to 100%)

0.705

viral > 80%

30

32% (26% to 39%)

98% (98% to 100%)

Child A < 50%

13

42% (31% to 54%)

99% (99% to 100%)

0.008

Child A > 50%

11

24% (19% to 29%)

99% (97 to 100%)

Full text

54

36% (31% to 41%)

99% (98% to 100%)

HCC: hepatocellular carcinoma

Sensitivity analysis

When considering only the 14 studies with a cross‐sectional design, we obtained an AFP sensitivity of 39% (95% CI 28% to 51%) and a specificity of 99% (95% CI 98% to 99%; Table 3).

When considering the 54 studies published in full text and excluding the two published in abstract form, we obtained an AFP sensitivity of 36% (95% CI 31% to 41%) and a specificity of 99% (95% CI 98% to 100%; Table 3),

We did not perform the remaining sensitivity analyses as all studies were judged to be at high risk of bias, and no study reported uninterpretable results.

Abdominal ultrasound (US)
Description of the included studies

Thirty‐nine studies with 18,792 participants provided data assessing abdominal ultrasound (US) for the diagnosis of HCC.

The median prevalence of the target disease was 15% (interquartile range 8% to 31%). When considering the 36 cross‐sectional studies, the median prevalence of HCC was 15% (IQR 9% to 25%). All included participants had hepatic cirrhosis. The median prevalence of Child‐Pugh class A participants was 69% (14 studies, IQR 30% to 81%), and the median proportion of participants with viral aetiology was 60% (26 studies, IQR 40% to 84%). The median proportion of participants with resectable HCC was 76% (20 studies, IQR 40% to 95%) and the median of the mean diameter across studies was 24 mm (17 studies, IQR 20.5 mm to 31 mm). The studies were conducted from 1983 to 2020. Considering study location, 13 studies were conducted in North and South America, 13 in Asia, 12 in Europe, and one in three continents. Twenty studies were conducted in the context of a surveillance program for HCC and 19 in participants with clinical suspected HCC.

Pooled results

Figure 7 shows the forest plot of sensitivity and specificity with their 95% CIs, and Figure 8 shows a graphical representation of studies in the receiver operating characteristic (ROC) space (sensitivity against 1 ‐ specificity). Sensitivity ranged from 28% to 100% (IQR 44% to 89%) and specificity from 43% to 100% (IQR 86% to 96%). We performed a meta‐analysis using the bivariate model, as the index test results are dichotomous (i.e. positive or negative) without a threshold. We obtained the following estimates: sensitivity 72% (95% CI 63% to 79%), specificity 94% (95% CI 91% to 96%), LR+ 12.5 (95% CI 8.6 to 18.25), LR‐ 0.29 (95% CI 0.22 to 0.39).


Forest plots of sensitivity and specificity of ultrasound against different reference standards.in 39 studies. Reference standards were: the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months. TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line).The individual studies are ordered by study design (cross‐sectional or case‐control), study setting (clinical setting or surveillance program) and increasing sensitivity.

Forest plots of sensitivity and specificity of ultrasound against different reference standards.in 39 studies. Reference standards were: the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months. TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line).The individual studies are ordered by study design (cross‐sectional or case‐control), study setting (clinical setting or surveillance program) and increasing sensitivity.


Summary receiver operating characteristic (ROC) comparing, in 39 studies, ultrasound and different reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.The solid circle represents the summary estimate of sensitivity and specificity.The dotted lines represent the 95% confidence regions. The dashed lines represent the 95% prediction regions.

Summary receiver operating characteristic (ROC) comparing, in 39 studies, ultrasound and different reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

The solid circle represents the summary estimate of sensitivity and specificity.The dotted lines represent the 95% confidence regions. The dashed lines represent the 95% prediction regions.

We assessed the diagnostic accuracy for resectable HCC as a secondary objective. We found seven studies (2163 participants) with more than 90% with resectable HCC (Dodd 1992; Gambarin‐Gelwan 2000; Kim 2001; Villacastin Ruiz 2016; Choi 2019; Kudo 2019; Park 2020). By using the bivariate model, the pooled sensitivity was 53% (95% CI 38% to 67%), specificity 96% (95% CI 94% to 97%), LR+ 12.3 (95% CI 7.7 to 19.5), LR‐ 0.5 95% CI 0.36 to 0.68).

Heterogeneity analysis

We investigated heterogeneity while considering studies using US as the index test and found no difference between the prespecified subgroups (Table 4).

Open in table viewer
Table 4. Heterogeneity and sensitivity analyses for ultrasonography (US)

Subgroup

N of studies

Sensitivity (95% CI)

Specificity (95% CI)

P value

All

39

72% (63% to 79%)

94% (91% to 96%)

case‐control

3

82% (64% to 92%)

87% (77% to 93%)

0.737

cross‐sectional

36

71% (62% to 79%)

95% (92% to 97%)

prospective

18

72% (60% to 81%)

94% (90% to 96%)

1.000

retrospective

21

72% (58% to 82%)

94% (89% to 97%)

before 2000

16

79% (70% to 86%)

96% (92% to 98%)

0.091

after 2000

23

67% (54% to 78%)

93% (88% to 96)

cirrhosis > 10%

33

70% (60% to 78%)

94% (91% to 96%)

cirrhosis < 10%

0

Europe

12

82% (73% to 89%)

94% (90% to 97%)

0.186

America

13

57% (45% to 68%)

94% (89% to 96%)

Asia

13

76% (58% to 88%)

94% (85% to 98%)

Africa

0

HCC prevalence < 10%

15

69% (54% to 81%)

96% (92% to 98%)

0.660

HCC prevalence > 10%

24

74% (62% to 82%)

93% (88% to 96%)

clinical suspect

19

74% (61% to 84%)

93% (89% to 96%)

0.898

surveillance

20

69% (57% to 79%)

95% (91% to 98%)

HCC resectable < 20%

4

90% (75% to 97%)

82% (60% to 94%)

0.088

HCC resectable > 20%

16

66% (52% to 77%)

95% (91% to 97%)

biopsy

7

81% (64% to 91%)

90% (84% to 94%)

0.379

OLT

10

55% (41% to 69%)

97% (93% to 96%)

other reference standard

22

76% (64% to 84%)

94% (89% to 97%)

viral < 80%

17

70% (57% to 80%)

94% (90% to 96%)

0.777

viral > 80%

9

79% (58% to 91%)

91% (79% to 97%)

Child A < 50%

5

50% (33% to 68%)

91% (83% to 95%)

0.346

Child A > 50%

9

74% (52% to 88%)

93% (82 to 98%)

US positivity criteria predefined

25

74% (63% to 83%)

93% (89% to 96%)

Uninterpretable test results reported

3

80% (71% to 81%)

76% (71% to 81%)

Full text

38

72% (64% to 80%)

94% (91% to 96%)

OLT: orthotopic liver transplantation; HCC: hepatocellular carcinoma

Sensitivity analysis

When considering only the 36 studies with a cross‐sectional design, we obtained a pooled sensitivity of 71% (95% CI 62% to 79%) and a specificity of 95% (95% CI 92% to 97%; Table 4).

When considering only the 25 studies that prespecified the positivity criteria, we obtained a pooled sensitivity of 74% (95% CI 63% to 83%) and a specificity of 93% (95% CI 89% to 96%; Table 4).

When considering only the three studies reporting uninterpretable results with intention‐to‐diagnose analysis, we obtained a sensitivity of 80% (95% CI 71% to 81%) and a specificity of 76% (95% CI 71% to 81%).

When considering the 38 studies published in full text and excluding the two published studies in abstract form, we obtained sensitivity of 72% (95% CI 64% to 80%) and specificity of 94% (95% CI 91% to 96%).

Combination of AFP and US
Description of the included studies

Eight studies with 5454 participants provided data assessing the combination of measurement of serum AFP and abdominal US for the diagnosis of HCC.

All studies considered positive the combination of the two tests when at least one was positive. The median prevalence of the target disease was 16% (IQR 9% to 17%). The median proportion of participants with liver cirrhosis was 100% (data reported by eight studies: in six studies it was 100%, in one study it was 93%, and in another study it was 53%). The median prevalence of participants with Child‐Pugh class A was 86% (data reported by four studies, IQR 60% to 96%) and the median prevalence of participants with viral aetiology was 84% (six studies, IQR 44% to 88%). The median proportion of resectable HCC was 76% (six studies, IQR 59% to 91%), and the mean diameter was 24 mm (four studies, IQR 18.5 to 31.5 mm). The studies were conducted from 1988 to 2019. Considering study location, three studies were conducted in North and South America, three in Asia, one in Europe, and one in three continents. Seven studies were conducted in the context of a surveillance programme for HCC and two studies in participants with the clinical suspected HCC.

Figure 9 shows the forest plot of sensitivity and specificity with their 95% CIs, and Appendix 6 shows a graphical representation of studies in the receiver operating characteristic (ROC) space (sensitivity against 1 ‐ specificity). Considering only the six studies (5,044 participants) which used for AFP a cut‐off value of 20 ng/mL, we performed a meta‐analysis using the bivariate model and we obtained the following pooled estimates: sensitivity 96% (95% CI 88% to 98%), specificity 85% (95% CI 73% to 93%), LR+ 6.5 (95% CI 3.5 to 12.0) and LR‐ 0.05 (95% CI 0.02 to 0.14; (Tremolada 1989; Gambarin‐Gelwan 2000; Singal 2012; Chang 2015; Ungtrakul 2016; Kim 2019b)).


Forest plots of sensitivity and specificity of the combination of alpha‐foetoprotein and ultrasound against different reference standards in 8 studies ordered by increasing sensitivity.Rreference standards were: the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months. TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line)

Forest plots of sensitivity and specificity of the combination of alpha‐foetoprotein and ultrasound against different reference standards in 8 studies ordered by increasing sensitivity.Rreference standards were: the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months. TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line)

We assessed the diagnostic accuracy for resectable HCC as a secondary objective. We found only two studies with more than 90% of participants with resectable HCC, preventing a meta‐analysis of their results: Choi 2019 with 203 participants, reported a sensitivity of 89% (95% CI 73% to 97%) and specificity of 83% (95% CI 76% to 88%) and Gambarin‐Gelwan 2000 with 106 participants, reported a sensitivity of 79% (95% CI 54% to 94%) and a specificity of 87% (95% CI 79% to 94%).

Heterogeneity analysis

We investigated heterogeneity while considering studies using the combination of AFP 20 ng/mL and US as the index test and found no difference between some prespecified subgroups: prospective compared to retrospective studies, studies conducted before 2000 compared to those conducted after 2000, studies with HCC prevalence lower than 10% compared to studies with HCC prevalence higher than 10%, studies conducted in surveillance programmes compared to studies conducted in people with suspected HCC. We could not assess the remaining comparisons because of the small number of included studies (Table 5).

Open in table viewer
Table 5. Heterogeneity and sensitivity analyses for the combination of alpha‐foetoprotein (AFP) (cut‐off 20 ng/mL) and ultrasonography (US)

Subgroup

N of studies

Sensitivity (95% CI)

Specificity (95% CI)

P value

All

6

96% (88% to 98%)

85% (73% to 93%)

case‐control

0

cross‐sectional

6

96% (88% to 98%)

85% (73% to 93%)

prospective

3

91% (84% to 95%)

91% (75% to 97%)

0.578

retrospective

3

97% (83% to 99%)

77% (66% to 85%)

before 2000

2

95% (44% to 100%)

81% (69% to 89%)

0.703

after 2000

4

96% (89% to 99%)

87% (69% to 95%)

cirrhosis > 10%

6

96% (88% to 98%)

85% (73% to 93%)

cirrhosis < 10%

0

Europe

1

100% (83% to 100%)

74% (67% to 80%)

§

America

2

79% (54% to 94%)

90% (77% to 97%)

87% (79% to 94%)

83% (79% to 87%)

Asia

3

99% (98% to 100%)

91% (81% to 96%)

94% (71% to 100%)

68% (66% to 71%)

98% (96% to 99%)

82% (81% to 84%)

Africa

0

HCC prevalence < 10%

3

96% (78% to 99%)

80% (76% to 84%)

0.100

HCC prevalence > 10%

3

95% (79% to 99%)

90% (68% to 97%)

clinical suspect

1

79% (54% to 94%)

87% (78% to 93%)

0.289

surveillance

5

97% (92% to 99%)

85% (70% to 93%)

HCC resectable < 20%

0

HCC resectable > 20%

4

95% (84% to 99%)

88% (72% to 96%)

biopsy

1

99% (98% to 100%)

68% (66% to 71%)

§

OLT

1

79% (54% to 94%)

87% (78% to 93%)

other reference standard

4

93% (86% to 97%)

88% (72% to 95%)

viral < 80%

1

79% (54% to 94%)

87% (79% to 94%)

*

viral > 80%

3

99% (98% to 100%)

91% (81% to 96%)

94% (71% to 100%)

68% (66% to 71%)

98% (96% to 99%)

82% (81% to 84%)

Child A < 50%

1

100% (83% to 100%)

74% (67% to 80%)

Child A > 50%

2

99% (98% to 100%)

91% (81% to 96%)

68% (66% to 71%)

98% (96% to 99%)

US positivity criteria predefined

2

90% (77% to 97%)

94% (71% to 100%)

83% (79% to 87%)

82% (81% to 84%)

§

Full text

6

96% (88% to 98%)

85% (73% to 93%)

* Sparse and missing data. Meta‐analysis not conducted

§ Model failed to converge

OLT: orthotopic liver transplantation; HCC: hepatocellular carcinoma; US: ultrasonography

Sensitivity analysis

We did not perform the sensitivity analyses as all the studies were judged to be at high risk of bias; all the studies were cross‐sectional; no study reported uninterpretable results; all the studies were published as full text, and only two studies reported predefined US positivity criteria (Singal 2012; Ungtrakul 2016; Table 5).

Comparative analyses

The indirect comparison between the 147 studies with AFP at a cut‐off value of around 20 ng/mL showed an AFP sensitivity of 60% (95% CI 58% to 62%) and specificity of 84% (95% CI 82% to 86%) compared to the 39 studies with US showing a sensitivity of 72% (95% CI 63% to 79%) and specificity of 94% (95% CI 91% to 96%). Both US sensitivity (P = 0.0011) and specificity (P < 0.0001) were higher than those of AFP (Figure 10).


Summary receiver operating characteristic (ROC) showing the indirect comparison (between study) of the results of two different index tests, ultrasound (black circles) and alpha‐foetoprotein with a cut‐off value around 20 ng/mL (red diamonds) against the same reference standards (the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months).The solid circles represent the summary estimates of sensitivity and specificity for ultrasound (black circle) and AFP cut‐off 20 ng/ml (red circle).”The dotted lines represent the 95% confidence regions.

Summary receiver operating characteristic (ROC) showing the indirect comparison (between study) of the results of two different index tests, ultrasound (black circles) and alpha‐foetoprotein with a cut‐off value around 20 ng/mL (red diamonds) against the same reference standards (the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months).

The solid circles represent the summary estimates of sensitivity and specificity for ultrasound (black circle) and AFP cut‐off 20 ng/ml (red circle).”

The dotted lines represent the 95% confidence regions.

For the direct comparison between the two tests, 15 studies provided data assessing AFP measurement with a cut‐off value of 20 ng/mL and abdominal US (Okazaki 1984; Cottone 1988; Maringhini 1988; Tremolada 1989; Sherman 1995; Chalasani 1999; Gambarin‐Gelwan 2000; Wong 2008; Singal 2012; Raff 2014; Chang 2015; Ungtrakul 2016; Atiq 2017; Kim 2019b; Yang 2019). We found that four studies (Sherman 1995; Raff 2014; Atiq 2017; Yang 2019) reported data obtained in different participants for the two index tests. For this reason, we excluded them from the direct comparison analysis. Thus, we included 11 studies with 6674 participants allowing a direct comparison (Figure 11). By using the bivariate model, we obtained the following pooled estimates: for AFP (cut‐off value 20 ng/mL), sensitivity 64% (95% CI 56% to 71%) and specificity 89% (95% CI 79% to 94%); for US, sensitivity 81% (95% CI 66% to 90%) and specificity 92% (95% CI 83% to 97%). The sensitivity of US was higher (P = 0.0044; relative sensitivity 1.27, 95% CI 1.06 to 1.49) while the specificities did not differ (P = 0.3861; relative specificity 1.04, 95% CI 0.95 to 1.12).


Summary receiver operating characteristic (ROC) showing the direct comparison (within study) of the results of two different index tests, alpha‐foetoprotein with a cut‐off value around 20 ng/mL (black circles) and ultrasound (red diamonds)in the same participants against the same reference standards (the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months).The solid circles represent the summary estimates of sensitivity and specificity for AFP, with cut‐off around 20 ng/ml (black circle) and for US, for direct comparison (red circle).”The dotted lines represent the 95% confidence regions..

Summary receiver operating characteristic (ROC) showing the direct comparison (within study) of the results of two different index tests, alpha‐foetoprotein with a cut‐off value around 20 ng/mL (black circles) and ultrasound (red diamonds)in the same participants against the same reference standards (the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months).

The solid circles represent the summary estimates of sensitivity and specificity for AFP, with cut‐off around 20 ng/ml (black circle) and for US, for direct comparison (red circle).”

The dotted lines represent the 95% confidence regions.

.

Seven studies provided data assessing either US or the combination of US and AFP with a cut‐off 20 ng/mL. After excluding the Raff 2014 study which reported data obtained in different participants, six studies with 5044 participants allowed a direct comparison (Figure 12). By using the bivariate model, we obtained for US a sensitivity of 76% (95% CI 56% to 89%) and a specificity of 93% (95% CI 80% to 98%); for the combination of US and AFP, a sensitivity of 96% (95% CI 88% to 98%) and a specificity of 85% (95% CI 73% to 92%). The sensitivity of the combination of US and AFP was higher (P = 0.0141; relative sensitivity 1.28, 95% CI 1.03 to 1.53) while the specificity did not differ (P = 0.1024; relative specificity 0.94, 95% CI 0.87 to 1.01) compared with US alone.


Summary receiver operating characteristic (ROC) showing the direct comparison (within study) of the results of two different index tests, ultrasound (black circles) and the combination of alpha‐foetoprotein with a cut of value around 20 ng/mL and ultrasound (red diamonds) in the same participants against the same reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation,the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.The solid circles represent the summary estimates of sensitivity and specificity for ultrasound (black circle) and US + AFP cut‐off 20 ng/ml (red circle).The dotted lines represent the 95% confidence regions.

Summary receiver operating characteristic (ROC) showing the direct comparison (within study) of the results of two different index tests, ultrasound (black circles) and the combination of alpha‐foetoprotein with a cut of value around 20 ng/mL and ultrasound (red diamonds) in the same participants against the same reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation,the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

The solid circles represent the summary estimates of sensitivity and specificity for ultrasound (black circle) and US + AFP cut‐off 20 ng/ml (red circle).

The dotted lines represent the 95% confidence regions.

Summaryof findings tables

The main results are shown in the summary of findings Table 1 and summary of findings Table 2.

Discussion

Summary of main results

This review aimed to assess the diagnostic accuracy of abdominal ultrasound (US) and alpha‐foetoprotein (AFP), alone or in combination, for the diagnosis of hepatocellular carcinoma (HCC) of any size and at any stage in people with chronic liver disease, either in a surveillance programme or in a clinical setting. The main results are shown in the summary of findings Table 1 and summary of findings Table 2 tables.

We included 373 studies: 326 studies assessed AFP as the index test in 144,570 participants; 39 studies assessed abdominal US in 18,792 participants; and eight studies assessed both AFP and abdominal US as the index tests in 5454 participants.

We judged only one study (US as the index test) to be at low risk of bias for all four QUADAS‐2 domains (Bennett 2002); all the remaining studies were considered to be at high or unclear risk of bias in at least one domain. We also judged most studies (323/373) to be at high concern for the applicability of the results, mainly because of the patient selection domain, as only people with viral aetiology or decompensated liver disease were included, or participants were selected according to volume or other characteristics of the target disease, and because of the reference standard domain, as to confirm the presence of HCC, pathological examination of explanted liver, or of surgical specimen, or necroscopy, or technologies no longer in use, were required.

We summarised the main results of analyses in the summary of findings Table 1 and summary of findings Table 2. We considered the following consequences of test results: people with true‐positive results, i.e. with HCC and positive test results, will receive appropriate further testing and possibly treatment; people with true‐negative results, i.e. without HCC and negative test results, will appropriately avoid further testing; people with false‐negative results, i.e. with HCC and negative test results, are misdiagnosed and will not receive the appropriate treatment; people with false‐positive results, i.e. without HCC and positive test results, will undergo inappropriately further testing with computed tomography (CT), contrast‐enhanced ultrasound (CEUS), magnetic resonance imaging (MRI), or biopsy.

The prevalence of HCC varied widely, from 1% to 82%, according to the study design and the different settings. For exemplification, we considered in the 'Summary of findings' tables two different populations: a population at low risk of HCC, with an HCC prevalence of 5%, a value close to that reported by most epidemiological studies (Lok 2009; EASL 2018; Forner 2018); a population at high risk of HCC, with a prevalence of 30%, that is the median of the prevalence in the included cross‐sectional studies conducted in clinical cohorts.

Alpha‐foetoprotein (AFP)

There was a wide variation in the used cut‐off values in the studies with AFP as the index test, and, therefore, we performed a meta‐analysis with the hierarchical summary ROC model (HSROC) (Figure 3). There was a considerable heterogeneity in the accuracy estimates, which could in some degree be attributable to the different cut‐off values. In order to obtain a pooled estimate of the sensitivity and the specificity, we chose the two most used cut‐off values: around 20 ng/mL reported in 147 of 326 studies, and 200 ng/mL reported in 56 studies.

AFP cut‐off around 20 ng/mL

For AFP with a cut‐off of around 20 ng/mL, performing the meta‐analysis with the bivariate model, we obtained the following pooled estimates: sensitivity of 60% (95% CI 58% to 62%) and specificity of 84% (95% CI 82% to 86%). Considering a hypothetical cohort of 1000 people with an HCC prevalence of 5%, we can expect 20 false‐negative and 148 false‐positive results; with a prevalence of 30%, we can expect 121 false‐negative and 109 false‐positive results (summary of findings Table 1).

These results were also consistent with those obtained in a sensitivity analysis considering the studies with a cross‐sectional design alone. We found the setting of the studies as a possible source of heterogeneity: we found different results in studies with enrolment from surveillance programmes compared to studies with enrolment from clinical series. We observed some heterogeneity of accuracy estimates between studies (sensitivity, IQR from 53% to 67%; specificity, IQR from 76% to 90%). Altogether, the heterogeneity of the results remained unexplained despite the exploration of many other possible sources. We did not find any difference between studies with cross‐sectional and case‐control design. Moreover, the results seem consistent in different geographical areas, along the time, according to HCC prevalence and volume, and according to viral or non viral aetiology and severity of the underlying chronic liver disease. The pooled estimates are quite precise with narrow 95% CIs, but all the studies were at high risk of bias and at high concern for applicability, and with a wide inconsistency that could not be explained by the investigation of potential sources. We judged the certainty of evidence as very low.

AFP cut‐off value of 200 ng/mL

For the 56 studies on AFP with a cut‐off value of 200 ng/mL, performing the meta‐analysis with bivariate model, we obtained sensitivity of 36% (95% CI 31% to 41%) and specificity of 99% (95% CI 98% to 99%). Considering a hypothetical cohort of 1000 people with a HCC prevalence of 5%, we can expect 32 false‐negative and 10 false‐positive results; with a prevalence of 30%, we can expect 195 false‐negative and 7 false‐positive results (summary of findings Table 1). These results were consistent also in a sensitivity analysis of the studies with cross‐sectional design alone. We observed some heterogeneity of accuracy estimates between studies (sensitivity, IQR, 23% to 50%; specificity, IQR 97% to 100%). As possible sources of heterogeneity, we found geographical location (studies conducted in different continents) and severity of the underlying liver disease, according to Child‐Pugh classification (Table 3). The pooled estimates are quite precise with narrow 95% CIs, but all studies were at high risk of bias and at high concern for applicability, and with a wide inconsistency that could not be explained by the investigation of potential sources. We judged the certainty of evidence as very low.

Abdominal ultrasound

For the 39 studies using US as the index test, performing the meta‐analysis with bivariate model, we obtained the following pooled estimates: sensitivity of 72% (95% CI 63% to 79%) and specificity of 94% (95% CI 91% to 96%). Considering a hypothetical cohort of 1000 people with an HCC prevalence of 5%, we can expect 2 false‐negative and 143 false‐positive results; with a prevalence of 30%, we can expect 143 false‐negative and 42 false‐positive results (summary of findings Table 1). We observed some heterogeneity of accuracy estimates between studies (sensitivity, IQR 44% to 89%; specificity, IQR 86% to 96%). Our investigation of the potential sources cannot explain this inconsistency of the results. Most studies are at high risk of bias and many at high concern for applicability. The pooled estimates of accuracy have narrow 95% Cl. We judged the certainty of evidence as very low.

Combination of AFP and abdominal ultrasound

For the six studies, using a combination of AFP with cut‐off value 20 ng/mL and US as index test, the meta‐analysis with the bivariate model produced the following pooled estimates: sensitivity of 96% (95% CI 88% to 98%) and specificity of 85% (95% CI 73% to 93%). Considering a hypothetical cohort of 1000 people with a HCC prevalence of 5%, we can expect 2 false‐negative and 143 false‐positive results; with a prevalence of 30%, we can expect 2 false‐negative and 105 false‐positive results (summary of findings Table 1). All studies are at high risk of bias and many at high concern for applicability. We did not find a considerable inconsistency of the results and imprecision of the estimates with wide confidence intervals 95% Cl. We judged the certainty of evidence as low.

Comparisons

We compared the results of the two index tests: AFP and US. We performed a direct (within‐study) comparison in 11 studies using US and AFP with a cut‐off value of around 20 ng/mL and showing a higher sensitivity of US with similar specificities (Figure 11). An indirect comparison between 147 AFP studies, with a cut‐off value of around 20 ng/mL, and 39 US studies showed a higher sensitivity and specificity of US (Figure 10). The direct comparison considering only the six studies, reporting both US and the combination of AFP (cut‐off 20 ng/mL) as index test and US performed in the same participants, showed a higher sensitivity of the combination of AFP and US (relative sensitivity 1.28, 95% CI 1.03 to 1.53, P = 0.0141), while the specificities did not differ (relative specificity 0.94, 95% CI 0.87 to 1.01; Figure 12). All studies were at high risk of bias and many at high concern for applicability. We judged the certainty of evidence as low (summary of findings Table 2).

Strengths and weaknesses of the review

Strengths and weaknesses of included studies

Overall, the included studies cover a vast time span and a wide geographical distribution including areas with high and low prevalence of chronic liver disease and HCC.

We found more studies using AFP (n = 326) than using US (n = 39), or the combination of AFP and US (n = 8) as the index test. As we anticipated, many studies with biomarkers were conducted with a case‐control design, and in order to improve the completeness of our review, we included studies that compared people with known HCC to matched control. The large number of studies allowed us to obtain precise summary estimates of sensitivity and specificity with narrow confidence intervals. On the other hand, we found only 11 studies providing data for a direct (within study) comparison of AFP and US.

An overall quality assessment of the studies showed some common methodological weaknesses. We considered only one study to be at low risk of bias (Bennett 2002). In most studies with AFP as the index test, the design was case‐control and the risk of bias was high for patient selection. Furthermore, different cut‐off values were used, ranging from 5 ng/mL to 1000 ng/mL, and these were rarely predefined. The choice of the reference standard was also a major concern for all studies, either with AFP or US, or the combination of AFP and US as index test. The most used reference standard was CT or MRI, or their combination (as also recommended by most clinical guidelines; (Omata 2017; EASL 2018; Heimbach 2018)), but these tests cannot be regarded as absolutely accurate. Another choice of a reference standard was the histology of focal lesion, which is highly specific, but not sensitive, especially for small lesions, and cannot be obtained in the participants with a negative index test. Lastly, another reference standard is the pathology of the explanted liver which is possible only in studies conducted on participants with advanced and decompensated liver disease on a waiting list for transplantation which does not match the review question. In some studies, an AFP value, higher than 200 ng/mL, 400 ng/mL, or 500 ng/mL was one of the criteria for the reference standard. Moreover, in case‐control studies, it was often unclear how the target disease was excluded in control participants. Reporting the time interval between the index test and the reference standard was very rare, and often participants underwent different reference standards according to the results of the index test. Furthermore, US is also considered associated with frequent technical failure and with uninterpretable results: interferences due to extrinsic factors such as interposed bowel, ribs, lung, or ascites, as well as patient factors such as obesity or inability to comply with breathing instructions, severe steatosis or severe parenchymal heterogeneity from advanced cirrhosis may impair visualisation of the liver (Rodgers 2019). Up to 14% of US examination were retrospectively judged as inadequate and only 66.5% as definitely adequate in a study of US quality in a HCC surveillance programme in people with liver cirrhosis (Simmons 2017). We found only three studies that addressed this problem reporting the number of uninterpretable results. Not reporting these technical failures of US examination and excluding them from analyses could have produced an overestimation of test accuracy.

Using QUADAS‐2, we judged more than 85% of the included studies at high concern for applicability. The case‐control design, adopted in most AFP studies, results in an artefactual mixing of affected and non‐affected participants which impairs applicability. However, even in cross‐sectional studies, as were most US and combination of AFP and US studies, the inclusion/exclusion criteria and different settings make the included participants different from those targeted by the review question. On the contrary, we judged at low concern most studies for the other two domains, i.e. index test and reference standard.

Finally, many studies did not report all the covariates we planned to assess as possible source of heterogeneity, and this might have impaired both their and our analyses.

Strengths and weaknesses of the review process

Limitations of the search strategy

Our search strategy allowed us to obtain a large number of studies that were conducted in various countries, showing a widespread implementation globally of the index tests, and confirming the clinical relevance of the review question. In order to improve the completeness of our review, we planned to include even studies with case‐control design that are considered to be at high risk of bias due to inflated accuracy estimates and could have been excluded. Most studies on biomarkers, such as AFP, are conducted with case‐control design and indeed, almost 80% of the included AFP studies were case‐control studies. Interestingly, their results were not different from those obtained by cross‐sectional studies. Furthermore, we included many studies in which AFP was not used as the index test but as the comparator to some other biomarker, and this choice might arguably make publication bias less probable. We identified seven studies through manual searching of the references of the included studies or of previous reviews, and we are confident that we have included most, if not all, of the includable published studies. We applied no language restrictions in the inclusion criteria, and we retrieved 20 full‐text studies published in non‐English languages, of which we included six studies.

Quality assessment and data extraction

We considered our attempts to reduce subjectivity in our judgments and to minimise errors and miscalculations in data extraction as a strength of this review. According to the protocol plan, two review authors independently assessed the risk of bias of included studies and applicability of their results, using QUADAS‐2, and completed the data extraction for each included study using a proper form. In case of disagreement, we reached consensus through discussion. Disagreement was more frequent for the assessment of two QUADAS‐2 domains: patient selection (19 studies) and reference standard (15 studies). For data extraction, most of the discordances were due to simple miscalculations or typos and easily solved. For 27 studies a discussion was needed. The agreement obtained through discussion by two review authors was further discussed and approved by a third review author. Then the same authors assessed the certainty of evidence using the GRADE approach and the level of agreement was very high.

Limitations in the review analyses

Despite the large number of included studies and participants, and the consequent precision of accuracy estimates, the results of included studies were not consistent. The use of different cut‐off values and different setting (surveillance programme compared to clinical series) could explain heterogeneity only in part. Considering only studies with the same AFP cut‐off values, the most frequent cut‐off values of 20 ng/mL and 200 ng/mL allowed obtaining more consistent estimates.

In studies with AFP with a cut‐off of 20 ng/mL only, we found that study setting was another source of heterogeneity: studies conducted in a surveillance programme compared to those conducted in a clinical setting showed different pooled estimates, with a lower sensitivity and higher specificity in the former. We expected that studies conducted in a surveillance programme would obtain more consistent results: inclusion and exclusion criteria were clear and standardised, such as the index test, reference standard, and timing, whereas, in a clinical setting more variability was expected as participants may have different concurrent disease, different severity of the underlying chronic liver disease, and different stage of the detected HCC. Arguably, in a surveillance programme the underlying liver disease is less severe, and HCCs are smaller. Despite these considerations, we did not plan a separate analysis for the two settings as they are not so clearly distinct in the actual clinical practice (Poustchi 2011; Forner 2018). The two index tests, particularly US, are part of the routine evaluation of people with liver disease; HCC, the target disease, induces no symptom and is usually asymptomatic, thus the clinical suspect of HCC is based only on the presence of a chronic advance liver disease. On the other hand, we found no difference according to the study settings in studies with AFP cut‐off value of 200 ng/mL, or with US.

As 80% of hepatocellular carcinoma occurrences occur in sub‐Saharan Africa and eastern Asia, we expected that study geographical location could be a source of heterogeneity (Bray 2018). The sensitivity was different in studies conducted across continents in studies with AFP cut‐off of 200 mg/mL. The severity of the underlying liver disease, as expressed by the percentage of participants with Child‐Pugh class A, could also provide explanation of the heterogeneity of results. The sensitivity was lower in studies with AFP cut‐off of 200 ng/mL and including more than 50% of patients with Chid‐Pugh class A, i.e. participants with less severe liver disease.

Despite the availability of an adequate number of studies, we were unable to demonstrate any role of aetiology of the underlying chronic liver disease and of the HCC characteristics (volume, resectability). Most studies, conducted either in a surveillance programme or in a clinical setting included inconsistent mixture of participants at different risk of HCC, as shown by the large variability of the prevalence, and we were unable to show the role of the individual characteristics of participants. We could investigate only characteristics that could be assessed at a study level whereas patients' factors or HCC characteristics can be assessed only by aggregate statistics with the inherent risk of ecological bias. Thus, some important relationship such as that with the HCC volume could have been missed. In addition, many of the included studies did not report data on the covariates of our interest. Also, we could not evaluate variability associated to test interpretation, particularly for US which is considered dependent on a subjective judgment. We checked the presence of a definition of US positivity criteria in single studies but not their stringency, apart from their subjective interpretation. We were also unable to assess the effect of uninterpretable results which should be relevant for US due to frequent technical failures. We found only two studies reporting the number of uninterpretable results and could not conduct the planned analysis according to the intention‐to‐diagnose principle. Moreover, we cannot exclude that most of the studies did not report uninterpretable results and excluded them from analyses, thus inflating the accuracy estimates.

In any case, the sensitivity analyses show that the obtained results are arguably robust, with no variation, after excluding studies published in abstract form or studies with case‐control design. As we conducted the analyses of AFP studies using the two most frequent cut‐off values of 20 ng/mL and 200 ng/mL, we considered unnecessary to conduct the planned analysis excluding studies without a predefinition of a cut‐off value.

Within‐ and between‐study comparisons

In order to assess any difference in the accuracy of the three index tests (AFP, US, and the combination of AFP and US) ,we planned and performed a direct (or within‐study) comparison. After the exclusion of the four studies that reported data for two or three index tests obtained in different numbers of participants (Sherman 1995; Raff 2014; Atiq 2017; Yang 2019), we could do a direct comparison with 11 primary studies with AFP and US, and with six studieswith US, and a combination of AFP and US. The US sensitivity was higher than that of AFP at a cut‐off of 20 ng/mL, with comparable specificity. Also, with the combination of AFP (cut‐off 20 ng/mL) and US, the sensitivity increased, in comparison to US alone, from 74% to 93% with comparable specificity. These results were confirmed by the indirect (between‐study) comparisons which were possible in a greater number of studies (146 with AFP and 39 with US). This between‐study comparison, including a greater number of studies, and hence with more power to detect any difference and with more precise results, has a high risk of confounding due to differences in population characteristics, reference standards, and study design.

Comparison with previous research

We found seven reviews on the same topic (Colli 2006; Tateishi 2008; Singal 2009; Kansagara 2014; Singal 2014; Chou 2015; Tzartzeva 2018). Two of these compared imaging techniques for the diagnosis of HCC (Colli 2006; Chou 2015), one assessed only AFP (Tateishi 2008), and four reviews focused mainly on the effectiveness of surveillance programmes with US and AFP (Singal 2009; Kansagara 2014; Singal 2014; Tzartzeva 2018). With our search, we could include many more studies for each index test, and differently from the other reviews, we explored the accuracy of AFP, US, and the combination of AFP and US in the clinical pathway as the first diagnostic step, either in clinical setting or surveillance programme. Due to differences in the methodologic approach, in the inclusion/exclusion criteria, and in the statistical analyses, the results are not comparable to each other and to our results. Colli 2006 reports the results from 14 studies published before January 2005, and the summary estimate of US sensitivity was 60% and specificity 97%. Both Chou 2015 and Tzartzeva 2018, pooling the results of more recent 15 studies, found a US sensitivity higher than 75% and specificity higher than 90%, more similar to our findings. According to Tzartzeva 2018, the accuracy of combining US and AFP improves the diagnostic accuracy with a sensitivity of 97%, but for the detection of early HCC it remains close to 60%.

Applicability of findings to the review question

The review question has broad inclusion criteria, and the consequent large heterogeneity of the results allows exploration of variation in accuracy across various settings, different patient groups or variations in index test, and reference standard application. Using the QUADAS‐2 tool, we judged many studies at high concern for applicability in the participant selection domain. In fact, most AFP studies (77%) were case‐control studies with an artefactual mixing of affected and non‐affected participants. However, even in cross‐sectional studies, the prevalence of the target disease ranged from 1% to 82%, as consequences of different settings and variable inclusion criteria often did not match the review question. On the other hand, we judged all studies to be at low concern for applicability in the index test domain. For the reference standard domain, we judged the studies using as reference standard the pathology of the explanted liver to be at high concern. This reference standard, even if perfectly accurate, cannot match the review question as it is applicable only to participants in a waiting list for a liver transplantation.

Study flow diagram 
Date of search: 5 June 2020

Figuras y tablas -
Figure 1

Study flow diagram
Date of search: 5 June 2020

Risk of bias and applicability concerns graph: review authors' judgements about each domain presented as percentages across included studies

Figuras y tablas -
Figure 2

Risk of bias and applicability concerns graph: review authors' judgements about each domain presented as percentages across included studies

Summary receiver operating characteristic (ROC) comparing in 326 studies alpha‐foetoprotein serum measurement with any cut‐off value and different reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

Figuras y tablas -
Figure 3

Summary receiver operating characteristic (ROC) comparing in 326 studies alpha‐foetoprotein serum measurement with any cut‐off value and different reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

Forest plots of sensitivity and specificity of alpha‐foetoprotein with a cut‐off value around 20 ng/mL against different reference standards in 147 studies ordered by study design, setting and increasing HCC prevalence. Reference standards were: the pathology of the explanted liver in case of transplantation, the histology of resected focal liver lesions, or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line).

Figuras y tablas -
Figure 4

Forest plots of sensitivity and specificity of alpha‐foetoprotein with a cut‐off value around 20 ng/mL against different reference standards in 147 studies ordered by study design, setting and increasing HCC prevalence. Reference standards were: the pathology of the explanted liver in case of transplantation, the histology of resected focal liver lesions, or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line).

Summary receiver operating characteristic (ROC) comparing alpha‐foetoprotein with a cut‐off value around 20 ng/mL (black circles) and alpha‐foetoprotein with a cut‐off value around 200 ng/mL (red diamonds) against the same reference standards.
Reference standards were: the pathology of the explanted liver in case of transplantation;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.The solid circles represent the summary estimates of sensitivity and specificity for AFP cut‐off around 20 ng/ml (black circle) and AFP cut off 200 ng/ml (red circle).The dotted lines represent the 95% confidence regions. The dashed lines represent the 95% prediction regions.

Figuras y tablas -
Figure 5

Summary receiver operating characteristic (ROC) comparing alpha‐foetoprotein with a cut‐off value around 20 ng/mL (black circles) and alpha‐foetoprotein with a cut‐off value around 200 ng/mL (red diamonds) against the same reference standards.
Reference standards were: the pathology of the explanted liver in case of transplantation;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

The solid circles represent the summary estimates of sensitivity and specificity for AFP cut‐off around 20 ng/ml (black circle) and AFP cut off 200 ng/ml (red circle).

The dotted lines represent the 95% confidence regions. The dashed lines represent the 95% prediction regions.

Summary receiver operating characteristic (ROC) comparing the results of studies conducted in different settings, surveillance programs (black circles) and clinical setting (red diamonds) against the same reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.The solid circles represent the summary estimates of sensitivity and specificity for surveillance setting (black circle) and clinical suspect setting (red circle).The dotted lines represent the 95% confidence regions. The dashed lines represent the 95% prediction regions.

Figuras y tablas -
Figure 6

Summary receiver operating characteristic (ROC) comparing the results of studies conducted in different settings, surveillance programs (black circles) and clinical setting (red diamonds) against the same reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

The solid circles represent the summary estimates of sensitivity and specificity for surveillance setting (black circle) and clinical suspect setting (red circle).

The dotted lines represent the 95% confidence regions. The dashed lines represent the 95% prediction regions.

Forest plots of sensitivity and specificity of ultrasound against different reference standards.in 39 studies. Reference standards were: the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months. TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line).The individual studies are ordered by study design (cross‐sectional or case‐control), study setting (clinical setting or surveillance program) and increasing sensitivity.

Figuras y tablas -
Figure 7

Forest plots of sensitivity and specificity of ultrasound against different reference standards.in 39 studies. Reference standards were: the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months. TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line).The individual studies are ordered by study design (cross‐sectional or case‐control), study setting (clinical setting or surveillance program) and increasing sensitivity.

Summary receiver operating characteristic (ROC) comparing, in 39 studies, ultrasound and different reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.The solid circle represents the summary estimate of sensitivity and specificity.The dotted lines represent the 95% confidence regions. The dashed lines represent the 95% prediction regions.

Figuras y tablas -
Figure 8

Summary receiver operating characteristic (ROC) comparing, in 39 studies, ultrasound and different reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

The solid circle represents the summary estimate of sensitivity and specificity.The dotted lines represent the 95% confidence regions. The dashed lines represent the 95% prediction regions.

Forest plots of sensitivity and specificity of the combination of alpha‐foetoprotein and ultrasound against different reference standards in 8 studies ordered by increasing sensitivity.Rreference standards were: the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months. TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line)

Figuras y tablas -
Figure 9

Forest plots of sensitivity and specificity of the combination of alpha‐foetoprotein and ultrasound against different reference standards in 8 studies ordered by increasing sensitivity.Rreference standards were: the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months. TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line)

Summary receiver operating characteristic (ROC) showing the indirect comparison (between study) of the results of two different index tests, ultrasound (black circles) and alpha‐foetoprotein with a cut‐off value around 20 ng/mL (red diamonds) against the same reference standards (the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months).The solid circles represent the summary estimates of sensitivity and specificity for ultrasound (black circle) and AFP cut‐off 20 ng/ml (red circle).”The dotted lines represent the 95% confidence regions.

Figuras y tablas -
Figure 10

Summary receiver operating characteristic (ROC) showing the indirect comparison (between study) of the results of two different index tests, ultrasound (black circles) and alpha‐foetoprotein with a cut‐off value around 20 ng/mL (red diamonds) against the same reference standards (the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months).

The solid circles represent the summary estimates of sensitivity and specificity for ultrasound (black circle) and AFP cut‐off 20 ng/ml (red circle).”

The dotted lines represent the 95% confidence regions.

Summary receiver operating characteristic (ROC) showing the direct comparison (within study) of the results of two different index tests, alpha‐foetoprotein with a cut‐off value around 20 ng/mL (black circles) and ultrasound (red diamonds)in the same participants against the same reference standards (the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months).The solid circles represent the summary estimates of sensitivity and specificity for AFP, with cut‐off around 20 ng/ml (black circle) and for US, for direct comparison (red circle).”The dotted lines represent the 95% confidence regions..

Figuras y tablas -
Figure 11

Summary receiver operating characteristic (ROC) showing the direct comparison (within study) of the results of two different index tests, alpha‐foetoprotein with a cut‐off value around 20 ng/mL (black circles) and ultrasound (red diamonds)in the same participants against the same reference standards (the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesion(s), or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months).

The solid circles represent the summary estimates of sensitivity and specificity for AFP, with cut‐off around 20 ng/ml (black circle) and for US, for direct comparison (red circle).”

The dotted lines represent the 95% confidence regions.

.

Summary receiver operating characteristic (ROC) showing the direct comparison (within study) of the results of two different index tests, ultrasound (black circles) and the combination of alpha‐foetoprotein with a cut of value around 20 ng/mL and ultrasound (red diamonds) in the same participants against the same reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation,the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.The solid circles represent the summary estimates of sensitivity and specificity for ultrasound (black circle) and US + AFP cut‐off 20 ng/ml (red circle).The dotted lines represent the 95% confidence regions.

Figuras y tablas -
Figure 12

Summary receiver operating characteristic (ROC) showing the direct comparison (within study) of the results of two different index tests, ultrasound (black circles) and the combination of alpha‐foetoprotein with a cut of value around 20 ng/mL and ultrasound (red diamonds) in the same participants against the same reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation,the histology of resected focal liver lesions, or the histology of biopsied focal liver lesion(s) with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

The solid circles represent the summary estimates of sensitivity and specificity for ultrasound (black circle) and US + AFP cut‐off 20 ng/ml (red circle).

The dotted lines represent the 95% confidence regions.

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Figuras y tablas -
Figure 13

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Forest plots of sensitivity and specificity of alpha‐fetoprotein with any cut‐off value against different reference standards in 326 studies ordered by increasing sensitivity. Reference standards were: the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesions, or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months. TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line).

Figuras y tablas -
Figure 14

Forest plots of sensitivity and specificity of alpha‐fetoprotein with any cut‐off value against different reference standards in 326 studies ordered by increasing sensitivity. Reference standards were: the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesions, or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months. TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line).

Forest plots of sensitivity and specificity of alpha‐foetoprotein with a cut‐off value around 200 ng/mL.against different reference standards in 56 studies ordered by increasing sensitivity. Referencev standards were: the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesions, or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months. TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line).

Figuras y tablas -
Figure 15

Forest plots of sensitivity and specificity of alpha‐foetoprotein with a cut‐off value around 200 ng/mL.against different reference standards in 56 studies ordered by increasing sensitivity. Referencev standards were: the pathology of the explanted liver in case of transplantation.;the histology of resected focal liver lesions, or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months. TP = true positive; FP = false positive; FN = false negative; TN = true negative. Values between brackets are the 95% confidence intervals (CIs) of sensitivity and specificity. The figure shows the estimated sensitivity and specificity of the study (blue square) and its 95% CI (black horizontal line).

Summary receiver operating characteristic (ROC) comparing in 8 studies the combination of alpha‐foetoprotein and ultrasound against different reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation, the histology of resected focal liver lesions, or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

Figuras y tablas -
Figure 16

Summary receiver operating characteristic (ROC) comparing in 8 studies the combination of alpha‐foetoprotein and ultrasound against different reference standards. Reference standards were: the pathology of the explanted liver in case of transplantation, the histology of resected focal liver lesions, or the histology of biopsied focal liver lesions with a follow‐up period of at least six months, typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months.

Alpha‐foetoprotein

Figuras y tablas -
Test 1

Alpha‐foetoprotein

Ultrasound

Figuras y tablas -
Test 2

Ultrasound

US + AFP

Figuras y tablas -
Test 3

US + AFP

AFP cut‐off around 20 ng/mL

Figuras y tablas -
Test 4

AFP cut‐off around 20 ng/mL

AFP cut‐off around 200 ng/mL

Figuras y tablas -
Test 5

AFP cut‐off around 200 ng/mL

US + AFP cut‐off 20 ng/mL

Figuras y tablas -
Test 6

US + AFP cut‐off 20 ng/mL

US for direct comparison AFP 20 ng/mL

Figuras y tablas -
Test 7

US for direct comparison AFP 20 ng/mL

Summary of findings 1. 'Summary of findings' table: diagnostic accuracy of AFP, US, and combination of AFP and US for the diagnosis of HCC

Review question: what is the diagnostic accuracy of alpha‐foetoprotein (AFP), abdominal ultrasound (US), or of the combination of AFP and abdominal US for the diagnosis of hepatocellular carcinoma (HCC) in adults with chronic liver disease?

Population: adults with chronic liver disease

Setting: clinical setting (secondary or tertiary care setting) or surveillance programs

Study design: prospective and retrospective cross‐sectional and case‐control studies

Index tests

Serum alpha‐foetoprotein (AFP) measurement with a cut‐off value of 20 ng/mL

Serum alpha‐foetoprotein (AFP) measurement with a cut‐off value of 200 ng/mL

Abdominal ultrasound (US)

Combination of serum alpha‐foetoprotein (AFP) measurement with a cut‐off value of 20 ng/mL and abdominal ultrasound (US)

Target condition: HCC of any size, any stage

Reference standards:

the pathology of the explanted liver in case of transplantation; the histology of resected focal liver lesion(s), or the histology of resected or biopsied focal liver lesion(s) with a follow‐up period of at least six months to exclude the presence of focal lesions non detected by the index test and synchronous lesions from the parenchyma surrounding the resected or biopsied area;

typical characteristics on cross‐sectional multiphasic contrast computer tomography (CT) or magnetic resonance imaging (MRI), with a follow‐up period of at least six months in order to allow the confirmation of an initial negative result on CT or on MRI.

Limitations in the evidence ‐ Risk of bias/Applicability

Index test: serum alpha‐foetoprotein (AFP) measurement cut‐off value 20 ng/mL

‐ Participant selection: high/unclear risk of bias 141 studies (96%), high concern 115 studies (78%)

‐ Index tests: high/unclear risk of bias in 73 studies (50%) high concern: no study

‐ Reference standard: high/unclear risk of bias in 105 studies (71%) high concern 33 studies (22%)

‐ Flow and timing: high risk of bias in 143 studies (97%)

Index test: serum alpha‐foetoprotein (AFP) measurement cut‐off value 200 ng/mL

‐ Participant selection: high/unclear risk of bias 48 studies (86%), high concern 47(84%)

‐ Index tests: high/unclear risk of bias in 54 studies (96%) high concern no study

‐ Reference standard: high/unclear risk of bias in 39 studies (70%) high concern 13 studies (23%)

‐ Flow and timing: high risk of bias in 55 studies (98%)

Index test: abdominal ultrasound

‐ Participant selection: high/unclear risk of bias in 23 studies (59%) high concern 22 studies (56%)

‐ Index tests: high/unclear risk of bias in 15 studies (38%) high concern no study

‐ Reference standard: high/unclear risk of bias in 27 studies (69%) high concern 13 studies (33%)

‐ Flow and timing: high risk of bias in 27 studies (TN) (69%)

Index test: combination of serum alpha‐foetoprotein (AFP) measurement with a cut‐off value of 20 ng/mL and abdominal ultrasound

‐ Participant selection: high/unclear risk of bias in 2 studies (33%) high concern 2 studies (33%)

‐ Index tests: high/unclear risk of bias in 2 studies (33%) high concern no study

‐ Reference standard: high/unclear risk of bias in 4 studies (67%) high concern one study (17%)

‐ Flow and timing: high risk of bias in 6 studies (100%)

Findings

Implications in a hypothetical cohort of 1000 people

Index test

Number of studies (participants)

Sensitivity

(95% CI)

Specificity

(95% CI)

Assumed prevalence of hepatocellular carcinoma (HCC)a

%

True positives will receive appropriately further necessary testing with CT or MRI, or contrast enhanced ultrasound (CEUS) and possibly treatment.

False negatives will be misdiagnosed and not receive appropriate treatment.

True negatives will not appropriately undergo unnecessary further testing with CT, MRI, CEUS, biopsy.

False positives will inappropriately undergo further unnecessary testing with CT, MRI, CEUS biopsy.

Certainty of the evidence

AFP (cut‐off 20 ng/mL)

147

(52144)

59.8%

(57.9% to 61.7%)

84.4%

(82.3% to 86.3%)

5%

30

20

802

148

very low b

⨁◯◯◯

30%

179

121

591

109

AFP (cut‐off 200 ng/mL)

56

(20452)

36% (31% to 41%)

99% (98% to 100%)

5%

18

32

940

10

very low c

⨁◯◯◯

30%

108

192

693

7

US

39

(18792)

72%

(63% to 79%),

94% (91% to 96%)

5%

36

14

893

57

very low d

⨁◯◯◯

30%

216

84

658

42

Combination of AFP (cut‐off 20 ng/mL) and US

6

(5044)

96%

(88% to 98%)

85%

(73% to 93%)

5%

48

2

807

143

low e

⨁⨁◯◯

30%

288

12

595

105

a We chose for exemplification two values of HCC prevalence: 5% for a population at low risk (compensated advanced chronic liver disease and chronic viral hepatitis) Lok 2009 and 30% for a population with high risk, a median of the prevalence in the included cross‐sectional studies conducted in clinical cohorts.

b Downgraded by three levels: risk of bias, indirectness, and inconsistency. Risk of bias downgraded one level because all studies were judged at high risk of bias; indirectness downgraded one level as we considered most studies to have concern regarding applicability mainly in relation to the population (including disease spectrum); inconsistency downgraded one level as for individual studies ranged from 24% to 90% and we could not explain the heterogeneity by study quality or other factors

c Downgraded by three levels: risk of bias, indirectness, and inconsistency. Risk of bias downgraded one level because all studies were judged at high risk of bias; indirectness downgraded one level as we considered most studies to have concern regarding applicability mainly in relation to the population (including disease spectrum); inconsistency downgraded one level as for individual studies ranged from 4% to 83% and we could not explain the heterogeneity by study quality or other factors

d Downgraded by three levels: risk of bias, indirectness, and inconsistency. Risk of bias downgraded one level because all studies were judged at high risk of bias; indirectness downgraded one level as we considered most studies to have concern regarding applicability mainly in relation to the population (including disease spectrum); inconsistency downgraded one level as for individual studies ranged from 28%to 100% and we could not explain the heterogeneity by study quality or other factors

eDowngraded by two levels: risk of bias, indirectness. Risk of bias downgraded one level because all studies were judged at high risk of bias; indirectness downgraded one level as we considered most studies to have concern regarding applicability mainly in relation to the population (including disease spectrum).

GRADE certainty of the evidence

High: we are very confident that the true effect lies close to that of the estimate of the effect.
Moderate: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.
Very low: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

The results presented in this table should not be interpreted in isolation from results of the individual included studies contributing to each summary test accuracy measure.

Figuras y tablas -
Summary of findings 1. 'Summary of findings' table: diagnostic accuracy of AFP, US, and combination of AFP and US for the diagnosis of HCC
Summary of findings 2. 'Summary of findings' table: direct comparison of US, and combination of AFP and US

Review question: what is the diagnostic accuracy of the combination of alpha‐foetoprotein (AFP) and abdominal ultrasound (US) compared to US for the diagnosis of hepatocellular carcinoma (HCC) in adults with chronic liver disease?

Population: adults with chronic liver disease

Setting: clinical setting (secondary or tertiary care setting) or surveillance programs

Study design: prospective and retrospective cross‐sectional studies

Index tests:abdominal ultrasound; combination of serum alpha‐foetoprotein (AFP) measurement with a cut‐off value of 20 ng/mL and abdominal ultrasound

Target condition: HCC of any size, any stage

Reference standards:the pathology of the explanted liver in case of transplantation;the histology of resected focal liver lesion(s), or the histology of resected or biopsied focal liver lesion(s) with a follow‐up period of at least six months to exclude the presence of focal lesions non detected by the index test and synchronous lesions from the parenchyma surrounding the resected or biopsied area;typical characteristics on cross‐sectional multiphasic contrast CT or MRI, with a follow‐up period of at least six months in order to allow the confirmation of an initial negative result on computer tomography (CT) or on magnetic resonance imaging (MRI).

Limitations in the evidence

Risk of bias/ Applicability

‐ Participant selection: high/unclear risk of bias in 2 studies (33%)/ high concern 2 studies (33%)

‐ Index tests: high/unclear risk of bias in 2 studies (33%)/ high concern no study

‐ Reference standard: high/unclear risk of bias in 4 studies (67%)/ high concern 1 study (17%)

‐ Flow and timing: high risk of bias in 6 studies (100%)

Findings

Implications in a hypothetical cohort of 1000 people

Index test

Number of studies (participants)

Sensitivity

(95% CI)

Relative sensitivity (95% CI)

P value

Specificity

(95% CI)

Relative specificity

(95% CI)

P value

Assumed prevalence of hepatocellular carcinoma (HCC)a

%

True positives
will receive appropriately further necessary testing with CT or MRI, or contrast enhanced ultrasound (CEUS) and possibly treatment .

False negatives
will be misdiagnosed and not receive appropriate treatment.

True negatives
will not appropriately undergo unnecessary further testing with CT, MRI, CEUS, biopsy

False positives
will inappropriately undergo further unnecessary testing with CT, MRI, CEUS biopsy.

Certainty of the evidence

US

6 (5044)

76% (56% to 89%)

1.28 (1.03 to 1.539

P = 0.014

93% (80% to 96%)

0.94, (0.87 to 1.01)

P = 0.102

5%

38

12

883

67

lowb

⨁⨁◯◯

30%

228

72

651

49

Combination of AFP (cut‐off 20 ng/mL) and US

96% (88% to 98%)

85% (73% to 82%)

5%

48

2

807

143

30%

288

12

595

105

a We chose for exemplification two values of HCC prevalence: 5% for a population at low risk (compensated advanced chronic liver disease and chronic viral hepatitis) Lok 2009 and 30% for a population with high risk, a median of the prevalence in the included cross‐sectional studies conducted in clinical cohorts.

bDowngraded by two levels: risk of bias, indirectness. Risk of bias downgraded one level because all studies were judged at high risk of bias; indirectness downgraded one level as we considered most studies to have concern regarding applicability mainly in relation to the population (including disease spectrum)

GRADE certainty of the evidence

High: we are very confident that the true effect lies close to that of the estimate of the effect.
Moderate: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.
Very low: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

The results presented in this table should not be interpreted in isolation from results of the individual included studies contributing to each summary test accuracy measure.

Figuras y tablas -
Summary of findings 2. 'Summary of findings' table: direct comparison of US, and combination of AFP and US
Table 1. Guideline recommendations for surveillance for hepatocellular carcinoma

GUIDELINE

INDICATION TO SURVEILANCE

TEST

INTERVAL

American Association for the Study of Liver Disease (AASLD; (Heimbach 2018))

Cirrhosis

Abdominal ultrasound alone or plus AFP

6 months

European Association for the Study of the Liver with European Organization for Research and Treatment of Cancer (EASL‐EORTC; (EASL‐EORTC 2012; EASL 2018))

Cirrhosis in Child Pugh stages A and B; cirrhosis in Child C stage awaiting liver transplantation; non‐cirrhotic hepatitis B virus (HBV) carriers with active hepatitis or family history of HCC; non‐cirrhotic chronic hepatitis C with advanced liver fibrosis stage 3 (F3)

Abdominal ultrasound

6 months

3 to 4 months: people with a nodule less than 1 cm or after resection or loco‐regional therapies

Asian Pacific Association for the Study of the Liver (APASL; (Omata 2017))

Cirrhosis and chronic HBV infection at risk of HCC

Abdominal ultrasound with serum AFP

6 months

AFP: alpha‐foetoprotein; HCC: hepatocellular carcinoma

Figuras y tablas -
Table 1. Guideline recommendations for surveillance for hepatocellular carcinoma
Table 2. Heterogeneity and sensitivity analyses for alpha‐foetoprotein (AFP) cut‐off value around 20 ng/mL

Subgroup

N of studies

Sensitivity (95% CI)

Specificity (95% CI)

P value

All

147

60% (58% to 62%)

84% (82% to 86%)

case‐control

111

60% (58% to 62%)

83% (81% to 85%)

0.133

cross‐sectional

36

57% (52% to 62%)

88% (84% to 91%)

prospective

29

59% (54% to 63%)

86% (81% to 90%)

0.828

retrospective

118

60% (58% to 62%)

84% (82% to 86%)

before 2000

22

65% (59% to 71%)

85% (81% to 88%)

0.264

after 2000

125

59% (57% to 61%)

84% (82% to 86%)

cirrhosis > 10%

94

59% (56% to 61%)

85% (82% to 87%)

§

cirrhosis < 10%

2

61% (51% to 70%)*

57% (50% to 63%)**

87% (84% to 90%)*

83% (74% to 90%)**

Europe

22

60% (54% to 65%)

87% (83% to 90%)

0.447

America

19

56% (50% to 61%)

89% (85% to 92%)

Asia

98

60% (58% to 62%)

83% (80%to 86%)

Africa

7

68% (54% to 80%)

81% (71% to 89%)

HCC prevalence < 10%

16

54% (47% to 62%)

89% (84% to 93%)

0.147

HCC prevalence > 10%

131

60% (58% to 62%)

84% (81% to 86%)

clinical suspect

117

61% (59% to 63%)

83% (80% to 85%)

0.005

surveillance

30

54% (49% to 60%)

89% (86% to 92%)

HCC resectable < 20%

4

61% (48% to 72%)

82% (64% to 92%)

0.909

HCC resectable > 20%

25

56% (51% to 61%)

87% (81% to 91%)

biopsy

22

63% (58% to 68%)

82% (77% to 87%)

0.832

other reference standard

124

59% (57% to 61%)

85% (82% to 87%)

viral < 80%

35

59% (55% to 63%)

87% (83% to 90%)

0.694

viral > 80%

84

59% (57% to 62%)

84% (81% to 86%)

Child A < 50%

17

59% (52% to 67%)

86% (82%to 89%)

0.746

Child A > 50%

34

59% (55% to 62%)

83% (77% to 87%)

Full text

142

60% (58% to 62%)

84% (82% to 86%)

* Hallager 2018 ; ** Liu 2017

§ Model failed to converge

HCC: hepatocellular carcinoma

Figuras y tablas -
Table 2. Heterogeneity and sensitivity analyses for alpha‐foetoprotein (AFP) cut‐off value around 20 ng/mL
Table 3. Heterogeneity and sensitivity analyses for alpha‐foetoprotein (AFP) cut‐off value around 200 ng/mL

Subgroup

N of studies

Sensitivity (95% CI)

Specificity (95% CI)

P value

All

56

36% (31% to 41%)

99% (98% to 100%)

case‐control

42

35% (30% to 40%)

99% (98% to 100%)

0.874

cross‐sectional

14

39% (28% to 51%)

99% (98% to 100%)

prospective

9

42% (27% to 58%)

99% (97% to 100%)

0.713

retrospective

47

35% (30% to 40%)

99% (98% to 100%)

before 2000

9

28% (15% to 47%)

100% (98% to 100%)

0.336

after 2000

47

37% (33% to 42%)

99% (98% to 100%)

cirrhosis > 10%

41

40% (28% to 40%)

99% (99% to 100%)

cirrhosis < 10%

0

Europe

8

40% (28% to 54%)

99% (98% to 100%)

0.020

America

9

27% (21% to 35%)

100% (98% to 100%)

Asia

31

34% (29% to 40%)

98% (97% to 99%)

Africa

8

53% (39% to 66%)

99% (97% to 100%)

HCC prevalence < 10%

5

30% (16% to 48%)

100% (95% to 100%)

0.805

HCC prevalence > 10%

51

36% (32% to 41%)

99% (98% to 99%)

clinical suspect

49

36% (32% to 41%)

99% (98% to100%)

0.995

surveillance

7

34% (18% to 54%)

99% (96% to 100%)

HCC resectable < 20%

2

42% (8% to 85%)

99% (82% to 100%)

0.931

HCC resectable > 20%

8

27% (12% to 50%)

99% (97% to 100%)

biopsy

9

31% (24% to 39%)

100% (97% to 100%)

0.140

other reference standard

46

37% (32% to 43%)

99% (98% to 100%)

viral < 80%

11

37% (29% to 46%)

99% (97% to 100%)

0.705

viral > 80%

30

32% (26% to 39%)

98% (98% to 100%)

Child A < 50%

13

42% (31% to 54%)

99% (99% to 100%)

0.008

Child A > 50%

11

24% (19% to 29%)

99% (97 to 100%)

Full text

54

36% (31% to 41%)

99% (98% to 100%)

HCC: hepatocellular carcinoma

Figuras y tablas -
Table 3. Heterogeneity and sensitivity analyses for alpha‐foetoprotein (AFP) cut‐off value around 200 ng/mL
Table 4. Heterogeneity and sensitivity analyses for ultrasonography (US)

Subgroup

N of studies

Sensitivity (95% CI)

Specificity (95% CI)

P value

All

39

72% (63% to 79%)

94% (91% to 96%)

case‐control

3

82% (64% to 92%)

87% (77% to 93%)

0.737

cross‐sectional

36

71% (62% to 79%)

95% (92% to 97%)

prospective

18

72% (60% to 81%)

94% (90% to 96%)

1.000

retrospective

21

72% (58% to 82%)

94% (89% to 97%)

before 2000

16

79% (70% to 86%)

96% (92% to 98%)

0.091

after 2000

23

67% (54% to 78%)

93% (88% to 96)

cirrhosis > 10%

33

70% (60% to 78%)

94% (91% to 96%)

cirrhosis < 10%

0

Europe

12

82% (73% to 89%)

94% (90% to 97%)

0.186

America

13

57% (45% to 68%)

94% (89% to 96%)

Asia

13

76% (58% to 88%)

94% (85% to 98%)

Africa

0

HCC prevalence < 10%

15

69% (54% to 81%)

96% (92% to 98%)

0.660

HCC prevalence > 10%

24

74% (62% to 82%)

93% (88% to 96%)

clinical suspect

19

74% (61% to 84%)

93% (89% to 96%)

0.898

surveillance

20

69% (57% to 79%)

95% (91% to 98%)

HCC resectable < 20%

4

90% (75% to 97%)

82% (60% to 94%)

0.088

HCC resectable > 20%

16

66% (52% to 77%)

95% (91% to 97%)

biopsy

7

81% (64% to 91%)

90% (84% to 94%)

0.379

OLT

10

55% (41% to 69%)

97% (93% to 96%)

other reference standard

22

76% (64% to 84%)

94% (89% to 97%)

viral < 80%

17

70% (57% to 80%)

94% (90% to 96%)

0.777

viral > 80%

9

79% (58% to 91%)

91% (79% to 97%)

Child A < 50%

5

50% (33% to 68%)

91% (83% to 95%)

0.346

Child A > 50%

9

74% (52% to 88%)

93% (82 to 98%)

US positivity criteria predefined

25

74% (63% to 83%)

93% (89% to 96%)

Uninterpretable test results reported

3

80% (71% to 81%)

76% (71% to 81%)

Full text

38

72% (64% to 80%)

94% (91% to 96%)

OLT: orthotopic liver transplantation; HCC: hepatocellular carcinoma

Figuras y tablas -
Table 4. Heterogeneity and sensitivity analyses for ultrasonography (US)
Table 5. Heterogeneity and sensitivity analyses for the combination of alpha‐foetoprotein (AFP) (cut‐off 20 ng/mL) and ultrasonography (US)

Subgroup

N of studies

Sensitivity (95% CI)

Specificity (95% CI)

P value

All

6

96% (88% to 98%)

85% (73% to 93%)

case‐control

0

cross‐sectional

6

96% (88% to 98%)

85% (73% to 93%)

prospective

3

91% (84% to 95%)

91% (75% to 97%)

0.578

retrospective

3

97% (83% to 99%)

77% (66% to 85%)

before 2000

2

95% (44% to 100%)

81% (69% to 89%)

0.703

after 2000

4

96% (89% to 99%)

87% (69% to 95%)

cirrhosis > 10%

6

96% (88% to 98%)

85% (73% to 93%)

cirrhosis < 10%

0

Europe

1

100% (83% to 100%)

74% (67% to 80%)

§

America

2

79% (54% to 94%)

90% (77% to 97%)

87% (79% to 94%)

83% (79% to 87%)

Asia

3

99% (98% to 100%)

91% (81% to 96%)

94% (71% to 100%)

68% (66% to 71%)

98% (96% to 99%)

82% (81% to 84%)

Africa

0

HCC prevalence < 10%

3

96% (78% to 99%)

80% (76% to 84%)

0.100

HCC prevalence > 10%

3

95% (79% to 99%)

90% (68% to 97%)

clinical suspect

1

79% (54% to 94%)

87% (78% to 93%)

0.289

surveillance

5

97% (92% to 99%)

85% (70% to 93%)

HCC resectable < 20%

0

HCC resectable > 20%

4

95% (84% to 99%)

88% (72% to 96%)

biopsy

1

99% (98% to 100%)

68% (66% to 71%)

§

OLT

1

79% (54% to 94%)

87% (78% to 93%)

other reference standard

4

93% (86% to 97%)

88% (72% to 95%)

viral < 80%

1

79% (54% to 94%)

87% (79% to 94%)

*

viral > 80%

3

99% (98% to 100%)

91% (81% to 96%)

94% (71% to 100%)

68% (66% to 71%)

98% (96% to 99%)

82% (81% to 84%)

Child A < 50%

1

100% (83% to 100%)

74% (67% to 80%)

Child A > 50%

2

99% (98% to 100%)

91% (81% to 96%)

68% (66% to 71%)

98% (96% to 99%)

US positivity criteria predefined

2

90% (77% to 97%)

94% (71% to 100%)

83% (79% to 87%)

82% (81% to 84%)

§

Full text

6

96% (88% to 98%)

85% (73% to 93%)

* Sparse and missing data. Meta‐analysis not conducted

§ Model failed to converge

OLT: orthotopic liver transplantation; HCC: hepatocellular carcinoma; US: ultrasonography

Figuras y tablas -
Table 5. Heterogeneity and sensitivity analyses for the combination of alpha‐foetoprotein (AFP) (cut‐off 20 ng/mL) and ultrasonography (US)
Table Tests. Data tables by test

Test

No. of studies

No. of participants

1 Alpha‐foetoprotein Show forest plot

326

144570

2 Ultrasound Show forest plot

39

18792

3 US + AFP Show forest plot

8

5454

4 AFP cut‐off around 20 ng/mL Show forest plot

147

52144

5 AFP cut‐off around 200 ng/mL Show forest plot

56

20452

6 US + AFP cut‐off 20 ng/mL Show forest plot

6

5044

7 US for direct comparison AFP 20 ng/mL Show forest plot

11

6674

Figuras y tablas -
Table Tests. Data tables by test