Scolaris Content Display Scolaris Content Display

Tomografía computarizada para el diagnóstico de la apendicitis aguda en pacientes adultos

Contraer todo Desplegar todo

Antecedentes

El diagnóstico de la apendicitis aguda (apendicitis) basado en la evaluación clínica, el análisis de sangre y el análisis de orina puede ser difícil. Por lo tanto, en las personas con sospecha de apendicitis, la tomografía computarizada (TC) abdominopélvica a menudo se utiliza como una prueba adicional después de la evaluación inicial para reducir la incertidumbre restante con respecto al diagnóstico. El objetivo del uso de la TC es ayudar al médico a discriminar entre los pacientes que necesitan cirugía con apendicectomía y las personas que no.

Objetivos

Objetivo primario

El objetivo principal fue evaluar la exactitud de la TC para el diagnóstico de la apendicitis en adultos con sospecha de apendicitis.

Objetivos secundarios

Los objetivos secundarios fueron comparar la exactitud de la TC mejorada con contraste frente a la TC no mejorada con contraste, comparar la exactitud de la TC de dosis baja frente a la TC de dosis estándar, y explorar la influencia de la generación de escáneres de TC, la experiencia del radiólogo, el grado de sospecha clínica de apendicitis y aspectos de la calidad metodológica sobre la exactitud diagnóstica.

Métodos de búsqueda

Se hicieron búsquedas en MEDLINE, Embase y Science Citation Index hasta el 16 de junio de 2017. También se hicieron búsquedas en las listas de referencias. No se excluyeron estudios sobre la base del estado de publicación ni el idioma.

Criterios de selección

Se incluyeron estudios prospectivos que comparaban los resultados de la TC frente a los resultados de un estándar de referencia en adultos (> 14 años de edad) con sospecha de apendicitis. Se excluyeron los estudios que reclutaron solo a mujeres embarazadas; los estudios en personas con dolor abdominal en cualquier sitio y sin sospecha particular de apendicitis; los estudios en los que todos los participantes habían sido sometidos a una ecografía antes de la TC y la decisión de realizar la TC dependió del resultado de la ecografía; los estudios que utilizaron un diseño de casos y controles; los estudios con menos de diez participantes; y los estudios que no informaron el número de positivos verdaderos, positivos falsos, negativos falsos y negativos verdaderos. Dos autores de la revisión realizaron de forma independiente el cribaje (screening) y seleccionaron los estudios para su inclusión.

Obtención y análisis de los datos

Dos autores de la revisión recopilaron de forma independiente los datos de cada estudio y evaluaron la calidad metodológica de acuerdo con la herramienta Quality Assessment of Studies of Diagnostic Accuracy ‐ Revised (QUADAS‐2). Se utilizó el modelo de efectos aleatorios bivariado para obtener las estimaciones resumidas de la sensibilidad y la especificidad.

Resultados principales

Se identificaron 64 estudios que incluían a 71 poblaciones de estudio separadas con un total de 10 280 participantes (4583 con y 5697 sin apendicitis aguda). Las estimaciones de la sensibilidad variaron de 0,72 a 1,0 y las estimaciones de la especificidad variaron de 0,5 a 1,0 en las 71 poblaciones de estudio. La sensibilidad resumida fue de 0,95 (intervalo de confianza [IC] del 95%: 0,93 a 0,96) y la especificidad resumida fue de 0,94% (IC del 95%: 0,92 a 0,95). En la prevalencia mediana de apendicitis (0,43), la probabilidad de tener apendicitis después de un resultado positivo de la TC fue de 0,92 (IC del 95%: 0,90 a 0,94), y la probabilidad de tener apendicitis después de un resultado negativo de la TC fue de 0,04 (IC del 95%: 0,03 a 0,05). En los análisis de subgrupos según el aumento del contraste, la sensibilidad de resumen fue mayor para la TC con contraste intravenoso (0,96; IC del 95%: 0,92 a 0,98), la TC con contraste rectal (0,97; IC del 95%: 0,93 a 0,99) y la TC mejorada con contraste intravenoso y oral (0,96; IC del 95%: 0,93 a 0,98) que para la TC no mejorada (0,91; IC del 95%: 0,87 a 0,93). La sensibilidad de resumen de la TC con aumento del contraste oral (0,89; IC del 95%: 0,81 a 0,94) y de la TC sin aumento fue similar. Los resultados no muestran prácticamente ninguna diferencia en la especificidad de resumen, que varió de 0,93 (IC del 95%: 0,90 a 0,95) a 0,95 (IC del 95%: 0,90 a 0,98) entre los subgrupos. La sensibilidad de resumen para la TC de dosis baja (0,94; IC del 95%: 0,90 a 0,97) fue similar a la sensibilidad de resumen para la TC de dosis estándar o de dosis no especificada (0,95; IC del 95%: 0,93 a 0,96); la especificidad de resumen no difirió entre la TC de dosis baja y la de dosis estándar o de dosis no especificada. Ningún estudio tuvo una calidad metodológica alta según la evaluación con la herramienta QUADAS‐2. Los principales problemas metodológicos fueron los estándares de referencia deficientes y la verificación parcial debido principalmente al seguimiento inadecuado e incompleto en las personas que no fueron sometidas a cirugía.

Conclusiones de los autores

La sensibilidad y la especificidad de la TC para el diagnóstico de la apendicitis en adultos es alta. La TC de dosis estándar no mejorada parece tener menor sensibilidad que la TC de dosis estándar mejorada con contraste intravenoso, rectal u oral e intravenoso. El uso de diferentes tipos de contraste para mejorar la TC o la ausencia del mismo no parece afectar la especificidad. Las diferencias en la sensibilidad y la especificidad entre la TC de dosis baja y la TC de dosis estándar parecen ser insignificantes. Los resultados de esta revisión deben interpretarse con cautela por dos razones. En primer lugar, estos resultados se basan en estudios de calidad metodológica baja. En segundo lugar, las comparaciones entre los tipos de contraste para mejorar la TC y las dosis de radiación pueden no ser fiables debido a que se basan en comparaciones indirectas que pueden ser confundidas por otros factores.

¿Qué tan precisa es la tomografía computarizada para el diagnóstico de la apendicitis aguda en adultos?

¿Por qué es importante mejorar el diagnóstico de la apendicitis?
El propósito de utilizar la tomografía computarizada (TC) en personas con sospecha de apendicitis es ayudar al médico a diferenciar entre los pacientes que necesitan cirugía con resección del apéndice (apendicectomía) y los pacientes que no necesitan este procedimiento.

¿Cuál era el objetivo de esta revisión?
El objetivo de esta revisión Cochrane fue averiguar cuán precisa es la TC del abdomen y la pelvis para diagnosticar la apendicitis en adultos. Los investigadores Cochrane incluyeron 64 estudios en la revisión para responder a esta pregunta.

¿Qué se estudió en la revisión?
Una TC se puede realizar de varias maneras. La calidad de la imagen puede mejorarse utilizando material de contraste intravenoso, y la visualización del apéndice puede ser mejor cuando se utiliza material de contraste oral o rectal. La TC también se puede realizar con radiación en dosis bajas. La exposición a la radiación relacionada con la TC puede aumentar el riesgo de cáncer a lo largo de la vida. Esta revisión Cochrane estudió la exactitud de los siguientes tipos de TC: cualquier tipo de TC, TC según el tipo de material de contraste y TC en dosis bajas.

¿Cuáles son los principales resultados de esta revisión?
Esta revisión incluyó 64 estudios relevantes que informaron resultados para 71 poblaciones de estudio separadas con un total de 10 280 participantes. Los resultados generales de estos estudios indican que, en teoría, si se utilizara una TC de cualquier tipo en un servicio de urgencias en un grupo de 1000 personas, de las cuales el 43% tiene apendicitis, entonces:
‐ se calcula que 443 personas tendrían un resultado de la TC que indicaría apendicitis, y de las mismas, el 8% no tendría apendicitis aguda; y
‐ de las 557 personas con un resultado de la TC que indica que la apendicitis no está presente, el 4% realmente tendría apendicitis aguda.

La TC de dosis baja pareció tener la misma exactitud que la TC de dosis estándar para el diagnóstico de la apendicitis. La TC con material de contraste intravenoso, rectal u oral e intravenoso pareció presentar la misma exactitud, y más exactitud que la TC sin el uso de material de contraste.

¿Cuán confiables son los resultados de los estudios en esta revisión?
Entre los estudios incluidos, el diagnóstico final de apendicitis se basó en los hallazgos quirúrgicos o en el examen microscópico del apéndice resecado. Entre los participantes que no fueron sometidos a cirugía, la apendicitis se descartó mediante el seguimiento para observar si los síntomas se resolvían sin apendicectomía. Es probable que el anterior haya sido un método confiable para decidir si los pacientes realmente tenían apendicitis cuando el seguimiento fue cuidadoso y completo. Desafortunadamente, esto no fue así en una proporción considerable de los estudios incluidos. En general, algunos problemas con la forma en que se realizaron los estudios fueron evidentes. Esto puede haber dado lugar a que la TC pareciera más precisa de lo que realmente es, aumentando así el número de resultados correctos de la TC (rectángulos verdes) en el diagrama.

¿Para quiénes son relevantes los resultados de esta revisión?
Los estudios incluidos en la revisión se realizaron principalmente en los servicios de urgencias. Se sospechó de apendicitis en todos los participantes después del examen clínico y las pruebas de sangre. Los estudios incluidos evaluaron una amplia variedad de tipos de TC. La edad promedio de los participantes varió de 25 a 46 años entre los estudios, y el porcentaje de mujeres varió entre un 26% y un 100%. El porcentaje de participantes del estudio con un diagnóstico final de apendicitis varió entre 13% y 92% entre los estudios (promedio, 43%).

¿Cuáles son las implicaciones de esta revisión?
La tomografía computarizada es un examen preciso que probablemente ayude a los médicos a tratar a las personas con posible apendicitis. Los resultados de esta revisión indican que la probabilidad de que un médico diagnostique de forma errónea la apendicitis aguda parece ser baja (8% entre los pacientes cuyos resultados de la TC sugieren que tienen apendicitis). La probabilidad de omitir un diagnóstico de apendicitis también es baja (4% entre los pacientes cuyos resultados de la TC sugieren que no tienen apendicitis).

¿Cuál es el grado de actualización de esta revisión?

Los autores de la revisión buscaron e incluyeron estudios publicados hasta el 16 de junio de 2017.

Authors' conclusions

Implications for practice

Sensitivity and specificity of CT for diagnosing acute appendicitis in adults are high, hence the use of CT is likely to assist clinicians in treating persons with possible appendicitis. Unenhanced standard‐dose CT appears to have lower sensitivity than standard‐dose CT with IV, rectal, or oral and IV contrast enhancement. Use of different types of contrast enhancement or no enhancement does not appear to affect specificity. Differences in sensitivity and specificity between low‐dose and standard‐dose CT appear to be negligible. In adult persons, it seems that low‐dose CT should be preferred over standard‐dose CT as a first‐line imaging test, with standard‐dose CT reserved for persons with inconclusive findings on low‐dose CT. To minimise radiation exposure, clinicians should critically assess whether additional information from CT imaging is needed for decision‐making about surgery, watchful waiting, or discharge. Results of this review should be interpreted with caution for two reasons. First, the results are based on studies of low methodological quality. Second, the comparisons between types of contrast enhancement and radiation dose may be unreliable because they are based on indirect comparisons that may be confounded by other factors.

Implications for research

Future research should focus on low‐dose CT and should corroborate the finding of equal accuracy between low‐dose and standard‐dose CT. Most existing studies have been performed in Asian populations (Chang 2016; Kim 2011; Kim 2012; Seo 2009; The Locat Group 2017; Yun 2016), three studies have been performed in European populations (Keyzer 2004; Keyzer 2009; Platon 2009), and two studies in paediatric populations have been performed in the USA (Callahan 2015; Didier 2015). Such studies should be designed as paired or randomised studies to minimise confounding from other factors that may influence accuracy. This research should also explore the influence of body mass index and whether contrast enhancement improves accuracy compared to unenhanced low‐dose CT. Results from the recent LOCAT study indicate that intravenous contrast enhancement is not needed when low‐dose CT is used (The Locat Group 2017).

The issue of contrast enhancement is also unsettled for standard‐dose CT; we included five randomised trials and one paired study that compared the accuracy of different types of contrast enhancement. More such studies are needed to weigh up reliably estimated gains in sensitivity and specificity with risks and inconveniences related to intravenous, oral, and rectal contrast enhancement.

To minimise radiation expose and costs, future research should continue to explore the performance of existing clinical decision rules in identifying persons with suspected appendicitis that can be managed without the use of CT. Meta‐analyses of the performance of the Alavarado Score have suggested that appendicitis can be ruled out in persons with low scores and ruled in among persons with high scores, but results were heterogeneous, and assessment of methodological quality demonstrated risk of verification bias (Ebell 2014; Ohle 2011). Several observational studies have explored consequences in terms of missed diagnoses and negative appendicectomies of limiting CT to persons with intermediate outcomes on the Alvarado Score (Coleman 2018; McKay 2007; Scott 2015), as well as the Adult Appendicitis Score (Sammalkorpi 2017). Results from a recent trial indicated that the need for imaging tests can be reduced even further. In this trial, persons with intermediate outcomes on the Appendicitis Inflammatory Response Score were randomly allocated to have mandatory or selective imaging (CT or ultrasonography (US)). There was no difference between groups in negative appendicectomy rate nor missed appendicitis rate at 30 days (Andersson 2017). This selective use of CT is supported by our finding that summary sensitivity and specificity for CT did not differ between study populations with intermediate suspicion due to an equivocal presentation and any suspicion of appendicitis.

In future systematic reviews in this area, study selection criteria require careful attention. All studies using retrospectively collected data to reduce potential bias from partial verification may exclude relevant information. Instead study authors should define minimum requirements for adequate follow‐up, or, alternatively, should include all studies and explore whether the quality of follow‐up affects summary estimates of sensitivity and specificity. A special caveat concerns studies in cohorts of persons selected following an appendicectomy and a CT‐scan because clinically applicable estimates of specificity are unlikely to result from such studies.

Future studies of the accuracy of CT for acute appendicitis should adhere to the updated STARD statement to improve the quality of reporting (Bossuyt 2015). Moreover, rigorous follow‐up of participants who do not have surgery should receive special attention in the planning and conduct of such studies because differential verification appears to be inevitable in this area. In general, follow‐up should be complete and careful, and should be of the right duration. In particular, follow‐up should consist of obtaining a reliable alternative diagnosis as well as contacting participants to check that symptoms have resolved, and that surgery or antibiotic therapy has not taken place elsewhere. Authors of future studies should consider strategies used to reduce loss to follow‐up in other types of research such as clinical trials, surveys, and longitudinal studies. These methods include minimising inconvenience, providing monetary incentives, and collecting all available contact information from participants, family members, or other locators (Bower 2014; Brueton 2013; Woolard 2004). Despite all efforts, some participants will be lost to follow‐up. It is important that the number of these participants is reported. Moreover, sensitivity analyses should be performed to assess the potential consequences of loss to follow‐up for sensitivity and specificity.

Finally, the use of antibiotic therapy among participants in upcoming studies will add to the complexity of disease verification. A definitive reference standard would be available only for those who did not improve on antibiotics and underwent subsequent surgery.

Figure 14 presents a flow diagram for the plain language summary.


Plain language summary flowchart.

Plain language summary flowchart.

Summary of findings

Open in table viewer
Summary of findings Summary of findings table

Population

Adults (> 14 years of age) with suspected acute appendicitis based on history, physical examination, and/or blood tests

Settings

Emergency and Radiology Departments in secondary and tertiary care settings

Index test

Computed tomography of the abdomen

Reference standard

Histological examination of the resected appendix or intraoperative findings in persons who had surgery. Clinical follow‐up for persons who did not have surgery

Target condition

Acute appendicitis

Number of studies

64 studies including 71 separate study populations with a total of 10,280 participants ‐ 4583 with and 5697 without acute appendicitis

Methodological concerns

The methodological quality was generally poor, particularly with respect to the reference test and the flow and timing domains. For these domains, few studies were at low risk of bias. Differential verification was used in most studies because some of the participants with suspected acute appendicitis did not have surgery. Clinical follow‐up for these participants was inadequate, incomplete, or poorly described in most studies

Results

Number of

studies
(study populations)a

Summary
sensitivity
(95% CI)

Summary
specificity
(95% CI)

Prevalence of appendicitis
(25% percentile

50% percentile

75% percentile)b

Post‐test probability

following a positive
CT outcome

(95% CI)

Post‐test probability

following a negative

CT outcome

(95% CI)

CT overall

64
(71)

0.95
(0.93‐0.96)

0.94

(0.92‐0.95)

0.32

0.43

0.58

0.88 (0.85‐0.90)

0.92 (0.90‐0.94)
0.96 (0.94‐0.96)

0.02 (0.02‐0.03)

0.04 (0.03‐0.05)
0.07 (0.05‐0.09)

Unenhanced CT

19

(19)

0.91
(0.87‐0.93)

0.94
(0.90‐0.96)

0.32

0.43

0.58

0.87 (0.82‐0.92)

0.92 (0.88‐0.95)

0.95 (0.93‐0.97)

0.04 (0.03‐0.06)

0.07 (0.05‐0.09)

0.12 (0.09‐0.16)

CT with intravenous contrast enhancement

17

(18)

0.96

(0.92‐0.98)

0.93
(0.90‐0.95)

0.32

0.43

0.58

0.87 (0.82‐0.90)

0.91 (0.88‐0.94)

0.95 (0.93‐0.96)

0.02 (0.01‐0.04)

0.03 (0.02‐0.06)

0.06 (0.03‐0.11)

CT with rectal contrast enhancement

9

(9)

0.97
(0.93‐0.99)

0.95
(0.90‐0.98)

0.32

0.43

0.58

0.91 (0.81‐0.96)

0.94 (0.87‐0.97)

0.97 (0.93‐0.99)

0.02 (0.01‐0.04)

0.03 (0.01‐0.06)

0.05 (0.02‐0.10)

CT with oral contrast enhancement

7

(7)

0.89

(0.81‐0.94)

0.94

(0.90‐0.97)

0.32

0.43

0.58

0.88 (0.81‐0.93)

0.92 (0.87‐0.96)

0.96 (0.92‐0.98)

0.05 (0.03‐0.09)

0.08 (0.04‐0.14)

0.14 (0.08‐0.22)

CT with oral and intravenous contrast enhancement

15

(15)

0.96
(0.93‐0.98)

0.94
(0.92‐0.96)

0.32

0.43

0.58

0.89 (0.85‐0.92)

0.93 (0.90‐0.95)

0.96 (0.94‐0.97)

0.02 (0.01‐0.03)

0.03 (0.02‐0.05)

0.05 (0.03‐0.09)

Low‐dose CT

7

(8)

0.94

(0.90‐0.97)

0.94

(0.91‐0.96)

0.32

0.43

0.58

0.88 (0.82‐0.92)

0.92 (0.88‐0.95)

0.96 (0.93‐0.97)

0.03 (0.02‐0.05)

0.04 (0.02‐0.08)

0.08 (0.04‐0.13)

Conclusion

Sensitivity and specificity of CT for diagnosing acute appendicitis in adults are high. Unenhanced standard‐dose CT appears to have lower sensitivity than standard‐dose CT with intravenous, rectal, or oral+intravenous contrast enhancement. Use of different types of contrast enhancement or no enhancement does not appear to affect specificity. Differences in sensitivity and specificity between low‐dose and standard‐dose CT appear to be negligible. The results of this review should be interpreted with caution for 2 reasons. First, the results are based on studies of low methodological quality. Second, the comparisons between types of contrast enhancement and radiation dose may be unreliable because they are based on indirect comparisons that may be confounded by other factors

CI: confidence interval.
CT: computed tomography.
aIn five studies, participants were randomly allocated to two CT‐protocols, and in another study to three CT‐protocols. These protocols differed with respect to contrast enhancement and radiation dose. This generated seven additional study populations, which were included as separate studies in the meta‐analyses.

bThe distribution of the prevalence of appendicitis was roughly similar in the included studies across subgroups. Therefore, to facilitate comparison of post‐test probabilities between subgroups, these probabilities were calculated for the 25%, 50%, and 75% percentiles of prevalence for all 71 study populations.

Background

Target condition being diagnosed

Acute appendicitis (appendicitis) is a common cause of abdominal pain, with an incidence of around 1 per 1000 per year (Hall 2010), and with a lifetime risk of 7% to 9% in developed countries (Anderson 2012). Appendicitis is an inflammation of the vermiform appendix, but the etiology of the inflammation and its progression remains poorly understood. Obstruction of the appendix lumen by a fecalith, stool, or caecum tumour may elicit appendicitis, but it appears that genetic and environmental factors are also important for the development of appendicitis (Sadr 2009). The characteristic medical history is one of central abdominal pain followed by nausea, vomiting, anorexia, and migration of pain to the right iliac fossa. Clinical and laboratory findings include mild pyrexia, exacerbation of pain on coughing, maximum tenderness in the right lower fossa, and elevated white blood cell count and C‐reactive protein concentration (Bhangu 2015; Humes 2006; Paulson 2003; Wagner 2009). Migration of pain and signs of peritoneal irritation (guarding, percussion, and rebound tenderness) appear to be the most reliable clinical features (Andersson 2004), but these features may be absent in up to 70% of patients with suspected appendicitis (Lameris 2009). Hence, the diagnosis based on history, clinical findings, and laboratory results is often difficult, particularly in women of childbearing age, because persons with a wide range of intra‐abdominal and pelvic pathology may have a similar clinical presentation. The treatment of choice for most persons is appropriate supportive therapy followed by expedient surgical excision of the appendix (appendicectomy). Based on intraoperative findings, appendicitis is classified as simple or complex (gangrenous or perforated appendix with or without abscess formation). Accordingly, the clinical spectrum of appendicitis is wide‐ranging ‐ from uncomplicated disease that may be self‐limiting to severe complicated disease with generalised peritonitis, sepsis, abscess formation, bowel obstruction, and rarely death (Blomqvist 2001). Over the past decade, several randomised controlled trials (RCTs) have shown that antibiotic therapy can be successful in 70% to 75% of persons with uncomplicated appendicitis on computed tomography (CT); remaining persons will need subsequent appendicectomy within the following year (Salminen 2015; Vons 2011). Laparoscopic appendicectomy is generally recommended over open appendicectomy due to less postoperative pain, lower incidence of surgical site infection, and reduced length of hospital stay (Di Saverio 2016). Conservative therapy with antibiotics and percutaneous drainage is recommended for persons presenting with an appendiceal abscess (Andersson 2007).

Index test(s)

Computed tomography (CT) is an imaging method that uses a series of X‐ray measurements from different angles and computer software to generate cross‐sectional images of the body. CT of the abdomen and pelvis has been used since the late 1980s to assess persons with suspected appendicitis (Balthazar 1986). With modern multi‐slice CT or multi‐detector row CT (MDCT), an abdominopelvic CT‐scan is acquired in a few seconds once the patient is positioned. The most common approach is to visualise the entire abdomen and pelvis via thin‐section images (≤ 5 mm), but protocols focusing on the lower abdomen and pelvis are also used to reduce radiation exposure at the expense of missing disease processes in the upper abdomen (Brown 2008). Enhancement by intravenous (IV), oral, or rectal contrast material is often used to optimise image quality and aid visualisation of the appendix; however, use of oral contrast is time‐consuming, rectal contrast is uncomfortable for the patient, and IV contrast may cause allergic reactions. Moreover, it is controversial whether contrast enhancement is needed for the radiological diagnosis of appendicitis (Neville 2009); hence, no consensus has been reached about the most appropriate CT‐protocol for persons with suspected appendicitis (Drake 2014; Tan 2017). The introduction of 16‐MDCT in 2002 enabled high‐quality multi‐planar re‐formations with coronal and sagittal cross‐sectional images that facilitate identification of the appendix (Paulson 2005). CT criteria used in most studies to detect an inflamed appendix have included an appendiceal diameter exceeding 6 mm and the finding of periappendiceal inflammation, an appendicolith, or thickening of the caecal wall (Terasawa 2004). Radiation exposure (effective dose) related to contrast‐enhanced abdominopelvic CT varies between 8 and 16 mSv (Smith‐Bindman 2009; Yun 2017), which roughly corresponds to three to six years of background radiation in most parts of the world. The estimated increased lifetime risk of cancer following an abdominopelvic CT‐scan is 0.02% to 0.14%; the lower the age at the time of CT‐scan, the higher the estimated risk (Brenner 2007). Many studies have evaluated the diagnostic accuracy of different types of CT (CT‐protocols) for appendicitis; accuracy has been high in previous meta‐analyses with summary estimates of sensitivity and specificity above 0.9 (Al‐Khayal 2007; Anderson 2005;Dahabreh 2015; Hlibczuk 2010; Terasawa 2004; Weston 2005; Xiong 2015). Several recent studies have demonstrated that low‐dose CT (effective dose around 2 mSv) is as accurate as standard‐dose CT for diagnosing appendicitis (Yun 2017). By contrast, the accuracy of CT in separating simple from complex appendicitis is more heterogeneous, with estimates of sensitivity and specificity ranging from 0.28 to 0.95, and from 0.88 to 1.0, respectively (Foley 2005; Horrow 2003; Oliak 1999; Suh 2011).

Clinical pathway

Adult persons admitted with acute pain in the right lower abdomen or possible appendicitis are routinely assessed by a general surgeon or an emergency physician via history‐taking, physical examination, urinalysis, and blood testing, including a differential white blood cell count and C‐reactive protein (CRP) concentration. In women of childbearing age, a gynaecological examination is performed and blood tests or urinalysis includes a pregnancy test (human chorionic gonadotropin analysis) (Humes 2006). Based on weighting and integration of collected information, the clinician must decide the appropriate course of action. If the risk of appendicitis is considered low, the clinician may decide on discharge; conversely, if the risk is high, the clinician will plan to perform surgery. If the risk is intermediate due to an equivocal clinical presentation, the clinician is likely to perform imaging tests or diagnostic laparoscopy, or to admit for observation. The proportion of persons with suspected appendicitis who have imaging tests varies considerably between settings. Assessment of risk of appendicitis may be subjective, or it may be based on one of several clinical decision rules developed to assist the clinician in decision‐making. Such decision rules include the Alvarado Score (Alvarado 1986), the Appendicitis Inflammatory Response Score (Andersson 2008), the Adult Appendicitis Score (Sammalkorpi 2014), and the Raja Isteri Pengiran Anak Saleha Appendicitis (RIPASA) Score (Chong 2010). Imaging tests often used include ultrasonography (US), CT, or sequential US and CT (i.e. CT following inconclusive findings on US). Magnetic resonance imaging (MRI) is typically reserved for children and pregnant women (Di Saverio 2016). The use of CT is common in the USA, where more than 90% of persons have CT before appendicectomy in some regions (Coursey 2010; Drake 2014). In England, the corresponding proportion was 13% in 2012 (National Surgical Research Collaborative 2013). In the Netherlands, almost all persons who undergo appendicectomy have preoperative sequential US and CT (van Rossem 2016). If the diagnosis of appendicitis is confirmed by imaging tests, most persons proceed to surgery. If the diagnosis is not confirmed, persons may be discharged or admitted for observation. Among the elderly with suspected appendicitis, CT is often performed to rule out conditions such as right‐sided colon cancer and diverticulitis.

Role of index test(s)

CT serves as an add‐on test to reduce diagnostic uncertainty following clinical evaluation, blood testing, and urinalysis in persons with suspected appendicitis. If accurate, CT can play an important role in reducing both unnecessary surgery and delay of surgery. When appendicitis is not confirmed by CT, CT images are often helpful for diagnosing other causes of abdominal pain, such as cholecystitis, diverticulitis, renal calculi, epiploic appendagitis, bowel obstruction, and gynaecological conditions. Historically, the negative appendicectomy rate (NAR) for persons operated on for acute appendicitis has exceeded 20% due to the low accuracy of clinical assessment and a low threshold to perform surgery to avoid potential disease progression through perforation and abscess formation (Lewis 1975;Velanovich 1992). The NAR is the proportion of resected appendices without histological evidence of inflammation out of all resected appendices. Along with the perforation rate, NAR is an often used indicator of the accuracy of the preoperative evaluation of persons with suspected appendicitis. A systematic review with meta‐analysis of results from 20 studies found a significantly lower NAR in persons who had clinical evaluation and preoperative CT compared to those who had clinical evaluation only (9% vs 17%, respectively; P = 0.001; Krajewski 2011). The time from emergency department to operating room was examined in 10 studies, and the mean waiting time was longer for those who had preoperative CT than for those who did not (800 vs 468 minutes; no statistical analysis due to lack of standard deviations), but no statistical difference in summary estimates of perforation rates was evident. Additionally, two studies from the USA have demonstrated a drop in NAR from 23% to 24% to 2% to 3% from the 1990s to 2007, coinciding with an increase in the use of preoperative CT from 10% to 20% to more than 85% (Raja 2010; Raman 2008). Results from other studies indicate that the effects of preoperative CT on NAR are limited to women younger than 45 years, whereas there is little or no effect on men (Coursey 2010; Wagner 2008). The accuracy of clinical assessment alone versus clinical assessment and CT has been compared in three RCTs with a total of 400 participants. The sensitivity of the former was 1.0 for all studies compared to 0.90 to 0.94 for the latter. Conversely, specificity was generally lower for clinical assessment alone (0.73 to 0.88) compared to clinical assessment and CT (0.93 to 1.0) (Hong 2003; Lopez 2007; Walker 2000). Two of the studies concluded that the accuracy of clinical assessment and CT was not superior to the accuracy of clinical assessment alone; the third study reached the opposite conclusion (Walker 2000).

Alternative test(s)

Alternative add‐on tests used to reduce diagnostic uncertainty following clinical evaluation are ultrasonography (US), magnetic resonance imaging (MRI), and diagnostic laparoscopy (DL). US has been used since the 1980s in persons with suspected appendicitis (Rybkin 2007); the main advantages are that US is free from radiation exposure, widely available, quick to perform, and cheap. Refinements in US technology and use of Doppler sonography and the graded compression technique have improved both visualisation of the appendix and accuracy (Birnbaum 2000). However, the utility of US is hampered because the appendix can be difficult to visualise even for experienced radiologists due to obesity and overlying bowel gas, resulting in inconclusive examinations in up to 30% to 50% of cases (D'Souza 2015; Leeuwenburgh 2013; Poletti 2011; Poortman 2009). Several meta‐analyses have compared the accuracy of US and CT (Doria 2006;Terasawa 2004; van Randen 2008), revealing lower sensitivity and specificity for US compared to CT. In the most recent meta‐analysis, summary sensitivity and specificity were 0.85 and 0.90, respectively, for US, and 0.96 and 0.96, respectively, for CT (Dahabreh 2015). Nevertheless, in some settings, US is used as the primary imaging test in most persons with suspected appendicitis, and CT is primarily reserved for persons with inconclusive US findings (van Rossem 2016).

Over the past 10 years, MRI has been increasingly used for assessment of persons with possible appendicitis. Advances in MRI hardware and software as well as in radiologists' expertise have led to increasing accuracy and quicker scan times (Leeuwenburgh 2012). Although MRI offers disadvantages such as high costs, long acquisition times, and limited availability, the features of high accuracy and non‐ionising radiation make MRI particularly attractive for pregnant women and children with an inconclusive US examination (Basaran 2009). Summary estimates of sensitivity and specificity in the currently most comprehensive meta‐analysis of results from 30 studies were 0.96 (95% confidence interval (CI) 0.95 to 0.97) and 0.96 (95% CI 0.95 to 0.97), respectively (Duke 2016). Summary estimates were similar in subgroups of children and pregnant women. A recent study used a paired design to compare MRI and CT in participants older than 11 years (Repplinger 2018). Sensitivity and specificity were 0.97 and 0.81 for unenhanced MRI, and 0.98 and 0.90 for IV contrast‐enhanced CT, respectively. The difference in specificity was statistically significant. Another paired study compared the accuracy of IV contrast‐enhanced CT and unenhanced MRI in persons with suspected appendicitis following a negative or inconclusive US examination (Leeuwenburgh 2012). Sensitivity and specificity were 0.98 and 0.88 for MRI, and 0.97 and 0.91 for CT, respectively. The difference in specificity was not statistically significant.

Diagnostic laparoscopy (DL) is a surgical procedure performed under general anaesthesia by which two or three cannulas are inserted through the abdominal wall after pneumoperitoneum with carbon dioxide has been established. A laparoscope and a grasper are inserted though the cannulas, loops of small bowel are swept away from the right lower quadrant, and the appendix is visualised. If the appendix appears inflamed, it is resected; if it appears normal, other causes of abdominal pain are sought. It remains controversial whether a macroscopically normal looking appendix should be resected or left in situ (Bijnen 2003; Grunewald 1993; Strong 2015; Teh 2000; van den Broek 2001). DL is used more often in European countries than in the USA, where CT is the most commonly used add‐on test following clinical evaluation (Di Saverio 2016; Jaunoo 2012; National Surgical Research Collaborative 2013). A recent review included 54 studies evaluating the accuracy of diagnostic laparoscopy; median sensitivity and specificity were 1.00 and 0.89, respectively (Dahabreh 2015). However, estimates showed wide variability, with sensitivity ranging from 0.37 to 1.0 (interquartile range 0.95 to 1.0), and specificity ranging from 0 to 1.0 (interquartile range 0.73 to 1.0). Complications of DL appear to be infrequent (< 2% in most studies); however in many studies, it was difficult to distinguish complications related to the diagnostic phase of laparoscopy from complications related to the therapeutic phase (appendicectomy). The most common complications were wound infection, postoperative ileus, deep venous thrombosis, haematoma, and intra‐abdominal infection (Dahabreh 2015).

Rationale

Assessment of persons with suspected appendicitis is a common and often difficult task for emergency physicians and general surgeons. Imaging tests are frequently used when the diagnosis is uncertain following clinical examination, blood testing, and urinalysis. The magnitude and importance of this assessment task are reflected by the fact that appendicectomy is the most frequently performed abdominal emergency procedure, with approximately 50,000 and 300,000 appendicectomies performed annually in the UK and the USA, respectively (Hospital Episode Statistics 2015; Weiss 2014). As part of the ongoing effort to develop an evidence‐based algorithm for the treatment of persons with suspected appendicitis, it is important to systematically review the accuracy of these imaging tests. Ideally, such a review should summarise and compare the accuracy of US, CT, and MRI, and the sequential use of these tests; however, the resources needed to perform such a review are extensive. Because CT appears to be the imaging test used most often (Jaunoo 2012), we limited our task to reviewing the accuracy of CT as a first‐line imaging test in adults and exploring differences in accuracy between CT‐protocols defined by the use of contrast enhancement and radiation dose. We excluded studies in children because US is usually the first‐line imaging test used in children, and CT is reserved for those with negative or inconclusive US findings to reduce radiation exposure (Frush 2009; Hernanz‐Schulman 2010; Strouse 2010). In our view, the methodological issues related to sequential use of imaging tests in children with suspected appendicitis require special attention in a separate review. Other Cochrane Review author teams are currently engaged in reviews of the accuracy of MRI and US for appendicitis.

Objectives

Primary objective

Our primary objective was to evaluate the accuracy of CT for diagnosing appendicitis in adults with suspected appendicitis.

Secondary objectives

Our secondary objectives were to compare the accuracy of contrast‐enhanced versus non‐contrast‐enhanced CT, to compare the accuracy of low‐dose versus standard‐dose CT, and to explore the influence of CT‐scanner generation, radiologist experience, degree of clinical suspicion of appendicitis, and aspects of methodological quality on diagnostic accuracy.

Methods

Criteria for considering studies for this review

Types of studies

We included prospective studies comparing the results of CT to the results of a reference standard test for appendicitis. We excluded studies with a case‐control design and studies with fewer than 10 participants. We considered studies in which all participants had histologically verified appendicitis as irrelevant because such studies cannot estimate specificity. In cases of duplicate publications, we considered the study report with the largest number of participants or the most information as the primary study report. We applied no language restrictions. We excluded studies using retrospectively collected data to reduce potential bias from partial verification.

Participants

We included studies in adults (> 14 years of age) with suspected appendicitis based on history, physical examination, and/or blood testing. We accepted authors' definitions of suspected appendicitis and applied no restrictions regarding the degree of suspicion of appendicitis. We excluded studies recruiting only pregnant women, as well as studies in persons with abdominal pain at any location and no particular suspicion of appendicitis. We also excluded studies in which all participants had US before CT, and the decision to perform CT depended on the outcome of US. In the protocol, we accepted studies with a mixed adult‐paediatric population if the paediatric fraction accounted for 10% or less of the group. We planned to contact study authors with a request for results for the adult subgroup when more than 10% of participants were younger than 15 years, but this turned out to be not feasible. Therefore, we decided to include studies with mixed adult‐paediatric populations, and we planned sensitivity analyses to explore whether summary sensitivity and specificity differed in such studies compared to studies including only adults (see Differences between protocol and review).

Index tests

Index tests included a sequential or helical abdominopelvic CT‐scan whereby the interpreter was assessing the appendix and its surroundings for signs of appendicitis. We applied no restrictions related to image acquisition, CT‐scanner generation, the part of the abdomen included in the scan (lower vs entire abdomen), radiation dose, or the use of enhancement by IV, oral, or rectal contrast material. We included no comparator tests.

Target conditions

The target condition was acute appendicitis. We did not distinguish between simple and complex appendicitis. We excluded studies evaluating the accuracy of CT for differentiating between simple and complex appendicitis.

Reference standards

We included studies that used one of the following two reference standards.

  • Histological examination of the removed appendix as well as clinical follow‐up of participants who did not have surgery.

  • Laparoscopic assessment of the appendix by the surgeon as inflamed or normal, as well as clinical follow‐up of participants who did not have surgery.

We included studies in which all participants had surgery if intraoperative assessment or histological examination was used as the reference standard. We also included studies that combined the two reference standards because only macroscopically inflamed appendices were resected and examined histologically. We considered intraoperative assessment by laparotomy and laparoscopy as equal. As stated above, we found wide variation in estimates of sensitivity and specificity for the laparoscopic appendix assessment when histological assessment was used as the reference standard, and whether a normal looking appendix should be resected or left in situ in persons undergoing laparoscopy for suspected appendicitis remains controversial. For this reason, we decided to consider laparoscopic assessment as a legitimate reference standard for appendicitis. We performed a sensitivity analysis to explore the potential consequences thereof. This analysis was not planned in the protocol (see Differences between protocol and review).

Search methods for identification of studies

Electronic searches

We searched MEDLINE and Embase via OVID by using an electronic search strategy that combines indexing terms and text words to capture the index test and the target disease. We developed our search strategy in collaboration with the medical information specialist of the Colorectal Cancer Group. We applied no filters in our electronic searches to target diagnostic test accuracy studies. We have presented our search strategies for MEDLINE in Appendix 1, and for Embase in Appendix 2. We performed the latest update of these searches on 16 June 2017. We also searched the Science Citation Index for study reports that had cited the included studies. We did not restrict studies on the basis of language or publication status.

Searching other resources

We screened the reference lists of included studies and existing systematic reviews for relevant studies.

Data collection and analysis

Selection of studies

Two review authors independently applied the selection criteria to the titles and abstracts of study reports identified by the search strategy. If the decision to exclude a study could not be made on the basis of the title and the abstract, we retrieved the entire study report for assessment. We based the final decision on inclusion on the entire study report. We resolved disagreements between review authors by discussion, or if necessary, by consultation with a third review author. We contacted study authors when information was insufficient to indicate whether a study could be included.

Data extraction and management

Two review authors independently extracted information from included studies using a data collection form. We collected the following information: country, publication language, selection criteria, recruitment procedure, study design, clinical setting, and age and gender distribution. For each study, we noted if participants were recruited regardless of the suspicion of appendicitis, or if recruitment was limited to those with intermediate suspicion due to an equivocal presentation. If all participants had surgery, we classified the degree of suspicion as high. For the index test, we collected information on CT manufacturer, model name, CT‐scanner generation (sequential/helical, single slice/ multi‐slice), slice thickness, slice interval, voltage, mAs level, use of multi‐planar reformations, use of contrast enhancement, use of a low‐dose protocol, radiologist experience, criteria for CT diagnosis of appendicitis, and whether CT was compared to other tests. We also extracted counts of true‐positive (TP), false‐positive (FP), false‐negative (FN), and true‐negative (TN) CT assessments. Finally, we collected information to support the assessment of methodological quality, particularly features related to the reference standard and patient flow. We piloted the data collection form on five studies assessing the accuracy of CT for appendicitis in children. We contacted study authors if information needed for quantitative analyses was unclear or was not reported.

Assessment of methodological quality

We used the Quality Assessment of Studies of Diagnostic Accuracy ‐ Revised (QUADAS‐2) tool to assess methodological quality. To promote consistent assessments, we developed a rating guideline with operational criteria for answering signalling questions and assessing risk of bias and concern regarding applicability (Appendix 3). Two review authors independently applied the QUADAS‐2 tool and resolved disagreements by discussion. We piloted our adaptation of the QUADAS‐2 tool on five studies assessing the accuracy of CT for appendicitis in children. We have presented the outcome of the methodological quality assessment graphically in standard figures. We explored the influence of bias risk on summary estimates of sensitivity and specificity in sensitivity analyses when feasible.

Statistical analysis and data synthesis

We used the bivariate random‐effects model to summarise sensitivity and specificity because we anticipated little variation between studies in the CT features that were used to diagnose appendicitis (Reitsma 2005). We performed an overall meta‐analysis with results from all studies regardless of contrast enhancement and radiation dose. If studies reported results for two or more independent study populations (i.e. randomised studies), we included the results for each study population in the analyses. In case accuracy analyses were reported for several CT criteria (i.e. thresholds), we focused on the criterion that conferred the highest degree of homogeneity with other studies. If results were reported for several observers without overall estimates of sensitivity and specificity, we calculated average values across observers for TP, FP, FN, and TN and rounded them to integers. To present and visually explore the variation between studies in sensitivity and specificity, we plotted study results in forest plots and in receiver operating characteristic (ROC) plots. For each analysis, we calculated a 95% prediction region around the summary estimate from the parameters of the bivariate model and added it to the plot. This region covers the range of sensitivity and specificity that would be expected in 95% of future large studies if it is assumed that the statistical model is adequate. We calculated summary likelihood ratios from summary estimates of sensitivity and specificity. We also calculated post‐test probabilities for appendicitis following positive and negative CT results for the 25%, 50%, and 75% percentiles of prevalence in the included studies.

In subgroup analyses, we explored and compared the accuracy of CT according to types of contrast enhancement (IV, oral, rectal, IV and oral) using unenhanced CT as the reference. We also compared the accuracy of low‐dose and standard‐dose CT (this subgroup analysis was not planned in the protocol). In the subgroup analyses, we applied the following rules if several CT‐protocols were used in the same study.

  • If the CT‐protocol differed in 20% of participants or less, we analysed the study according to the CT‐protocol used in the majority of persons.

  • If the CT‐protocol differed in more than 20% of participants, we contacted study authors to request subgroup data. If we received no reply from study authors, we excluded the study from the subgroup analysis.

We performed meta‐regression analyses to explore potential sources of heterogeneity (see below). We performed these analyses by adding one covariate at a time to the bivariate model. We used a likelihood ratio test to compare nested models with and without covariates and to test whether summary sensitivity and specificity differed between groups. If the number of studies made it meaningful to add parameters to the models, we tested whether the assumption of equal variances for the random‐effects model across groups was reasonable. Fitting models with separate variances for the random‐effects model for each group did not improve the fit of any of the models (P > 0.12; likelihood ratio test), hence we used equal variances for the random‐effects model in all analyses. Using parameter estimates from the bivariate model, we calculated absolute differences in summary sensitivity and specificity between different types of contrast enhancement and unenhanced CT. We also calculated these differences between low‐dose and standard‐dose CT. We calculated a 95% confidence interval for these differences by using the delta method. We used the metandi, xtmelogit, and nlcom commands in Stata version 13 (Stata‐Corp, College Station, Texas, USA) to perform the analyses.

Investigations of heterogeneity

We explored the following study characteristics as sources of heterogeneity.

  • CT‐scanner generation: number of detector rows fewer than 16 versus equal to or greater than 16.

  • Assessment by senior radiologist versus another individual.

  • Participants with intermediate suspicion of appendicitis due to an equivocal presentation versus participants with any suspicion of appendicitis (In the protocol, this analysis was planned as a sensitivity analysis of studies in participants with intermediate suspicion).

Sensitivity analyses

We performed sensitivity analyses to explore the effects of methodological quality on summary estimates of sensitivity and specificity. We implemented these analyses as a subgroup analysis in studies with low risk of bias across the four domains in QUADAS‐2 (in the protocol, it was planned to investigate the impact of each of the four domains in meta‐regression analyses).

We also performed a sensitivity analysis to explore whether inclusion of studies with a mix of paediatric and adult participants affected the summary estimates. Moreover, we explored whether summary estimates were affected by the inclusion of studies that used laparoscopic assessment of the appendix as a reference standard. Finally, we explored the impact of selecting different analyses from paired studies that reported two or more analyses in the same study population. These analyses were not planned in the protocol.

Assessment of reporting bias

We performed no assessment of reporting bias.

Results

Results of the search

Through our electronic search of MEDLINE and Embase, we identified 9841 references; 2762 of these were duplicates. Science Citation Index provided one additional reference. We excluded 6606 irrelevant references after reading titles and abstracts, and we collected the full text of 474 articles for further assessment. Of these, 236 did not report a diagnostic accuracy study of CT in persons with suspected appendicitis, and we excluded 174 for the reasons stated in Figure 1. Sixty‐four studies complied with the selection criteria, and these studies provided data for the review. We contacted the corresponding authors of 26 studies; ten replied, and nine provided supplementary information (Holloway 2003; Jo 2010; Keyzer 2004; Ozturk 2014; Repplinger 2015; Scott 2015; Sim 2013; Tan 2015; Uzunosmanoglu 2017).


Study flow diagram.CT: computed tomography.
 US: ultrasonography.

Study flow diagram.

CT: computed tomography.
US: ultrasonography.

Characteristics of included studies

In 55 of the 64 included studies, the outcome of a single CT‐protocol was compared to the result of the reference standard. Six studies randomly allocated participants to have one of two CT‐protocols (Hekimoglu 2011; Kepner 2012; Keyzer 2009; Kim 2012; Mittal 2004), or to have one of three CT‐protocols (Hershko 2007). Four studies compared different CT‐protocols in the same participants, three studies compared two protocols (Jacobs 2001; Keyzer 2004; Platon 2009), and one study compared four CT‐protocols in each of two randomised groups (Keyzer 2009). Hence, the review includes 80 analyses of accuracy from 71 separate study populations with a total of 10,280 participants (4583 with and 5697 without acute appendicitis). The median number of participants in the 71 separate study populations was 100, with interquartile range 65 to 157, and range 26 to 738.

All studies were reported in full‐text publications except two. One was published as a letter to the editor (Cougard 2002), and the other was published as a conference abstract (Repplinger 2015). The authors of the latter provided an unpublished full‐text manuscript (Repplinger 2018). The publication language was English in 58 studies, French in two studies, and Spanish, Turkish, Russian, or German in four studies. The studies were performed in 22 countries; 30 studies were performed in the USA. Three studies were multi‐centre studies conducted at two (in't Hof 2004), two (Kim 2008), and six participating centres (Atema 2015).

The accuracy of CT was compared to the accuracy of US in 13 studies, to clinical decision rules or clinical assessments in nine studies, to MRI in one study, and to CT conditional on US results in one study. These were randomised trials or paired diagnostic accuracy studies.

Settings and features of the study populations

The clinical settings were emergency departments, general surgery departments, and radiology departments in 34, one, and 15 studies, respectively. In 14 studies, the setting was unclear. All studies were performed in secondary or tertiary care hospitals. Among the 71 separate study populations, the median prevalence of appendicitis was 0.43, with interquartile range 0.32 to 0.58, and range 0.13 to 0.92. The gender distribution was reported for 67 study populations, and the median percentage of women was 55%, with interquartile range 49% to 61%, and range 26% to 100%. The median or mean age of study participants was available for 59 study populations, and the median of these was 33 years, with interquartile range 30 to 38 years, and range 25 to 46 years. Participants younger than 15 years of age were included in 30 study populations. The percentage of paediatric participants was available for five of these populations; it ranged from 3% to 15%. The authors of one study provided subgroup results for participants aged 15 years or older (Sim 2013). All participants were 15 years of age or older in 39 study populations, and two studies provided no information about the age distribution (Holloway 2003; Megibow 2002). Based on available information, we considered it most likely that the latter two studies included adults or a mix of adults and children.

No study reports mentioned that a course of antibiotic therapy was used as an alternative to surgery, or that antibiotic therapy in participants with a negative CT result was a reason for exclusion.

CT‐scanners and CT‐protocols

A single CT‐scanner was used in 50 studies, two were used in 12 studies, three were used in one study, and six were used in a multi‐centre study at six centres (Atema 2015). Hence, overall 83 CT‐scanners were used in the included studies. Of these, 68 were helical, seven were non‐helical, and eight were not described as helical or non‐helical. Of the 68 helical CT‐scanners, 22 were single detector row devices, 35 were multi‐detector row devices, and it was unclear for 11 CT‐scanners if they were single or multi‐detector row devices. For the multi‐detector row CT‐scanners, the number of detector rows was 2, 4, 16, 64, 128, 256, and unclear for 1, 7, 10, 6, 3, 2, and 6 scanners, respectively. The entire abdomen and pelvis was included in the CT‐scan in 34 study populations, whereas the scan included only the lower abdomen and pelvis in 29 study populations. The field of view was not reported for eight study populations. Additional details about CT‐protocols are presented in Table 1. We have described the use of contrast enhancement and low‐dose protocols below under subgroup analyses.

Open in table viewer
Table 1. Components of CT‐protocols in the 64 included studies

CT‐protocol

components

Number

of studies

Slice thickness (mm)

0.6‐2.9

3.0‐4.9

5.0‐6.9

7.0‐10.0

not stated

6

9

36

4

9

Slice interval (mm)

0.6‐2.9

3.0‐4.9

5.0

10.0

not stated

6

5

16

1

36

Voltage (kV)
120

140

200

not stated

21

4

1

38

mAs product (mAs)

30‐100

100‐199

200‐299

≥ 300

not stated

4

5

5

4

46

CT: computed tomography.
Atema 2015 was a multi‐centre study including six centres.
The most commonly used CT‐protocol specified the following: slice thickness 3 mm; voltage 120 kV; and mAs product 165 mAs.
These values are used in the table.

Methodological quality of included studies

The outcome of our assessment of methodological quality is described below and is summarised in Figure 2 and Figure 3. None of the included studies were high‐quality studies defined as studies with low risk of bias for all four domains. Three studies had low risk of bias for three domains (in't Hof 2004; Keyzer 2009; Pakaneh 2008). Fifteen studies had high or unclear risk of bias for all four domains. Insufficient reporting defined as one or more domains with unclear risk of bias was noted in 52 studies. Our assessments of the signalling questions for each study are presented under Characteristics of included studies


Risk of bias and applicability concerns graph: review authors' judgements about each domain presented as percentages across included studies.

Risk of bias and applicability concerns graph: review authors' judgements about each domain presented as percentages across included studies.


Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study.

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study.

Domain 1: patient selection

A consecutive or a random sample of persons was enrolled in 24 studies, and inappropriate exclusions were avoided in 32 studies. Fifteen studies complied with both of these signalling questions and were considered to have low risk of bias for the patient selection domain. Both signalling questions were scored as unclear for 17 studies. As regards applicability, we considered the study population to represent an unselected sample of persons with suspected appendicitis in seven studies, whereas this was not so for 10 studies. In 47 studies, it was unclear if the study population was representative.

Domain 2: index test

In 58 studies, the CT‐scan was evaluated without knowledge of the reference standard. This information was unclear in six studies. The criteria for the CT diagnosis of appendicitis were prespecified in 48 studies. This was not done in 13 studies and was unclear in three studies. We assessed the risk of bias introduced by execution and interpretation of CT‐scans as low, high, and unclear in 44, 12, and 8 studies, respectively.

The description of the CT‐scanner (manufacturer, model name, helical vs non‐helical, number of detector rows) and the CT‐protocol (use of contrast enhancement, low dose vs standard dose, slice thickness, slice interval, voltage and mAs product, use of multi‐planar reconstruction) was adequate in 19 studies and was inadequate in 44 studies, whereas it was unclear for one study. In 11 studies, it was explicitly stated that coronal and/or sagittal reformations were used in the assessments.

The features included in the CT analyses were reported in 52 studies. The six most common features were appendix diameter (41 studies, diameter > 6 mm in 34 studies), periappendicular inflammation (40 studies), appendicolith (29 studies), abscess or phlegmon (19 studies), thickened or layered appendix wall (13 studies), and periappendiceal free fluid (13 studies).

The incorporation of equivocal CT assessments in the analyses was reported by 19 studies. Equivocal CT assessments were counted as positive for appendicitis in six studies, negative in eight, and excluded from analyses in two. Other incorporations were used in three studies. Results were based on initial assessment of the CT‐scan in 31 studies; this was not so in 18 studies and was unclear in 15 studies. Overall, our concern regarding applicability of the execution and interpretation of CT‐scans was high, low, and unclear for 46, 10, and 8 studies, respectively.

Domain 3: reference standard

A single reference standard was used in six studies in which all participants had surgery. Among these studies, histological examination of the resected appendix was performed in three (Pakaneh 2008;Uzunosmanoglu 2017;Wong 2002), intraoperative findings were used in two (Gamanagatti 2007;in't Hof 2004), and it is unclear if the reference standard was based on intraoperative findings or on histological assessments of resected appendices in one (Nemsadze 2009). In another study, all participants had surgery, but macroscopically normal looking appendices were left in situ if participants had a laparoscopy, hence the reference standard was macroscopic findings during laparoscopy combined with histological examination of removed appendices (Poortman 2003). In the remaining 57 studies, only a subset of participants had surgery with or without appendectomy. Various follow‐up regimens were used as a reference standard in those who did not have surgery. These regimens were highly heterogenous and ranged from checking hospital records for readmission to using systematic and standardised regimens including one or more telephone interviews, mailed questionnaires, or outpatient consultations within a predefined time frame. Telephone interviews, mailed questionnaires, outpatient visits, and review of medical records were conducted in 27, 3, 7, and 14 studies, respectively. Some studies used more than one of these methods for follow‐up. The follow‐up interval after CT or discharge was reported in 40 studies: it was up to one month, one to three months, and four or more months in 11, 15, and 8 studies, respectively. In six studies, the upper limit of the follow‐up interval was not reported.

In our assessment, the reference standard was likely to correctly classify participants as having or not having acute appendicitis in 22 studies; this was not the case in 29 studies, and it was unclear in 13 studies. Inadequate or insufficiently described follow‐up was the reason that 42 studies did not comply with our criteria for correct classification.

In 24 of these 42 studies, follow‐up methods as well as follow‐up intervals were inadequate or were not reported. In three studies, the follow‐up interval was within 31 days, which was the longest duration we accepted, but the follow‐up method was inadequate (checking for readmissions, reviewing hospital records, or method not stated). In the remaining 15 studies, the method of follow‐up was adequate, but the follow‐up interval was not; length of follow‐up after CT was within three months in five studies, was longer than three months in six, and was not stated in four.

Histological evaluations, intraoperative findings, and results of follow‐up were assessed without knowledge of the CT outcome in two studies (in't Hof 2004; Keyzer 2009). In 59 studies, this information was unclear, and in three studies, the reference standard included intraoperative assessment of the appendix by an unblinded surgeon (Gamanagatti 2007; Jacobs 2001; Platon 2009).

Overall, there was low risk of bias in the reference standard domain for two studies (in't Hof 2004; Keyzer 2009), high risk for 30 studies, and unclear risk for 32 studies. Our concern regarding applicability of the reference standard was low for the 58 studies with differential verification because the reference standard in these studies reflects clinical practice wherein only some persons with suspected appendicitis have surgery.

Domain 4: flow and timing

More than 95% of participants received a reference standard in 44 studies. This assessment was liberal, as it was often difficult to determine if participants scheduled for follow‐up had received follow‐up as intended. The choice of reference standard was considered independent of the CT result in eight studies; in seven of these, all participants had surgery. In five studies, it was unclear if the reference standard was independent of CT outcome. As stated above, all participants received the same reference test in six studies.

All participants with a CT diagnosis of appendicitis had surgery in 21 studies. In 18 studies, a few participants with a CT diagnosis of appendicitis were followed up. Likewise, all participants without CT signs of appendicitis were followed up in three studies, whereas a few participants without CT signs of appendicitis had surgery in 46 studies.

All participants were included in the analyses in 50 studies, in 13 studies they were not, and in one study this was unclear. Reasons why participants were not included in analyses included because they did not have surgery in three studies, because they were lost to follow‐up in four studies, because CT findings were inconclusive in three studies, and for other reasons in three studies.

In our assessment, there was low risk of bias in the flow and timing domain for three studies (in't Hof 2004; Pakaneh 2008; Wong 2002), risk was high for 60 studies, and risk was unclear for one study.

Findings

Overall, the diagnostic accuracy of CT was reported for 71 separate study populations in the 64 included studies. Estimates of sensitivity ranged from 0.72 to 1.0, and estimates of specificity from 0.5 to 1.0. Sensitivity and specificity were higher than 0.90 in 40 study populations. The forest plot is presented in Figure 4, and the summary ROC plot in Figure 5. In the overall meta‐analysis of results from the 71 study populations, summary sensitivity was 0.95 (95% confidence interval (CI) 0.93 to 0.96), and summary specificity was 0.94 (95% CI 0.92 to 0.95). The summary positive likelihood ratio was 15 (95% CI 12 to 19), and the summary negative likelihood ratio was 0.05 (95% CI 0.04 to 0.07). At the median appendicitis prevalence of 0.43, the probability of appendicitis following a positive and a negative CT result was 0.92 (95% CI 0.90 to 0.94) and 0.04 (95% CI 0.03 to 0.05), respectively. At the 25% percentile prevalence of 0.32, the probability following a positive and a negative CT result was 0.70 (95% CI 0.65 to 0.74) and 0.01 (95% CI 0.01 to 0.01), respectively. At the 75% percentile prevalence of 0.58, the probability following a positive and a negative CT result was 0.96 (95% CI 0.94 to 0.96) and 0.07 (95% CI 0.05 to 0.09), respectively.


Forest plot: CT regardless of contrast enhancement and radiation dose.

Forest plot: CT regardless of contrast enhancement and radiation dose.


Summary ROC plot of CT for diagnosis of acute appendicitis (any contrast enhancement and radiation dose). The hollow symbols represent the pairs of sensitivity and specificity from the included studies; the symbols are scaled according to sample sizes of the studies. The solid circle represents the summary sensitivity and specificity. This summary point is surrounded by a 95% prediction region (interrupted line).

Summary ROC plot of CT for diagnosis of acute appendicitis (any contrast enhancement and radiation dose). The hollow symbols represent the pairs of sensitivity and specificity from the included studies; the symbols are scaled according to sample sizes of the studies. The solid circle represents the summary sensitivity and specificity. This summary point is surrounded by a 95% prediction region (interrupted line).

Comparative subgroup analyses according to contrast enhancement and radiation dose

Unenhanced CT was evaluated in 19 study populations, and CT with IV, rectal, oral, and IV+oral contrast enhancement was evaluated in 18, 9, 7, and 15 study populations, respectively. Summary sensitivity varied between 0.89 (95% CI 0.81 to 0.94) and 0.97 (95% CI 0.93 to 0.99) across subgroups defined by the use of contrast enhancement, and summary specificity varied from 0.93 (95% CI 0.90 to 0.95) to 0.95 (95% CI 0.90 to 0.98). Summary sensitivity was lowest for CT with oral contrast 0.89 (95% CI 0.81 to 0.94) and unenhanced CT 0.91 (95% CI 0.87 to 0.93), whereas the variation was marginal between CT with IV contrast, rectal contrast, and IV+oral contrast. These results correspond with the finding of lower sensitivity but similar specificity in three studies comparing CT with oral contrast enhancement to CT with IV+oral contrast enhancement using a paired or a randomised design (Jacobs 2001; Kepner 2012; Keyzer 2009) (Table 2). Likewise, sensitivity of unenhanced CT was lower than sensitivity of CT with any type of contrast enhancement in two studies with a paired or a randomised design (Hershko 2007; Keyzer 2009).

Low‐dose protocols were evaluated in eight study populations. Summary sensitivity and specificity for low‐dose CT was 0.94 (95% CI 0.90 to 0.97) and 0.94 (95% CI 0.91 to 0.96), respectively. These estimates were similar to summary estimates in the overall meta‐analysis. This finding corresponds closely with the findings in four studies with direct comparisons of low‐dose and standard‐dose CT (Keyzer 2004; Keyzer 2009;Kim 2012;Platon 2009) (Table 3).

Results of the subgroup analyses are summarised in Table 4 presented graphically in Figure 6, Figure 7, Figure 8, Figure 9, and Figure 10, and described below. In addition to the types of contrast enhancement covered by the subgroup analyses, CT with oral+rectal contrast was evaluated in three study populations (Funaki 1998; Kan 2001; Rao 1997), and CT with IV+oral+rectal contrast was evaluated in one study (Mittal 2004). Several types of contrast enhancement were used in three study populations, and results from these populations were excluded from the subgroup analyses (Nemsadze 2009; Pickuth 2001; Weltman 2000). In the protocol, some of the subgroup analyses were planned as sensitivity analyses (see Differences between protocol and review).


Summary ROC plot of CT with intravenous contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.

Summary ROC plot of CT with intravenous contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.


Summary ROC plot of CT with rectal contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.

Summary ROC plot of CT with rectal contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.


Summary ROC plot of CT with oral contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.

Summary ROC plot of CT with oral contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.


Summary ROC plot of CT with intravenous and oral contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.

Summary ROC plot of CT with intravenous and oral contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.


Summary ROC plot of low‐dose versus standard‐dose CT. See the caption for Figure 5 for a description of symbols and lines.

Summary ROC plot of low‐dose versus standard‐dose CT. See the caption for Figure 5 for a description of symbols and lines.

Open in table viewer
Table 2. Results from studies comparing different types of contrast enhancement using a randomised or a paired design

Study

Design

Sensitivity/specificity according to type of contrast enhancement

None

IV

Oral

Rectal

IV+oral

IV+oral

+rectal

Hekimoglu 2011

Randomised

0.77/0.93

0.97/0.99

Hershko 2007

Randomised

0.90/0.86

0.95/0.92

1.00/0.88

Kepner 2012

Randomised

1.00/0.99

1.00/0.95

Mittal 2004

Randomised

1.00/1.00

0.98/0.50

Keyzer 2009

Randomised
& paired

0.75/0.93

0.85/0.98

0.85/0.96

1.00/0.98

Jacobs 2001

Paired

0.76/0.94

0.92/0.95

Results for the standard‐dose CT‐protocols.

Open in table viewer
Table 3. Results from studies comparing low‐dose and standard‐dose CT‐protocols using a randomised or a paired design

Study

Design

Contrast
enhancement

Sensitivity/specificity

Low‐dose protocol

Standard‐dose protocol

Kim 2012

RCT

Intravenous

0.95/0.93

0.95/0.94

Keyzer 2004a

Paired

Unenhanced

0.97‐1.00/0.80‐0.94

0.97‐1.00/0.82‐0.94

Keyzer 2009b

Paired

Unenhanced

Intravenous

Oral

Oral and intravenous

0.80‐0.85/0.91‐0.93

0.70‐0.80/1.0‐1.0

0.85‐1.0/0.88‐0.96

0.85‐1.0/0.96‐0.98

0.75‐0.75/0.93‐0.93

0.85‐0.85/0.98‐0.98

0.85‐0.92/0.96‐0.96

1.0‐1.0/0.96‐1.0

Platon 2009

Paired

Oral (low dose)

Oral and intravenous (standard dose)

0.95/0.96

1.0/0.96

CT: computed tomography.
RCT: randomised controlled trial.
aResults are given as the range of sensitivity and specificity for the four participating radiologists.
bResults are given as the range of sensitivity and specificity for the two participating radiologists.

Open in table viewer
Table 4. Subgroup analyses according to type of contrast enhancement and radiation dose

Subgroups by enhancement

and dose

Number

of analyses

(studies)a

Summary estimates with 95% CI

Absolute differences in

summary estimates

with 95% CI

Sensitivity

Specificity

Positive likelihood

ratio

Negative

likelihood ratio

Sensitivity

Specificity

Unenhanced

19 (19)

0.91
(0.87‐0.93)

0.94
(0.90–0.96)

15
(9–24)

0.10
(0.07–0.14)

IV contrast

18 (17)

0.96

(0.92–0.98)

0.93
(0.90–0.95)

14
(9–20)

0.04
(0.02–0.09)

0.04b
(0.00–0.09)

‐0.01b
(‐0.04–0.03)

IV and oral contrast

15 (15)

0.96
(0.93–0.98

0.94
(0.92–0.96)

17
(12–26)

0.04

(0.02–0.07)

0.05b

(0.01–0.09)

0.01b

(‐0.03–0.04)

Rectal contrast

9 (9)

0.97
(0.93–0.99)

0.95
(0.90–0.98)

21
(9–51)

0.04
(0.02–0.08)

0.05b

(0.01–0.09)

0.01b

(‐0.03–0.06)

Oral contrast

7 (7)

0.89

(0.81–0.94)

0.94

(0.90–0.97)

16

(9–29)

0.11

(0.06–0.21)

‐0.01b

(‐0.08–0.6)

0.01b

(‐0.03–0.05)

Standard dose

67 (64)

0.95

(0.93–0.96)

0.94

(0.92–0.95)

15.6
(12.3–19.7)

0.05
(0.04–0.07)

Low dose

8 (7)

0.94

(0.90–0.97)

0.94

(0.91–0.96)

16

(10–24)

0.06
(0.03–0.11)

0.00c
(‐0.04–0.05)

0.00c
(‐0.04–0.03)

Overall

71

0.95
(0.93–0.96)

0.94
(0.92–0.95)

15
(12–19 )

0.05
(0.04–0.07)

CI: confidence interval.
IV: intravenous.
aRandomised and paired studies provided two or more analyses.
bAbsolute difference compared to unenhanced CT.
cAbsolute difference compared to standard‐dose CT.

Unenhanced CT

Estimates of sensitivity and specificity for unenhanced CT were available for 19 study populations reported in 19 studies. Two studies reported results for unenhanced standard‐dose CT and unenhanced low‐dose CT in the same participants (Keyzer 2004; Keyzer 2009). Results for standard‐dose CT were selected for this analysis. The median prevalence of appendicitis in these populations was 0.39, with interquartile range 0.36 to 0.72, and range 0.22 to 0.92. Estimates of sensitivity ranged from 0.75 to 0.97, and estimates of specificity ranged from 0.75 to 1.0. The summary sensitivity was 0.91 (95% CI 0.87 to 0.93), and the summary specificity was 0.94 (95% CI 0.90 to 0.96).

CT with intravenous contrast enhancement

Estimates of sensitivity and specificity for CT with IV contrast enhancement were available for 18 study populations reported in 17 studies. One study provided results for standard‐dose CT and low‐dose CT in the same study population (Keyzer 2009). Results for standard‐dose CT were selected for this analysis. The median prevalence of appendicitis in these populations was 0.44, with interquartile range 0.36 to 0.57, and range 0.18 to 0.77. Estimates of sensitivity ranged from 0.72 to 1.0, and estimates of specificity from 0.64 to 1.0. The summary sensitivity was 0.96 (95% CI 0.92 to 0.98), and the summary specificity was 0.93 (95% CI 0.90 to 0.95).

Meta‐regression analyses showed a trend for higher summary sensitivity for CT with IV contrast enhancement compared to unenhanced CT (0.96, 95% CI 0.92 to 0.98 vs 0.90, 95% CI 0.87 to 0.93) (likelihood ratio test, Chi² = 3.35, 1 df, P = 0.07). There was no statistically significant difference for summary specificity (0.93, 95% CI 0.90 to 0.95 vs 0.94, 95% CI 0.90 to 0.96) (likelihood ratio test, Chi² = 0.20, 1 df, P = 0.66) (Figure 6).

CT with rectal contrast enhancement

Estimates of sensitivity and specificity for CT with rectal contrast enhancement were available for nine independent study populations reported in nine studies. The median prevalence of appendicitis in these populations was 0.51, with interquartile range 0.45 to 0.56, and range 0.32 to 0.92. Estimates of sensitivity ranged from 0.82 to 1.0, and estimates of specificity from 0.67 to 1.0. The summary sensitivity was 0.97 (95% CI 0.93 to 0.99), and the summary specificity was 0.95 (95% CI 0.90 to 0.98).

In meta‐regression analyses, summary sensitivity for CT with rectal contrast enhancement was statistically significantly higher than summary sensitivity for unenhanced CT (0.97, 95% CI 0.93 to 0.99 vs 0.90, 95% CI 0.87 to 0.93) (likelihood ratio test, Chi² = 5.78, 1 df, P = 0.02). There was no statistically significant difference for summary specificity (0.95, 95% CI 0.90 to 0.98 vs 0.94, 95% CI 0.90 to 0.96) (likelihood ratio test, Chi² = 0.27, 1 df, P = 0.61) (Figure 7).

CT with oral contrast enhancement

Estimates of sensitivity and specificity for CT with oral contrast enhancement were available for seven independent study populations reported in seven studies. One study provided results for standard‐dose CT and low‐dose CT in the same study population (Keyzer 2009), and we used the results for standard‐dose CT for this analysis. The median prevalence of appendicitis in these populations was 0.24, with interquartile range 0.20 to 0.40, and range 0.15 to 0.43. Estimates of sensitivity ranged from 0.76 to 1.0, and estimates of specificity from 0.86 to 1.0. The summary sensitivity was 0.89 (95% CI 0.81 to 0.94), and the summary specificity was 0.94 (95% CI 0.90 to 0.97).

Meta‐regression analyses showed no statistically significant difference between summary sensitivity or specificity for CT with oral contrast enhancement versus unenhanced CT (likelihood ratio test, Chi² = 0.46, 2 df, P = 0.80) (Figure 8).

CT with intravenous and oral contrast enhancement

Estimates of sensitivity and specificity for CT with IV and oral contrast enhancement were available for 15 independent study populations reported in 15 studies. Again, one study provided results for standard‐dose CT and low‐dose CT in the same study population (Keyzer 2009), and we used the results for standard‐dose CT for this analysis. The median prevalence of appendicitis in these populations was 0.36, with interquartile range 0.30 to 0.51, and range 0.18 to 0.64. Estimates of sensitivity ranged from 0.80 to 1.0, and estimates of specificity from 0.83 to 0.99. The summary sensitivity was 0.96 (95% CI 0.93 to 0.98), and the summary specificity was 0.94 (95% CI 0.92 to 0.96).

In meta‐regression analyses, summary sensitivity for CT with intravenous and oral contrast enhancement was statistically significantly higher than summary sensitivity for unenhanced CT (0.96, 95% CI 0.93 to 0.98 vs 0.90, 95% CI 0.87 to 0.93) (likelihood ratio test, Chi² = 6.85, 1 df, P = 0.01). There was no statistically significant difference for summary specificity (0.94, 95% CI 0.92 to 0.96 vs 0.94, 95% CI 0.90 to 0.96) (likelihood ratio test, Chi² = 0.23, 1 df, P = 0.63) (Figure 9).

Low‐dose CT regardless of contrast enhancement

Estimates of sensitivity and specificity for low‐dose CT were available for eight independent study populations reported in seven studies. The study that contributed two study populations was a randomised study that reported results for low‐dose CT with no contrast and IV contrast enhancement in one group, and for oral contrast and oral+IV contrast enhancement in the other group. For this analysis, we selected intravenous contrast enhancement from the first group and oral contrast enhancement from the other. In the remaining six study populations, IV, oral, and no contrast enhancement were used in three, one, and two studies, respectively. The median prevalence of appendicitis in the eight populations was 0.38, with interquartile range 0.30 to 0.41, and range 0.20 to 0.53. Estimates of sensitivity ranged from 0.75 to 0.98, and estimates of specificity from 0.85 to 1.0. The summary sensitivity was 0.94 (95% CI 0.90 to 0.97), and the summary specificity was 0.94 (95% CI 0.91 to 0.96).

Meta‐regression analyses showed no statistically significant difference between summary sensitivity or specificity for low‐dose versus standard‐ or unspecified‐dose CT (likelihood ratio test, Chi² = 0.21, 2 df, P = 0.90) (Figure 10).

Post‐test probabilities, summary likelihood ratios, and absolute differences in summary sensitivity and specificity for the subgroup analyses described above are presented in summary of findings Table and Table 4.

Investigation of heterogeneity

Influence of CT‐scanner generation

A non‐helical CT‐scanner or a helical CT‐scanner with less than 16‐detector row technology was used in 32 studies (36 study populations), and summary sensitivity and specificity were 0.94 (95% CI 0.91 to 0.95) and 0.93 (95% CI 0.91 to 0.94), respectively. A helical CT‐scanner with 16‐detector row or higher technology was used in 15 studies (18 study populations), and summary sensitivity and specificity were 0.97 (95% CI 0.95 to 0.98) and 0.94 (95% CI 0.91 to 0.96), respectively. In meta‐regression analyses, summary sensitivity was statistically significantly higher for the latter group than for the former (likelihood ratio test, 1 df, Chi² = 5.23, P = 0.02). There was no statistically significant difference for summary specificity between groups (likelihood ratio test, 1 df, Chi² = 0.24, P = 0.63) (Figure 11). The number of detector rows was not stated in 17 studies (17 study populations).


Exploration of heterogeneity: influence of CT‐scanner generation (CT with 16 detector rows or higher vs CT with fewer than 16 detector rows). See the caption for Figure 5 for a description of symbols and lines.

Exploration of heterogeneity: influence of CT‐scanner generation (CT with 16 detector rows or higher vs CT with fewer than 16 detector rows). See the caption for Figure 5 for a description of symbols and lines.

Influence of radiologists' experience

Senior radiologists evaluated CT‐scans in 27 studies (31 study populations), in‐training radiologists evaluated CT‐scans in three studies (five study populations), and CT‐scans were evaluated by senior or in‐training radiologists in 15 studies (16 study populations). The radiologists' experience was not reported in 19 studies (19 study populations). Summary estimates of sensitivity and specificity were as follows for the three groups.

  • Senior radiologists: 0.97 (95% CI 0.95 to 0.98) and 0.95 (95% CI 0.93 to 0.97), respectively.

  • In‐training radiologists: 0.92 (95% CI 0.80 to 0.97) and 0.91 (95% CI 0.86 to 0.94), respectively.

  • Senior or in‐training radiologists: 0.93 (95% CI 0.89 to 0.95) and 0.93 (95% CI 0.90 to 0.96), respectively.

In meta‐regression analyses, we pooled in‐training radiologists with senior or in‐training radiologists. In these analyses, summary sensitivity was statistically significantly higher in study populations with senior radiologists' evaluations (likelihood ratio test, Chi² = 8.01, 1 df, P = 0.01). Summary specificity was also higher in study populations with senior radiologists' evaluations but was not significantly higher (likelihood ratio test, Chi² = 2.21, 1 df, P = 0.14) (Figure 12).


Exploration of heterogeneity: Influence of radiologists' experience. See the caption for Figure 5 for a description of symbols and lines.

Exploration of heterogeneity: Influence of radiologists' experience. See the caption for Figure 5 for a description of symbols and lines.

Influence of pretest degree of suspicion of appendicitis

Participants with intermediate suspicion of appendicitis were recruited in 24 studies (25 study populations), participants with any suspicion were recruited in 18 studies (20 study populations), and participants with a high degree of suspicion were included in four studies (four study populations). The degree of suspicion was unclear in 18 studies (22 study populations). Summary estimates of sensitivity and specificity for the first two mentioned groups were as follows.

  • Intermediate suspicion: 0.96 (95% CI 0.93 to 0.97) and 0.94 (95% CI 0.91 to 0.96), respectively.

  • Any suspicion: 0.94 (95% CI 0.91 to 0.96) and 0.94 (95% CI 0.90 to 0.96), respectively.

There was no difference in the prevalence of appendicitis between studies recruiting participants with intermediate and any suspicion of appendicitis. Median and interquartile ranges were 0.47 (0.35 to 0.58) and 0.44 (0.34 to 0.64), respectively.

In meta‐regression analyses, we found no statistical evidence of a difference in summary sensitivity or specificity between study populations including participants with intermediate and any suspicion of appendicitis (likelihood ratio test, Chi² = 1.78, 2 df, P = 0.41). This did not change when we included data from all study populations in the analysis and grouped studies with any, high, and unclear degree of suspicion (likelihood ratio test, Chi² = 1.08, 2 df, P = 0.58) (Figure 13).


Exploration of heterogeneity: influence of pre‐test suspicion of appendicitis. See the caption for Figure 5 for a description of symbols and lines.

Exploration of heterogeneity: influence of pre‐test suspicion of appendicitis. See the caption for Figure 5 for a description of symbols and lines.

Sensitivity analyses

The analyses in this section differ from those planned in the protocol (see Differences between protocol and review).

Influence of methodological quality
Domains 1 and 2 (patient selection and index test)

Summary sensitivity and specificity for 18 study populations with low risk of bias for domain 1 were 0.94 (95% CI 0.91 to 0.96) and 0.94 (95% CI 0.91 to 0.96), respectively. Likewise, summary sensitivity and specificity for 50 study populations with low risk of bias for domain 2 were 0.94 (95% CI 0.92 to 0.96) and 0.95 (95% CI 0.93 to 0.96), respectively. These estimates were hardly different compared to the overall summary estimates of sensitivity (0.95) and specificity (0.94).

Domains 3 and 4 (reference standard and flow and timing)

Risk for bias was scored as low in two studies (three study populations) for domain 3 and in three studies (three study populations) for domain 4. This was insufficient for meta‐analysis.

Other sensitivity analyses

In the overall meta‐analysis, it was necessary to select one of two or more analyses from four paired studies including five study populations (Jacobs 2001; Keyzer 2004; Keyzer 2009; Platon 2009). These studies compared the accuracy of different doses or enhancement protocols in the same participants. We performed a sensitivity analysis to assess the influence of selecting other analyses from these studies and found that summary sensitivity and specificity did not change (Table 5). From the study that presented more than two analyses, we selected results from the standard‐dose protocols. Likewise, two studies including three study populations compared the accuracy of two or more CT‐protocols at low and standard doses (Keyzer 2004; Keyzer 2009). Results for the standard‐dose protocols were selected in the subgroup meta‐analyses. In sensitivity analyses, we used the low‐dose protocol results from these studies instead and found no effects on summary estimates of sensitivity and specificity (Table 5).

Open in table viewer
Table 5. Sensitivity analysis ‐ effects of selecting results for other CT‐protocols in paired studies

Subgroup
‐ by enhancement and dose

Number of analyses
(studies)

Summary estimates with 95% CI

Original analysis

Sensitivity analysis

Sensitivity

Specificity

Sensitivity

Specificity

Unenhanced

19 (19)

0.91
(0.87‐0.93)

0.94
(0.90‐0.96)

0.91
(0.88‐0.94)

0.94
(0.90‐0.96)

Intravenous contrast

18 (17)

0.96

(0.92‐0.98)

0.93
(0.90‐0.95)

0.96

(0.91‐0.98)

0.93

(0.90‐0.95)

Intravenous and oral contrast

15 (15)

0.96
(0.93‐0.98

0.94
(0.92‐0.96)

0.96
(0.93‐0.98

0.94
(0.92‐0.96)

Oral contrast

7 (7)

0.89

(0.81‐0.94)

0.94

(0.90‐0.97)

0.90

(0.82‐0.95)

0.94

(0.90‐0.96)

Low dose

8 (7)

0.94

(0.90‐0.97)

0.94

(0.91‐0.96)

0.95

(0.91‐0.97)

0.94

(0.91‐0.96)

Overall

71 (64)

0.95
(0.93‐0.96)

0.94
(0.92‐0.95)

0.95

(0.93‐0.96)

0.94

(0.92‐0.95)

CI: confidence interval.
CT: computed tomography.

We also explored the potential effects of including studies with a mix of paediatric and adult participants. Participants younger than 15 years of age were included in 26 studies with 28 study populations, and it was unclear if two other studies with two study populations included paediatric participants; summary sensitivity and specificity for these 30 study populations were 0.95 (95% CI 0.93 to 0.97) and 0.94 (95% CI 0.91 to 0.95), respectively. In contrast, all participants were adults in 36 studies with 41 study populations; summary sensitivity and specificity for this subgroup were 0.95 (95% CI 0.92 to 0.96) and 0.94 (95% CI 0.92 to 0.95), respectively. Hence, the inclusion of studies with a mix of adult and paediatric participants appears to have no effect on the summary estimates.

Finally, we explored whether inclusion of five studies that used laparoscopic findings as the reference standard influenced summary estimates of sensitivity and specificity (Gamanagatti 2007; in't Hof 2004; Jacobs 2001; Platon 2009; Poortman 2003). These estimates did not change when we repeated the overall meta‐analysis and excluded results from the five studies.

Discussion

Summary of main results

The main results of this review are presented in summary of findings Table. We included 64 studies with results from 71 separate study populations. Summary sensitivity and specificity of computed tomography (CT) regardless of protocol were 0.95 (95% confidence interval (CI) 0.93 to 0.96) and 0.94 (95% CI 0.92 to 0.95), respectively. In subgroup analyses according to contrast enhancement, summary sensitivity was higher for CT with intravenous contrast (0.96, 95% CI 0.92 to 0.98), CT with rectal contrast (0.97, 95% CI 0.93 to 0.99), and CT with intravenous+oral contrast enhancement (0.96, 95% CI 0.93 to 0.98) as compared to unenhanced CT (0.91, 95% CI 0.87 to 0.93). Summary sensitivity of CT with oral contrast enhancement (0.89, 95% CI 0.81 to 0.94) was similar to summary sensitivity of unenhanced CT. Results showed no differences in summary specificity, which varied from 0.93 (95% CI 0.90 to 0.95) to 0.95 (95% CI 0.90 to 0.98) between subgroups. Summary sensitivity for low‐dose CT (0.94, 95% 0.90 to 0.97) was similar to summary sensitivity for standard‐ or unspecified‐dose CT (0.95, 95% 0.93 to 0.96). Summary specificity did not differ between low‐dose and standard‐ or unspecified‐dose CT.

In meta‐regression analyses, summary sensitivity was statistically significantly higher in studies using CT‐scanners with 16 or more detector rows, and in studies where CT‐scans were evaluated by senior radiologists. Summary specificity did not differ significantly between groups in these analyses. Results showed no statistically significant differences in summary sensitivity or specificity between studies that recruited participants with an intermediate suspicion of acute appendicitis due to an equivocal presentation and studies that recruited participants with any suspicion of appendicitis. The methodological quality of the included studies was generally poor, particularly for the reference test and the flow and timing domains.

Strengths and weaknesses of the review

The major strengths of this review are that we adhered to recommended review methods and performed an extensive search of the literature without language restrictions and filters to target diagnostic test accuracy studies. We included data from 64 studies and produced a comprehensive review of the accuracy of CT for appendicitis in adults. Because of challenges related to differential and partial verification in this area, we focused on prospective studies to limit potential bias from retrospective studies with missing reference standard outcomes in participants who did not have surgery. In subgroup analyses, we explored the accuracy of different CT‐protocols characterised by type of contrast enhancement and radiation dose. We also assessed the influence of CT‐scanner generation, radiologists' experience, disease spectrum, and methodological quality on summary estimates of sensitivity and specificity.

We noted several limitations in the review process. In some study reports, the reporting quality made it difficult to assess whether data collection was conducted prospectively or retrospectively. In most of these situations, we contacted the corresponding author and excluded the study if we received no reply. However, for some studies, our judgements may have been too liberal. In general, we accepted studies as having prospective data collection if study authors used the term 'prospective' or 'consecutive' to characterise the data collection, and if we found no clear‐cut evidence to suggest the contrary (i.e. statements that participants were selected from databases or registries). As in previous systematic reviews in this and related areas, we decided to exclude studies using retrospective data collection from registers and hospital records to reduce potential bias from partial verification (Al‐Khayal 2007; Ebell 2014; Terasawa 2004; van Randen 2008; Xiong 2015). Hospital records may not contain the necessary information, participants may be treated in other hospitals, and telephone follow‐up after, say, 12 months is unlikely to be successful for all participants. However, the basis for this decision could be questioned due to the low standards of follow‐up in the prospective studies included. Also, follow‐up in the included studies was often based on reviews of hospital records for alternative diagnoses and a check that appendicectomy was not performed during the follow‐up interval. Among the 71 studies that we excluded due to retrospective data collection, participants were selected following an appendicectomy and preoperative CT in 28 studies. The prevalence of appendicitis is high and the proportion with a negative CT outcome is correspondingly low in such studies; it follows that resulting estimates of specificity are unlikely to be applicable to CT‐negatives in general. In another 38 of the retrospective studies, participants were selected from registries or databases. In most of these studies, follow‐up of participants who did not have surgery was based on review of hospital records for alternative diagnoses and readmission; however, in a few studies, telephone interviews were also performed, but the response rate generally was not reported. In addition, our adaptation of Quality Assessment of Studies of Diagnostic Accuracy ‐ Revised (QUADAS‐2) included a definition for an adequate follow‐up period, which lasted seven to 31 days. We admit this is arbitrary, but we maintain that length of follow‐up is important for assessing the quality of follow‐up. We believe that a follow‐up period of seven to 31 days is sufficiently long to capture missed cases and is sufficiently short that new events are not captured.

Another limitation was that we did not distinguish between uncomplicated and complicated acute appendicitis as separate target conditions. This distinction is becoming increasingly relevant with emerging evidence of antibiotic therapy as an alternative to surgery in persons with uncomplicated acute appendicitis, because selection of persons for antibiotic therapy depends on the finding of uncomplicated acute appendicitis on CT (Salminen 2015; Vons 2011). Misclassification of complicated appendicitis as uncomplicated is a likely explanation for failure of antibiotic therapy.

Finally, it was not feasible to contact the authors of 28 studies including paediatric participants with a request for subgroup results for participants older than 14 years of age. Instead we decided to include these studies and perform a sensitivity analysis that revealed no difference in summary sensitivity and specificity between studies with and without paediatric participants.

The major limitation of the included studies was poor methodological quality. However, the impact of low methodological quality appears to be negligible for the patient selection domain and the index test domain as there was practically no difference in summary estimates between the overall meta‐analysis and sensitivity analyses in studies with low risk of bias for these domains. Poor scorings in the reference standard domain and in the flow and timing domain were due to low quality of follow‐up and partial verification. Differential verification appears to be inevitable in accuracy studies of CT for acute appendicitis, and this increases the demand for rigorous follow‐up. In most studies, the majority of CT‐positive participants had surgery and CT‐negative participants generally had follow‐up because it was considered unethical to expose CT‐negative patients to surgery that was likely to be unnecessary. An important finding was the multitude of methods applied to perform follow‐up, which ranged from checking hospital records for readmissions to using standardised regimens including telephone interviews or outpatient consultations within a predefined time frame. Accordingly, we considered follow‐up as inadequate or insufficiently described in 42 studies. Another important piece of information that was often missing was the proportion of participants who had received follow‐up as planned. We assumed that follow‐up was complete when all participants were included in the 2×2 table, but this may be optimistic.

It could be argued that follow‐up is irrelevant when an alternative diagnosis (e.g. diverticulitis, pelvis inflammatory disease, ureter stone) was made that explained participants' abdominal pain. The frequency of alternative diagnoses besides non‐specific abdominal pain in participants without appendicitis was reported in 27 studies for 29 study populations. The median frequency was 0.56, with interquartile range 0.34 to 0.62 and range 0.13 to 0.94. It could be countered that although an alternative diagnosis rules out appendicitis in some cases, an alternative diagnosis may be less reliable in others; therefore it may not necessarily rule out appendicitis in all participants who do not have surgery.

In our view, the major problem incurred by low‐quality follow‐up and loss to follow‐up is the partial verification that results. Partial verification has been associated with higher estimates of sensitivity in diagnostic accuracy studies in general (Whiting 2013), and we suspect that a similar association could exist in the studies that we reviewed. Unfortunately, it was not feasible to investigate if and to what extent low methodological quality in the reference standard domain and in the flow and timing domain impacted summary estimates due to the small number of studies with adequate and complete follow‐up.

Another limitation of the included studies relates to the paucity of studies with direct comparisons of different CT‐protocols using a paired or randomised design. We included nine such studies, but the number of primary analyses in these studies was too low for comparative meta‐analyses to be performed to assess the influence of types of contrast enhancement and radiation dose. All comparisons that we made are indirect, and it is important to be aware that such comparisons may be confounded by factors such as differences in population characteristics, properties of the CT‐scanner, radiologists' experience, and study methods. Nevertheless, our finding of similar accuracy for low‐dose and standard‐dose CT corresponds with results from a recent multi‐centre study in which persons with suspected appendicitis were randomly allocated to low‐dose and standard‐dose CT (The Locat Group 2017). In addition, findings of lower sensitivity for unenhanced CT and no gain in accuracy from supplementing IV contrast with oral contrast enhancement are in line with the results from a retrospective study in 9047 adult persons who underwent appendicectomy in 56 hospitals in the USA (Drake 2014).

Applicability of findings to the review question

Participants in the included studies were predominantly adult or adolescent persons above 14 years of age with suspected appendicitis who were recruited in urban university hospitals. The suspicion of acute appendicitis was based on history, physical examination findings, and results of routine laboratory tests and urinalysis. Studies in persons who underwent ultrasonography before CT were excluded. We found no statistical evidence to show that summary estimates of accuracy differed between subgroups of studies that included persons with an intermediate suspicion of appendicitis due to an equivocal presentation and studies in persons recruited with any suspicion of appendicitis. Results from the primary studies cover a wide range of CT‐scanners, CT‐protocols, types of contrast enhancement, and radiation doses. Based on this, we believe that the findings presented in this review are applicable to most persons above 14 years of age with suspected appendicitis following initial evaluation. Our meta‐regression analyses indicate that overall summary estimates of sensitivity may not be representative in two settings. In settings using newer CT‐scanners (16 or more channels), sensitivity is likely to be higher. Conversely, in settings with in‐training radiologists, sensitivity is likely to be lower. Again, these findings should be interpreted cautiously due to possible confounding by other factors.

Previous research

The results of our meta‐analyses are consistent with the results from previous meta‐analyses that are presented in Table 6.

Open in table viewer
Table 6. Results from previously published meta‐analyses

Author and
publication year

Number of
included studies

Focus of review

Summary
sensitivity

(95% CI)

Summary
specificity
(95% CI)

Terasawa 2004

12

Adults, any CT modality, prospective studies

0.94 (0.91‐0.95)

O.95 (0.93‐0.96)

Anderson 2005

23

Adults, comparison of enhancement with:
oral contrast vs
any enhancement excluding oral contrast

0.92
0.95

0.94
0.97

Weston 2005

12

Adults, any CT modality

0.97 (0.94–0.98)

0.94 (0.92–96)

Doria 2006

21a

Any CT modality, separate results for adults and children

0.94 (0.92‐0.95)a

0.94 (0.94‐0.96)a

Al‐Khayal 2007

25

Adults and children, any CT modality, prospective studies

0.93 (0.92‐0.95)

0.93 (0.92‐0.95)

van Randen 2008

6

Mainly adults or adolescents, any CT modality,
prospective studies with direct comparisons of CT and US

0.91 (0.84‐0. 95)

0.90 (0.85‐0.94)

Hlibczuk 2010

7

Unenhanced, helical CT

0.93 (0.90‐0.95)

0.96 (0.94‐0.98)

Dahabreh 2015

72a

Any CT modality
Separate results for adults, children,
women of reproductive age, and pregnant women

0.96 (0.95‐0.97)a

0.96 (0.93‐0.97)a

Xiong 2015

7

Unenhanced CT, prospective studies

0.90 (0.86–0.92)

0.94 (0.92–0.97)

Aly 2016

5

Comparison of:
low‐dose CT vs
standard‐dose CT

0.93 (0.89‐0.96)
0.94 (0.91‐0.96)

0.93 (0.90‐0.96)

0.94 (0.92‐0.96)

Yun 2017

9

Comparison of:
low‐dose CT and
standard‐dose CT
in adults and children

0.96 (0.92‐0.98)
0.96 (0.94‐0.98)

0.93 (0.89‐0.96)

0.92 (0.88‐0.95)

CI: confidence interval.
CT: computed tomography.
aStudies and results in adults.

Plain language summary flowchart.
Figuras y tablas -
Figure 14

Plain language summary flowchart.

Study flow diagram.CT: computed tomography.
 US: ultrasonography.
Figuras y tablas -
Figure 1

Study flow diagram.

CT: computed tomography.
US: ultrasonography.

Risk of bias and applicability concerns graph: review authors' judgements about each domain presented as percentages across included studies.
Figuras y tablas -
Figure 2

Risk of bias and applicability concerns graph: review authors' judgements about each domain presented as percentages across included studies.

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study.
Figuras y tablas -
Figure 3

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study.

Forest plot: CT regardless of contrast enhancement and radiation dose.
Figuras y tablas -
Figure 4

Forest plot: CT regardless of contrast enhancement and radiation dose.

Summary ROC plot of CT for diagnosis of acute appendicitis (any contrast enhancement and radiation dose). The hollow symbols represent the pairs of sensitivity and specificity from the included studies; the symbols are scaled according to sample sizes of the studies. The solid circle represents the summary sensitivity and specificity. This summary point is surrounded by a 95% prediction region (interrupted line).
Figuras y tablas -
Figure 5

Summary ROC plot of CT for diagnosis of acute appendicitis (any contrast enhancement and radiation dose). The hollow symbols represent the pairs of sensitivity and specificity from the included studies; the symbols are scaled according to sample sizes of the studies. The solid circle represents the summary sensitivity and specificity. This summary point is surrounded by a 95% prediction region (interrupted line).

Summary ROC plot of CT with intravenous contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.
Figuras y tablas -
Figure 6

Summary ROC plot of CT with intravenous contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.

Summary ROC plot of CT with rectal contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.
Figuras y tablas -
Figure 7

Summary ROC plot of CT with rectal contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.

Summary ROC plot of CT with oral contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.
Figuras y tablas -
Figure 8

Summary ROC plot of CT with oral contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.

Summary ROC plot of CT with intravenous and oral contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.
Figuras y tablas -
Figure 9

Summary ROC plot of CT with intravenous and oral contrast enhancement versus unenhanced CT. See the caption for Figure 5 for a description of symbols and lines.

Summary ROC plot of low‐dose versus standard‐dose CT. See the caption for Figure 5 for a description of symbols and lines.
Figuras y tablas -
Figure 10

Summary ROC plot of low‐dose versus standard‐dose CT. See the caption for Figure 5 for a description of symbols and lines.

Exploration of heterogeneity: influence of CT‐scanner generation (CT with 16 detector rows or higher vs CT with fewer than 16 detector rows). See the caption for Figure 5 for a description of symbols and lines.
Figuras y tablas -
Figure 11

Exploration of heterogeneity: influence of CT‐scanner generation (CT with 16 detector rows or higher vs CT with fewer than 16 detector rows). See the caption for Figure 5 for a description of symbols and lines.

Exploration of heterogeneity: Influence of radiologists' experience. See the caption for Figure 5 for a description of symbols and lines.
Figuras y tablas -
Figure 12

Exploration of heterogeneity: Influence of radiologists' experience. See the caption for Figure 5 for a description of symbols and lines.

Exploration of heterogeneity: influence of pre‐test suspicion of appendicitis. See the caption for Figure 5 for a description of symbols and lines.
Figuras y tablas -
Figure 13

Exploration of heterogeneity: influence of pre‐test suspicion of appendicitis. See the caption for Figure 5 for a description of symbols and lines.

CT (unenhanced).
Figuras y tablas -
Test 1

CT (unenhanced).

CT (IV contrast).
Figuras y tablas -
Test 2

CT (IV contrast).

CT (oral contrast).
Figuras y tablas -
Test 3

CT (oral contrast).

CT (rectal contrast).
Figuras y tablas -
Test 4

CT (rectal contrast).

CT (IV+oral contrast).
Figuras y tablas -
Test 5

CT (IV+oral contrast).

CT (oral+rectal contrast).
Figuras y tablas -
Test 6

CT (oral+rectal contrast).

CT (IV+oral+rectal contrast).
Figuras y tablas -
Test 7

CT (IV+oral+rectal contrast).

Low‐dose CT.
Figuras y tablas -
Test 8

Low‐dose CT.

CT (overall).
Figuras y tablas -
Test 9

CT (overall).

Standard‐dose CT.
Figuras y tablas -
Test 10

Standard‐dose CT.

Summary of findings Summary of findings table

Population

Adults (> 14 years of age) with suspected acute appendicitis based on history, physical examination, and/or blood tests

Settings

Emergency and Radiology Departments in secondary and tertiary care settings

Index test

Computed tomography of the abdomen

Reference standard

Histological examination of the resected appendix or intraoperative findings in persons who had surgery. Clinical follow‐up for persons who did not have surgery

Target condition

Acute appendicitis

Number of studies

64 studies including 71 separate study populations with a total of 10,280 participants ‐ 4583 with and 5697 without acute appendicitis

Methodological concerns

The methodological quality was generally poor, particularly with respect to the reference test and the flow and timing domains. For these domains, few studies were at low risk of bias. Differential verification was used in most studies because some of the participants with suspected acute appendicitis did not have surgery. Clinical follow‐up for these participants was inadequate, incomplete, or poorly described in most studies

Results

Number of

studies
(study populations)a

Summary
sensitivity
(95% CI)

Summary
specificity
(95% CI)

Prevalence of appendicitis
(25% percentile

50% percentile

75% percentile)b

Post‐test probability

following a positive
CT outcome

(95% CI)

Post‐test probability

following a negative

CT outcome

(95% CI)

CT overall

64
(71)

0.95
(0.93‐0.96)

0.94

(0.92‐0.95)

0.32

0.43

0.58

0.88 (0.85‐0.90)

0.92 (0.90‐0.94)
0.96 (0.94‐0.96)

0.02 (0.02‐0.03)

0.04 (0.03‐0.05)
0.07 (0.05‐0.09)

Unenhanced CT

19

(19)

0.91
(0.87‐0.93)

0.94
(0.90‐0.96)

0.32

0.43

0.58

0.87 (0.82‐0.92)

0.92 (0.88‐0.95)

0.95 (0.93‐0.97)

0.04 (0.03‐0.06)

0.07 (0.05‐0.09)

0.12 (0.09‐0.16)

CT with intravenous contrast enhancement

17

(18)

0.96

(0.92‐0.98)

0.93
(0.90‐0.95)

0.32

0.43

0.58

0.87 (0.82‐0.90)

0.91 (0.88‐0.94)

0.95 (0.93‐0.96)

0.02 (0.01‐0.04)

0.03 (0.02‐0.06)

0.06 (0.03‐0.11)

CT with rectal contrast enhancement

9

(9)

0.97
(0.93‐0.99)

0.95
(0.90‐0.98)

0.32

0.43

0.58

0.91 (0.81‐0.96)

0.94 (0.87‐0.97)

0.97 (0.93‐0.99)

0.02 (0.01‐0.04)

0.03 (0.01‐0.06)

0.05 (0.02‐0.10)

CT with oral contrast enhancement

7

(7)

0.89

(0.81‐0.94)

0.94

(0.90‐0.97)

0.32

0.43

0.58

0.88 (0.81‐0.93)

0.92 (0.87‐0.96)

0.96 (0.92‐0.98)

0.05 (0.03‐0.09)

0.08 (0.04‐0.14)

0.14 (0.08‐0.22)

CT with oral and intravenous contrast enhancement

15

(15)

0.96
(0.93‐0.98)

0.94
(0.92‐0.96)

0.32

0.43

0.58

0.89 (0.85‐0.92)

0.93 (0.90‐0.95)

0.96 (0.94‐0.97)

0.02 (0.01‐0.03)

0.03 (0.02‐0.05)

0.05 (0.03‐0.09)

Low‐dose CT

7

(8)

0.94

(0.90‐0.97)

0.94

(0.91‐0.96)

0.32

0.43

0.58

0.88 (0.82‐0.92)

0.92 (0.88‐0.95)

0.96 (0.93‐0.97)

0.03 (0.02‐0.05)

0.04 (0.02‐0.08)

0.08 (0.04‐0.13)

Conclusion

Sensitivity and specificity of CT for diagnosing acute appendicitis in adults are high. Unenhanced standard‐dose CT appears to have lower sensitivity than standard‐dose CT with intravenous, rectal, or oral+intravenous contrast enhancement. Use of different types of contrast enhancement or no enhancement does not appear to affect specificity. Differences in sensitivity and specificity between low‐dose and standard‐dose CT appear to be negligible. The results of this review should be interpreted with caution for 2 reasons. First, the results are based on studies of low methodological quality. Second, the comparisons between types of contrast enhancement and radiation dose may be unreliable because they are based on indirect comparisons that may be confounded by other factors

CI: confidence interval.
CT: computed tomography.
aIn five studies, participants were randomly allocated to two CT‐protocols, and in another study to three CT‐protocols. These protocols differed with respect to contrast enhancement and radiation dose. This generated seven additional study populations, which were included as separate studies in the meta‐analyses.

bThe distribution of the prevalence of appendicitis was roughly similar in the included studies across subgroups. Therefore, to facilitate comparison of post‐test probabilities between subgroups, these probabilities were calculated for the 25%, 50%, and 75% percentiles of prevalence for all 71 study populations.

Figuras y tablas -
Summary of findings Summary of findings table
Table 1. Components of CT‐protocols in the 64 included studies

CT‐protocol

components

Number

of studies

Slice thickness (mm)

0.6‐2.9

3.0‐4.9

5.0‐6.9

7.0‐10.0

not stated

6

9

36

4

9

Slice interval (mm)

0.6‐2.9

3.0‐4.9

5.0

10.0

not stated

6

5

16

1

36

Voltage (kV)
120

140

200

not stated

21

4

1

38

mAs product (mAs)

30‐100

100‐199

200‐299

≥ 300

not stated

4

5

5

4

46

CT: computed tomography.
Atema 2015 was a multi‐centre study including six centres.
The most commonly used CT‐protocol specified the following: slice thickness 3 mm; voltage 120 kV; and mAs product 165 mAs.
These values are used in the table.

Figuras y tablas -
Table 1. Components of CT‐protocols in the 64 included studies
Table 2. Results from studies comparing different types of contrast enhancement using a randomised or a paired design

Study

Design

Sensitivity/specificity according to type of contrast enhancement

None

IV

Oral

Rectal

IV+oral

IV+oral

+rectal

Hekimoglu 2011

Randomised

0.77/0.93

0.97/0.99

Hershko 2007

Randomised

0.90/0.86

0.95/0.92

1.00/0.88

Kepner 2012

Randomised

1.00/0.99

1.00/0.95

Mittal 2004

Randomised

1.00/1.00

0.98/0.50

Keyzer 2009

Randomised
& paired

0.75/0.93

0.85/0.98

0.85/0.96

1.00/0.98

Jacobs 2001

Paired

0.76/0.94

0.92/0.95

Results for the standard‐dose CT‐protocols.

Figuras y tablas -
Table 2. Results from studies comparing different types of contrast enhancement using a randomised or a paired design
Table 3. Results from studies comparing low‐dose and standard‐dose CT‐protocols using a randomised or a paired design

Study

Design

Contrast
enhancement

Sensitivity/specificity

Low‐dose protocol

Standard‐dose protocol

Kim 2012

RCT

Intravenous

0.95/0.93

0.95/0.94

Keyzer 2004a

Paired

Unenhanced

0.97‐1.00/0.80‐0.94

0.97‐1.00/0.82‐0.94

Keyzer 2009b

Paired

Unenhanced

Intravenous

Oral

Oral and intravenous

0.80‐0.85/0.91‐0.93

0.70‐0.80/1.0‐1.0

0.85‐1.0/0.88‐0.96

0.85‐1.0/0.96‐0.98

0.75‐0.75/0.93‐0.93

0.85‐0.85/0.98‐0.98

0.85‐0.92/0.96‐0.96

1.0‐1.0/0.96‐1.0

Platon 2009

Paired

Oral (low dose)

Oral and intravenous (standard dose)

0.95/0.96

1.0/0.96

CT: computed tomography.
RCT: randomised controlled trial.
aResults are given as the range of sensitivity and specificity for the four participating radiologists.
bResults are given as the range of sensitivity and specificity for the two participating radiologists.

Figuras y tablas -
Table 3. Results from studies comparing low‐dose and standard‐dose CT‐protocols using a randomised or a paired design
Table 4. Subgroup analyses according to type of contrast enhancement and radiation dose

Subgroups by enhancement

and dose

Number

of analyses

(studies)a

Summary estimates with 95% CI

Absolute differences in

summary estimates

with 95% CI

Sensitivity

Specificity

Positive likelihood

ratio

Negative

likelihood ratio

Sensitivity

Specificity

Unenhanced

19 (19)

0.91
(0.87‐0.93)

0.94
(0.90–0.96)

15
(9–24)

0.10
(0.07–0.14)

IV contrast

18 (17)

0.96

(0.92–0.98)

0.93
(0.90–0.95)

14
(9–20)

0.04
(0.02–0.09)

0.04b
(0.00–0.09)

‐0.01b
(‐0.04–0.03)

IV and oral contrast

15 (15)

0.96
(0.93–0.98

0.94
(0.92–0.96)

17
(12–26)

0.04

(0.02–0.07)

0.05b

(0.01–0.09)

0.01b

(‐0.03–0.04)

Rectal contrast

9 (9)

0.97
(0.93–0.99)

0.95
(0.90–0.98)

21
(9–51)

0.04
(0.02–0.08)

0.05b

(0.01–0.09)

0.01b

(‐0.03–0.06)

Oral contrast

7 (7)

0.89

(0.81–0.94)

0.94

(0.90–0.97)

16

(9–29)

0.11

(0.06–0.21)

‐0.01b

(‐0.08–0.6)

0.01b

(‐0.03–0.05)

Standard dose

67 (64)

0.95

(0.93–0.96)

0.94

(0.92–0.95)

15.6
(12.3–19.7)

0.05
(0.04–0.07)

Low dose

8 (7)

0.94

(0.90–0.97)

0.94

(0.91–0.96)

16

(10–24)

0.06
(0.03–0.11)

0.00c
(‐0.04–0.05)

0.00c
(‐0.04–0.03)

Overall

71

0.95
(0.93–0.96)

0.94
(0.92–0.95)

15
(12–19 )

0.05
(0.04–0.07)

CI: confidence interval.
IV: intravenous.
aRandomised and paired studies provided two or more analyses.
bAbsolute difference compared to unenhanced CT.
cAbsolute difference compared to standard‐dose CT.

Figuras y tablas -
Table 4. Subgroup analyses according to type of contrast enhancement and radiation dose
Table 5. Sensitivity analysis ‐ effects of selecting results for other CT‐protocols in paired studies

Subgroup
‐ by enhancement and dose

Number of analyses
(studies)

Summary estimates with 95% CI

Original analysis

Sensitivity analysis

Sensitivity

Specificity

Sensitivity

Specificity

Unenhanced

19 (19)

0.91
(0.87‐0.93)

0.94
(0.90‐0.96)

0.91
(0.88‐0.94)

0.94
(0.90‐0.96)

Intravenous contrast

18 (17)

0.96

(0.92‐0.98)

0.93
(0.90‐0.95)

0.96

(0.91‐0.98)

0.93

(0.90‐0.95)

Intravenous and oral contrast

15 (15)

0.96
(0.93‐0.98

0.94
(0.92‐0.96)

0.96
(0.93‐0.98

0.94
(0.92‐0.96)

Oral contrast

7 (7)

0.89

(0.81‐0.94)

0.94

(0.90‐0.97)

0.90

(0.82‐0.95)

0.94

(0.90‐0.96)

Low dose

8 (7)

0.94

(0.90‐0.97)

0.94

(0.91‐0.96)

0.95

(0.91‐0.97)

0.94

(0.91‐0.96)

Overall

71 (64)

0.95
(0.93‐0.96)

0.94
(0.92‐0.95)

0.95

(0.93‐0.96)

0.94

(0.92‐0.95)

CI: confidence interval.
CT: computed tomography.

Figuras y tablas -
Table 5. Sensitivity analysis ‐ effects of selecting results for other CT‐protocols in paired studies
Table 6. Results from previously published meta‐analyses

Author and
publication year

Number of
included studies

Focus of review

Summary
sensitivity

(95% CI)

Summary
specificity
(95% CI)

Terasawa 2004

12

Adults, any CT modality, prospective studies

0.94 (0.91‐0.95)

O.95 (0.93‐0.96)

Anderson 2005

23

Adults, comparison of enhancement with:
oral contrast vs
any enhancement excluding oral contrast

0.92
0.95

0.94
0.97

Weston 2005

12

Adults, any CT modality

0.97 (0.94–0.98)

0.94 (0.92–96)

Doria 2006

21a

Any CT modality, separate results for adults and children

0.94 (0.92‐0.95)a

0.94 (0.94‐0.96)a

Al‐Khayal 2007

25

Adults and children, any CT modality, prospective studies

0.93 (0.92‐0.95)

0.93 (0.92‐0.95)

van Randen 2008

6

Mainly adults or adolescents, any CT modality,
prospective studies with direct comparisons of CT and US

0.91 (0.84‐0. 95)

0.90 (0.85‐0.94)

Hlibczuk 2010

7

Unenhanced, helical CT

0.93 (0.90‐0.95)

0.96 (0.94‐0.98)

Dahabreh 2015

72a

Any CT modality
Separate results for adults, children,
women of reproductive age, and pregnant women

0.96 (0.95‐0.97)a

0.96 (0.93‐0.97)a

Xiong 2015

7

Unenhanced CT, prospective studies

0.90 (0.86–0.92)

0.94 (0.92–0.97)

Aly 2016

5

Comparison of:
low‐dose CT vs
standard‐dose CT

0.93 (0.89‐0.96)
0.94 (0.91‐0.96)

0.93 (0.90‐0.96)

0.94 (0.92‐0.96)

Yun 2017

9

Comparison of:
low‐dose CT and
standard‐dose CT
in adults and children

0.96 (0.92‐0.98)
0.96 (0.94‐0.98)

0.93 (0.89‐0.96)

0.92 (0.88‐0.95)

CI: confidence interval.
CT: computed tomography.
aStudies and results in adults.

Figuras y tablas -
Table 6. Results from previously published meta‐analyses
Table Tests. Data tables by test

Test

No. of studies

No. of participants

1 CT (unenhanced) Show forest plot

19

2140

2 CT (IV contrast) Show forest plot

17

4265

3 CT (oral contrast) Show forest plot

7

673

4 CT (rectal contrast) Show forest plot

9

1098

5 CT (IV+oral contrast) Show forest plot

15

2074

6 CT (oral+rectal contrast) Show forest plot

3

230

7 CT (IV+oral+rectal contrast) Show forest plot

2

152

8 Low‐dose CT Show forest plot

7

1445

9 CT (overall) Show forest plot

64

10380

10 Standard‐dose CT Show forest plot

61

9292

Figuras y tablas -
Table Tests. Data tables by test