Scolaris Content Display Scolaris Content Display

Tomografía de emisión de positrones (TEP) y resonancia magnética (RM) para la evaluación de la resecabilidad tumoral en el cáncer peritoneal/de las trompas de Falopio/ovárico epitelial primario avanzado

Collapse all Expand all

Antecedentes

El cáncer de ovario es la principal causa de muerte por cáncer ginecológico en los países desarrollados. La cirugía y la quimioterapia se consideran la base del tratamiento y la completitud de la cirugía es un factor pronóstico principal para la supervivencia en estas mujeres. En la actualidad se utiliza la tomografía computarizada (TC) para evaluar de forma preoperatoria la resecabilidad tumoral. Si se considera posible, las pacientes se programarán para cirugía primaria de citorreducción (es decir, se realizará una cirugía lo más amplia posible para eliminar la masa tumoral con la intención de no dejar tumor [macroscópico] visible). Si la citorreducción primaria no se considera posible (es decir, la masa tumoral es demasiado extensa), las pacientes recibirán quimioterapia neoadyuvante para reducir la masa tumoral y luego someterlas a cirugía (de intervalo). Sin embargo, la TC no es perfecta para evaluar la resecabilidad tumoral, de manera que se puede considerar que las modalidades adicionales de imagen optimizan la selección del tratamiento.

Objetivos

Evaluar la exactitud diagnóstica de la fluorodeoxiglucosa‐18 (FDG) TEP/TC, la RM convencional y ponderada por difusión (DW, por sus siglas en inglés) como reemplazo o adicional a la TC abdominal, para evaluar la resecabilidad tumoral con la cirugía primaria de citorreducción en las pacientes con cáncer peritoneal/de las trompas de Falopio/ovárico epitelial primario en estadio III a IV.

Métodos de búsqueda

Se hicieron búsquedas de estudios potencialmente elegibles en MEDLINE y Embase (OVID) (1946 hasta 23 febrero 2017). Además, se realizaron búsquedas en ClinicalTrials.gov, WHO‐ICTRP y en la lista de referencias de todos los estudios relevantes.

Criterios de selección

Estudios de exactitud diagnóstica que analizaron la exactitud de la FDG‐PET/CT preoperatoria, la IRM convencional o la IRM‐DW para evaluar la resecabilidad tumoral en las pacientes con cáncer peritoneal/de las trompas de Falopio/ovárico epitelial primario en estadio avanzado (III a IV) que se programan para cirugía primaria de citorreducción.

Obtención y análisis de los datos

Dos autores de la revisión examinaron de forma independiente la relevancia de los títulos y los resúmenes para la inclusión, extrajeron los datos y realizaron la evaluación de la calidad metodológica mediante QUADAS‐2. El número limitado de estudios no permitió los metanálisis.

Resultados principales

En el análisis se incluyeron cinco estudios (544 participantes). Todos los estudios realizaron la prueba índice como reemplazo de la TC abdominal. Dos estudios (366 participantes) analizaron la exactitud de la FDG‐TEP/TC para evaluar la citorreducción incompleta con enfermedad residual de cualquier tamaño > 0 cm) con sensibilidades de 1,0 (IC del 95%: 0,54 a 1,0) y 0,66 (IC del 95%: 0,60 a 0,73) y especificidades de 1,0 (IC del 95%: 0,80 a 1,0) y 0,88 (IC del 95%: 0,80 a 0,93), respectivamente (evidencia de certeza baja y moderada). Tres estudios (178 participantes) investigaron la RM en diferentes enfermedades de interés, de los cuales dos investigaron la RM‐DW y uno la RM convencional. El primer estudio mostró que la RM‐DW determina la citorreducción incompleta con enfermedad residual de cualquier tamaño con una sensibilidad de 0,94 (IC del 95%: 0,83 a 0,99) y una especificidad de 0,98 (IC del 95%: 0,88 a 1,00) (evidencia de certeza baja y moderada). Para la TC abdominal, la sensibilidad para evaluar la citorreducción incompleta fue 0,66 (IC del 95%: 0,52 a 0,78) y la especificidad 0,77 (IC del 95%: 0,63 a 0,87) (evidencia de certeza baja y baja). El segundo estudio informó una sensibilidad de la RM‐DW de 0,75 (IC del 95%: 0,35 a 0,97) y una especificidad de 0,96 (IC del 95%: 0,80 a 1,00) (evidencia de certeza muy baja) para la evaluación de la citorreducción incompleta con enfermedad residual > 1 cm. En el último estudio, la sensibilidad para la evaluación de la citorreducción incompleta con enfermedad residual > 2 cm con la RM convencional fue 0,91 (IC del 95%: 0,59 a 1,00) y la especificidad 0,97 (IC del 95%: 0,87 a 1,00) (evidencia de certeza muy baja). En general, la evidencia fue de certeza muy baja a moderada (según GRADE), debido principalmente a los tamaños pequeños de la muestra y la imprecisión.

Conclusiones de los autores

Los estudios indicaron una especificidad alta y una sensibilidad moderada para la FDG‐TEP/TC y la RM para evaluar la citorreducción macroscópica incompleta. Sin embargo, la certeza de la evidencia no fue suficiente para recomendar el agregado habitual de la FDG‐TEP/TC o la RM a la práctica clínica

En un contexto de investigación, se podría considerar el agregado de un método de imagen alternativo en las pacientes identificadas como apropiadas para la citorreducción primaria mediante la TC abdominal, con la intención de filtrar los falsos negativos (es decir, citorreducción, posible sobre la base de la TC abdominal, no posible durante la cirugía real).

¿Cuán exactas son las técnicas de imagenología TEP y RM para determinar la viabilidad de la cirugía primaria de citorreducción por cáncer de ovario?

¿Por qué es importante determinar la viabilidad de la resección tumoral ovárica?
El cáncer de ovario es una enfermedad con una mortalidad alta que afecta a 239 000 mujeres cada año en todo el mundo. Cuando este cáncer se vuelve sintomático y se detecta, las células cancerosas se han propagado en todo el abdomen en la mayoría de las pacientes. El tratamiento consiste en cirugía para retirar tanto tumor visible como sea posible (también llamada cirugía de citorreducción) y quimioterapia. Los ensayos controlados aleatorios han indicado que en las pacientes en las que no es posible retirar mediante cirugía todo el cáncer visible, administrar primero la quimioterapia para reducir el tumor es una estrategia alternativa de tratamiento. Esta alternativa de tratamiento puede mejorar el número de pacientes con una extracción exitosa de todo el tumor visible, lo que se conoce como citorreducción macroscópica. Por lo tanto, es importante determinar de antemano si todos los depósitos tumorales visibles se pueden eliminar mediante cirugía seguida de quimioterapia, o si es necesario administrar primero quimioterapia para reducir el tamaño tumoral antes de realizar la cirugía.

La imagenología con tomografía computarizada abdominal (TC abdominal) se utiliza actualmente para determinar si la cirugía primaria de citorreducción es factible. Sin embargo, no se puede determinar de forma correcta el resultado en todas las pacientes. Otras técnicas de imagenología que se pueden utilizar son la tomografía de emisión de positrones (TEP) y la imagenología de resonancia magnética (RM). La TEP visualiza la captación de glucosa por las células, permite la detección de las metástasis distantes y con frecuencia se realiza de forma paralela con la TC abdominal (FDG‐TEP/TC). La RM proporciona un buen contraste de las partes blandas para detectar lesiones pequeñas. Estas técnicas de imágenes adicionales pueden mejorar la selección del tratamiento.

¿Cuál es el objetivo de esta revisión?

Investigar la exactitud de la TEP y la RM en pacientes con cáncer de ovario de estadio avanzado para determinar la viabilidad de la cirugía primaria de citorreducción.

¿Cuáles son los principales resultados de la revisión?

Se identificaron dos estudios (con 366 participantes) que analizaron la exactitud de la FDG‐TEP/TC y tres estudios (con 178 participantes) que investigaron la exactitud de la RM.

En un grupo hipotético de 1000 pacientes, de las que 620 presentarían tumor residual después de la cirugía (62% de prevalencia), 211 pacientes se considerarían incorrectamente apropiadas para cirugía según la FDG‐TEP/TC y 37 pacientes según RM. Sin embargo, la calidad y la cantidad de estos estudios no fueron suficientes para que estas técnicas de imagen se usen de forma sistemática en la práctica clínica. Por lo tanto, los autores concluyeron que se necesita más investigación antes de poder hacer tal recomendación.

Authors' conclusions

Implications for practice

In women with advanced stage ovarian cancer, no firm conclusions can be drawn regarding the accuracy of FDG‐PET/CT, conventional MRI, or (DW‐)MRI to assess incomplete debulking surgery. FDG‐PET/CT and MRI are commonly available in hospitals and they suggested there was a high specificity and moderate sensitivity to assess incomplete debulking. Potential advantages included the ability of FDG‐PET/CT to detect extra‐abdominal (distant) disease and the soft tissue contrast of MRI for (small) lesion detection.

Importantly, the level of evidence is insufficient to advise routine addition of FDG‐PET/CT or MRI to clinical practice.

Implications for research

When a patient is suspected of ovarian cancer with extensive tumour load, it is difficult to judge tumour resectability based on abdominal CT alone. However, the size of tumour tissue remaining after surgery is one of the main prognostic factors in women with ovarian cancer and necessitates careful patient selection for either primary debulking or neoadjuvant chemotherapy treatment. Therefore, additional tools are needed, ideally more accurate than abdominal CT and less invasive than laparoscopy.

Future research should focus on the additional value of FDG‐PET/CT and MRI compared to abdominal CT in order to reduce the number of women with incomplete debulking. A cohort of women with advanced stage ovarian cancer for whom debulking surgery by abdominal CT is considered feasible could receive either FDG‐PET/CT or MRI before primary debulking is performed. A radiologist that is blinded to the results of the other test could systematically score both imaging modalities, ideally by using a universally accepted and validated scoring system that has yet to be determined.

As previously described, it remains challenging to determine the feasibility of tumour resection since validated prediction models are lacking. Therefore, future research should focus on the construction and verification of predictive algorithms based on radiological findings and other predictors including biochemical parameters, tumour biopsies, and patient characteristics. Ideally, centre‐specific features (e.g. the level of specialisation and annual caseload) should be incorporated as covariates.

Summary of findings

Open in table viewer
Summary of findings Diagnostic accuracy of FDG‐PET/CT and MRI for assessing tumour resectability in advanced epithelial ovarian/fallopian tube/primary peritoneal cancer

What is the diagnostic accuracy of FDG‐PET/CT or MRI for assessing tumour resectability in advanced epithelial ovarian/fallopian tube/primary peritoneal cancer?

Patients Women suspected of ovarian cancer scheduled for surgery

Prior testing Conventional diagnostic work‐up (e.g. physical examination, ultrasound)

Setting University hospitals or specialised cancer institutes

Index test FDG‐PET/CT or MRI. In all studies, the index test was evaluated as a replacement of abdominal CT. No studies were identified that followed an add‐on design.

Target condition Residual disease assessed after debulking surgery

Test

Target condition

No. of women (studies)

Prevalence in study

Sensitivity

(95% CI)

Specificity

(95% CI)

No. of false negatives*

per 1000 tested

No. of false positives**

per 1000 tested

Test accuracy certainty (quality) of evidence (sensitivity/specificity)a

FDG‐PET/CT

Residual disease > 0 cm

23/343 (2)

26%/65%

1.0 (0.54 to 1.0) and 0.66 (0.60 to 0.73)

1.0 (0.80 to 1.0) and 0.88 (0.80 to 0.93)

211 (167 to 248)b

46 (27 to 76)b

Lowc/moderated

DW‐MRI

Residual disease > 0 cm

94 (1)

53%

0.94 (0.83 to 0.99)

0.98 (0.88 to 1.00)

37 (6 to 105)b

8 (0 to 46)b

Lowc/moderated

DW‐MRI

Residual disease > 1 cm

34 (1)

23.5%

0.75 (0.35 to 0.97)

0.96 (0.80 to 1.00)

59 (7 to 153)

31 (0 to 153)

Very low/very low e, f

Conventional MRI

Residual disease > 2 cm

50 (1)

22%

0.91 (0.59 to 1.00)

0.97 (0.87 to 1.00)

20 (0 to 90)

23 (0 to 101)

Very low/very low e,g

CTh

Residual disease > 0 cm

94 (1)

53%

0.66 (95% CI 0.52 to 0.78)

0.77 (95% CI 0.63 to 0.87)

211 (136 to 298)b

87 (49 to 141)b

Low/lowc

CI: confidence interval
CT: computed tomography
DW‐MRI: diffusion‐weighted Magnetic Resonance Imaging
FDG: fluorodeoxyglucose‐18
PET: positron emission tomography
* False negatives (FNs): judged as feasible for surgery based on imaging, with an incomplete debulking at surgery.
** False positives (FPs): judged as not feasible for surgery based on imaging, with a complete debulking at surgery.

a. According to GRADE for sensitivity (false negatives (FNs)) and specificity (false positives (FPs)), respectively
b. Numbers are calculated based on the results of the largest study (Shim 2015) at the mean prevalence of incomplete debulking (62%) of the two largest studies that addressed debulking with residual disease of any size (Michielsen 2017; Shim 2015). The prevalence of incomplete debulking was calculated as (TP + FN)/total study subjects (273/437 = 62%).
c. Downgraded two levels for very wide confidence interval for number of FNs (sensitivity)
d. Downgraded one level for wide confidence interval for number of FPs (specificity)
e. Downgraded two levels as very small sample size; very wide confidence intervals for number of FNs (sensitivity) and number of FPs (specificity).
f. Downgraded one level due to applicability concerns for the Index test since the radiologists were blinded for (presurgical) clinical data.
g. Downgraded one level due to high risk of bias for patient selection and flow and timing.
h. To compare the findings of the included studies (performing PET/CT or MRI to assess tumour resectability) with CT (the current gold standard), we provided the diagnostic accuracy of CT from the study with the best quality of evidence and with the target condition that is currently used in practice (Michielsen 2017).

Background

Epithelial ovarian, fallopian tube, and primary peritoneal cancers are malignancies of the internal female genital tract. Clinically, these tumours are often regarded as a single entity, due to their similarity and overlap in pathophysiology, symptomatology, diagnostic approach, staging, treatment, and prognosis (Prat 2014). Globally, ovarian cancer affects 239,000 women each year (Ferlay 2012). It is most commonly identified at an advanced stage due to the absence of symptoms in early stage‐disease. When symptoms do occur, they are often nonspecific and include abdominal pain or discomfort, bloating, and fatigue (Olson 2001). The extent of ovarian cancer is categorised using the International Federation of Gynaecology and Obstetrics (FIGO) staging criteria (Prat 2014). In advanced stage‐disease, the tumour is not confined to the ovaries (stage I) or true pelvis (stage II), but has spread outside the pelvis through the peritoneal (abdominal) cavity or towards regional lymph nodes (stage III), or to extra‐abdominal lymph nodes and/or with haematogenous spread resulting in distant metastasis (e.g. lungs or liver parenchyma, stage IV) (Mutch 2014; Prat 2014). This late presentation makes ovarian cancer the leading cause of death from gynaecological cancer in developed countries worldwide, with an absolute global mortality of 152,000 women each year (Ferlay 2012).

In women with advanced stage epithelial ovarian, fallopian tube, and primary peritoneal cancer, a combination of chemotherapy and debulking surgery is considered the mainstay of treatment. Debulking surgery (i.e. surgical efforts to remove the bulk of tumour) usually encompasses removal of the uterus (hysterectomy) and adnexa, resection of the omentum (an apron of fatty tissue attached to the greater curvature of the stomach, containing veins, arteries, lymphatics), and the attempted resection of all visible tumour deposits (NCI 2015). The actual feasibility of the latter, in reality, is limited by the location of lesions (e.g. around blood vessels) and the potential morbidity that each resection induces. At the end of each surgical procedure, a conclusion can be drawn on the completeness of debulking (cytoreductive) surgery, categorised into: no visible tumour deposits left (i.e. macroscopic ('complete') debulking); debulking with residual disease ≤ 1 cm (in the past often called 'optimal debulking'); or debulking with residual disease > 1 cm (i.e. incomplete debulking). This distinction is important since, along with tumour response to chemotherapy, the completeness of debulking surgery is the most important prognostic factor for survival in women with advanced stage epithelial ovarian cancer (Bristow 2002; Elattar 2011; NCI 2015; Vergote 2010). Unfortunately, despite chemotherapy and macroscopic debulking surgery, the majority of women still develop recurrent disease (Du Bois 2009). As 'macroscopic complete debulking' is determined by the naked eye of the surgeon, this does not imply that the resections are 'complete' in the sense of cancer‐free surgical margins determined by histopathological examination of the specimen. Therefore, recurrences are can be partly due to remaining microscopic disease (i.e. occult disease) after treatment.

Preoperative diagnostic imaging is used to estimate tumour extension and thus the feasibility of surgical debulking. If macroscopic debulking (removal of all visible tumour) seems feasible, based on imaging, primary debulking surgery is attempted. If imaging indicates that the chance of macroscopic debulking is small, women receive neoadjuvant chemotherapy (in order to reduce tumour load) and subsequently debulking surgery (i.e. interval debulking). Currently, diagnostic imaging is predominantly based on abdominal computed tomography (CT). Unfortunately, this preoperative assessment is imperfect since small tumour deposits can be missed and distinguishing malignant from benign tissue can be challenging. This can lead to cases where primary surgery is attempted in which not all visible tumour can be removed. This causes unnecessary morbidity and negatively influences prognosis (Vergote 2010). In contrast, macroscopic debulking is the strongest independent predictor of patient outcome and should be attempted whenever deemed possible (Vergote 2010). Recent randomised controlled trials have demonstrated equivalence in survival between primary surgery and the alternative approach with neoadjuvant chemotherapy and interval debulking surgery, with reduced morbidity in the latter (Kehoe 2015; Morrison 2012; Vergote 2010).

Bristow 2002 demonstrated the extensive heterogeneity between centres in their percentage of macroscopically debulking and incomplete debulking with residual disease limited to 1 cm in diameter, or 2 cm in the earlier studies (Baker 1994), which ranged from 0% to 100% with a weighted mean of only 41.9%. Even with careful patient selection using laparoscopy, the percentage of women with residual tumour after primary debulking surgery still ranges up to 31% to 43% (Rutten 2017, Rutten 2014).

In conclusion, it is important to conscientiously select women for either primary debulking surgery with adjuvant chemotherapy or neoadjuvant chemotherapy followed by interval debulking. The aim should be to macroscopically debulk those women upfront who can be surgically resected and reduce surgical morbidity in those who cannot, who would benefit from chemotherapy first.

Target condition being diagnosed

The target condition is the outcome of primary debulking surgery for advanced stage epithelial ovarian, fallopian tube, and/or primary peritoneal cancer. The outcome is defined by the diameter of the largest tumour deposit remaining after surgery and is determined by the surgeon performing the procedure. The term 'primary' specifies those women in whom no treatment, surgical or chemotherapy, has been given prior to this surgery. Three target condition categories were considered.

  • Macroscopic debulking, which was defined as no macroscopically visible tumour deposits at the end of surgery. Debulking of all deposits is the objective, though not always clinically feasible (NICE 2011). This can be due to their location (e.g. situated on the mesentery or liver hilum) or when the number of (small) metastases is innumerable (i.e. miliary pattern of spread). In general, deposit resection needs to be abandoned when continuing would induce unacceptable morbidity (e.g. compromising the blood supply to the entire small bowel in case of mesenterial resections). Consequently, this leads to an incomplete debulking with residual deposits of ovarian cancer.

  • Incomplete debulking with visible residual disease, divided into two subcategories, depending on whether there were macroscopically visible tumour deposits:

    • ≤ 1 cm in diameter remaining at the end of surgery; or

    • > 1 cm in diameter remaining at the end of surgery.

Index test(s)

In this systematic review, we considered the following three noninvasive and commonly available index tests.

  • Whole body fluorodeoxyglucose‐18 (FDG) positron emission tomography (PET), with or without a parallel conventional abdominal CT for anatomical reference (PET‐CT).

  • Conventional T1w/T2w (i.e. anatomical) magnetic resonance imaging (MRI), with or without intravenously administered gadolinium contrast.

  • Diffusion‐weighted MRI (DW‐MRI), in addition to conventional MRI, an imaging method that uses the diffusion of water molecules to generate contrast.

Clinical pathway

With (subtle) symptoms, or based on accidental discovery of an abdominal mass, women suspected of ovarian cancer preferably present to a gynaecological oncologist. Here, a standard diagnostic work‐up is performed starting with obtaining information about medical history, symptoms, family history, known allergies, use of medication, and social background. This is followed by a general physical and pelvic examination (Roett 2009). In most centres, ultrasound (transvaginal and/or abdominal) is routinely added to assess the size and composition of the adnexal mass as well as the presence of free fluid in the rectouterine excavation (i.e. pouch of Douglas) (NICE 2011).

Blood tests are performed to assess both general health as well as specific tumour marker levels and a CT scan of the pelvis, abdomen and, optionally, the chest is performed (NICE 2011). The presence, location, and extent of the adnexal mass, ascites, peritoneal tumour deposits, omental caking (abnormally thickened greater omentum which indicates infiltration of tumour tissue), lymph node enlargement, pleural effusion and haematogenous metastases are specifically assessed. In some centres, chest CT is substituted by two‐directional plain film chest radiography.

A multidisciplinary tumour board of experts discuss all findings and determine the diagnosis, stage and treatment plan, and, in particular, the feasibility of ('complete') tumour debulking. When considered feasible, primary debulking surgery followed by adjuvant chemotherapy is preferred. The tumour stage is macroscopically estimated at surgery and definitively after histopathological examination. When the feasibility of debulking surgery is questionable, women are commonly treated with three or six cycles of neoadjuvant chemotherapy (usually a combination of carboplatin and paclitaxel) and subsequently, in the case of no disease progression, with interval debulking surgery.

Alternative test(s)

Laparoscopy, performed either as ambulatory surgery or directly before the laparotomy, was considered as an alternative test. A Cochrane systematic review on laparoscopy for the assessment of tumour resectability in ovarian cancer remained inconclusive (Rutten 2014). However, a recent randomised controlled trial found that the number of incomplete debulking surgeries with residual disease > 1 cm in diameter can be reduced from 39% to 10% by performing diagnostic laparoscopy prior to debulking surgery (Rutten 2017).

Rationale

Abdominal CT is imperfect in assessing the (non‐)resectability of advanced stage ovarian cancer in primary debulking surgery (Borley 2015; Suidan 2014, Vergote 2008). Alternative imaging options, such as PET(‐CT), conventional and diffusion‐weighted MRI, are currently widely available in the developed world and may possibly yield a superior diagnostic test accuracy (DTA) to assess preoperatively if macroscopic debulking can be achieved. First, PET(‐CT) provides information on tumour extension, based on the enhanced glucose metabolism of cancer cells, and is particularly useful for identification of distant metastases. Second, MRI has good soft tissue image contrast and gives a detailed view of structures and its position towards surrounding tissue. These imaging tests can be added to the preoperative work‐up (if the healthcare system permits with respect to costs), either as an alternative to abdominal CT (i.e. replacement test) or in combination with abdominal CT (i.e. as an add‐on test). Adding an alternative imaging method can be considered in women with a tumour load determined resectable by abdominal CT, in an attempt to filter out false‐negatives (i.e. resectable based on abdominal CT, not resectable according to the alternative method). In these women with non‐resectable tumours, additional imaging studies such as MRI or PET(‐CT) may possibly reduce the percentage of women with residual disease after primary debulking surgery. If PET(‐CT) and/or MRI show superior accuracy, more adequate selection of women for either primary debulking or neoadjuvant chemotherapy can be performed.

Unfortunately, there is currently no systematic review which addresses the DTA of these imaging modalities (see; Index test(s)) in this context.

Objectives

To assess the diagnostic accuracy of fluorodeoxyglucose‐18 (FDG) PET/CT, conventional and diffusion‐weighted (DW) MRI as replacement or add‐on to abdominal CT, for assessing tumour resectability at primary debulking surgery in women with stage III to IV epithelial ovarian/fallopian tube/primary peritoneal cancer.

Secondary objectives

To investigate the year of study initiation, the annual surgical caseload, and whether surgery is performed by a gynaecological oncologist as possible sources of heterogeneity. For further details, please see Investigations of heterogeneity.

Methods

Criteria for considering studies for this review

Types of studies

We included randomised comparisons of diagnostic tests, cross‐sectional, retrospective and prospective cohort studies, that address the DTA of preoperative PET(‐CT), conventional or (additional) diffusion‐weighted MRI on assessing tumour resectability in women who are scheduled to undergo primary debulking surgery. Studies which added the index test(s) on to abdominal CT or when the index test replaced abdominal CT, were included. To evaluate the add‐on effect, the alternative imaging test had to be performed within four weeks before or after the abdominal CT. Studies following a case‐control design, which carry an inherent high risk of bias in a DTA research objective, were excluded.

Participants

Studies had to include adult (18 years of age or more) women diagnosed with advanced stage (stage III to IV) epithelial ovarian/fallopian tube/primary peritoneal cancer, considered eligible for primary debulking surgery (i.e. no adjuvant chemotherapy treatment or prior surgery to assess tumour extension was performed). Also, studies with participants in stage I to IV disease were included if data from women with stage III to IV disease could be extracted.

Index tests

The index tests of interest were preoperatively performed fluorodeoxyglucose‐18 PET(‐CT), conventional and diffusion‐weighted MRI (see; Index test(s)). All these imaging modalities were used as a replacement or as an add‐on to abdominal CT in women with advanced epithelial ovarian/fallopian tube/primary peritoneal cancer.

A positive index test was defined as an assessment of tumour spread in which resection at primary debulking surgery was judged to be unfeasible (i.e. index test indicates ‘tumour is not resectable’) by the radiologist or multidisciplinary tumour board. Conversely, a negative index test was defined as a tumour for which resection by primary debulking surgery was considered feasible.

Target conditions

The target condition was defined as the resectability of all deposits from epithelial ovarian/fallopian tube/primary peritoneal cancer at primary debulking surgery. This target condition had three categories (see; Target condition being diagnosed) which makes two commonly studied and clinically relevant dichotomisations possible (see: Statistical analysis and data synthesis).

Reference standards

The reference standard was the process of debulking surgery. This is most commonly performed via a laparotomy, although in recent years laparoscopy has also been performed in cases of limited disease volume. During such a procedure, the abdomen is systematically explored to assess the tumour spread and its resectability. The outcome category (size of residual tumour after surgery) was determined by the surgeon at the end of this surgery.

Search methods for identification of studies

Our search for relevant literature involved both electronic databases (see Electronic searches) and additional sources (see Searching other resources).

Electronic searches

We searched MEDLINE (Ovid) and Embase (Ovid) systematically for potentially eligible studies. We did not use search filters (collections of terms aimed at reducing the number needed to screen) as an overall limiter because those published have not proved sensitive enough (Beynon 2013) and we applied no language restriction. The Medline search strategy was developed in conjunction with Cochrane Gynaecological, Neuro‐oncology and Orphan Cancers and this along with the Embase strategy were executed by co‐author René Spijker who has extensive experience in systematic reviews.

  • MEDLINE Ovid (January 1946 to 23 February 2017) (Appendix 1).

  • Embase (January 1946 to 23 February 2017) (Appendix 2

Searching other resources

We searched both ClinicalTrials.gov (Appendix 3) and WHO‐ICTRP (Appendix 4) to identify prospectively registered trials. Furthermore, the reference lists of all relevant studies were searched for additional relevant studies using Web of Science.

Data collection and analysis

The data collection and analysis adhered to the guidelines provided in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Deeks 2013).

Selection of studies

All titles and abstracts retrieved by electronic searching were downloaded into a reference management database and duplicates were removed. The remaining references were independently examined by two review authors (JFR and JPH) using the pre‐set inclusion and exclusion criteria, as stated above. Afterwards, discrepancies in judgement between both review authors were discussed until consensus was reached. When the possible inclusion or exclusion of an individual study remained unclear, full‐text assessment was independently performed by the same two review authors for a final decision. Articles considered directly eligible based on title and abstract screening were also read in full text to definitively confirm adherence to the inclusion and exclusion criteria. Excluded studies were documented and the reasons for exclusion were stated according to the guidance provided in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy.

Data extraction and management

Two review authors (JFR and JPH) independently performed data extraction from the selected studies. Data were checked and entered into RevMan 5 by one review author and checked by another review author.

For the included studies, general information (title, aim of study, setting, study design, inclusion period), data on characteristics of women (inclusion criteria, exclusion criteria, age, FIGO stage, number of enrolled and eligible women) and index test (type, criteria to consider primary debulking unfeasible), outcomes and deviations from the protocol were abstracted onto a data abstraction form specially designed for the review (see Characteristics of included studies). We contacted the authors of the original studies in case of missing data.

Assessment of methodological quality

The QUADAS‐2 assessment tool for diagnostic accuracy studies in the context of systematic reviews was completed for all included studies (Whiting 2011). This assessment was performed independently by two review authors (JFR and JPH) and final results were based on consensus discussion. Operational definitions of QUADAS‐2 items were derived from Rutten 2014 and are described in Appendix 5.

Statistical analysis and data synthesis

We performed separate analyses for different target conditions based on the size of residual disease after debulking surgery (see: Target condition being diagnosed):

  • Incomplete debulking with residual disease of any size (> 0 cm in diameter) versus macroscopic debulking.

  • Debulking with residual disease > 1 cm versus residual disease ≤ 1 cm in diameter.

From each study, we extracted the numbers of true and false negatives and positives to calculate sensitivity and specificity. Figure 1 and Figure 2 outline the definitions of the two by two table for these analyses. Figure 3 shows a visual representation of the 2 x 2 tables.


Definitions of the two by two table, wherein the index tests are tabulated against the reference standard outcome, on the analysis: macroscopic debulking versus incomplete debulking with residual disease of any size (i.e. consisting of deposits ≤ 1 cm and > 1 cm in diameter ). TP = true positive, FP = false positive, FN = false negative, TN = true negative.

Definitions of the two by two table, wherein the index tests are tabulated against the reference standard outcome, on the analysis: macroscopic debulking versus incomplete debulking with residual disease of any size (i.e. consisting of deposits ≤ 1 cm and > 1 cm in diameter ). TP = true positive, FP = false positive, FN = false negative, TN = true negative.


Definitions of the two by two table, wherein the index tests are tabulated against the reference standard outcome, on the analysis: macroscopic debulking or incomplete debulking with residual disease ≤ 1 cm in diameter versus incomplete resection with residual disease > 1 cm in diameter. TP = true positive, FP = false positive, FN = false negative, TN = true negative.

Definitions of the two by two table, wherein the index tests are tabulated against the reference standard outcome, on the analysis: macroscopic debulking or incomplete debulking with residual disease ≤ 1 cm in diameter versus incomplete resection with residual disease > 1 cm in diameter. TP = true positive, FP = false positive, FN = false negative, TN = true negative.


Visual representation of 2 x 2 table. TP = true positive, FP = false positive, FN = false negative, TN = true negative.

Visual representation of 2 x 2 table. TP = true positive, FP = false positive, FN = false negative, TN = true negative.

We intended to perform analyses for the index tests as add‐on tests in women who were considered resectable based on abdominal CT (CT ‘negatives’) to filter out women who were erroneously considered resectable by abdominal CT (false‐negatives). We planned to perform meta‐analyses according to the guidelines described in Chapter 10 of the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Macaskill 2010). Unfortunately, we could not perform a meta‐analysis for the three MRI studies since the target condition differed between these studies. For the PET‐studies, we could not perform meta‐analysis, because of the limited number of studies. Review Manager 2014 was used to prepare forest plots of sensitivity and specificity of the included studies.

We assigned levels of evidence to the various outcome categories (true positive (TP), false positive (FP), false negative (FN) and true negative (TN), see Figure 1 and Figure 2) according to GRADE and prepared 'Summary of findings' tables (Hsu 2011; Schünemann 2008). Labelling the tumour status erroneously as resectable ('false negatives') was considered worse than labelling the tumour status erroneously as non‐resectable ('false positives'). For GRADE, therefore, the DTA outcome ‘false negative’ was deemed 'critical' (9) and the DTA outcome 'false positive' as less critical (8). The other outcomes (TP and TN) were considered 'important'. To create the GRADE profiles and 'Summary of findings' tables, we used GRADEpro GDT.

Investigations of heterogeneity

We had planned to explore heterogeneity by adding covariates to the statistical model but the limited number of studies prevented this.

Sensitivity analyses

We had intended to perform sensitivity analyses by excluding studies at high risk of bias for each of the QUADAS‐2 domains, but we were unable to do so due to too few studies.

Assessment of reporting bias

No assessment of reporting bias was performed. Currently, no uniformly accepted and validated method for assessing this type of bias, in the context of a review based on DTA studies, exists (Van Enst 2014).

Results

Results of the search

Our search identified 7,101 citations in MEDLINE and 11,653 in Embase. After removing duplicates, 14,789 articles remained for title and abstract screening. A total of 11 articles were deemed potentially eligible and were reviewed in full text. Of these, seven did not meet the inclusion criteria and are listed in the Characteristics of excluded studies table, along with their reasons for exclusion. We included the four remaining articles in this review. Searching ClinicalTrials.gov and WHO‐ICTRP revealed 119 and 64 additional trials, respectively, and, out of these, one additional study eligible for inclusion was identified. Reference checking with Web of Science revealed 160 citations, but no additional studies were found. Five studies were therefore finally included in this analysis. An overview of the search results is presented in Figure 4.


Study flow diagram.

Study flow diagram.

Results are presented separately for FDG‐PET/CT and MRI in this review. We could not identify studies that addressed the accuracy of FDG‐PET/CT and MRI as add‐on tests to abdominal CT. Characteristics and quality assessments of the individual studies can be found in the Characteristics of included studies table.

FDG‐PET/CT
Two studies investigated the accuracy of FDG‐PET/CT for assessing tumour resectability (Alessi 2016; Shim 2015).

The first study prospectively investigated the accuracy of FDG‐PET/CT to assess the outcome of debulking surgery (Alessi 2016). The target condition was, after clarification was provided by the study authors, macroscopic debulking with no visible tumour remaining after surgery. In 29 consecutive women with an ovarian mass, total body FDG‐PET/CT was performed within 20 days of debulking surgery. All women underwent explorative laparotomy. Where debulking was considered feasible, women received primary debulking surgery and the remaining women received neoadjuvant chemotherapy. Criteria to consider primary debulking unfeasible are summarised in Table 1. Out of 29 women, 23 were diagnosed with ovarian cancer (of whom four had early stage‐disease), and are included in our analysis.

Open in table viewer
Table 1. Criteria to consider primary debulking unfeasible

Criteria to consider primary debulking unfeasible according to study methods

Alessi

Shim

Espada

Forstner

Michielsen

Site of tumour involvement

Liver/porta hepatis

Yes

No

No

Yes

Yes

Mesentery

Yes

Yes

Yes

Yes

No

Colon

Yes, when necessitating > 4 bowel resections

No

No

No

Yes, when necessitating multiple bowel resections

Stomach

Yes

No

Yes

No

Yes

Pancreas

Yes

No

No

No

Yes

Duodenum

Yes

No

No

No

Yes

Diaphragm

No

Yes

No

Yes

No

Ascites

No

Yes

No

No

No

Peritoneal carcinomatosis

Yes

Yes

No

No

No

Lesser sac/bursa omentalis

No

No

Yes

Yes

No

Spleen/splenic hilum

No

No

Yes

No

No

Lymph nodes above level of renal vessels/at coeliac axis

No

No

Yes

Yes

Yes

Gastrosplenic ligament

No

No

No

Yes

No

Presacral extraperitoneal disease

No

No

No

Yes

No

Extra‐abdominal distant metastasis

No

No

No

No

Yes

Vessels of coeliac trunk

No

No

No

No

Yes

Hepatoduodenal ligament

No

No

No

No

Yes

Superior mesenteric artery

No

No

No

No

Yes

Yes: site of tumour involvement is selected as one of the criteria to consider primary debulking unfeasible
No: site of tumour involvement is not selected as a criterion to consider primary debulking unfeasible

The second study developed and validated a model to determine incomplete debulking with residual disease of any size in women with advanced stage ovarian cancer (Shim 2015). A total of 343 women were included and allocated to a development (n = 240) or validation (n = 103) cohort. All received primary debulking surgery. Women undergoing neoadjuvant chemotherapy, due to insufficient physical condition for surgery or presence of extra‐abdominal disease, were excluded. The prediction model consisted of five FDG‐PET/CT features (four anatomical structures, see Table 1, and the tumour FDG uptake ratio) and one non‐imaging related feature (an unvalidated surgical aggressiveness index). FDG‐PET/CT was performed within four weeks of surgery.

MRI
Three studies addressed the accuracy of MRI for assessing tumour resectability (Espada 2013; Forstner 1995; Michielsen 2017). One study addressed conventional MRI (Forstner 1995), and two studies addressed DW‐MRI (Espada 2013; Michielsen 2017). Two of the studies also addressed the accuracy of abdominal CT (Forstner 1995; Michielsen 2017).

The first study assessed the diagnostic accuracy of MRI in combination with diffusion‐weighted imaging (DW‐MRI) compared to explorative laparotomy for assessing incomplete debulking surgery in women with advanced stage ovarian cancer (Espada 2013). Surgery was performed by a gynaecological oncologist and incomplete debulking was defined as residual tumour > 1 cm in diameter. Within 15 days of surgery, 3‐Tesla (DW‐)MRI of the abdomen and pelvis was performed. Criteria to consider primary debulking surgery unfeasible are summarised in Table 1. From the 36 recruited women, 34 were diagnosed with ovarian cancer and included in the analysis.

The second study prospectively evaluated ovarian cancer staging and tumour resectability with abdominal CT or conventional T1w/T2w MRI, or both (Forstner 1995). A total of 128 women were enrolled, of whom 82 received imaging by abdominal CT, MRI, or both. After inclusion, women with neoadjuvant chemotherapy, benign disease, other intra‐abdominal malignancies or those who had undergone surgery more than one month after MRI were excluded from the statistical analysis (n = 46). In our analysis, data from the subgroup of 50 women with MRI were included, of whom 30 had FIGO stage III/IV ovarian cancer. The target condition was defined as debulking with residual disease < 2 cm in diameter. Criteria to consider whether primary debulking was unfeasible are summarised in Table 1. MRI was performed within four weeks of surgery and all women received debulking surgery.

The third study compared the accuracy of abdominal CT and whole body DW‐MRI to assess incomplete debulking with residual disease of any size in women with ovarian cancer (Michielsen 2017). This prospective study enrolled 126 women, of whom 94 were diagnosed with ovarian cancer and were eligible for analysis. All women received (primary or interval) debulking surgery, except for four women, who were physically unfit to undergo surgery. If surgery was considered unfeasible, a diagnostic laparoscopy was performed as a reference standard to confirm non‐resectability. Criteria to consider whether primary debulking was unfeasible are summarised in Table 1. Out of the 94 women with ovarian cancer, 73 had advanced stage (III or IV) disease. No details were provided on the time period between the index test and reference standard.

Methodological quality of included studies

The results of the QUADAS‐2 assessments are presented in Figure 5.


Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

FDG‐PET/CT
The two FDG‐PET/CT studies were considered at low risk of bias in all domains, except for the reference standard domain (Alessi 2016; Shim 2015). For both studies, it was unclear whether the surgery was performed by a gynaecologist or a specialised gynaecological oncologist and, for one study, it was not reported whether the outcome of debulking surgery was interpreted blind to the FDG‐PET/CT results (Alessi 2016). There were no applicability concerns for both studies.

MRI
One study addressing DW‐MRI was judged to be at low risk of bias in all domains, except for participant selection (Espada 2013). Enrolment procedures and exclusion criteria were not described, resulting in an unclear risk of bias for this domain. There were concerns about applicability for the Index test domain since the radiologists were blinded for (presurgical) clinical data contrary to standard practice. Furthermore, the applicability of participant selection remained unclear because no details were provided on the diagnostic assessment leading to their selection.

A second study addressing conventional T1w/T2w MRI was judged to have a high risk of bias in two domains (Forstner 1995). There were concerns about participant selection since their allocation to the imaging modality was decided on a variety of factors, including the preference of the referring physician. Additionally, during its execution the study design and methodology were changed. The initial goal was to perform both abdominal CT and MRI in all women for an intrapatient comparison. However, due to difficulties in participant recruitment, the study design was changed into a non randomised inter‐participant comparison that required either abdominal CT or MRI imaging. From the initial 128 women recruited, 82 underwent both surgery and imaging and formed the study population. Women treated with neoadjuvant chemotherapy were excluded after enrolment. Consequently, the participant flow could have introduced bias. Applicability concerns for this study remained unclear for two domains. First, it was unclear whether all women were scheduled for debulking surgery after diagnostic assessment. Secondly, the study did not provide a clear definition of what was considered a positive result for the reference standard. However, in the discussion, the study authors specified that debulking was considered optimal when, after surgery, no tumour of > 2 cm remained, which was the standard for debulking surgery during the study period (1990 to 1994). Thirdly, with changing attitudes towards the goal of debulking surgery over the past two decades, from < 2 cm in the 1990s to no visible residual tumour nowadays, applicability concerns were present for this study.

In a third study investigating DW‐MRI, the overall risk of bias was judged as low and there were no applicability concerns (Michielsen 2017). It remained unclear if the study flow and timing could have introduced bias because no information was provided on the time period between the index test and reference standard.

Findings

FDG‐PET/CT
Two studies evaluated the accuracy of FDG‐PET/CT to assess tumour resectability. The target condition was incomplete debulking with residual disease of any size (> 0 cm in diameter) versus macroscopic debulking. Definitions of true positives, false positives, false negatives, and true negatives are shown in (Figure 1).

In the first study, the prevalence of incomplete debulking was 6/23 (26%) (Alessi 2016). The sensitivity for assessing incomplete debulking (with residual disease of any size) of FDG‐PET/CT was 1.00 (95% CI 0.54 to 1.00) and the specificity 1.00 (95% CI 0.80 to 1.00), as displayed in Figure 6.


Forest plot of tests: 1 PET/CT for assessing incomplete debulking with residual disease of any size, 4 MRI for assessing incomplete debulking with residual disease of any size, 2 MRI for assessing incomplete debulking with residual disease > 1 cm, 3 MRI for assessing incomplete debulking with residual disease > 2 cm.

Forest plot of tests: 1 PET/CT for assessing incomplete debulking with residual disease of any size, 4 MRI for assessing incomplete debulking with residual disease of any size, 2 MRI for assessing incomplete debulking with residual disease > 1 cm, 3 MRI for assessing incomplete debulking with residual disease > 2 cm.

The second FDG‐PET/CT study used a prediction model including five FDG‐PET/CT features and a surgical aggressiveness index to assess incomplete debulking with residual disease of any size in 343 women with ovarian cancer (Shim 2015). The study authors defined the high‐risk group as having a predicted probability of incomplete debulking of greater than 80%. With this prediction model, the prevalence of incomplete debulking was 65% and 163 women would be classified as being unsuitable for debulking (positive index test), of whom 148 would have incomplete debulking with residual disease (positive reference standard). The sensitivity of FDG‐PET/CT for incomplete debulking (with residual disease of any size) was 0.66 (95% CI 0.60 to 0.73) and the specificity 0.88 (95% CI 0.80 to 0.93).

If the very small first study (Alessi 2016) was ignored, the following results would apply to a hypothetical group of 1,000 women with an incomplete debulking prevalence of 62% (mean prevalence of Michielsen 2017 and Shim 2015): 211 women (95% CI 167 to 248) would be incorrectly classified as having no residual tumour (FNs) after surgery and 46 women (95% CI 27 to 76) would be incorrectly classified as having residual disease (FPs) after surgery (summary of findings Table).

MRI
All three MRI studies assessed the diagnostic accuracy of MRI for a different target condition. Figure 6 displays the paired forest plots of sensitivity and specificity for assessing incomplete debulking for the different target conditions.

The first study (Espada 2013) used a self‐developed predictive score based on abdominal sites and tumour extension on DW‐MRI to assess incomplete debulking with residual disease > 1 cm. Debulking was incomplete in 8 of 34 women (23.5%). A score ≥ 6 had the highest overall accuracy at 91%. The sensitivity for assessing incomplete debulking of DW‐MRI was 0.75 (95% CI 0.35 to 0.97) and the specificity 0.96 (95% CI 0.80 to 1.00).

In the second study (Forstner 1995), 11 out of 50 women had incomplete debulking surgery with residual disease > 2 cm (22%). The sensitivity for assessing incomplete debulking on conventional MRI was 0.91 (95% CI 0.59 to 1.0) and the specificity 0.97 (95% CI 0.87 to 1.0). For abdominal CT, the sensitivity for assessing incomplete debulking was 0.50 (95% CI 0.12 to 0.88) and the specificity 1.0 (95% CI 0.91 to 1.0).

The third MRI study (Michielsen 2017) compared the diagnostic accuracy of DW‐MRI and abdominal CT. From the 94 included women, 44 underwent primary debulking surgery. Macroscopic debulking was performed in 39 women (89%), two women had residual tumour < 1 cm, one woman had residual disease > 1 cm and two women were unfit for surgery. In the 50 remaining women, treated with neoadjuvant chemotherapy and interval debulking, non‐resectability was confirmed with laparoscopy or biopsy from a distant metastasis. In this study, the prevalence of incomplete debulking with residual disease of any size was 53% and the sensitivity for assessing incomplete debulking (with residual disease of any size) of DW‐MRI was 0.94 (95% CI 0.83 to 0.99) and the specificity 0.98 (95% CI 0.88 to 1.00). For abdominal CT, the sensitivity for assessing incomplete debulking was 0.66 (95% CI 0.52 to 0.78) and the specificity 0.77 (95% CI 0.63 to 0.87).

An overview of the results is provided in summary of findings Table.

Discussion

Summary of main results

The aim of this systematic review was to determine the diagnostic accuracy of FDG‐PET/CT and MRI for assessing incomplete debulking surgery in women with advanced stage epithelial ovarian cancer. We included five studies: two addressing FDG‐PET/CT (Alessi 2016; Shim 2015); one conventional MRI (Forstner 1995) and two DW‐MRI (Espada 2013; Michielsen 2017). Both FDG‐PET/CT and MRI showed high specificity and moderate sensitivity (see summary of findings Table). In a hypothetical group of 1000 women, of whom 620 would have incomplete debulking of any size (prevalence 62%), in 211 women (95% CI 167 to 248), surgery would incorrectly be considered feasible according to FDG‐PET/CT and in 37 women (95% CI 6 to 105) according to MRI. However, the quality of evidence was very low to moderate according to GRADE, mainly due to the small sample sizes of the included studies.

In all studies, FDG‐PET/CT or MRI were used as an initial test and the sensitivity and specificity were determined irrespective of abdominal CT results. Therefore, this review does not provide information on the accuracy of FDG‐PET/CT or MRI as add‐on tests to abdominal CT, only as its replacement. The two studies that addressed the accuracy of abdominal CT (Forstner 1995; Michielsen 2017) found low sensitivity and moderate specificity for assessing incomplete debulking of residual disease > 2 cm and of any size, respectively. A review comparing 10 studies that used abdominal CT‐based models to assess residual disease showed a sensitivity ranging from 19.2% to 100% and specificity from 56.7% to 100% (Rutten 2015). This broad range can be explained by the different definitions used for the size of residual disease (e.g. the sensitivity and specificity can be different for assessing residual disease > 2 cm and > 0 cm).

In our included studies, the prevalence of incomplete debulking varied from 22% to 63%. This wide range may in part be due to changes in the goal of debulking surgery over the past decades, previously < 2 cm residual disease to the current standard of no macroscopically visible residual disease. Therefore, the prevalence of incomplete debulking is likely to increase when the accepted size of remaining tumour after surgery decreases.

Two studies (Forstner 1995; Shim 2015) excluded women who had received neoadjuvant chemotherapy instead of primary debulking surgery. This exclusion has affected the numbers in the two by two tables, possibly leading to an underestimation of the sensitivity, since most of the women would have been considered unsuitable for surgery.

All studies used laparotomy as a reference standard and one study (Michielsen 2017) used also laparoscopy or biopsy from a suspect distant lesion as reference standards to confirm tumour non‐resectability in women in which primary debulking was considered unfeasible. However, there might be ethical concerns with respect to operating on women where debulking surgery was considered not feasible.

While a number of studies have tried to identify specific radiologic predictors of incomplete debulking, no accepted universally validated scoring instrument exists. For clinical and study purposes, a standardised image‐based instrument that can assess the feasibility of debulking surgery is desired. As a result, methodological heterogeneity exists between the included studies due to their different criteria on the feasibility of primary debulking (Table 1). The management of recurrent disease has also been widely investigated over the past years. A tool including performance status, the completeness of primary debulking, and the presence of ascites has been developed and used to assess the feasibility of secondary debulking in women with recurrent disease (Harter 2011). As the outcome of primary surgery is one of the criteria for this review, this tool cannot be used for assessing the feasibility of primary debulking.

Determining tumour resectability remains a complex and heterogeneous decision, since the feasibility of surgery depends not only on imaging results (which captures the dissemination pattern), but also on the experience and degree of specialisation of the surgeon, the institutional policies, patient’s physical condition, and her personal preferences (e.g. willingness to risk a colostomy).

Strengths and weaknesses of the review

Our extensive search with comprehensive inclusion criteria yielded a large number of screened publications. However, only a small number of publications addressed the review question. Several studies performed analyses based on specific tumour sites but lacked an overall judgement on tumour (non)resectability (Hynninen 2013; Pfannenberg 2009; Risum 2008). Therefore, we could not perform meta‐analyses or correct for possible sources of heterogeneity such as year of study initiation. We successfully contacted study authors for clarification on their study methods and results when details were missing. Unfortunately, the sample size of included studies limits the ability to draw robust conclusions and no studies addressing the accuracy of FDG‐PET/CT or MRI as additional tests to abdominal CT were found. Also, as the accepted size of remaining tumours after surgery was different for the three MRI studies, it was impossible to estimate summary sensitivity and specificity. Another limitation of this review is that it is uncertain if in some studies the index test was used to exclude participants. This could have introduced bias in estimating the positive predictive value (PPV) women for which surgery was considered unfeasible by the index test with residual disease after debulking) and, therefore, the focus should lie with the negative predictive value (NPV).

Applicability of findings to the review question

All studies addressed the diagnostic test accuracy for FDG‐PET/CT or MRI as an initial test and showed sensitivity and specificity independent of abdominal CT results. Therefore, we were unable to provide information on the accuracy of the index tests as add‐on tests to abdominal CT. The proposed study population of this review had ovarian cancer in an advanced stage. Nevertheless, from the included studies, four out of 366 women (1%) undergoing FDG‐PET/CT and 26 out of 178 women (15%) undergoing MRI had early stage‐disease at surgery. We decided to include these women in our analysis as this reflects clinical practice.

Definitions of the two by two table, wherein the index tests are tabulated against the reference standard outcome, on the analysis: macroscopic debulking versus incomplete debulking with residual disease of any size (i.e. consisting of deposits ≤ 1 cm and > 1 cm in diameter ). TP = true positive, FP = false positive, FN = false negative, TN = true negative.
Figures and Tables -
Figure 1

Definitions of the two by two table, wherein the index tests are tabulated against the reference standard outcome, on the analysis: macroscopic debulking versus incomplete debulking with residual disease of any size (i.e. consisting of deposits ≤ 1 cm and > 1 cm in diameter ). TP = true positive, FP = false positive, FN = false negative, TN = true negative.

Definitions of the two by two table, wherein the index tests are tabulated against the reference standard outcome, on the analysis: macroscopic debulking or incomplete debulking with residual disease ≤ 1 cm in diameter versus incomplete resection with residual disease > 1 cm in diameter. TP = true positive, FP = false positive, FN = false negative, TN = true negative.
Figures and Tables -
Figure 2

Definitions of the two by two table, wherein the index tests are tabulated against the reference standard outcome, on the analysis: macroscopic debulking or incomplete debulking with residual disease ≤ 1 cm in diameter versus incomplete resection with residual disease > 1 cm in diameter. TP = true positive, FP = false positive, FN = false negative, TN = true negative.

Visual representation of 2 x 2 table. TP = true positive, FP = false positive, FN = false negative, TN = true negative.
Figures and Tables -
Figure 3

Visual representation of 2 x 2 table. TP = true positive, FP = false positive, FN = false negative, TN = true negative.

Study flow diagram.
Figures and Tables -
Figure 4

Study flow diagram.

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study
Figures and Tables -
Figure 5

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Forest plot of tests: 1 PET/CT for assessing incomplete debulking with residual disease of any size, 4 MRI for assessing incomplete debulking with residual disease of any size, 2 MRI for assessing incomplete debulking with residual disease > 1 cm, 3 MRI for assessing incomplete debulking with residual disease > 2 cm.
Figures and Tables -
Figure 6

Forest plot of tests: 1 PET/CT for assessing incomplete debulking with residual disease of any size, 4 MRI for assessing incomplete debulking with residual disease of any size, 2 MRI for assessing incomplete debulking with residual disease > 1 cm, 3 MRI for assessing incomplete debulking with residual disease > 2 cm.

PET/CT for assessing incomplete debulking with residual disease of any size.
Figures and Tables -
Test 1

PET/CT for assessing incomplete debulking with residual disease of any size.

MRI for assessing incomplete debulking with residual disease > 1 cm.
Figures and Tables -
Test 2

MRI for assessing incomplete debulking with residual disease > 1 cm.

MRI for assessing incomplete debulking with residual disease > 2 cm.
Figures and Tables -
Test 3

MRI for assessing incomplete debulking with residual disease > 2 cm.

MRI for assessing incomplete debulking with residual disease of any size.
Figures and Tables -
Test 4

MRI for assessing incomplete debulking with residual disease of any size.

Summary of findings Diagnostic accuracy of FDG‐PET/CT and MRI for assessing tumour resectability in advanced epithelial ovarian/fallopian tube/primary peritoneal cancer

What is the diagnostic accuracy of FDG‐PET/CT or MRI for assessing tumour resectability in advanced epithelial ovarian/fallopian tube/primary peritoneal cancer?

Patients Women suspected of ovarian cancer scheduled for surgery

Prior testing Conventional diagnostic work‐up (e.g. physical examination, ultrasound)

Setting University hospitals or specialised cancer institutes

Index test FDG‐PET/CT or MRI. In all studies, the index test was evaluated as a replacement of abdominal CT. No studies were identified that followed an add‐on design.

Target condition Residual disease assessed after debulking surgery

Test

Target condition

No. of women (studies)

Prevalence in study

Sensitivity

(95% CI)

Specificity

(95% CI)

No. of false negatives*

per 1000 tested

No. of false positives**

per 1000 tested

Test accuracy certainty (quality) of evidence (sensitivity/specificity)a

FDG‐PET/CT

Residual disease > 0 cm

23/343 (2)

26%/65%

1.0 (0.54 to 1.0) and 0.66 (0.60 to 0.73)

1.0 (0.80 to 1.0) and 0.88 (0.80 to 0.93)

211 (167 to 248)b

46 (27 to 76)b

Lowc/moderated

DW‐MRI

Residual disease > 0 cm

94 (1)

53%

0.94 (0.83 to 0.99)

0.98 (0.88 to 1.00)

37 (6 to 105)b

8 (0 to 46)b

Lowc/moderated

DW‐MRI

Residual disease > 1 cm

34 (1)

23.5%

0.75 (0.35 to 0.97)

0.96 (0.80 to 1.00)

59 (7 to 153)

31 (0 to 153)

Very low/very low e, f

Conventional MRI

Residual disease > 2 cm

50 (1)

22%

0.91 (0.59 to 1.00)

0.97 (0.87 to 1.00)

20 (0 to 90)

23 (0 to 101)

Very low/very low e,g

CTh

Residual disease > 0 cm

94 (1)

53%

0.66 (95% CI 0.52 to 0.78)

0.77 (95% CI 0.63 to 0.87)

211 (136 to 298)b

87 (49 to 141)b

Low/lowc

CI: confidence interval
CT: computed tomography
DW‐MRI: diffusion‐weighted Magnetic Resonance Imaging
FDG: fluorodeoxyglucose‐18
PET: positron emission tomography
* False negatives (FNs): judged as feasible for surgery based on imaging, with an incomplete debulking at surgery.
** False positives (FPs): judged as not feasible for surgery based on imaging, with a complete debulking at surgery.

a. According to GRADE for sensitivity (false negatives (FNs)) and specificity (false positives (FPs)), respectively
b. Numbers are calculated based on the results of the largest study (Shim 2015) at the mean prevalence of incomplete debulking (62%) of the two largest studies that addressed debulking with residual disease of any size (Michielsen 2017; Shim 2015). The prevalence of incomplete debulking was calculated as (TP + FN)/total study subjects (273/437 = 62%).
c. Downgraded two levels for very wide confidence interval for number of FNs (sensitivity)
d. Downgraded one level for wide confidence interval for number of FPs (specificity)
e. Downgraded two levels as very small sample size; very wide confidence intervals for number of FNs (sensitivity) and number of FPs (specificity).
f. Downgraded one level due to applicability concerns for the Index test since the radiologists were blinded for (presurgical) clinical data.
g. Downgraded one level due to high risk of bias for patient selection and flow and timing.
h. To compare the findings of the included studies (performing PET/CT or MRI to assess tumour resectability) with CT (the current gold standard), we provided the diagnostic accuracy of CT from the study with the best quality of evidence and with the target condition that is currently used in practice (Michielsen 2017).

Figures and Tables -
Summary of findings Diagnostic accuracy of FDG‐PET/CT and MRI for assessing tumour resectability in advanced epithelial ovarian/fallopian tube/primary peritoneal cancer
Table 1. Criteria to consider primary debulking unfeasible

Criteria to consider primary debulking unfeasible according to study methods

Alessi

Shim

Espada

Forstner

Michielsen

Site of tumour involvement

Liver/porta hepatis

Yes

No

No

Yes

Yes

Mesentery

Yes

Yes

Yes

Yes

No

Colon

Yes, when necessitating > 4 bowel resections

No

No

No

Yes, when necessitating multiple bowel resections

Stomach

Yes

No

Yes

No

Yes

Pancreas

Yes

No

No

No

Yes

Duodenum

Yes

No

No

No

Yes

Diaphragm

No

Yes

No

Yes

No

Ascites

No

Yes

No

No

No

Peritoneal carcinomatosis

Yes

Yes

No

No

No

Lesser sac/bursa omentalis

No

No

Yes

Yes

No

Spleen/splenic hilum

No

No

Yes

No

No

Lymph nodes above level of renal vessels/at coeliac axis

No

No

Yes

Yes

Yes

Gastrosplenic ligament

No

No

No

Yes

No

Presacral extraperitoneal disease

No

No

No

Yes

No

Extra‐abdominal distant metastasis

No

No

No

No

Yes

Vessels of coeliac trunk

No

No

No

No

Yes

Hepatoduodenal ligament

No

No

No

No

Yes

Superior mesenteric artery

No

No

No

No

Yes

Yes: site of tumour involvement is selected as one of the criteria to consider primary debulking unfeasible
No: site of tumour involvement is not selected as a criterion to consider primary debulking unfeasible

Figures and Tables -
Table 1. Criteria to consider primary debulking unfeasible
Table Tests. Data tables by test

Test

No. of studies

No. of participants

1 PET/CT for assessing incomplete debulking with residual disease of any size Show forest plot

2

366

2 MRI for assessing incomplete debulking with residual disease > 1 cm Show forest plot

1

34

3 MRI for assessing incomplete debulking with residual disease > 2 cm Show forest plot

1

50

4 MRI for assessing incomplete debulking with residual disease of any size Show forest plot

1

94

Figures and Tables -
Table Tests. Data tables by test