Scolaris Content Display Scolaris Content Display

Symptoms, ultrasound imaging and biochemical markers alone or in combination for the diagnosis of ovarian cancer in women with symptoms suspicious of ovarian cancer

This is not the most recent version

Collapse all Expand all

Abstract

This is a protocol for a Cochrane Review (Diagnostic test accuracy). The objectives are as follows:

To establish the accuracy of symptoms, ultrasound and biomarkers alone or in combination for the diagnosis of ovarian cancer in pre‐ and postmenopausal women.

To compare the accuracy of different tests or test combinations.

Background

Ovarian cancer is the deadliest and the most common cause of mortality among all gynaecological cancers. In 2012, 239,000 women were diagnosed with ovarian cancer and 152,000 women died worldwide (CRUK 2014). The high case fatality rate is largely attributed to the advanced stage at diagnosis in the majority of ovarian cancers. Approximately 75% of ovarian cancers are diagnosed in an advanced stage. Five‐year survival rates are less than 30% in advanced‐stage disease in comparison to five‐year survival of more than 90% in stage 1 disease (CRUK 2014). Lack of awareness and recognition of symptoms by patients and physicians is considered one of the main factors in delayed diagnosis and poor outcomes. Diagnosis of ovarian cancer is challenging because of variable presentation, the non‐specific nature of symptoms (Fitch 2002), and the low prevalence (0.23%) (Myers 2006). Ten per cent of women will undergo surgery in their lifetime for ovarian pathology, but only a small minority will have ovarian cancer (RCOG 2011). The prevalence of cancer in women undergoing surgery for ovarian pathology is 20% (Koonings 1989), but ranges from 5.7% to 57.5% (Myers 2006).

Diagnosis of ovarian cancer in premenopausal women poses additional challenges. The majority of tumours detected in premenopausal women tend to be benign; only 1 in 1000 symptomatic ovarian cysts are malignant, increasing to 3 in 1000 at age 50 (RCOG 2011).

A considerable amount of research has been dedicated to early diagnosis of ovarian cancer in an effort to improve outcomes. The use of symptoms, biomarkers and imaging (in particular ultrasound parameters) has been explored in an effort to make an earlier and more accurate diagnosis (Bankhead 2005; Sarojini 2012; Van Calster 2012). The accuracy of these tests, alone or in combination, and in different healthcare settings, has been investigated in various studies (Ferraro 2013; Kaijser 2014), but the most accurate combination of tests has yet to be determined.

This is a generic protocol for a series of four linked diagnostic test accuracy reviews to estimate the accuracy of symptoms, ultrasound imagining and biochemical markers alone or in combination for the diagnosis of ovarian cancer:

  • Clinical symptoms for the diagnosis of ovarian cancer

  • Ultrasound imaging for the diagnosis of ovarian cancer

  • Biochemical markers for the diagnosis of ovarian cancer

  • Symptoms, ultrasound imagining and biochemical markers for the diagnosis of ovarian cancer

Target condition being diagnosed

The diagnosis of ovarian cancer is difficult, largely because many physiological and benign conditions in premenopausal women, including the menstrual cycle, endometriosis and fibroids, may present in a similar way: with symptoms, an abnormal ultrasound scan and/or associated with raised biomarkers. The result is that the specificity of these tests is reduced and the probability of false positive results is increased.

Ovarian cancer is a heterogeneous disease and includes epithelial cell tumours, germ cell tumours, stromal cell tumours, metastatic cancers and tumours of low malignant potential (LMP, also known as borderline tumours). More than 90% of ovarian cancers in postmenopausal women are epithelial cell tumours, whereas in premenopausal women 15% to 20% of tumours are germ cell in origin. Epithelial cell tumours are the most common and within this group of tumours high‐grade serous carcinoma is the commonest and most deadly. Other common epithelial histological types are mucinous, clear cell and endometrioid types (Shepherd 2000). Current understanding of the pathogenesis of ovarian cancers suggests they are different diseases, sharing the same anatomical location. Recent morphological and genetic studies have helped to improve our understanding of ovarian carcinogenesis and tumour behaviour based on different histology types. The distal fallopian tube has been identified as the origin for serous ovarian carcinomas and ovarian clear cell cancers, the origin of endometrial cancer has been linked to endometriosis (Wiegand 2010), and the origin of the majority of mucinous tumours is considered to be the appendix (Seidman 2003). A dualistic model has also been proposed based on the behaviour of tumours (Shih 2004). Type I tumours are indolent, present at an early stage and progress from benign, to intermediate to carcinoma in a stepwise pattern; low‐grade serous, endometrioid, clear cell and mucinous carcinomas are examples of type I tumours. Type II tumours are aggressive, high‐grade carcinomas, most often diagnosed at an advanced stage and include high‐grade serous, endometrioid and undifferentiated carcinomas. Type I and type II tumours display markedly different and distinct genetic patterns (Cho 2009). This advancement in understanding has major research implications, especially regarding the role of biomarkers in the management of ovarian cancer. However, there is a lag between recent advances in knowledge and the current evidence base. This understanding is yet to be reflected in the majority of the primary studies. Many biomarkers are being investigated and preliminary evidence on a few promising biomarkers has been reported, but needs to be substantiated in larger studies.

Existing reviews are mostly restricted to the performance of tests in epithelial ovarian cancer in postmenopausal women and they neglect the heterogeneous nature of the disease and the different prevalence of the tumour based on menopausal status (Myers 2006). Most studies have investigated the accuracy of tests for the diagnosis of ovarian cancer in adnexal masses occurring in postmenopausal women or have not made a distinction between pre‐ and postmenopausal women (Myers 2006).

Our review will include primary ovarian cancer of all histological types and stages, including borderline tumours. We will not consider the diagnosis of metastatic disease (cancer found in the ovary, but originating in an other organ) in this review.

Index test(s)

Symptoms

In the last decade, the perception of ovarian cancer as a 'silent killer' has changed. Various published literature has concluded that the diagnosis is preceded by persistent gastrointestinal and urinary symptoms, and menstrual disturbances (Bankhead 2005). Symptoms frequently reported in studies include abdominal pain, pelvic pain, abdominal bloating, distension, altered bowel habit, such as constipation and diarrhoea, and urinary symptoms. However, the duration, severity and nature of these symptoms are non‐specific and mimic benign conditions, such as irritable bowel syndrome and perimenopausal changes, making early recognition and diagnosis challenging. The majority of studies investigating the accuracy of symptoms are case‐control studies using non‐validated questionnaires and are prone to recording or recall bias. The Goff Symptom Index, one of the most commonly studied and validated questionnaires, has been shown to have a sensitivity of 66.7% and specificity of 90% in women older than 50 years, with a corresponding sensitivity and specificity of 86.7% in women of less than 50 years (Goff 2007). However, the potential for the Goff Symptom Index and other symptom scores to significantly improve outcome and their potential value as a diagnostic tool has also been questioned, since the interval from recognition to diagnosis is variable and may only be a few months (Lim 2012).

Biochemical markers

Biochemical markers are substances secreted or shed by tumours into surrounding blood and body fluids and expressed in abnormal tissues. Biomarkers may be uniquely specific for some tumour subtypes or non‐specific.

The most commonly used biomarker for ovarian cancer is CA125, which is raised in many benign and physiological conditions (Moss 2005; Posadas 2004). CA125 operating at a threshold of 30 units/ml has been shown to have a sensitivity of 81% and specificity of 75% for distinguishing benign from malignant tumours in mixed pre‐ and postmenopausal populations with adnexal masses (Jacobs 1990). However, CA125 has a low sensitivity (50%) for early‐stage ovarian cancer (Jacobs 1989), and reduced specificity in premenopausal women.

More recently, other promising tumour markers have received Food and Drug Administration (FDA) approval. These include OVA1TM, which is a multivariate index assay using a combination of five bioassays including CA125 II, transthyretin (TTR), apolipoprotein A1 (Apo‐A1), transferrin and beta 2 microglobulin. Human embryonic antigen (HE4) has been demonstrated to have similar sensitivity, but improved specificity compared to CA125 and OVA1TM for ovarian cancer, particularly in premenopausal women (Ferraro 2013; Holcomb 2011). Human chorionic gonadotropins (HCG), lactate dehydrogenase (LDH) and alpha fetoprotein (AFP) are germ cell tumour markers and are recommended for use in women under 40 years (ACOG 2007; RCOG 2011).

Revised understanding of the pathophysiology of ovarian cancer suggests that the majority of high‐grade serous ovarian cancers and primary peritoneal cancers arise from the fimbrial end of the fallopian tube and are therefore likely to disseminate intraperitoneally early (Vaughan 2011). This implies that early detection with symptoms and ultrasound imaging may never be achievable and sensitive biomarkers will be required to detect early disease. It has been noted that levels of some tumour markers may begin to rise as early as three years prior to diagnosis (Anderson 2009).

Ultrasound

Ultrasound imaging enables visualisation of morphological details of ovarian cysts. The diagnostic potential of ultrasound has improved with advancing technology and the availability of transvaginal ultrasound (TVS), 3D ultrasound and doppler techniques to characterise blood flow. However, the use of ultrasound to characterise lesions is influenced by interference from surrounding tissue, variability of the macroscopic features and the subjective nature of interpretation that is operator‐dependent. Various scores have been developed to make ultrasound interpretation more objective (Geomini 2009). Morphological features, such as size, presence of bilateral lesions, presence and thickness of septum, presence of solid areas, excrescences and papillary structures within tumours, metastases, presence of ascites and doppler measurements of blood flow, have been combined in various ways.

  • The 'U' score (presence of bilateral lesions, multilocularity, solid areas, metastases or ascites, where U = 0 indicates the absence of any of these features; U = 1 indicates the presence of any one of these features and U = 3 indicates the presence of two or more of these features) (RCOG 2011).

  • The 'B' rules (unilocular cysts; presence of solid components where the largest solid component is less than 7 mm; presence of acoustic shadowing; smooth multilocular tumour with a largest diameter of less than 100 mm; no blood flow)) (RCOG 2011).

  • The 'M' rules (irregular solid tumour; presence of solid components where the largest solid component is less than 7 mm; at least four papillary structures; irregular multilocular solid tumour with a largest diameter of 100 mm or more; very strong blood flow) (RCOG 2011).

The U score, B and M rules have been evaluated in many primary studies on their own or in combination with other features (i.e. non‐ultrasound features) (Kaijser 2014). New ultrasound‐based models (simple rules (SR) and logistic regression model 2 (LR2)) have been proposed by the International Ovarian Tumour Analysis (IOTA) as having better diagnostic accuracy in the preoperative evaluation of ovarian tumours but external validation of these scores to date is limited (Kaijser 2014).

Combinations of tests

Ovarian cancer is a heterogeneous tumour and it is likely that a combination of tests has the potential to improve diagnostic accuracy over any single test alone. The Risk of Malignancy Index (RMI), calculated by multiplying the ultrasound score, menopausal status and CA125 (RMI = U x M x CA125), is the most widely used combination of tests. The sensitivity and specificity of RMI has been demonstrated to be 70% and 90% respectively in postmenopausal women when a cut‐off of RMI 250 is used (RCOG 2010). The Risk of Ovarian Malignancy Algorithm (ROMA) uses menopausal status and the biomarkers CA125 and HE4 for risk assessment of the probability of ovarian cancer in adnexal masses. A meta‐analysis concluded that ROMA aided differentiation of epithelial ovarian cancer (EOC) from benign masses with higher sensitivity, but lower specificity compared to HE4 and CA125 alone, but considerable heterogeneity was present resulting from the different thresholds, variations in study design and patient characteristics (Li 2012). Other test combinations have been proposed, including more recently the ADNEX (Assessment of Different NEoplasias in the adneXa) model, which combines clinical and ultrasound variables and the biomarker CA125 and shows promise in the preoperative discrimination of benign, borderline, early and advanced malignancies in ovarian masses (Van Calster 2014).

For the purpose of this review we will consider combination tests as tests combining variables from more than one category of the index tests, for example: symptoms, ultrasound imaging and biochemical markers. We will also consider a test that includes risk factors for ovarian cancer, such as age or family history, combined with one or more of the index tests included in this review, i.e. symptoms, ultrasound scan and biomarkers, as combination tests. Our review will also include any combination of tests used in any order.

Clinical pathway

Symptomatic women present in both generalist and specialist settings and may undergo further investigations including biomarker tests, ultrasound scan or both to guide referral to general gynaecologists or gynaecological oncologists. Existing guidelines vary in their recommendations. The National Institute for Health and Care Excellence (NICE) and the Royal College of Obstetrics and Gynaecology (RCOG) in the UK have suggested a clinical pathway where symptoms in primary care trigger further testing in primary care with ultrasound scan and biomarkers prior to referral to specialist care. The NICE guidance recommends ultrasound imaging in symptomatic women with CA125 of 35 IU/ml or greater. The RMI is used in secondary care to triage for surgical management (NICE 2011): postmenopausal women are referred to gynaecological oncologists if the RMI is more than 250 and premenopausal women are referred if CA125 is more than 200 units/ml (RCOG 2010; RCOG 2011). The American College of Obstetrics and Gynaecology (ACOG) recommends using a combination of symptoms, risk factors, biomarkers and imaging tests (including computed tomography (CT) and positron emission tomography (PET‐CT)) to triage women for surgical management (ACOG 2007). Use of germ cell tumour markers such as alpha fetoprotein (AFP), human chorionic gonadotropin (HCG) and lactate dehydrogenase (LDH) are recommended in women under 40 years (ACOG 2007; RCOG 2011). A recent multicentre study in the UK demonstrated variable adherence to the recent NICE guidance in terms of tests used, thresholds used and interpretation (Rai 2015).

Prior test(s)

As a minimum women will present with self assessed symptoms. In addition, women may have had clinical assessment (history and examination), imaging and biomarker tests prior to testing with the index test depending on at what point in the clinical pathway the index tests are being evaluated.

Role of index test(s)

The index tests are used for triage of patients with symptoms and/or size suspicious of ovarian cancer presenting in primary or secondary care for further testing or treatment.

Alternative test(s)

This review is concerned with initial investigations to diagnose ovarian cancer that would be applicable both in primary and secondary care. Computed tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET) and other complex imaging techniques are beyond the scope of this review.

The multivariate biomarker OvaSureTM (CA125, prolactin, leptin, insulin growth factor II, macrophage inhibitory factor and osteopontin) has been reported to have a sensitivity and specificity of 95.3% and 99.4% respectively (Visintin 2008), but has been withdrawn from market as these performance characteristics were based on an inaccurate prevalence rate. Advances in technology, especially in mass spectrometry, have led to high throughput assays and rapid turnover in identifying promising new biomarkers. However, deficiencies in verification, validation and reproducibility have mitigated against the translation of promising biomarkers into practice. In addition, lack of a consistent, standardised process for obtaining regulatory approval for tests may be contributing to the gap between development and implementation. For this reason we will only include currently approved FDA markers used in the diagnosis of ovarian cancer in this review. We will be inclusive in our search strategy and map emerging biomarkers for the diagnosis of primary ovarian cancer as a resource for updating the review.

Rationale

Advances in surgical practice and chemotherapy have slightly improved survival, but ovarian cancer continues to have high mortality, which is largely attributed to advanced stage at diagnosis. The non‐specific nature of symptoms associated with ovarian cancer and the high prevalence of ovarian cysts of uncertain significance (30% of females with regular menstruation, 50% of females with irregular menstruation and 6% of postmenopausal females (Duklewski 2009)) continues to pose problems for early and accurate diagnosis.

With advances in knowledge revealing the extent of the heterogeneous nature of ovarian cancer disease there is a need to re‐examine the performance of tests alone and in combination and in sub‐populations of ovarian cancer risk and for different types of disease. In addition, a review is needed that encompasses the most recent test developments.

An internal scoping exercise of systematic reviews carried out by the author team in preparation for this Cochrane review demonstrated limitations in quality and the degree to which current understanding is reflected. Deficiencies included inadequate ascertainment of the literature, limited spectrum, lack of consideration of prior testing, inclusion of studies with inadequate reference standards (minimal or no follow‐up data on patients who did not undergo surgery) and non‐ascertainment of the disease status of index test negatives.

Accurate diagnosis of ovarian cancer is important to ensure appropriate referral and further management, including surgery. Outcomes in ovarian cancer are better when patients are referred to specialists in gynaecological oncology and inappropriate referral to general gynaecologists may result in the need for additional remedial surgery. However, referral of benign masses to specialists in gynaecological oncology may cause unnecessary anxiety in women, result in unnecessarily invasive surgery, compromise fertility and overwhelm services.

Objectives

To establish the accuracy of symptoms, ultrasound and biomarkers alone or in combination for the diagnosis of ovarian cancer in pre‐ and postmenopausal women.

To compare the accuracy of different tests or test combinations.

Secondary objectives

We will investigate the following sources of heterogeneity:

Population

  • Clinical setting (generalist/primary care/community/family practice) versus specialist setting (cancer unit/cancer centre/gynaecological oncology)

  • Menopausal status

Index tests

  • Test positivity threshold

  • Experience of the ultrasound test operator (general sonographers versus specialist interest)

Target condition

  • Histological subtype

Study quality

  • Case‐control versus other study designs

  • Study quality: for study participants not receiving surgery initially following a negative index test result: 12 months follow‐up versus less than 12 months follow‐up

Methods

Criteria for considering studies for this review

Types of studies

We will include diagnostic case‐control, cross‐sectional and comparative diagnostic test accuracy studies. We we also include studies developing and validating multivariable models for the diagnosis of ovarian cancer. We anticipate that in view of the low prevalence of ovarian cancer, the majority of cross‐sectional studies will recruit women with reference standard results and index test results will be ascertained retrospectively. Studies have to contain sufficient data to extract 2 x 2 tables on the diagnostic test performance. We will include studies not providing verification of index test negatives and construct 2 x 2 tables by imputation using setting‐specific prevalence estimates.

Participants

Adult women aged 18 years or older, irrespective of menopausal status. We will exclude studies restricted exclusively to populations under 18. We will exclude women with a previous history of ovarian cancer and pregnant women.

Prior tests

The review will include women who have symptoms or signs suggestive of suspected ovarian cancer. As a minimum women will have undertaken self assessment. As the review covers index tests used in both generalist and specialist settings women may also have had one or more of ultrasound scan and biomarker testing prior to the index test being evaluated. We will exclude cross‐sectional studies explicitly describing the population as asymptomatic or screening or where the asymptomatic participants cannot be disaggregated. We will downgrade studies not clarifying symptomatology in the included population in the quality assessment (QUADAS‐2) by noting applicability in the patient domain as unclear.

Index tests

Symptoms

We will include combinations of symptoms alone or combinations of symptoms, signs or risk factors for ovarian cancer (such as family history) at any threshold and in any order. We will exclude studies restricted to single symptoms, signs alone or risk factors alone.

Biomarkers

We will include the following FDA approved biomarkers:

  • CA125

  • CEA

  • HE4

  • OVA1

  • LDH

  • HCG

  • AFP

Although HCG and AFP are FDA approved markers they are not approved by the FDA for use as tests in ovarian cancer. However, they are used clinically and recommended by the RCOG and ACOG for women under 40 as additional markers for germ cell tumours and we will include them.

We will tabulate other, non‐FDA approved ovarian cancer biomarkers as in development to inform review updates.

Ultrasound

Any ultrasound characteristic or combination of characteristics at any threshold, conducted and interpreted by either generalist or specialist sonographers (we will investigate operator experience as a potential source of heterogeneity). We will review only studies post 1991 to restrict the review to current technology and include:

  • 3D ultrasound

  • Grey scale morphology (TVS)

  • Doppler studies involving ovarian pathology

Combinations of tests

Any combination of the index tests listed above (symptoms, ultrasound scan, biomarkers) at any threshold and used in any order.

Target conditions

Ovarian cancer, all stages and types. We will exclude studies restricted to specific ovarian pathologies with the exception of epithelial ovarian cancer (EOC) as this is the most common (> 90% in postmenopausal women) of the ovarian cancers and is associated with the highest mortality. We will exclude metastatic or recurrent ovarian cancer and studies where it is not possible to disaggregate data concerning metastatic disease.

Reference standards

Histology in women who have undergone surgery and clinical follow‐up in women who have not undergone surgery. For studies using clinical follow‐up, we will consider a year of follow‐up of higher quality and we will analyse these separately from studies where clinical follow‐up is less than one year.

Search methods for identification of studies

Electronic searches

We will use sensitive search strategies combining terms for the target condition (ovarian cancer) and the index tests (biochemical markers, symptom indices and ultrasound tests or testing strategies) as well as terms to describe test combinations. We will adapt the strategies to run across a range of databases: the Cochrane Central Register of Controlled Trials (CENTRAL), MEDLINE and MEDLINE In Process (Ovid), EMBASE (Ovid), CINAHL (Ebsco), the Cochrane Database of Systematic Reviews (CDSR), Database of Abstracts of Reviews of Effects (DARE), Health Technology Assessment Database (HTA) and SCI Science Citation Index (ISI Web of Knowledge). We will draw on existing systematic reviews and guidelines as a source of primary studies. We will apply no language restrictions.

We will update the electronic searches for symptoms from the search strategies (completed 2009) used to inform recent UK guidance (NICE 2011). In addition, for ultrasound and biomarkers, we will add additional terms and backdate the searches further to 1991 in order to capture emerging evidence on the probable use of IOTA (International Ovarian Tumor Analysis) variables and biomarkers that were not covered in the NICE guidance. The symptom score search strategy for MEDLINE (Ovid) is shown in Appendix 1.

Searching other resources

To identify ongoing and unpublished studies we will search the following trials registers and conference abstracts and proceedings without a date limit: ClinicalTrials.gov, UK Clinical Research Network Study Portfolio Database (UKCRN) and WHO International Clinical Trials Registry Platform (ICTRP). We will individually search conference proceedings from the European Society of Gynaecological Oncology (ESGO), International Gynecologic Cancer Society (IGCS), American Society of Clinical Oncology (ASCO) and Society of Gynecologic Oncology (SGO), supplemented by searches of the ZETOC and Conference Proceedings Citation Index (Web of Knowledge).

We will handsearch the citation lists of reviews and included studies.

We will supplement the searches for biomarkers by searching the FDA (http://www.fda.org.uk) and European Medicines Agency websites (http://www.ema.europa.eu/ema/) using strategies used by the UK National Horizon Scanning Centre. Inclusion will be limited to FDA approved biomarkers but we will map non‐FDA approved biomarkers as a resource for review updates.

Data collection and analysis

We will combine the results of searches in an EndNote database and remove the duplicates. We will carry out study selection and quality assessment in duplicate and independently (NR and RC) with disagreements resolved by discussion or arbitration by a third review author (CD or SS).

Selection of studies

Study authorship will not be concealed (Cochrane DTA Handbook 2013). We will review unique titles and abstracts against the predefined selection criteria to select potentially relevant studies for full‐text review. We will carry out study selection independently and in duplicate (NR, RC). We will resolve differences in opinion by discussion and resolve any persisting disagreements using a third arbiter (SS or CD). We will summarise the results of the selection process using a PRISMA flow diagram and document reasons for exclusion. The group of experts forming the management group of the ROCkeTS (Refining Ovarian Cancer Test Accuracy Scores) study will check the final list of included studies.

Data extraction and management

We will use a pre‐defined data collection form (Appendix 2). We will carry out data extraction independently and in duplicate (NR and RC). We will resolve any difference of opinion by discussion and resolve any persisting disagreements using a third arbiter (SS or CD). We will seek the following data: study design, setting, method of recruitment, number of participants, age, menopausal status (directly or using age over 50 years and history of previous hysterectomy as a proxy for postmenopausal status), prior tests, index tests and index test threshold, index test operator (for symptoms and ultrasound scan), reference standard (including duration of follow‐up) and stage of cancer. We will extract data to derive a 2 x 2 table for each study. Where index test negatives are not verified in a study we will impute prevalence estimates applicable to the study setting.

Assessment of methodological quality

We will undertake quality assessment independently and in duplicate (NR and RC). We will resolve any difference of opinion by discussion and resolve any persisting disagreements using a third arbiter (SS or CD). We will tailor the QUADAS‐2 checklist according to the topic area being addressed by the addition of a comparative domain and a separate domain for modelling studies drawing on the forthcoming PROBAST (prediction model risk of bias assessment) tool for diagnostic and prediction models (Wolff 2014). The tailoring is detailed in Appendix 3 and summarised below.

Patient selection

Inappropriate exclusions include specific age groups, histological sub‐types or grades, specific ovarian cancer pathologies, co‐morbidities such as endometriosis and infertility.

Applicability judgements will depend on symptom status (minimum of self assessed in generalist settings and self assessed or elicited by a healthcare professional in specialist settings).

Index test

Applicability judgements will depend on the experience of the healthcare professional eliciting symptoms and the operator of the ultrasound scan, and whether the interpretation of the ultrasound scan was informed by the presence of symptoms.

Reference standard

Follow‐up of less than 12 months for index test negatives is considered unlikely to correctly classify the target condition.

Applicability judgements will depend on whether disease positives can be disaggregated into borderline, ovarian cancer and metastatic disease.

Flow and timing domain

The interval between the application of the index test and the reference standard should be three months or less.

Disease negatives should all receive the same reference standard.

Addition of a comparative domain

For studies comparing two or more index tests, selection of participants should be the same for each test.

For studies comparing two or more index tests, the interval between index tests should be less than three months.

We will present the results of the quality assessment graphically and narratively, highlighting the most important threats to validity and applicability.

Statistical analysis and data synthesis

We will conduct preliminary exploration of diagnostic accuracy study results for each index test separately (see 1 to 4 below), by plotting estimates of sensitivity and specificity in (i) forest plots and (ii) ROC plots.

Index test groups

We will analyse index tests separately in the following groups:

  1. Symptoms suspicious of ovarian cancer, alone or in combination

  2. Biomarkers alone or in combination (CA125, CEA, HE4, OVA1, LDH, HCG, AFP)

  3. Ultrasound characteristics, alone or in combination

  4. Combinations of tests across categories 1 to 3 (either as rules or multivariable models)

Within each index test group, we will consider a meta‐analysis where studies use the same test or same combination of tests, studies have compatible study designs and where heterogeneity (as assessed by visual inspection and clinical expertise) is considered reasonable.

Index test subgroups

For each index test or test combination, if there are sufficient studies, we will consider the following subgroups for separate meta‐analyses.

Patient characteristics

  • Generalist setting (primary care, community care, family practice) versus specialist setting (secondary care, cancer unit, cancer centre).

  • Pre‐ versus postmenopausal status as explicitly stated or using age (< 50 years versus age > 50 years) or hysterectomy (yes/no) as surrogates for menopausal status if menopausal status is not reported.

Index test characteristics

  • Test threshold where studies report a common threshold.

  • For ultrasound studies, if the same variables or same rules or same combination of variables are used in more than four studies, we will consider them for meta‐analysis. In addition, we will do an overall meta‐analysis across the 'U', 'B' and 'M' scores, including other studies derived from scores based on similar ultrasound parameters to these.

Target condition

  • Histological subtype: epithelial versus non‐epithelial; high‐grade serous (type I) versus other epithelial (type II); early‐stage (stage I/II) versus late‐stage disease (stage III/IV).

Study quality

  • Case‐control versus other study designs.

Methods for meta‐analysis

We will explore diagnostic accuracy by creating forest plots of study‐specific estimates of sensitivity and specificity, and by plotting these estimates in ROC space. Where adequate data are available and it is considered reasonable to pool results, we will perform meta‐analyses using hierarchical models. Since the characteristics measured by index tests may yield binary, ordinal or quantitative test results, the choice of model ‐ bivariate model (Chu 2006; Reitsma 2005) or HSROC model (Rutter 2001) ‐ will depend on whether studies report common thresholds or thresholds vary across studies. To estimate average sensitivity and specificity, we will perform the analysis of each test by first restricting to studies that report a common threshold. To estimate a summary ROC curve without restricting to a common threshold, we will randomly select data at one threshold from each study. We will perform all analyses using the NLMIXED procedure in Statistical Analysis System(SAS 2009) and the xtmelogit command in Stata version 14 (StataCorp LP 2015).

For studies testing multivariable models, we will include both validation and development models with a separate subgroup meta‐analysis for validation models only (higher level of evidence). We will consider meta‐analysis of multivariable models where exactly the same model is used in terms of both variables and variable coefficients, the model estimates relate to similar patient populations and there are sufficient studies for meta‐analysis.

We will consider random‐effects univariate analyses (which ignore any correlation between sensitivity and specificity) where pooling is considered an appropriate approach but where hierarchical models fail to converge.  

Where meta‐analysis is not considered appropriate due to clinical or methodological heterogeneity we will use a narrative synthesis.

We will translate summary estimates of sensitivity and specificity into the summary estimates of the probability of disease for test‐positive patients (PPV) and test‐negative patients (1‐NPV) using a prevalence of ovarian cancer of 0.23% in women presenting to primary care with symptoms (Myers 2006). In sensitivity analyses we will explore other values of prevalence reflecting secondary care.

Methods for test thresholds

For studies where the 2 x 2 table is generated by using test thresholds, the choice of model, bivariate model (Chu 2006; Reitsma 2005) or HSROC model (Rutter 2001) will depend on whether studies report common thresholds or thresholds vary across studies.

For an index test within an individual study we will normally extract up to three thresholds for an index with multiple possible thresholds. We will prioritise extraction of results in the following order: (i) from pre‐specified thresholds, (ii) thresholds commonly used in clinical guidelines, (iii) thresholds commonly used in the published literature and (iv) thresholds reported as main outcomes in the studies. In exceptional instances we may extract data from five thresholds for an individual index test in a single study.

We will exclude studies where it is not possible to identify an appropriate threshold to enable a 2 x 2 table for study results at a single threshold to be reported according to our categories of index test (1 to 4 above).

Exploratory analyses will include forest plots of study estimates of sensitivity and specificity grouped by test threshold, and plotting sensitivity and specificity in ROC space with test thresholds indicated.

To estimate average sensitivity and specificity, we will perform the analysis of each test by first restricting to studies that report common threshold(s) or thresholds viewed as clinically important in the published literature.

To estimate a summary ROC curve without restricting to a common threshold, we will use methods of meta‐analysis using multiple thresholds across studies (Riley 2015), or if we are unable to use these methods we will randomly select a single threshold for each study.

Comparison of test accuracy

We will compare test accuracy between different tests first by restricting to studies that make head‐to‐head (direct) comparisons between tests within the same population as this provides the most reliable evidence (Takwoingi 2013). Secondly, we will also compare tests by including all relevant studies (indirect comparison), particularly where there are few studies comparing tests within the same population. We will compare test accuracy by adding a covariate for test types to be compared in the bivariate or HSROC model, and we will use likelihood ratio tests to test statistical significance between tests.

Investigations of heterogeneity

For each index test group, we will explore the effect of the relevant factors specified in the secondary objectives by visual inspection of forest plots and summary ROC plots. For further investigations of heterogeneity, if there are adequate data, we will add factors as single covariates to the bivariate or HSROC model. We will separately add the following covariates to the bivariate model to assess the association of test performance:

Index test

  • Experience of test operator (ultrasound and symptom elicitation by health professionals) applicable to setting as a categorical covariate: (yes versus no)

  • Threshold (if a continuous covariate)

Reference standard

  • Low risk of bias in the reference standard domain as a categorical variable: high or unclear risk versus low risk

Sensitivity analyses

We will consider sensitivity analyses if there are sufficient studies to investigate the impact on the summary estimates of (i) including only studies with low concern about applicability in the patient selection domain of QUADAS‐2, (ii) leaving out potentially highly influential studies and (iii) classification of borderline tumours as malignant or benign.

We will calculate PPV and NPV for a range of values for the prevalence of ovarian cancer reflecting both primary care and secondary care using the best available estimates from the published literature and hospital audits.

Assessment of reporting bias

We will not undertake any formal assessment of reporting bias in our review due to current uncertainty about how to assess reporting bias in diagnostic test accuracy reviews, especially in the presence of heterogeneity (Deeks 2005).