Tests to assist in the staging of cutaneous melanoma: a generic protocol

  • Protocol
  • Diagnostic

Authors


Abstract

This is a protocol for a Cochrane Review (Diagnostic test accuracy). The objectives are as follows:

To determine the diagnostic accuracy of SLNB for the detection of nodal metastases (in the investigated nodal basin) for the staging of cutaneous invasive melanoma.

To determine the diagnostic accuracy of imaging tests for the detection of any metastasis in the primary staging of cutaneous invasive melanoma (i.e. staging at presentation).

To determine the diagnostic accuracy of imaging tests for the detection of any metastasis in the staging of recurrence in cutaneous invasive melanoma (i.e. re-staging prompted by findings on routine follow-up).

To determine the diagnostic accuracy of imaging tests for the detection of nodal metastases in the staging of cutaneous invasive melanoma.

To determine the diagnostic accuracy of imaging tests for the detection of distant metastases in the staging of cutaneous invasive melanoma.

These will be estimated separately for those undergoing primary staging and those who have experienced a disease recurrence.

Investigation of sources of heterogeneity

We will consider a range of potential sources of heterogeneity for investigation in each individual test review. These may vary between reviews but may include the following.

i. Population characteristics

  • AJCC stage of disease

  • Sentinel lymph node status (for imaging studies only)

  • Clinical nodal status (for imaging studies only)

  • Primary tumour site (head and neck, trunk, limb, and other)

ii. Index test characteristics

  • Differences in test positivity thresholds (e.g. for SLNB, the tracer threshold for a 'hot' versus 'cold' node)

  • Other relevant test characteristics as appropriate to the test under consideration

iii. Reference standard characteristics

  • Reference standard used (histology, clinical or imaging-based follow-up; concurrent imaging-based reference standard)

iv. Study quality

  • Consecutive or random sample of participants recruited

  • Index test interpreted, blinded to the reference standard result

  • Index test interpreted, blinded to the result of any other index test

  • Presence of partial or differential verification bias (whereby only a sample of those subject to the index test are verified by the reference test or by the same reference test, with selection dependent on the index test result)

  • Use of an adequate reference standard

  • Overall risk of bias

We will examine the quality and quantity of research evidence available on the effectiveness of each index test for the primary target condition and make recommendations regarding where further research might be required.

Background

Cochrane Skin (CSG, Nottingham) in collaboration with the Test Evaluation Research Group in the Institute of Applied Health Research (TERG, Birmingham) are undertaking a series of Cochrane Diagnostic Test Accuracy (DTA) Reviews on the diagnosis and staging of melanoma and keratinocyte skin cancers, as part of the National Institute for Health Research (NIHR) Cochrane Systematic Reviews Programme. Appendix 1 shows the current content and structure of the programme.

As several reviews for each topic area will follow similar methodology, we have prepared generic protocols in order to avoid duplication of effort.

This protocol concerns the evaluation of tests for the staging of cutaneous melanoma (i.e. to determine the extent of disease in those with an already confirmed diagnosis of melanoma), including sentinel lymph node biopsy (SLNB) for detection of nodal metastases and imaging tests for the detection of any metastatic disease. A separate Cochrane protocol is available for the staging of cutaneous squamous cell carcinoma (cSCC) (Dinnes 2017), for the diagnosis of melanoma (Dinnes 2015a), and for the diagnosis of keratinocyte skin cancer (Dinnes 2015b). The Background and Methods sections of this protocol use some text that was originally published in the protocol concerning the evaluation of tests for the diagnosis of melanoma (Dinnes 2015a).

Table 1 provides a glossary of terms used.

Table 1. Glossary of terms
  1. Some of the definitions above have been obtained from the NICE Guideline for the management of melanoma (NICE 2015a).

TermDefinition
Adjuvant therapy or treatmentA treatment given after the main treatment for cancer to reduce the risk of recurrence.
Adverse eventDetrimental change in health occurring in a person receiving the treatment whether or not it has been caused by the treatment.
AxillaryIn the armpit.
BiopsyRemoval of a sample of tissue from the body to assist in diagnosis or inform the choice of treatment of a disease.
BRAF V600 mutationBRAF is a human gene that makes a protein called B-Raf which is involved in the control of cell growth. BRAF mutations (damaged DNA) occur in around 40% of melanomas, which can then be treated with particular drugs.
BRAF inhibitorsTherapeutic agents which inhibit the serine-threonine protein kinase BRAF mutated metastatic melanoma.
Breslow thicknessA scale for measuring the thickness of melanomas by the pathologist using a microscope, measured in mm from the top layer of skin to the bottom of the tumour.
Cervical (lymph nodes)Lymph nodes found in the neck area of the body.
Computed tomography (CT)Imaging technique in which the person lies on a table within an x-ray gantry. The images are acquired using a spiral (helical) path and banks of detectors, allowing presentation of the internal organs and blood vessels in different projections including 3-D views.
CoronalFrontal plane dividing the body into front and back.
False negativeAn individual who is truly positive for a disease, but whom a diagnostic test classifies them as disease-free.
False positiveAn individual who is truly disease-free, but whom a diagnostic test classifies them as having the disease.
HistopathologyThe study of tissue, usually obtained by biopsy or excision, for example under a microscope.
IncidenceThe number of new cases of a disease in a given time period.
InguinalLymph nodes in or just above or just below the groin.
Isolated limb perfusionA medical procedure that directly delivers a drug through the bloodstream in a limb to the site affected by melanoma.
Local recurrenceRegrowth of a tumour in the area from which it was originally removed.
Locoregional recurrenceRegrowth of a tumour in the area from which it was originally removed or in the regional lymph nodes (usually nearest to the original tumour site).
Lymph nodeLymph nodes filter the lymphatic fluid (clear fluid containing white blood cells) that travels around the body to help fight disease; they are located throughout the body often in clusters (nodal basins).
Lymph node dissectionSurgical removal or one or more lymph nodes in the absence of proven involvement with melanoma.
LymphadenectomyLymphadenectomy or lymph node dissection is a surgical operation to remove one or more groups of lymph nodes.
LymphoscintigraphyAn imaging technique used to identify the lymph drainage basin, determine the number of sentinel nodes, differentiate sentinel nodes from subsequent nodes, locate the sentinel node in an unexpected location, and mark the sentinel node over the skin for biopsy. It requires the injection of a radioisotope into the skin around the biopsy scar and a scan some hours later to determine to which lymph nodes the tracer has travelled.
Lymphovascular invasionTumour cells which have spread to involve the blood vessels and lymphatic vessels within the skin.
Magnetic resonance imaging (MRI)A type of scan which uses a magnetic field and radio waves to produce images of sections of the body.
Mediastinal and hilar adenopathyEnlargement of the pulmonary lymph nodes.
MEK inhibitorsDrugs that inhibit the mitogen-activated protein kinase enzymes which are often upregulated in melanoma.
Meta-analysisA form of statistical analysis used to synthesise results from a collection of individual studies.
Metastases/metastatic diseaseSpread of cancer away from the primary site to somewhere else through the bloodstream or the lymphatic system.
MicrometastasesMicrometastases are metastases so small that they can only be seen under a microscope.
Mitotic rateMicroscopic evaluation of number of cells actively dividing in a tumour.
MorbidityDetrimental effects on health.
MortalityEither (1) the condition of being subject to death; or (2) the death rate, which reflects the number of deaths per unit of population in relation to any specific region, age group, disease, treatment or other classification, usually expressed as deaths per 100, 1000, 10,000 or 100,000 people.
Multidisciplinary teamA team with members from different healthcare professions and specialties (e.g. urology, oncology, pathology, radiology, and nursing). Cancer care in the National Health Service (NHS) uses this system to ensure that all relevant health professionals are engaged to discuss the best possible care for that patient.
Nodal basinCluster of lymph nodes which filter lymphatic fluid as it travels around the body; clusters are located under the arm (axilla), in the groin, neck, chest and abdomen.
OncologyThe study of cancers. This term also refers to the medical specialty of cancer care, with particular reference to the use of radiotherapy or drugs to treat cancer. The medical specialty is often split into clinical oncology (doctors who use radiotherapy and drug treatment) and medical oncology (doctors who use drug treatment).
PalpationFeeling with the fingers or hands as part of a clinical examination of the body.
Positron emission tomography (PET)A nuclear medicine imaging technique whereby a radioactive glucose (usually 18FDG) is administered intravenously before a scan is conducted to create an image using colours to show where the FDG (or other radioactive tracer) has been taken up in the body.
PrevalenceThe proportion of a population found to have a condition.
Prognostic factors/indicatorsSpecific characteristics of a cancer or the person who has it which might affect the patient’s prognosis.
RadiotherapyThe use of radiation, usually high energy x-rays to control the growth of cancer cells.
RAS-RAF-MEK-ERK signalling pathwayA chain of proteins which allow signals from a receptor on the surface of a cell to be sent to the DNA in the cell nucleus; a mutation in one of the proteins in the pathway is associated with the development of many cancers.
RecurrenceRecurrence is when new cancer cells are detected following treatment. This can occur either at the site of the original tumour or at other sites in the body.
RelapseWhere cancer starts to grow again after treatment.
SagittalMedian plane dividing the body into left and right.
SensitivityIn this context the term is used to mean the proportion of individuals with a disease who have that disease correctly identified by the study test.
Sentinel lymph node biopsy (SLNB)A radioactive tracer and blue dye are injected into the skin surrounding the primary lesion and the 'sentinel' lymph nodes to which the tracer drains are located by imaging (usually lymphoscintigraphy) and then removed and examined for nodal metastatic spread that cannot be detected clinically or on imaging.
Signal transductionOccurs when extracellular signalling molecules activate a specific receptor which then triggers cellular pathways.
StagingClinical description of the size and spread of a patient’s tumour, fitting into internationally agreed categories.
Stereotactic radiotherapyA technique for delivering high dose radiotherapy very accurately to small areas inside the body which reduces the damage done by the radiotherapy to adjacent healthy tissues.
Subclinical (disease)Disease that is usually asymptomatic and not easily observable, e.g. by clinical or physical examination.
Systemic treatmentTreatment, usually given by mouth or by injection, that reaches and affects cancer cells throughout the body rather than targeting one specific area.
UltrasoundA type of scan in which high-frequency sound waves are used to outline a part of the body.

Target condition being diagnosed

Melanoma skin cancer arises from uncontrolled proliferation of melanocytes, the epidermal cells that produce pigment or melanin. It can occur in any organ that contains melanocytes, including the eye and internal organs, but most commonly arises in the skin (McLaughlin 2005). Cutaneous melanoma refers to any lesion with malignant melanocytes present in the deeper layers of the skin, the dermis, and includes superficial spreading, nodular, acral lentiginous, and lentigo maligna melanoma variants (SIGN 2003). Melanoma in situ refers to malignant melanocytes that are contained within the epidermis and have not yet spread into the dermis, but are at risk of progression to invasive melanoma if left untreated (Lens 2004). It is one of the most dangerous forms of skin cancer, with the potential to metastasise (or spread) to other parts of the body; it accounts for only a small percentage of skin cancer, cases but is responsible for up to 75% of skin cancer deaths (Boring 1994).

The overall worldwide incidence of skin cancer is difficult to estimate, as there is often no requirement for the more common forms of skin cancers to be reported to cancer registries (Lomas 2012). In 2003, the World Health Organization estimated that 132,000 melanoma skin cancers occur globally each year compared to between two and three million non-melanoma skin cancers (primarily basal cell carcinoma (BCC) and cutaneous squamous cell carcinoma (cSCC)) (WHO 2003). Estimates of the incidence of melanoma have since increased to over 200,000 newly diagnosed cases worldwide (2012 data) (Erdmann 2013; Ferlay 2015), with an estimated 55,000 deaths (Ferlay 2015). In the UK, melanoma has one of the fastest rising incidence rates of any cancer, and has had the biggest projected increase in incidence between 2007 and 2030 (Mistry 2011). In the decade leading up to 2013, age standardised incidence increased by 46%, with 14,500 new cases in 2013 and 2459 deaths in 2014 (Cancer Research UK 2017).

The rising incidence in melanoma is thought to be primarily related to rising recreational sun exposure and tanning bed use, and an increasingly ageing population with higher lifetime recreational ultraviolet (UV) exposure (Boniol 2012; Gandini 2005), in conjunction with possible earlier detection (Linos 2009). Risk factors can be broadly divided into host or environmental factors. Host factors include pale skin and light hair or eye colour (Evans 1988); older age (Geller 2002); male sex (Geller 2002); previous skin cancer (Tucker 1985); predisposing skin lesions, e.g. high melanocytic naevus counts (Gandini 2005), clinically atypical naevi (Gandini 2005), or large congenital naevi (Swerdlow 1995); genetically inherited skin disorders e.g. xeroderma pigmentosum (Lehmann 2011); and a family history of melanoma (Gandini 2005). Environmental factors include recreational, occupational, and work-related exposure to sunlight (both cumulative and episodic burning) (Gandini 2005); artificial tanning (Boniol 2012); and immunosuppression, e.g. as seen in organ transplant recipients or human immunodeficiency virus (HIV)-positive individuals (DePry 2011). Lower socioeconomic class may be associated with delayed presentation and thus more advanced disease at diagnosis (Reyes-Ortiz 2006).

The main prognostic indicators following diagnosis of cutaneous melanoma can be divided into histological and clinical factors. Histologically, Breslow thickness is the single most important predictor of survival, as it is a quantitative measure of tumour invasion or volume, and thus propensity to metastasise (Balch 2001). Other factors associated with poorer prognosis histologically include microscopic ulceration, mitotic rate, microscopic satellites, regression, lymphovascular invasion, and nodular (rapidly growing) or amelanotic (lacking in melanin pigment) subtypes (Moreau 2013; Shaikh 2012). Independent of tumour thickness, prognosis is worse in older people, males, and those with locally recurrent lesions, regional lymph node involvement, or primary lesion location on the scalp or neck (Zemelman 2014).

Following histological confirmation of diagnosis, the lesion is staged according to the American Joint Committee on Cancer (AJCC) Staging System to inform treatment strategy (Table 2 (a)) (Balch 2009). Stage 0 refers to melanoma in situ; stages I to II localised melanoma; stage III regional metastasis (spread to the lymph nodes, usually but not always, those nearest to the primary tumour); and stage IV distant metastasis. A preliminary stage is assigned based on histological evaluation (thickness of primary lesion, presence of ulceration or mitoses) and clinical (and sometimes radiological) assessment of the regional lymph nodes (Balch 2009) (Table 2 (b)). A pathological stage is then confirmed based on histology of the primary lesion and of the regional lymph nodes (if the patient has either sentinel lymph node biopsy (SLNB) or complete lymph node dissection), and with imaging to confirm the presence or absence of disseminated disease, where indicated.

Table 2. American Joint Committee on Cancer (AJCC) staging for cutaneous melanoma
  1. LDH - lactate dehydrogenase; M - metastasis; N - nodes; NA - not applicable; T - tumour; Tis - melanoma in situ.

    *Micrometastases are diagnosed after sentinel lymph node biopsy.

    Macrometastases are defined as clinically detectable nodal metastases confirmed pathologically.

    Clinical staging is based on histology of the primary lesion and clinical (or radiological) examination.

    δPathological staging is assigned based on histology of the primary lesion and of the regional lymph nodes (either sentinel lymph node biopsy (SLNB) or complete lymph node dissection (CLND), where indicated.

a. TNM staging categories for cutaneous melanoma
Classification  
T Thickness (mm) Ulceration status/mitoses
TisNANA
T1<= 1.00a: Without ulceration and mitosis 1/mm2
  b: With ulceration or mitoses 1/mm2
T21.01 to 2.00a: Without ulceration
  b: With ulceration
T32.01 to 4.00a: Without ulceration
  b: With ulceration
T4> 4.00a: Without ulceration
  b: With ulceration
N Number of metastatic nodes Nodal metastatic burden
N00NA
N11a: Micrometastasis*
  b: Macrometastasis
N22 to 3a: Micrometastasis*
  b: Macrometastasis
  c: In transit metastases/satellites without metastatic nodes
N34 metastatic nodes, or matted nodes, or in transit metastases/satellites with metastatic nodes 
M Site Serum LDH
M0No distant metastasesNA
M1aDistant skin, subcutaneous, or nodal metastasesNormal
M1bLung metastasesNormal
M1cAll other visceral metastasesNormal
 Any distant metastasisElevated
b. Anatomical stage groupings    
Clinical stage‡ T N M Pathological stage δ T N M
0TisN0M0 0TisN0M0
IAT1aN0M0 IAT1aN0M0
IBT1bN0M0 IBT1bN0M0
 T2aN0M0 T2aN0M0
IIAT2bN0M0 IIAT2bN0M0
 T3aN0M0 T3aN0M0
IIBT3bN0M0 IIBT3bN0M0
 T4aN0M0 T4aN0M0
IICT4bN0M0 IICT4bN0M0
IIIAny TN > N0M0 IIIAT1- T4aN1aM0
     T1- T4aN2aM0
     IIIBT1- T4bN1aM0
     T1- T4bN2aM0
     T1- T4aN1bM0
     T1- T4aN2bM0
     T1- T4aN2cM0
     IIICT1- T4bN1bM0
     T1- T4bN2bM0
     T1- T4bN2cM0
     Any TN3M0
IVAny TAny NM1 IVAny TAny NM1

Survival following diagnosis of melanoma is closely linked to stage of disease. Data from the USA published in 2009, indicated a five-year survival of 91% to 95% for stage I melanoma (Breslow thickness ≤ 1 mm), dropping to 27% to 69% in stage III disease (Balch 2009). Disseminated melanoma to distant sites and visceral organs has been incurable; prior to the advent of targeted and immunotherapies, median survival was six to nine months, with a one-year survival rate of 25%, and three-year survival of 15% (Balch 2009; Korn 2008). Between 1975 and 2010, five-year relative survival for melanoma in the USA increased from 80% to 94%, with survival for localised, regional, and distant disease estimated at 99%, 70%, and 18%, respectively, in 2010 (Cho 2014). Overall mortality rates however, showed little change, at 2.1 deaths per 100,000 in 1975 and 2.7 per 100,000 in 2010 (Cho 2014). Increasing incidence in localised disease over the same period (from 5.7 to 21 per 100,000) suggests that much of the observed improvement in survival may be due to earlier detection and heightened vigilance (Cho 2014). Targeted therapies for stage IV melanoma (e.g. BRAF inhibitors) have improved survival expectation, and immunotherapies are evolving, such that, very long-term survival is being documented (see 'Treatment of melanoma').

Treatment of melanoma

The treatment of melanoma varies to some extent, according to the stage of disease upon diagnosis. For primary melanoma, the mainstay of treatment is complete lesion excision, with a safety margin some distance from the borders of the primary tumour in order to remove both the tumour and any malignant cells that might have spread into the surrounding skin (Garbe 2016; Marsden 2010; NICE 2015a; Sladden 2009). Recommended surgical margins vary according to tumour thickness and stage of disease at presentation (Garbe 2016; NICE 2015a). There is mixed evidence for further local or regional interventions. A Cochrane Review of five randomised controlled trials (RCTs) found no evidence for overall survival from wider surgical margins (Sladden 2009); however, a more recent meta-analysis of six RCTs suggested a non-statistically significant benefit in terms of overall survival and recurrence-free survival from wider margins (hazard ratio for overall survival 1.09, 95% confidence interval (CI) 0.98 to 1.22) and a significant benefit in terms of melanoma-specific survival (hazard ratio 1.17, 95% CI 1.03 to 1.34; P = 0.02) (Wheatley 2016). A non-systematic review found no evidence for elective lymph node dissection (5 RCTs), prophylactic isolated limb perfusion (1 RCT), or sentinel node biopsy followed by complete lymph node dissection (1 RCT) having any impact on survival (Eggermont 2007). The latter conclusion has been supported by a large RCT showing no disease-free survival benefit for sentinel node biopsy when combined with selective removal of positive nodes (selective lymphadenectomy), making its therapeutic use controversial in the absence so far of effective adjuvant therapies (Kyrgidis 2015; Morton 2014). Guidelines recommend SLNB only for use as a staging, rather than therapeutic procedure (Garbe 2016; NICE 2015a).

For stage III melanoma, completion lymphadenectomy (removal of all regional lymph nodes when at least one diseased node is identified on SLNB) (or complete lymph node dissection) should be considered only after the potential benefits and disadvantages of the procedure have been "critically discussed" (NICE 2015a), especially with patients with small tumour deposits (<= 1 mm diameter), as no survival benefit has yet been shown (Garbe 2016). No adjuvant radiotherapy or adjuvant systemic treatment is recommended for routine use in stage I, II or III disease in the UK (NICE 2015a), and in many parts of Europe (Garbe 2016). Interferon-alpha has been licensed as an adjuvant treatment for stage IIB and stage III melanoma in the USA and Europe (licensed by Food and Drug Administration (FDA) and European Medicines Agency (EMA)) (Garbe 2016); however, a systematic review found evidence for its effectiveness to be stronger in terms of disease-free survival than overall survival (Mocellin 2013). The general lack of effective adjuvant therapies had been linked to the lack of effective drugs that have historically been available to treat advanced disease (Eggermont 2007), but very recently, Eggermont et al reported prolonged survival with adjuvant ipilimumab (Eggermont 2016).

For many years, dacarbazine was the only drug approved worldwide for stage IV melanoma, with fotemustine used in some European countries (Avril 2004), and interleukin (IL)-2 given in the USA (Atkins 1999). Temozolomide has also been used, especially for people with brain metastases, because of its greater ability to pass the blood-brain barrier (Lukas 2014; Zhu 2014). This landscape has changed dramatically with two distinct therapeutic approaches suggesting survival benefits in metastatic melanoma: targeting mutations in tumour cells and immunomodulation (Chapman 2011; Hamid 2013; Hodi 2010).

The discovery of driver mutations in melanoma has allowed the development of pharmacological inhibitors that target mutated signal transduction in the RAS-RAF-MEK-ERK signalling pathway, e.g. BRAF inhibitors (Chapman 2012; Villanueva 2010), and MEK inhibitors (Dummer 2014; Larkin 2014). Three such therapies are currently recommended in the UK for those with unresectable or metastatic BRAF V600 mutation-positive melanoma (around 45% of patients (Garbe 2016)): dabrafenib (NICE 2014a), vemurafenib (NICE 2012b), or trametinib in combination with dabrafenib (NICE 2016b). The limitation of these treatments is the short median length of response due to the development of multiple resistance mechanisms (Chapman 2012; Villanueva 2010); however, combinations of BRAF and MEK inhibitors show greater promise in patients with BRAF-mutant melanoma (Robert 2015), with European guidelines recommending this approach as standard treatment, where indicated (Garbe 2016).

Immunotherapy-based approaches include ipilimumab, nivolumab, and pembrolizumab which have been approved in the USA and Europe (Hodi 2010), and by the National Institute for Health and Care Excellence (NICE) in the UK, both alone (NICE 2012a; NICE 2014a; NICE 2014b; NICE 2015b; NICE 2015c), and in combination (NICE 2016a; NICE 2016b). These have shown high response rates, and most importantly, demonstrate for the first time in the treatment of melanoma, the potential for a durable clinical response (Chapman 2011; Hamid 2013; Hodi 2010; Hodi 2016; Larkin 2015; Maio 2015; Sznol 2013).

A Cochrane Review comparing the efficacy of available systemic therapies for stage IIIC and stage IV melanoma is currently underway (Pasquali 2014), as are a number of further NICE appraisals of new therapeutic agents, including binimetinib, talimogene laherparepvec, and temozolomide (NICE 2017).

Psychosocial interventions to improve quality of life and general psychological distress in cancer patients following a diagnosis, are also available; however, a Cochrane Review found considerable variation in the evidence to support such interventions (Galway 2012).

Index test(s)

Accurate staging of melanoma is more important than ever, partly to avoid unnecessary treatment and associated morbidity in those with early stage disease, and also to ensure that potentially effective therapies are initiated in a timely manner in those with metastatic disease who are well enough to tolerate them. The response to treatment in these patients may be better when the volume of disease is small, and will be better tolerated when the patient is systemically well (i.e. with good performance status).

These reviews will evaluate a range of available tests used for staging cutaneous melanoma. The first step towards determining melanoma stage is taking a detailed clinical history to determine if there are any symptoms suggesting spread of disease. This is followed by a thorough clinical examination, including whole body skin examination and palpation of the regional lymph nodes. In current practice, SLNB may be considered in stage IB and stage II melanoma for those without palpable lymph nodes. The majority of staging in terms of imaging is undertaken in clinical stage III and IV disease (see Clinical pathway).

Sentinel lymph node biopsy (SLNB)

SLNB allows the detection of metastatic spread to the regional lymph node basins (clusters of lymph nodes, including axillary (under arm), inguinal (groin), and cervical (neck), but can also be found on the upper arm (epitrochlear), or behind the knee (popliteal)). SLNB is usually performed by a plastic surgeon at the same time as wide local excision of the primary melanoma (NICE 2015a). A radioactive tracer and patent blue dye are injected into the dermis at and surrounding the primary lesion or excision biopsy scar. The 'sentinel' lymph nodes to which the tracer drains are located by imaging (usually lymphoscintigraphy) and then removed and examined microscopically for micrometastases (cancer spread that cannot be seen by the naked eye) (NICE 2015a). The presence of micrometastases directly informs pathological staging as outlined in Table 2.

By definition, SLNB is useful only for the detection of locoregional disease via lymphatic spread, so although imaging is used to detect the sentinel nodes, it is not considered to be an 'imaging' test. Purely imaging-based tests, which can visualise larger areas of the body, can also detect distant metastatic disease, which occurs via lymphatic or haematogenous spread. While SLNB is restricted to primary staging on initial confirmation of a melanoma diagnosis, imaging can be undertaken on initial presentation of disease, on development of recurrence, and during follow-up (Melanoma Focus 2014). The latter use of imaging, as a monitoring test with the aim of early detection of recurrence, is not the focus of our reviews.

Imaging tests are undertaken and interpreted by radiologists, with decisions about patient management following imaging or SLNB made at multidisciplinary team meetings as discussed in the Clinical pathway section below.

Ultrasound

Ultrasound can be used to assist the detection of diseased lymph nodes; with those who have a positive imaging result, proceeding to fine needle aspiration cytology (FNAC), core biopsy, or SLNB. A 2011 systematic review identified 21 studies of ultrasound for either primary lymph node staging or surveillance; for primary staging sensitivity was 60% with a specificity of 97% (the number of studies that considered staging versus surveillance is unclear) (Xing 2011).

A 2013 systematic review of 10 studies, using both image-guided and non-image-guided FNAC, demonstrated an overall sensitivity of 97% and a specificity of 99%, and included two studies using only ultrasound-guided FNAC (Hall 2013). No systematic reviews of ultrasound-guided core needle biopsy in melanoma have been identified; however, a recently published case series showed sensitivity of 98% and specificity of 100% (Bohelay 2015).

Computed tomography (CT) (non-contrast or contrast-enhanced)

CT scans use x-rays to take cross-sectional images of the body, often using coronal and sagittal reformats (Bluemm 1983; van Waes 1983). The procedure involves varying amounts of radiation according to the area of the body to be scanned (Mahesh 2017), and can be conducted using an intravenous contrast agent (contrast-enhanced) to increase the sensitivity of metastasis detection in solid organs.

The Mohr 2009 study describes contrast-enhanced CT as the best method of identifying intrathoracic metastases, as superior to x-ray for detection of mediastinal and hilar adenopathy associated with lymphatic spread, and for assessment of lesions in the bone. CT can also be used for assessment of metastatic spread to the brain (Goulart 2011). As melanoma is one of the top three cancers responsible for cerebral metastases (Cagney 2017), the accuracy of CT in comparison with other imaging tests needs to be established. Overall specificity is reportedly high for detection of regional and distant disease, but sensitivity varies from 23% to 85% for detection of lymph node metastases, and 25% to 74% for assessment of distant spread (Xing 2011).

Magnetic resonance imaging (MRI) (non-contrast or contrast-enhanced)

MRI scans use magnets and radiowaves rather than radiation to generate images, which are then computer processed to produce cross-sectional 'slices' of the body (Ai 2012). MRI scans are more expensive and take longer to carry out compared to CT scans (Whaley 2016b).

We did not identify any systematic reviews of MRI for melanoma staging from our scoping searches; however, a number of studies have considered whole body MRI (Jouvet 2014; Mosavi 2013), and MRI for detection of brain (Aukema 2010), or hepatic metastases (Sofue 2012).

Positron emission tomography (PET) or PET-CT (positron emission tomography-computed tomography)

Positron emission tomography (PET) is a nuclear medicine technique whereby a radioactive glucose analogue (usually fludeoxyglucose 18F (18FDG)) is administered intravenously, which is then metabolised as part of the body's normal function (Lammertsma 2017). The PET scanner detects the FDG and an image is created showing where FDG uptake is high. Tumours take up more FDG than normal tissue due to a higher rate of metabolism, with malignant masses generally being more 'active' than benign ones (Oncolink 2016). PET can also be combined with CT to provide both functional and structural information. The use of PET in combination with CT will necessarily increase the radiation exposure of the patient (IAEA 2016).

PET-CT is generally considered to be a more sensitive test than CT alone (Xing 2011); however, it has not been determined whether any increase in sensitivity confers patient benefit in terms of changes in management, and ultimately patient outcomes (Schroer-Gunther 2012). It may be that PET-CT has the most added value for patients with smaller metastatic deposits that are easier to control with stereotactic surgery (Youland 2017), for metastases in areas that are difficult to image with CT or other imaging modalities (Tan 2012), or those with indeterminate metastases in areas such as the lung. Whether these assumptions are supported by current evidence has yet to be established. The evidence report for the NICE guideline found no evidence "to suggest that earlier treatment of metastatic disease improves survival and therefore increased sensitivity was viewed currently as not an important issue" (NICE 2015d).

Clinical pathway

The diagnosis of melanoma can take place in primary, secondary, and tertiary care settings, by both generalist and specialist healthcare providers; however, the staging of confirmed disease takes place in secondary and tertiary care settings only (NICE 2015a). The recommendations on the management of melanoma following diagnosis, published in the recent NICE Guideline (NICE 2015a), and other UK guideline documents (Burkill 2014; Marsden 2010; Melanoma Taskforce 2011), are summarised in Figure 1, and outlined below; however, it should be noted that practice varies across the UK.

Figure 1.

Summary of NICE guideline recommendations for the management of cutaneous melanoma following primary diagnosis (NICE guidance 2015)

Following complete excision of the primary lesion, all patients should undergo a full clinical examination of both the skin and regional lymph nodes (NICE 2015a). Preliminary staging can then be assigned based on the outcome of this and histopathology results for the primary lesion(s). Those with palpable lymph nodes are automatically assigned to stage III while those with no palpable lymph nodes are assigned a stage between 0 and IIC, according to the thickness of the tumour (Breslow) and the presence of ulceration or mitoses (Balch 2009).

The results of all investigations carried out in the process of diagnosis will be discussed at a multidisciplinary team meeting (Melanoma Taskforce 2011), where decisions regarding further staging (i.e. the index tests of interest for this protocol) and other investigations are made, and the results of all investigations carried out in the process of diagnosis are discussed. This could be a local skin multidisciplinary team, or for those with stage IIB disease and above, a specialist skin multidisciplinary team (Marsden 2010). These teams should include dermatologists, surgeons (including plastic surgeons), medical and clinical oncologists, radiologists, histopathologists, skin cancer nurse specialists, physiotherapists, psychologists, lymphoedema services, occupational therapists, and cosmetic camouflage advisers (Melanoma Taskforce 2011).

No further staging investigations are currently recommended for people with thin melanomas (<= 1 mm) without ulceration or mitoses (up to stage IA) (NICE 2015a).

Sentinel lymph node biopsy (SLNB) may be considered for those with stage IB melanoma (<= 1 mm but with surface ulceration or mitoses) and stage II disease (i.e. clinically node negative and with Breslow thickness > 1 mm) (NICE 2015a), to detect any micrometastases in the nearest draining lymph node basin. If micrometastasis is detected, the stage of disease becomes stage III and patients should be counselled about the potential advantages and disadvantages of complete lymph node dissection (Marsden 2010). Those with clinically palpable lymph nodes or nodes found to be diseased on imaging may also undergo complete lymph node dissection. Stage III A, B or C is assigned according to lesion characteristics and the presence of micro- or macrometastases (i.e. clinically detectable nodal metastases) and the number of disease nodes identified (Table 2).

Available recommendations on appropriate use of imaging vary to some extent, even within the UK (Burkill 2014; Melanoma Focus 2014; NICE 2015a). CT is generally the imaging test of choice; however, some centres offer high resolution ultrasound, MRI, or PET scans in addition to, or in some cases instead of SLNB. NICE recommend that CT staging to identify those who may benefit from systemic therapy should be reserved for those with stage IIC (if no SLNB has been undertaken) and stage III or suspected stage IV disease (NICE 2015a). Imaging of the brain should be considered only if metastatic disease outside the central nervous system is suspected; with CT used for adults and MRI for children and young adults aged under 25 to reduce long-term risks of radiation (NICE 2015a).

The Melanoma Focus position paper on follow-up of high risk melanoma in the UK recommends that at baseline (and during follow-up) high risk patients (as defined by the local specialist skin multidisciplinary team) should undergo CT of the chest, abdomen, and pelvis (or total body PET-CT), plus MRI of the head, as standard (Melanoma Focus 2014).

The Royal College of Radiologists guideline recommends that scans should be tailored according to the site of the primary lesion and most likely regional lymph node basin. In general, CT imaging of the head, chest, abdomen, and pelvis should be employed for lower limb and lower body wall lesions, with the addition of CT of the neck for upper limb, scalp, neck, and upper torso primary tumours (Burkill 2014). MRI may be more appropriate for imaging the central nervous system (Burkill 2014). Although PET-CT has been suggested to have a role in imaging the lower limbs, further evidence is required (Burkill 2014).

Patients with stage IIIB melanoma and above will also be offered genotyping, for example, to identify BRAF mutations to allow further planning of systemic treatment (Melanoma Taskforce 2011); however, systemic treatment is generally only currently recommended for those with stage IV disease or extensive locoregional disease that is not amenable to surgery, according to available NICE guidance (see 'Treatment of Melanoma' section).

The performance of SLNB, and of particular imaging tests, may also be a requirement for enrolment in ongoing clinical trials of treatment.

Role of index test(s)

SLNB currently provides one method of identifying those without clinically palpable lymph nodes to improve prognosis estimation. Although it carries less of a burden in terms of morbidity compared to lymph node dissection, it is still an invasive procedure with a certain risk of adverse events (estimated at 5% morbidity) (Marsden 2010), and has no established survival advantage (Kyrgidis 2015; Morton 2014). Ultrasound with FNAC is not sufficiently sensitive to replace SLNB but has been suggested to have a role in fast-tracking those with positive cytology results (micrometastases identified) to complete lymph node dissection, while those with negative cytology may proceed to SLNB, as required (Voit 2014).

There is currently no recommended role for imaging tests in early stage disease, but CT has been recommended as the imaging approach of choice for detection of nodal and distant spread in those with stage III or IV disease (and for those with stage IIC if no SLNB has been performed) (NICE 2015a). PET-CT is increasingly being used; however, practice varies across the country, primarily according to availability. The advantages to patient management with PET-CT are not yet known. The most appropriate role for MRI in staging melanoma in adults, other than for central nervous system disease, is unclear.

Alternative test(s)

A number of other tests may be used to inform patient management following a diagnosis of melanoma.

When clinically palpable lymph nodes are identified, core needle biopsy or FNAC of the lymph node may be undertaken to confirm the presence of macrometastases, i.e. metastases that are palpable (Marsden 2010). Fine needle aspiration is a fairly simple procedure which allows a sample of cells to be taken from the lymph node with a fine needle (Hall 2013), while core needle biopsy uses a slightly larger needle with a hollow centre, allowing the removal of a core of tissue with the cell structure intact (Whaley 2016a). Both procedures can be guided by simple palpation or, for more deep-seated lesions, with image-based guidance to identify micrometastases (requiring use of a microscope to be visualised) (Bohelay 2015). Although the accuracy of core needle biopsy in comparison to fine needle aspiration has been identified as a key clinical question to be investigated, it is beyond the scope of these reviews which focus on the detection of nonpalpable metastatic disease.

Genetic testing of primary melanoma specimens, for BRAF mutations for example, is increasingly used, particularly with the emergence of systemic treatments for BRAF V600 mutation-positive melanoma (Chapman 2011; Chapman 2012; Larkin 2014; Larkin 2015). A survey for the NICE guideline found 54% of local skin multidisciplinary teams and 83% of specialist skin multidisciplinary teams questioned had arranged testing of tumour blocks (from either primary or secondary melanoma tissue) for BRAF mutations within the preceding 2 years (NICE 2015a). Although genetic testing (or genotyping) may be carried out at the same time at staging investigations, its purpose is to inform systemic treatment decision rather than being an integral part of the staging procedure itself.

Biomarkers, such as S100, are used in countries such as Germany as a marker of prognosis (Gray 2014) or of early disease relapse (Peric 2011) rather than for staging purposes per se (Egberts 2010; Pirpiris 2010), whilst lactate dehydrogenase LDH is part of AJCC staging for stage IV (Pirpiris 2010); however, these are beyond the scope of our reviews.

Rationale

Appropriate staging of melanoma is crucial to ensuring that patients are directed to the most appropriate and effective treatment. A number of tests are available to assist in the staging of melanoma; however, their comparative accuracy for detection of nodal, or distant metastases, or both, according to histological stage at presentation is unclear.

The NICE guideline recommendations for staging (see Clinical pathway) were based on available systematic reviews of both SLNB and imaging tests (Hall 2013; Jimenez-Requena 2010; Krug 2008; Rodriguez 2014; Valsecchi 2011; Xing 2011), with some supplementary data from primary studies (NICE 2015d). The majority of reviews are limited in terms of currency (de Rosa 2011; Jimenez-Requena 2010; Krug 2008; Valsecchi 2011; Warycha 2009; Xing 2011), with literature searches in most cases extending only as recently as 2009 (Jimenez-Requena 2010; Krug 2008; Valsecchi 2011; Xing 2011). Furthermore, the only one of these to compare accuracy across imaging tests did not consider histological stage (Xing 2011). Two reviews provide a more recent evaluation of PET and PET-CT (search dates up to 2012 and 2011 respectively) (Rodriguez 2014; Schroer-Gunther 2012); however, the Schroer-Gunther 2012 review also relied on previously published reviews (Jimenez-Requena 2010; Krug 2008), with supplementary searching for more recently published studies, and the Rodriguez 2014 review included only stage III melanoma. The Schroer-Gunther 2012 review relied on quality assessment that was carried out for the original systematic reviews, and only a small number of studies were eventually included; the authors themselves recommend that future reviews should include a broader range of study designs (Schroer-Gunther 2012).

The comparative accuracy of imaging tests according to stage of disease, therefore remains to be determined. Furthermore, any evidence for or against the routine use of brain scanning in stage III melanoma, either with CT or MRI, is yet to be determined. PET-CT is increasingly used but any additional role it has compared with CT or MRI needs to be examined according to particular patient groups.

Our approach will allow the accuracy of each individual test to be established (where possible according to stage of disease) and comparative accuracy to be summarised in a Cochrane DTA overview. This approach will allow a complex topic to be approached in stages, making the data more accessible and allowing the reader to focus on individual tests. The overview review will compare those tests for which there is sufficient evidence, exploiting within-study comparisons of tests and allowing the analysis and comparison of different diagnostic strategies.

This generic protocol provides the methodology that we will use for a suite of reviews of tests to assist in the staging of melanoma. We will tailor accordingly the background sections for each individual test review and the overview review.

Objectives

To determine the diagnostic accuracy of SLNB for the detection of nodal metastases (in the investigated nodal basin) for the staging of cutaneous invasive melanoma.

To determine the diagnostic accuracy of imaging tests for the detection of any metastasis in the primary staging of cutaneous invasive melanoma (i.e. staging at presentation).

To determine the diagnostic accuracy of imaging tests for the detection of any metastasis in the staging of recurrence in cutaneous invasive melanoma (i.e. re-staging prompted by findings on routine follow-up).

Secondary objectives

To determine the diagnostic accuracy of imaging tests for the detection of nodal metastases in the staging of cutaneous invasive melanoma.

To determine the diagnostic accuracy of imaging tests for the detection of distant metastases in the staging of cutaneous invasive melanoma.

These will be estimated separately for those undergoing primary staging and those who have experienced a disease recurrence.

Investigation of sources of heterogeneity

We will consider a range of potential sources of heterogeneity for investigation in each individual test review. These may vary between reviews but may include the following.

i. Population characteristics
  • AJCC stage of disease

  • Sentinel lymph node status (for imaging studies only)

  • Clinical nodal status (for imaging studies only)

  • Primary tumour site (head and neck, trunk, limb, and other)

ii. Index test characteristics
  • Differences in test positivity thresholds (e.g. for SLNB, the tracer threshold for a 'hot' versus 'cold' node)

  • Other relevant test characteristics as appropriate to the test under consideration

iii. Reference standard characteristics
  • Reference standard used (histology, clinical or imaging-based follow-up; concurrent imaging-based reference standard)

iv. Study quality
  • Consecutive or random sample of participants recruited

  • Index test interpreted, blinded to the reference standard result

  • Index test interpreted, blinded to the result of any other index test

  • Presence of partial or differential verification bias (whereby only a sample of those subject to the index test are verified by the reference test or by the same reference test, with selection dependent on the index test result)

  • Use of an adequate reference standard

  • Overall risk of bias

We will examine the quality and quantity of research evidence available on the effectiveness of each index test for the primary target condition and make recommendations regarding where further research might be required.

Methods

Criteria for considering studies for this review

Types of studies

We will include test accuracy studies that allow comparison of results of the index test with that of a reference standard, including:

  • prospective and retrospective studies;

  • studies where all participants receive a single index test and a reference standard;

  • studies where all participants receive more than one index test(s) (concurrently) and a reference standard;

  • studies where participants are allocated (by any method) to receive different index tests or combinations of index tests and all receive a reference standard (between-person comparative studies);

  • studies that recruit a series of participants unselected by true disease status; and

  • diagnostic case-control studies that separately recruit diseased and nondiseased groups (Rutjes 2005).

We will exclude follow-up or surveillance studies using repeated imaging tests to detect disease recurrence, as defining the most appropriate follow-up schedule for melanoma patients is not the primary objective of these reviews. We will exclude studies if it is not possible to derive the number of true positives, false positives, false negatives and true negatives from data provided in the paper, or small studies with less than five disease-positive or less than five disease-negative participants. Although the size threshold of five is arbitrary, such small studies are likely to give unreliable estimates of sensitivity or specificity, and may be biased in a similar way to small randomised controlled trials (RCTs) of treatment effects.

We will include studies reporting either lesion-based or patient-based analyses; however, we will only include those reporting data on a per patient basis in the primary analysis. This is particularly pertinent for the reviews of imaging tests where multiple metastatic sites may be detected in an individual patient.

Participants

We will include studies in adults with cutaneous melanoma at any primary site who are undergoing staging, either following primary presentation of disease or following recurrence of disease. We will include studies that include mixed populations of patients or where the clinical pathway cannot be determined and examine any effect on test accuracy in subgroup analysis. We will exclude studies in which test results for participants with melanoma cannot be differentiated from those of participants with other diagnoses.

For studies of sentinel lymph node biopsy (SLNB), outcomes must be presented for both sentinel lymph node positive and sentinel lymph node negative participants. For studies of imaging tests, we will include studies focusing on either sentinel lymph node positive or sentinel lymph node negative participants.

Index tests

We will undertake individual reviews for SLNB and for the following imaging tests, either alone or in combination.

  • Ultrasound (with or without subsequent fine needle aspiration cytology (FNAC) or core biopsy)

  • Computed tomography (CT) (non-contrast or contrast-enhanced)

  • Positron emission tomography (PET) or PET-CT (¹⁸FDG only)

  • Magnetic resonance imaging (MRI) (non-contrast or contrast-enhanced)

SLNB studies may assess the effectiveness of methods of detection of sentinel lymph nodes, for example using different tracers or dyes or alternative imaging approaches. These will often compare approaches in terms of number of diseased nodes identified and we will exclude these unless an eligible reference standard, as described below, has been used.

We will produce a comparative overview review to compare the accuracy of tests, either alone or in combination.

Target conditions

The target condition for the SLNB review will necessarily be defined differently according to the result of the index test as:

  • for sentinel lymph node positive participants, the presence of micrometastasis in the nodal basin investigated by the SLNB procedure; or

  • for sentinel lymph node negative participants, the emergence of clinically detectable nodal disease or macrometastases in the nodal basin investigated by the SLNB procedure in the absence of evidence of distant metastases (i.e. a false negative SLNB result will be considered to have occurred if disease is proven to emerge only in the regional lymph nodes with no spread to distant sites).

In the event of inadequate data, we will drop the requirement to demonstrate the absence of distant metastases in sentinel lymph node negative participants with regional nodal recurrence, and we will consider the emergence of any nodal disease in the nodal basin investigated by the SLNB procedure a sufficient definition of a false negative result.

The target conditions for the imaging test reviews are the detection of:

  • any metastases,

  • any nodal metastases, or

  • any distant metastases.

The use of the same tests for the staging of cutaneous squamous cell carcinoma (cSCC) is the subject of a separate protocol (Dinnes 2017).

Reference standards

Acceptable reference standards include:

  • histology of lymph node or distant specimens, with samples obtained by core biopsy, SLNB or lymph node dissection, for index test positive participants;

  • cytology of lymph node specimens, with samples obtained by core biopsy, or fine needle aspiration, for index test positive participants;

  • clinical or radiological follow-up to identify nodal or distant recurrence of at least six months, for index test negative participants; or

  • any combination of the above.

Studies using cross-sectional imaging-based reference standards, i.e. a direct comparison of the index test compared to an alternative reference standard imaging test, will not be eligible.

Search methods for identification of studies

The Information Specialist (Susan Bayliss) has carried out a comprehensive search for published and unpublished studies. As previously mentioned, a series of Cochrane Diagnostic Test Accuracy (DTA) reviews on the diagnosis and staging of melanoma and keratinocyte skin cancers is being carried out as part of a National Institute for Health Research (NIHR) Programme Grant.

Electronic searches

We have conducted a single large literature search for the programme grant, covering all conditions and tests. This allowed for the screening of search results for potentially relevant papers for all reviews at the same time. We formulated a MEDLINE scoping search combining disease-related terms with terms related to the test names, using both text words and subject headings. As most records were related to the searches for tests for the staging of disease, we applied a filter using terms related to cancer staging and to accuracy indices to the staging test search, to try to eliminate irrelevant studies, e.g. those using imaging tests to assess treatment effectiveness.

We screened a sample of 300 records that would be missed by applying this filter and adjusted the filter to make sure that we would not miss any potentially relevant studies. The final search filter (Appendix 2) reduces the overall numbers retrieved from MEDLINE by around 6000. We cross-checked the final search result against the list of studies included in five systematic reviews; our search identified all but one of the studies, and this study is not indexed on MEDLINE. The Information Specialist, Susan Bayliss, has devised the search strategy, with input from the Information Specialist from Cochrane Skin, Elizabeth Doney. We used no additional limits.

We undertook further scoping searches to identify any relevant systematic reviews or health technology assessments. In addition to general bibliographic databases, we also accessed specialist databases with a focus on reviews of diagnostic test accuracy, such as ARIF.

We have now searched the following bibliographic databases, retrieving a total of 33,994 unique records:

Published studies

The Cochrane Central Register of Controlled Trials (CENTRAL) in the Cochrane Library; the Cochrane Database of Systematic Reviews (CDSR) in the Cochrane Library; CRD Database of Abstracts of Reviews of Effects (DARE); CRD HTA (Health Technology Assessment) database; MEDLINE via OVID (from 1946); MEDLINE In-Process & Other Non-Indexed Citations via OVID; Embase via OVID (from 1980); and CINAHL (Cumulative Index to Nursing and Allied Health Literature) via EBSCO from 1960 to the present).

Unpublished studies

Conference Proceedings Citation Index (CPCI) via Web of Science™ (from 1990); Zetoc (from 1993); and SCI Science Citation Index Expanded via Web of Science™ (from 1900), using the "Proceedings and Meetings Abstracts" Limit function.

Trials registers

The US National Institutes of Health Ongoing Trials Register ClinicalTrials.gov (www.clinicaltrials.gov); NIHR Clinical Research Network Portfolio Database (www.nihr.ac.uk/research-and-impact/nihr-clinical-research-network-portfolio/); and the World Health Organization International Clinical Trials Registry Platform (apps.who.int/trialsearch).

We aimed to identify all relevant studies, regardless of language or publication status (published, unpublished, in press, or in progress). We applied no date limits. Update searches will be time- and resource-dependent.

Searching other resources

Due to time restrictions and the volume of evidence retrieved from the electronic searches, we will not conduct any handsearching of conference proceedings. By searching CENTRAL, we will retrieve relevant records identified by regular handsearching by Cochrane Skin. The handsearched conferences and journals are listed here: www.skin.cochrane.org/resources-handsearchers.

We will include information about potentially relevant ongoing studies in the 'Characteristics of ongoing studies' tables. We will screen any relevant systematic reviews identified by the searches for their included primary studies, and we will include any that our searches have missed in the review. We will check the reference lists of all included papers, and subject experts within the author team will review the final list of included studies. We may use citation-searching for key references when we consider it appropriate.

Data collection and analysis

Selection of studies

Due to the volume of records retrieved, at least one review author (JDi or NC) has undertaken screening of the titles and abstracts, with any queries on the part of either reviewer discussed and resolved by consensus. A pilot exercise independently screening 539 references from MEDLINE showed a good level of agreement (89% with a kappa of 0.77). To date, 822 records have been selected for full text review for the staging reviews. At least two review authors, including methodologists (JDi or NC) and clinical reviewers, using a study eligibility screening proforma based on prespecified inclusion criteria, will independently undertake subsequent assessment of potentially relevant full text articles for the staging reviews (Appendix 3). Where differences in opinion exist, a third party drawing on the clinical and methodological expertise in the team, as appropriate to the content of the query (JDe, CD, HW, and RM), will help with resolution. We will compile a list of otherwise eligible studies for which insufficient data were presented to allow for the construction of a 2x2 contingency table, and we will contact study authors, asking them to provide the relevant data. We will describe the study selection process in an adapted PRISMA flowchart (Liberati 2009). At the full text inclusion stage, we will tag studies according to their target condition (melanoma or cSCC) and index test.

Data extraction and management

We will carry out data extraction using a predesigned and piloted data extraction form using Excel to ensure that we collect relevant data. At least two review authors will independently extract data concerning details of the study design, participants, index test(s) or test combinations and criteria for index test positivity, reference standards, and data required to populate a 2x2 diagnostic contingency table for each index test. We will record where data are available at several index test thresholds. A third party drawing on clinical and methodological expertise in the team as appropriate to the content of the query will resolve discrepancies.

We will attempt to contact authors of included studies where information that is considered key to one or more of the assessments of the quality of an included study, investigation of heterogeneity, or completion of a 2x2 diagnostic contingency table is missing. We will follow up studies published only as conference abstracts to identify whether a subsequent full paper has been published. Where possible, we will contact the authors of conference abstracts published from 2015 to 2016 and ask whether full data are available. If we can identify no full paper, we will mark conference abstracts as 'pending' and revisit them. Experience of contacting authors for information about missing data in DTA reviews is limited. Therefore, where we seek missing data, we will document the outcome of contact with the authors.

Dealing with multiple publications and companion papers

In the event of multiple reports of a primary study, we will examine all available data to determine the potential for overlapping populations and to identify a primary data source. Where we suspect overlapping study populations and are unable to identify a primary data source, we will contact study authors for clarification in the first instance. If contact with authors is unsuccessful, we will use the most complete and up-to-date data source available, thus avoiding the risk of double-counting. We will examine the impact of inconsistencies in reporting of 2x2 data that remain unresolved in a sensitivity analysis.

Assessment of methodological quality

We will assess applicability and risk of bias of included studies using the QUADAS-2 checklist (Whiting 2011), which has been tailored to the review topic (see Appendix 4).

Patient selection domain (1)

Selective recruitment of study participants can be a key influence on test accuracy. In general terms, all participants eligible to undergo a test should be included in a study, allowing for the intended use of that test within the context of the study.

Inappropriate participant exclusions affecting the internal validity of a study of staging might include exclusion of people with primary tumours at sites such as head and neck or exclusion of those with unsuccessfully mapped sentinel lymph nodes.

For sentinel lymph node biopsy (SLNB) studies, the applicability of a study's results will be affected by the patient spectrum according to the clinical stage of disease (AJCC stage) and site of the primary tumour.

Imaging tests may be undertaken following diagnosis of the primary melanoma lesion or following disease recurrence, such that studies may include mixed populations of participants. Given the potential for variation in test accuracy according to patient spectrum and disease prevalence (Brenner 1997; Leeflang 2013; Mulherin 2002), the applicability of results will be affected by the proportion of participants undergoing primary staging versus staging for disease recurrence, as well as by the clinical stage of disease (AJCC stage or clinical nodal status) and site of the primary tumour.

Index test domain (2)

Given the subjectivity of test interpretation, particularly for imaging tests, the interpretation of the index test blinded to the result of the reference standard is a key means of reducing bias. For prospective studies, the index tests will by nature be interpreted before the result of the reference standard is known; however, retrospective studies will be susceptible to information bias either if the person abstracting data from medical records is aware of individual patients' final diagnoses, or if any reinterpretation of images is undertaken for the purposes of the study.

For imaging tests, studies reporting the accuracy of multiple diagnostic thresholds (different tumour characteristics or parameters) for the same index test will also be subject to information bias unless each characteristic was interpreted by a different reader. This would be an impractical and unlikely approach for most studies, but a quality item has been included in order to highlight any studies where this occurs in order to allow discussion.

In terms of applicability, despite the often subjective nature of test interpretation, it is important that study authors outline the particular characteristics that were considered to be indicative of the presence of disease so that appropriate comparisons can be made between test evaluations and the test can be replicated in practice. For SLNB, a description of the tracer threshold for a 'hot' versus 'cold' node, as well as a description of the histology interpretation (such as the Royal College of Pathologists) requirements will be required.

The experience of the observer will also impact on the applicability of study results. Detailed information on the experience and training of care providers is often lacking, such that a detailed analysis of the impact of examiner experience may not be possible. However to be considered 'low concern':

  • surgical members of the specialist skin multidisciplinary team should meet guideline recommendations, i.e. carrying out at least 15 inguinal or axillary lymph node dissections per year (NHS England 2014);

  • imaging tests should be interpreted by consultant radiologists.

Reference standard domain (3)

In an ideal study, consecutively recruited participants should all undergo the same reference standard. In reality, both partial and differential verification bias are likely.

Partial verification bias will occur where histology (e.g. complete lymph node dissection) is the only reference standard used, and only those participants with a certain degree of suspicion of malignancy based on the result of the index test undergo verification, the others either being excluded from the study or being defined as being disease-negative without further assessment or follow-up. Cytology cannot be used as the only reference standard due to the potential for relatively high false negative rates; however a positive cytology result is considered equally as valid as a histology result in the majority of the cases (with the exception of poorly differentiated tumours).

Differential verification bias will be present where other reference standards are used in addition to histological or cytological verification. Differential verification is inevitable in these reviews because of the invasive nature of obtaining tissue samples for histological confirmation of presence/absence of malignancy. This is particularly true where complete lymph node dissection is the reference standard for detection of nodal metastases, as this will not be undertaken in those who have a negative SLNB. With imaging tests, histological confirmation would be impossible following a negative imaging result, however those with borderline or indeterminate results are also unlikely to have subsequent histology. Any indeterminate results will be reviewed by the multidisciplinary team and a decision made whether to repeat the imaging in three months for example, or to image with a different modality to clarify. With borderline imaging the finding is usually too small to call a metastasis, making biopsy very unlikely for practical reasons.

Absence of disease in index test negative participants and in those negative on cytology will require confirmation by clinical or radiological follow-up. Ideally, a follow-up based reference standard should be long enough to allow all present but 'hidden' cases of disease to become detectable (Naaktgeboren 2013), however differentiating disease that was originally present but missed from newly emergent disease is problematic, particularly given the slow growing nature of disease. No upper time limit has therefore been applied to define an 'adequate' follow-up reference standard.

For the SLNB review, we will require studies to report the emergence of clinically detectable or macroscopic nodal disease in an investigated nodal basin for sentinel lymph node negative participants in order to be included; for the reference standard to be judged adequate however, we will require studies to report disease occurrence in a mapped nodal basin. For the imaging reviews, we will define an adequate reference standard for imaging test negative participants as clinical or radiological follow-up to detect any metastatic disease. We will consider studies that use a concurrently applied imaging test to determine final diagnosis of index test negative participants at high risk of bias.

A further challenge is the potential for incorporation bias, i.e. where the result of the index test is used to help determine the reference standard diagnosis. For both SLNB and imaging tests, only those with positive test results will undergo any procedure to allow histological confirmation (whether core biopsy, SLNB or complete lymph node dissection). In each case, the histopathologist will most likely be aware that the index test was positive and this knowledge will inform the pathology procedure.

There is also considerable potential for the clinicians or radiologists concerned with the clinical follow-up, radiological follow-up, or both, of study participants to identify any subsequent emergence of nodal or distant disease to be aware of the original index test result and to use that to inform diagnostic decisions at the time of follow-up.

Reference standard blinding is therefore extremely unlikely and its enforcement would significantly limit the generalisability of the study results. We will therefore assess the presence of blinded reference test interpretation (as it is a standard QUADAS-2 item) but will not include it in our overall assessment of bias.

Flow and timing domain (4)

A period of one month has been defined as an appropriate interval (low risk of bias) between application of the index test and a histological reference standard (complete lymph node dissection or biopsy of possible distant metastases). Where the reference standard is follow-up based, we have not applied any restrictions on follow-up timing.

Comparative domain

In the event that we identify comparative primary test accuracy studies and include them in the overview, we will add a comparative domain to the QUADAS-2 checklist (Appendix 4). Questions reflect the possibility of selection bias (into the study and allocation to index test or testing strategies) and assessment of blinding of interpretation of each individual index test for within-person comparisons. In addition, for within-person test comparisons, we have specified a maximum of one month between application of individual index tests, as intervals greater than these may be accompanied by changes in tumour characteristics. This is an arbitrary threshold, and in the event that a large proportion of included studies exceed this time period, we will undertake a sensitivity analysis to investigate the impact of this quality item on estimates of accuracy.

We will initially pilot the amended checklist tool on a small number of included full text articles. Independently, two review authors will rate each study on the four quality domains (patient selection, index test(s), reference standard, flow and timing). They will resolve any disagreements by consensus or by referral to a third review author.

We will narratively summarise the results of quality assessment for all included studies at domain level, highlighting those domains that pose the greatest potential for risk of bias and concern about applicability for the body of evidence. We will supplement the narrative summary with summary graphics and tables as appropriate to assist with the presentation of the results of quality assessment across included studies for important participant subgroups and by index test.

Statistical analysis and data synthesis

For the SLNB reviews, our primary analysis will focus on the detection of metastases in the investigated nodal basin. For the imaging test reviews, we will conduct separate analyses firstly according to whether study participants are recruited on primary presentation of melanoma or with a disease recurrence, and secondly according to our primary and secondary objectives, i.e. detection of any metastasis (which must include both nodal and distant recurrence) and detection of nodal metastasis alone or detection of any distant metastasis, as defined under 'Target condition'). SLNB is not employed for staging of recurrence in skin cancer.

Studies may report test accuracy per-lesion or per-patient. Our unit of analysis for the primary analyses will be the patient as study participants may have multiple metastatic sites at any one time, such that a per lesion analysis may over-estimate test accuracy. We will include data from studies that reported per-lesion level data in secondary analyses, such that per lesion and per patient data from different studies would be combined together, using per patient data in preference where both are reported within a study. The estimation of the accuracy metrics to be used in our reviews are detailed in Appendix 5.

For the SLNB review, both index test and reference standard positivity are defined histologically. In the absence of an additional suitable reference standard for SLNB test positivity, it will not be possible to estimate false positive cases and specificity will always be 100%. We will therefore perform meta-analysis of only sensitivities by using a univariate random effects logistic regression model. We will also estimate the pooled negative predictive value in a secondary analysis (the positive predictive value not being possible to calculate due to false positives not being estimable). The definitions for each cell of the 2x2 contingency tables for the SLNB review are as follows.

  • TP = sentinel lymph node positive (i.e. all patients with a positive sentinel lymph node regardless of any subsequent recurrence).

  • FP = not possible to estimate.

  • FN = sentinel lymph node negative patients who experience clinical emergence of disease in the same nodal basin, in the absence of disseminated disease.

  • TN = sentinel lymph node negative patients who do not experience clinical emergence of disease in the same nodal basin.

For the imaging test reviews, we will estimate sensitivity and specificity in the usual way. We will initially explore the data by plotting estimates of sensitivity and specificity on coupled forest plots and in receiver operating characteristic (ROC) space for each index test under consideration. We will use hierarchical models to perform meta-analyses (Macaskill 2010). Where commonly used thresholds are reported, we will produce summary operating points (summary sensitivities and specificities) with 95% confidence and prediction regions using the method in Reitsma 2005. Where different thresholds are used, we will fit a summary curve using the hierarchical summary ROC model (Rutter 2001). When few studies are available for meta-analysis, we will simplify hierarchical models as appropriate, depending on whether the focus of inference is a summary point or summary curve (Takwoingi 2015). It is anticipated that results from multiple thresholds within a single study may be reported in many instances. Where multiple thresholds are to be selected for the review, data from the same participants may be used more than once in each analysis. For the analysis of summary curves, however, we will select standard or most commonly used thresholds from each study; failing that, we will select one threshold at random.

For the overview, we will perform both direct and indirect test comparisons, the latter being required because it is anticipated that comparative studies may be scarce (Takwoingi 2013). To formally compare index tests, we will add a covariate for test type to a hierarchical model. We will use likelihood ratio tests to assess the statistical significance of differences in test accuracy (sensitivity and specificity) for analyses of summary points and shape and accuracy for analyses of summary curves, by comparing models without the covariate terms with models containing the covariate terms.

We will conduct analyses using Review Manager 5 (Review Manager 2014), the NLMIXED procedure in the statistical software SAS version 9.4 (SAS 2016) and the meqrlogit command in the statistical software STATA 15 (STATA 2017).

Investigations of heterogeneity

We will initially examine heterogeneity between studies by visually inspecting the forest plots of sensitivity and specificity and summary ROC plots. Where a sufficient number of studies has assessed the same index test and the characteristics of interest (see Secondary objectives) were adequately reported to enable analyses, we will perform meta-regression by adding the potential source of heterogeneity as a covariate to a hierarchical model. We will apply a minimum requirement of at least five studies in each subgroup; we will only report heterogeneity analyses with less than five studies per group when we can be convinced that models have achieved adequate convergence and that the distribution of studies across groups is adequate to provide valid estimates. Where factors to be investigated (e.g. AJCC stage of disease) could vary between participants within a study, we will rely on the inclusion criteria set out by study authors (such as restriction to stage II or to stage III or IV melanoma), or use the results of any subgroup analyses within a study to examine the effect of that covariate. We will assess each of the factors listed under the secondary objectives where possible.

Sensitivity analyses

If a sufficient number of studies assess the same index test, we will perform sensitivity analyses restricting analyses according to:

  • those with direct test comparisons (where the period of application between the index tests was within one month);

  • where concerns around applicability for participant selection are low;

  • where there was low risk of bias for the index test; and

  • where there was low risk of bias for the reference standard.

As for the Investigations of heterogeneity above, we will require a minimum of at least five studies before we conduct sensitivity analyses.

Assessment of reporting bias

Because of uncertainty about the determinants of publication bias for diagnostic accuracy studies and the inadequacy of tests for detecting funnel plot asymmetry (Deeks 2005), we will not perform tests to detect publication bias.

Acknowledgements

The Cochrane Skin editorial base wishes to thank Luigi Naldi, the key Editor for this protocol; Brian Stafford, the consumer referee; and the clinical referees, An-Wen Chan and Chante Karimkhani. We also thank Clare Dooley, the copy-editor of this protocol.

We also wish to thank the Cochrane DTA editorial base and colleagues.

Appendices

Appendix 1. Current content and structure of the Programme Grant

 LIST OF REVIEWSEstimated number of studies
 Diagnosis of melanoma 
1Visual inspection50
2Dermoscopy88
3Teledermatology15
4Mobile phone applications2
5aComputer-aided diagnosis – dermoscopy-based techniques37
5bComputer-aided diagnosis – spectroscopy-based techniquesThis review will be amalgamated into 5a
6Reflectance confocal microscopy19
7High frequency ultrasound5
8Overview: Comparing the accuracy of tests for which sufficient evidence is identified either alone or in combinationNumber not estimable
 Diagnosis of keratinocyte skin cancer (BCC and cSCC) 
9Visual inspection +/- Dermoscopy22
10aComputer-aided diagnosis – dermoscopy-based techniques3
10bComputer-aided diagnosis – spectroscopy-based techniquesThis review will be amalgamated into 10a
11Optical coherence tomography5
12Reflectance confocal microscopy9
13Exfoliative cytology9
14Overview: Comparing the accuracy of tests for which sufficient evidence is identified either alone or in combinationNumber not estimable
 Staging of melanoma 
15Ultrasound25 - 30
16CT5 - 10
17PET or PET-CT20 - 25
18MRI5
19Sentinel lymph node biopsy +/- high frequency ultrasound70
20Overview: Comparing the accuracy of tests for which sufficient evidence is identified either alone or in combinationNumber not estimable
 Staging of cSCC 
21Imaging tests review10 - 15
22Sentinel lymph node biopsy +/- high frequency ultrasound15 - 20

Appendix 2. MEDLINE (OVID) search strategy

Database: Ovid MEDLINE(R) <1946 to August 2016 (as run on 28 August 2016) FINAL

Amended Search Strategy:

1 exp melanoma/

2 exp skin cancer/

3 exp basal cell carcinoma/

4 exp Neoplasms, basal cell/

5 basalioma$1.ti,ab.

6 ((basal cell or skin) adj2 (cancer$ or carcinoma$1 or mass or masses or tumour$ or tumor$ or neoplasm$1 or adenoma$1 or epithelioma$1 or lesion$1 or malignan$ or nodule$1)).ti,ab.

7 (pigmented adj2 (lesion$1 or mole$ or nevus or nevi or naevus or naevi or skin)).ti,ab.

8 (melanom$ or nonmelanoma$ or non-melanoma$ or melanocyt$ or non-melanocyt$ or nonmelanocyt$ or keratinocyt$).ti,ab.

9 nmsc.ti,ab.

10 rodent ulcer$.ti,ab.

11 (squamous cell adj2 (cancer$ or carcinoma$1 or mass or masses or tumor$1 or tumour$1 or neoplasm$1 or adenoma$1 or epithelioma$1 or epithelial or lesion$1 or malignan$ or nodule$1) adj2 (skin or epiderm$ or cutaneous)).ti,ab.

12 (BCC or CSCC or NMSC).ti,ab.

13 keratinocy$.ti,ab.

14 Keratinocytes/

15 or/1-14 (253324)

16 dermoscop$.ti,ab.

17 dermatoscop$.ti,ab.

18 photomicrograph$.ti,ab.

19 exp epiluminescence microscopy/

20 Microscopy, Confocal/

21 (epiluminescence adj2 microscop$).ti,ab.

22 (confocal adj2 microscop$).ti,ab.

23 Tomography, Optical Coherence/

24 Dielectric Spectroscopy/

25 Cytodiagnosis/

26 (incident light adj2 microscop$).ti,ab.

27 (surface adj2 microscop$).ti,ab.

28 (visual adj (inspect$ or examin$)).ti,ab.

29 ((clinical or physical) adj examin$).ti,ab.

30 3 point.ti,ab.

31 three point.ti,ab.

32 pattern analys$.ti,ab.

33 ABCD$.ti,ab.

34 menzies.ti,ab.

35 7 point.ti,ab.

36 seven point.ti,ab.

37 (digital adj2 (dermoscop$ or dermatoscop$)).ti,ab.

38 artificial intelligence.ti,ab.

39 AI.ti,ab.

40 computer assisted.ti,ab.

41 computer aided.ti,ab.

42 neural network$.ti,ab

43 exp diagnosis, computer-assisted/

44 MoleMax.ti,ab.

45 image process$.ti,ab.

46 automatic classif$.ti,ab.

47 image analysis.ti,ab.

48 SIAscop$.ti,ab.

49 Aura.ti,ab.

50 (optical adj2 scan$).ti,ab.

51 MelaFind.ti,ab.

52 SIMSYS.ti,ab.

53 MoleMate.ti,ab.

54 SolarScan.ti,ab.

55 VivaScope.ti,ab.

56 (high adj3 ultraso$).ti,ab.

57 (canine adj2 detect$).ti,ab.

58 ((mobile or cell or cellular or smart) adj ((phone$1 adj2 app$1) or application$1)).ti,ab.

59 smartphone$.ti,ab.

60 (DermoScan or SkinVision or DermLink or SpotCheck).ti,ab.

61 Mole Detective.ti,ab.

62 Spot Check.ti,ab.

63 (mole$1 adj2 map$).ti,ab.

64 (total adj2 body).ti,ab.

65 exfoliative cytolog$.ti,ab.

66 digital analys$.ti,ab.

67 (imag$ adj3 software).ti,ab.

68 (teledermatolog$ or tele-dermatolog$ or telederm or tele-derm or teledermoscop$ or tele-dermoscop$ or teledermatoscop$ or tele-dermatoscop$).ti,ab.

69 (optical coherence adj (technolog$ or tomog$)).ti,ab.

70 OCT.ti,ab.

71 (computer adj2 diagnos$).ti,ab.

72 exp sentinel lymph node biopsy/)

73 (sentinel adj2 node).ti,ab.

74 nevisense.mp. or HFUS.ti,ab.

75 electrical impedance spectroscopy.ti,ab.

76 history taking.ti,ab

77 patient history.ti,ab.

78 (naked eye adj (exam$ or assess$)).ti,ab.

79 (skin adj exam$).ti,ab.

80 physical examination/

81 ugly duckling.mp. or UD.ti,ab.

82 ((physician$ or clinical or physical) adj (exam$ or triage or recog$)).ti,ab.

83 ABCDE.mp. or VOC.ti,ab.

84 clinical accuracy.ti,ab.

85 Family Practice/ or Physicians, Family/ or clinical competence/

86 (confocal adj2 microscop$).ti,ab.

87 diagnostic algorithm$1.ti,ab.

88 checklist$.ti,ab.

89 virtual imag$.ti,ab.

90 volatile organic compound$1.ti,ab.

91 dog$1.ti,ab.

92 gene expression analy$.ti,ab.

93 reflex transmission imag$.ti,ab

94 thermal imaging.ti,ab.

95 elastography.ti,ab.

96 or/16-95 (849678)

97 (CT or PET).ti,ab.

98 PET-CT.ti,ab.

99 (FDG or F18 or Fluorodeoxyglucose or radiopharmaceutical$).ti,ab.

100 exp Deoxyglucose/

101 deoxy-glucose.ti,ab.

102 deoxyglucose.ti,ab.

103 CATSCAN.ti,ab. 104 exp Tomography, Emission-Computed/

105 exp Tomography, X-ray computed/

106 positron emission tomograph$.ti,ab.

107 exp magnetic resonance imaging/

108 (MRI or fMRI or NMRI or scintigraph$).ti,ab.

109 exp echography/

110 Doppler echography.ti,ab.

111 sonograph$.ti,ab.

112 ultraso$.ti,ab.

113 doppler.ti,ab)

114 magnetic resonance imag$.ti,ab.

115 or/97-114 (1337432)

116 (stage$ or staging or metasta$ or recurr$ or advanced or sensitivity or specificity or false negative$ or thickness$).ti,ab.

117 "Sensitivity and Specificity"/

118 exp cancer staging/

119 or/116-118 (2164365)

120 115 and 119

121 96 or 120

122 15 and 121 (18542)

Appendix 3. Full-text exclusion criteria

The study:Response (enter X if any of the exclusion criteria are met)
  • is not a primary study

 
  • is a conference abstract only

 
  • is a systematic review

 
  • does not allow accuracy to be estimated separately for either melanoma or cSCC participants

 
  • (for SLNB) does not report outcomes for both SLN+ and SLN- patients

 
  • (for SLNB) does not report recurrence in the investigated nodal basin

 
  • (for imaging) does not report detection of nodal or distant recurrence (or any recurrence)

 
  • (for melanoma only) includes < 5 diseased or < 5 nondiseased participants

 
  • (for cSCC) no sample size limit

 
  • evaluates an ineligible index test (eligible tests are SLNB, US, CT, PET or PET-CT, MRI)

 
  • is a surveillance (follow-up) study using repeat or serial imaging

 
  • does not use an eligible reference standard

 
  • does not assess test accuracy (i.e. 2 x 2 cannot be derived)

 

cSCC: cutaneous squamous cell carcinoma.
SLNB: sentinel lymph node biopsy.
US: ultrasound.
CT: computed tomography.
PET: positron emission tomography.
PET-CT: positron emission tomography-computed tomography.
MRI: magnetic resonance imaging.

Appendix 4. QUADAS interpretation

Item Response (delete as required)
PARTICIPANT SELECTION (1) - RISK OF BIAS
1) Was a consecutive or random sample of participants or images enrolled?

Yes - if paper states consecutive or random

No – if paper describes other method of sampling

Unclear – if participant sampling not described

2) Was a case-control design avoided?

Yes - if consecutive or random or case-control design clearly not used

No – if study described as case-control or describes sampling specific numbers of participants with particular diagnoses

Unclear – if not described

3) Did the study avoid inappropriate exclusions, e.g. needs examples of inappropriate exclusions in this context – for both melanoma and for cutaneous squamous cell carcinoma (cSCC) staging?

Yes - if inappropriate exclusions were avoided

No – if lesions were excluded that might affect test accuracy, e.g. indeterminate results or where disagreement between evaluators was observed

Unclear – if not clearly reported

4) For between-person comparative (BPC) studies only (i.e. allocating different tests to different study participants such as randomised controlled trials (RCTs)): 
  • a) were the same participant selection criteria used for those allocated to each test?

Yes - if same selection criteria were used for each index test

No – if different selection criteria were used for each index test

Unclear – if selection criteria per test were not described

N/A – if only one index test was evaluated or all participants received all tests

  • b) was the potential for biased allocation between tests avoided through adequate generation of a randomised sequence?

Yes - if adequate randomisation procedures are described

No – if inadequate randomisation procedures are described

Unclear – if the method of allocation to groups is not described (a description of ‘random’ or ‘randomised’ is insufficient)

N/A – if only one index test was evaluated or all participants received all tests

  • c) was the potential for biased allocation between tests avoided through concealment of allocation prior to assignment?

Yes - if appropriate methods of allocation concealment are described

No – if appropriate methods of allocation concealment are not described

Unclear – if the method of allocation concealment is not described (sufficient detail to allow a definite judgement is required)

N/A – if only one index test was evaluated

Could the selection of participants have introduced bias?
v FOR NON-COMPARATIVE (NC) STUDIES
If answers to all of questions 1) and 2) and 3) was ‘Yes’: Risk is Low
If answers to any one of questions 1) or 2) or 3) was ‘No’: Risk is High
If answers to any one of questions 1) or 2) or 3) was ‘Unclear’: Risk Unclear
v FOR BETWEEN-PERSON COMPARATIVE STUDIES
If answers to all of questions 1) and 2) and 3) and 4) was ‘Yes’: Risk is Low
If answers to any one of questions 1) or 2) or 3) or 4) was ‘No’: Risk is High
If answers to any one of questions 1) or 2) or 3) or 4) was ‘Unclear’: Risk Unclear
PARTICIPANT SELECTION (1) - CONCERNS REGARDING APPLICABILITY
For sentinel lymph node biopsy and imaging tests:
1) Does the study report results for participants unselected by stage of disease or site of primary lesion, i.e. the study does not focus solely on those with a particular stage of disease such as AJCC I or melanoma <=1 mm in thickness?

Yes - if an unrestricted group of participants have been included

No - if a selected group of study participants have been included, e.g. those with clinical stage I disease or only those with thin melanoma

Unclear – if insufficient details are provided to determine the spectrum of included participants

2) Did the study report data on a per-patient rather than per-lesion basis?

Yes – if a per-patient analysis was reported

No – if a per-lesion analysis only was reported

Unclear – if it is not possible to assess whether data are presented on a per-patient or per-lesion basis

For imaging tests only:
3) Does the study focus primarily on participants undergoing primary staging or those undergoing staging for disease recurrence?

Yes - if at least 80% of study participants are undergoing primary staging following diagnosis of a primary cutaneous melanoma or staging of recurrence

No - if less than 80% of study participants are undergoing primary staging following diagnosis of a cutaneous melanoma or staging of recurrence

Unclear – if insufficient details are provided to determine the proportion of patients undergoing primary staging versus those undergoing staging of recurrence

Is there concern that the included participants do not match the review question?
If the answer to question 1) or 2) (and 3)) was ‘Yes’: Concern is Low
If the answer to question 1) or 2) (and 3)) was ‘No’: Concern is High
If the answer to question 1) or 2) (and 3)) was ‘Unclear’: Concern is Unclear
INDEX TEST (2) - RISK OF BIAS (to be completed per test evaluated)
1) Was the index test or testing strategy result interpreted without knowledge of the results of the reference standard?

Yes - if index test described as interpreted without knowledge of reference standard result, or for prospective studies, if index test is always conducted and interpreted prior to the reference standard

No – if index test described as interpreted in knowledge of reference standard result

Unclear – if index test blinding is not described

2) Was the diagnostic threshold at which the test was considered positive prespecified?

Yes - if threshold was prespecified (i.e. prior to analysing study results)

No - if threshold was not prespecified

Unclear - if not possible to tell whether or not diagnostic threshold was prespecified

For imaging tests only:
3) For studies reporting the accuracy of multiple diagnostic thresholds (tumour characteristic or parameter) for the same index test, was each threshold interpreted without knowledge of the results of the others?

Yes - if thresholds were selected prospectively and each was interpreted by a different reader, or if study implements a retrospective (or no) cutoff

No - if study uses prospective threshold and report states reported by same reader

Unclear - if no mention of number of readers for each threshold or if pre-specification of threshold not reported

N/A - multiple diagnostic thresholds not reported for the same index test

4) For within-person comparisons (WPC) of index tests or testing strategies (i.e. > 1 index test applied per participant), was each index test result interpreted without knowledge of the results of other index tests or testing strategies?

Yes - if all index tests were described as interpreted without knowledge of the results of the others

No - if the index tests were described as interpreted in the knowledge of the results of the others

Unclear – if it is not possible to tell whether knowledge of other index tests could have influenced test interpretation

N/A – if only one index test was evaluated

Could the conduct or interpretation of the index test have introduced bias? 
v FOR NC and BPC STUDIES item 3) / 4) to be added 
If answers to questions 1) and 2) was ‘Yes’: Risk is Low
If answers to either questions 1) or 2) was ‘No’: Risk is High
If answers to either questions 1) or 2) was ‘Unclear’: Risk is Unclear
v FOR WPC STUDIES 
If answers to all questions 1), 2) for any index test and 3) was ‘Yes’: Risk is Low
If answers to any one of questions 1) or 2) for any index test or 3) was ‘No’: Risk is High
If answers to any one of questions 1) or 2) for any index test or 3) was ‘Unclear’: Risk is Unclear
INDEX TEST (2) - CONCERN ABOUT APPLICABILITY

1) Were thresholds or criteria for diagnosis reported in sufficient detail to allow replication?

This item applies equally to studies using objective and more subjective approaches to test interpretation. For SLNB studies, this requires description of the tracer threshold for identification of the SLN and the histological assessment.

Yes – if the criteria for diagnosis of the target disorder were reported in sufficient detail to allow replication

No – if the criteria for diagnosis of the target disorder were not reported in sufficient detail to allow replication

Unclear – if some but not sufficient information on criteria for diagnosis to allow replication were provided

2) Was the test interpretation carried out by an experienced examiner?

Yes – if the test was interpreted by an experienced examiner as defined in the review protocol

No – if the test was not interpreted by an experienced examiner (see above)

Unclear – if the experience of the examiner(s) was not reported in sufficient detail to judge or if examiners described as 'Expert' with no further detail given

Is there concern that the index test, its conduct, or interpretation differ from the review question?
If answers to questions 1) and 2) was ‘Yes’: Concern is Low
If answers to questions 1) or 2) was ‘No’: Concern is High
If answers to questions 1) or 2) was ‘Unclear’: Concern is Unclear
REFERENCE STANDARD (3) - RISK OF BIAS
1) Is the reference standard likely to correctly classify the target condition? 

a) DISEASE POSITIVE - One or more of:

- Histological confirmation of metastases following lymph node dissection (or SLNB or core biopsy for imaging studies)

- Clinical/radiological follow up to identify clinically detectable disease in a mapped nodal basin (SLNB studies)

- Clinical/radiological follow up to identify any metastases (imaging studies) subsequently confirmed on histology

Yes – if all disease positive participants underwent one of the listed reference standards

No – if a final diagnosis for any disease positive participant was reached without histopathology

Unclear – if the method of final diagnosis was not reported for any disease positive participant

b) DISEASE NEGATIVE - One or more of:

- Histological confirmation of absence of disease in a mapped nodal basin following lymph node dissection (or following SLNB for imaging studies)

- Clinical/radiological follow up of test negative participants

Yes – if at least 90% of disease negative participants underwent one of the listed reference standards

No – if more than 10% of benign diagnoses were reached by concurrent imaging test

Unclear – if the method of final diagnosis was not reported for any participant with benign or disease negative diagnosis

2) Were the histology-based reference standard results interpreted without knowledge of the results of the index test?

Yes – if the histopathologist was described as blinded to the index test result

No – if the histopathologist was described as having knowledge of the index test result

Unclear – if blinded histology interpretation was not clearly reported

3) Were the reference standard results based on patient follow-up interpreted without knowledge of the results of the index test?

Yes – if the clinician or radiologist was described as blinded to the index test result

No – if the clinician or radiologist was described as having knowledge of the index test result

Unclear – if blinded interpretation was not clearly reported

Could the reference standard, its conduct, or its interpretation have introduced bias? 
If answers to questions 1) and 2) and 3) was ‘Yes’: Risk is Low
If answers to questions 1) or 2) or 3) was ‘No’: Risk is High
If answers to questions 1) or 2) or 3) was ‘Unclear’: Risk is Unclear
REFERENCE STANDARD (3) - CONCERN ABOUT APPLICABILITY
1) Does the study use the same definition of disease positive as the primary review question or is it possible to fully disaggregate data such that data matching the review question can be extracted?

Yes – same definition of disease positive used, or patients can be disaggregated and regrouped according to review definition

No – some patients cannot be disaggregated

For SLNB review – disease positive includes participants with any nodal recurrence (not restricted to clinical recurrence in same nodal basin)

For imaging reviews – participants with nodal versus distant recurrences cannot be disaggregated

Unclear – definition of disease positive not clearly reported

For studies of imaging tests:
2) The result of another imaging test (without patient follow-up to determine later emergence of disease) was not used as a reference standard

Yes – if imaging-based diagnosis was not used as a reference standard for any participant

No – if imaging-based diagnosis was used as a reference standard for any participant

Unclear – if not clearly reported

3) Item on observer experience could be included?

Is there concern that the target condition as defined by the reference standard does not match the review question?

 
If answers to all questions 1), 2) and 3) was ‘Yes’: Concern is Low
If answers to any one of questions 1) or 2) or 3) was ‘No’: Concern is High
If answers to any one of questions 1) or 2) or 3) was ‘Unclear’: Concern is Unclear
***For teledermatology studies only:
If answers to questions 1) and 3) was ‘Yes’: Concern is Low
If answers to questions 1) or 3) was ‘No’: Concern is High
If answers to questions 1) or 3) was ‘Unclear’: Concern is Unclear
FLOW AND TIMING (4): RISK OF BIAS
1) Was there an appropriate interval between index test and reference standard? 
  • a) For index test positive participants, was the interval between index test and histological reference standard <= 1 month?

Yes – if study reports <= 1 month between index and histological reference standard

No – if study reports > 1 month between index and histological reference standard

Unclear – if study does not report interval between index and histological reference standard

  • b) If reference standard is clinical or imaging-based follow up of index test negative participants, was there less than 6 months between application of index test(s) and first follow-up visit?

Yes – if study reports a follow-up visit within 6 months of application of the index test

No – if study reports the first follow-up visit beyond 6 months of the index test

Unclear – if study does not report timing of follow-up visits

2) Did all participants receive the same reference standard?

Yes – if all participants underwent the same reference standard

No – if more than one reference standard was used

Unclear – if not clearly reported

3) Were all participants included in the analysis?

Yes – if all participants were included in the analysis

No – if some participants were excluded from the analysis

Unclear – if not clearly reported

4) For WITHIN-PERSON COMPARISONS (WPC) of index tests:

Was the interval between application of index tests <= 1 month?

Could the participant flow have introduced bias?

Yes – if study reports <= 1 month between index tests

No – if study reports > 1 month between index tests

Unclear – if study does not report interval between index tests

v FOR NON-COMPARATIVE and BPC STUDIES
If answers to questions 1) and 2) and 3) was ‘Yes’: Risk is Low
If answers to any one of questions 1) or 2) or 3) was ‘No’: Risk is High
If answers to any one of questions 1) or 2) or 3) was ‘Unclear’: Risk is Unclear
v FOR WITHIN-PERSON COMPARATIVE STUDIES (WPC)
If answers to all questions 1), 2), 3), and 4) was ‘Yes’: Risk is Low
If answers to any one of questions 1), 2), 3), or 4) was ‘No’: Risk is High
If answers to any one of questions 1), 2), 3), or 4) was ‘Unclear’: Risk is Unclear

Appendix 5. Calculation of diagnostic accuracy statistics

i) Contingency table (2x2 table)

Reference standard

 

+ve

Diseased

-ve

Nondiseased

 

Index test result

+ ve

True positives ab False positivesTotal test positive
- veFalse negatives cd True negativesTotal test negative
 Total diseasedTotal nondiseased 

ii) Diagnostic accuracy indices

SensitivityProportion of diseased who have positive test results

True positives / total diseased

a / (a + c)

SpecificityProportion of nondiseased who have negative test results

True negatives / total nondiseased

d / (b + d)

Positive predictive value (PPV)Proportion with positive test result who actually have the disease

True positives / total test positive

a / (a + b)

Negative predictive value (NPV)Proportion with negative test result who really do not have the disease

True negatives / total test negative

d / (c + d)

Contributions of authors

JDi was the contact person with the editorial base.
JDi co-ordinated the contributions from the co-authors and wrote the final draft of the protocol.
JDi, CD, NC, JDe, YT, SB worked on the Methods sections.
JDi, DS, STC, RM, HW drafted the clinical sections of the Background and responded to the clinical comments of the referees.
JDi, JJD, YT, CD responded to the methodology and statistics comments of the referees.
JDi, HW, RM, CD, NC, JDe, YT, SB, DS, STC contributed to writing the protocol.
KG was the consumer co-author and checked the protocol for readability and clarity. She also ensured that the outcomes are relevant to consumers.
JDi is the guarantor of the final review.

Disclaimer

This project is supported by the National Institute for Health Research (NIHR), via Cochrane Infrastructure funding to the Cochrane Skin Group and Cochrane Programme Grant funding. The views and opinions expressed therein are those of the authors and do not necessarily reflect those of the Systematic Reviews Programme, NIHR, National Health Service (NHS) or the Department of Health.

Declarations of interest

Jacqueline Dinnes: nothing to declare.
Daniel Saleh: nothing to declare.
Julia Newton-Bishop: she has undertaken legal work, offering advice on patient complaint issues, paid for by patients. Her research group, the Leeds Institute of Cancer & Pathology (LICP) Melanoma Research Group (MRG), is in receipt of a number of research grants from Cancer Research UK, the Medical Research Council, the National Institutes of Health, and the Melanoma Research Alliance. She received a single honorarium from conference organisers for a talk given at an academic meeting in Bergen, which she paid into the MRG research account. None of the entities listed are commercial sponsors.
Seau Tak Cheung: nothing to declare.
Paul Nathan: he has received consultancy fees from Bristol Myers Squibb (BMS), Pfizer, Merck Sharp Dohme (MSD), Merck, and Immunocore to sit on advisory boards. He has received payment from BMS and Novartis for lectures at satellite symposia; payment from BMS for webcasts; payment of travel, accommodations, and meeting expenses from BMS and MSD for attending conferences of the American Society of Clinical Oncology, Society for Melanoma Research, and European Society of Medical Oncology.
Rubeta N Matin: nothing to declare.
Naomi Chuchu: nothing to declare.
Susan E Bayliss: nothing to declare.
Yemisi Takwoingi: nothing to declare.
Clare Davenport: nothing to declare.
Kathie Godfrey: nothing to declare.
Colette O'Sullivan: nothing to declare.
Jonathan J Deeks: nothing to declare.
Hywel C Williams: nothing to declare.

Sources of support

Internal sources

  • No sources of support supplied

External sources

  • NIHR Systematic Review Programme, UK.

  • National Institute for Health Research (NIHR), UK.

    The NIHR, UK, is the largest single funder of the Cochrane Skin Group

Ancillary