Diagnostic tests for oral cancer and potentially malignant disorders in patients presenting with clinically evident lesions

Joseph LY Liu; Tanya Walsh; Alexander R Kerr; Mark Lingen; Paul Brocklehurst; Graham Ogden; Saman Warnakulasuriya; Crispian Scully

doi:10.1002/14651858.CD010276

Diagnostic tests for oral cancer and potentially malignant disorders in patients presenting with clinically evident lesions

Authors' declarations of interest

Version published: 12 December 2012 Version history

https://doi.org/10.1002/14651858.CD010276

Collapse all Expand all

Abstract

This is a protocol for a Cochrane Review (Diagnostic test accuracy). The objectives are as follows:

To estimate the accuracy of index tests for the detection of oral cancer and potentially malignant disorders of the lip and oral cavity in patients presenting with clinically evident lesions.

Background

Target condition being diagnosed

The target conditions of interest are oral squamous cell carcinoma (OSCC), the most common form of oral cavity cancer (Scully 2000a), and potentially malignant disorders (PMD), of the lip and oral cavity in patients presenting with clinically evident lesions. A variety of terms have been used internationally to describe clinical presentations that have the potential to become cancer. At a meeting of international oral cancer and precancer experts held in 2005, the concept of precancer, along with issues surrounding classification and definition, aetiology, diagnosis and management was extensively discussed. Through consensus, the term 'potentially malignant disorders' was selected to convey the fact that not all precancerous lesions and conditions will transform to cancer, but there is the potential for malignant transformation (van der Waal 2009; Warnakulasuriya 2007).

The natural history of oral cancer is not fully understood, given variations in disease processes and dysplastic changes in PMD (Napier 2008; Scully 2009). Most oral carcinomas are preceded by PMD, of which erythroplakia, non‐homogeneous leukoplakia, erosive lichen planus, oral submucous fibrosis and actinic keratosis are perhaps the most important (Warnakulasuriya 2007). The concept of a two‐step process of cancer development of the oral mucosa is established (i.e. precursor to established lesion). Oral leukoplakia is the best‐known precursor lesion and between less than 1 and 18% of lesions develop into oral cancer. The original 1978 World Health Organisation (WHO) definition of oral leukoplakia has been revised to read "The term leukoplakia should be used to recognize white plaques of questionable risk having excluded (other) known diseases or disorders that carry no increased risk for cancer" (Warnakulasuriya 2007). The presence of epithelial dysplasia can help predict malignant development but the diagnosis is essentially subjective, with not all lesions exhibiting dysplasia, some becoming malignant and some regressing. Carcinoma can also develop from lesions in which epithelial dysplasia was not previously diagnosed. Numerous attempts have been made to relate biological characteristics to the malignant potential of leukoplakias, but finding a definitive characteristic remains elusive (Reibul 2003). Estimates of malignant transformation rates (MTR) vary enormously, from site to site within the mouth, from population to population and from study to study (Napier 2008). The MTR of hospital‐based surveys are consistently higher than community‐based studies because of sampling bias. Petti calculated a global MTR of oral leukoplakia of 1.36% per year (95% confidence interval 0.69 to 2.03%) based on the prevalence of leukoplakia (Petti 2003), but this far exceeds the numbers of actual cases of malignancy. Virtually all studies emphasize the chronicity of oral PMD, with an increasing tendency to malignant change in the first 5 years. For example, the incidence of OSCC arising from leukoplakia in Californians was greatest in the second year of follow‐up (11 out of 45; 24%) (Silverman 2004). The proportion of PMD that will develop OSCC is uncertain but low; best estimates suggest a rate of less than 2% per year (Napier 2008).

The early detection and excision of some PMD can prevent malignancy, or if malignancy is detected, there is some evidence that appropriate treatment can reduce disease severity and improve survival rates (Brocklehurst 2010; van der Waal 2009; Warnakulasuriya 2007). Leukoplakias can be treated by a number of methods. According to Lodi et al's systematic review (Lodi 2008), the effectiveness of surgical interventions, including laser therapy and cryotherapy, has not been studied by means of a randomised controlled trial (RCT) with a no treatment/placebo arm. Vitamin A and retinoids have been tested by five RCTs, two studies investigated beta carotene or carotenoids, the other drugs tested were bleomycin (one study), mixed tea (one study) and ketorolac (one study). None of the treatments tested showed a benefit when compared with the placebo. Lodi et al concluded that there was no evidence of effective treatment in preventing the malignant transformation of leukoplakia. Where resolution of a lesion is observed, relapses and adverse effects are common.

Technologies to treat and manage oral cancer have progressed substantially, as shown by Cochrane systematic reviews of RCTs (Bessell 2011; Furness 2011; Glenny 2010). Patients presenting with oral lesions persisting for more than 2 to 3 weeks are generally referred to an oral medicine specialist for further investigation and to rule out malignancy (Scully 2000a; Scully 2000b; Scully 2000c). Once progressed to frank malignancy, the traditional treatment is surgery and radiotherapy. More recently, systemic chemotherapy has been included as part of the treatment regimen before or during radiotherapy. Surgery for the treatment of oral cancer is followed by exacting reconstructive surgery to restore form and function. Debilitating side effects can occur as a result of radiotherapy and chemotherapy, adversely affecting an individual's quality of life. The 5‐year survival rate following diagnosis has remained at around 50% for the past 30 years in most countries (Parkin 2001; Warnakulasuriya 2009). This is in marked contrast to the improved survival rates in many other cancers, such as those of the breast and the colon (Cancer Research UK), but may be explained at least in some part by the fact that oral cancer is more often diagnosed at a late stage of the disease, when prognosis is poorer and the risks of significant morbidity and mortality are substantially higher (Rogers 2009; Rusthoven 2010).

Index test(s)

There is no standard practice for patients presenting with clinically evident lesions that may carry a risk of cancer. Factors contributing to this variation include geographical location and access to clinical personnel. A conventional oral examination (COE), a standard visual and tactile examination of the oral mucosa under normal (incandescent) light by a frontline clinician such as a general dentist is a common starting point. This can be considered as an opportunistic 'screen'. The assumption is that an examination is performed to provide an opportunity for 'case‐finding' where necessary. Upon discovering a lesion the clinician will make a subjective judgement based upon the clinical presentation, their clinical experience and training and the resources available to them to decide on the next step. A fungating ulcerative mass that is obviously an advanced malignancy needs little clinical acumen and initiates an immediate referral. As we move earlier in the disease spectrum, the clinical features become progressively less obvious and the judgement as to whether a lesion is, or has, the potential to become dysplastic or even malignant, and hence the next step on the pathway becomes more difficult.

A number of index tests have been proposed for use by frontline clinicians, specialists and 'super‐specialists' (i.e. those who see surveillance populations) as adjuncts to a conventional oral cancer examination for the purpose of improving diagnostic test accuracy (Fedele 2009; Leston 2010; Lingen 2008; Patton 2008; Rethman 2010). These include:

vital staining (toluidine blue, tolonium chloride);
oral cytology (e.g. OralCDx brush biopsy);
light‐based detection (e.g. ViziLite and ViziLite Plus, Microlux/DL, VELscope, Orascoptic DK, Identafi 3000) and oral spectroscopy; and
blood and saliva analysis.

Vital staining and oral cytology are long available diagnostic adjuncts to a conventional oral examination (Leston 2010; Lingen 2008). In this review, we will restrict vital staining index tests to those applied to a lesion that has been visualized. A companion Cochrane review 'Clinical assessment to screen for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults' will include vital staining index tests in a rinse form, used as a screening adjunct in a general population (Walsh 2012). Other tests such as light‐based detection systems have become commercially available only more recently. Blood analysis and saliva analysis are more novel tests at an early stage of evaluation.

It is worth noting there are regional differences in regulations on the use of some of the above tests. For example, toluidine blue, having been consistently rejected as a stand‐alone technique, is not cleared for use as a stand‐alone screening technique in the United States; it is included in the ViziLite Plus system. However, the toluidine blue‐only component is approved by the FDA as a marking device.

There are a number of different uses for such diagnostic adjuncts dependent on the pathway taken by the patient. Of the index tests listed above, all have the potential to be used as diagnostic or case‐finding adjuncts to the COE by frontline clinicians (Additional Table 1), specialists and 'super‐specialists', to aid in the more accurate diagnosis of oral cancer and PMD. By including these tests, the diagnostic process would be identification of clinically evident lesions on the basis of clinical appearance and criteria and/or findings from the index test(s), followed by biopsy where appropriate. The tests could have a triage role in assisting the general dentist or oral specialist to more accurately identify or assess persistent oral lesions of uncertain significance. For instance, traumatic keratoses are common, and referring each patient with a white patch to a specialist to undergo a scalpel biopsy is excessive, and incurs increased financial cost and patient worry. A non‐invasive index test or combination of tests adjunctive to the COE that provided a frontline clinician with a high degree of accuracy would not only reduce the number of patients with benign disease being referred, but could avoid the need for invasive biopsy in patients testing negative.

Open in table viewer

Table 1. Index tests for oral cancer and PMDs

Test	Characteristics	Classification of response	Other information
Conventional oral examination (COE)	A standard visual and tactile examination of the oral mucosa under normal (incandescent) light.	The presence of an oral mucosal abnormality with a suspicion of malignancy or potential malignancy is classified as a positive test result; the presence of oral mucosal abnormality without a suspicion of malignancy or potential malignancy is classified as a negative test result.	Traditionally used as a oral cancer screen rather than diagnosis, but its utility is debated (Lingen 2008). Advantages: quick and easy once trained, minimally invasive. Disadvantages: oral mucosal abnormalities are not necessarily clinically or biologically malignant; only as small percentage of leukoplakias are progressive or become malignant, COE cannot distinguish between those that are or are not; some precancerous lesions may exist within oral mucosa that appears clinically normal by COE alone (Lingen 2008).
Vital staining (e.g. toluidine blue, tolonium chloride)	Vital staining refers to the use of dyes such as toluidine blue or tolonium chloride to stain oral mucosa tissues for PMD or malignancy (Leston 2010; Lingen 2008; Patton 2008). The procedure is as follows. Pre‐rinse with acetic acid Rinse with water Apply toluidine blue Post rinse with acetic acid Rinse with water Observe mucosa to check for staining.	The result of the test is classified as positive if tissue is stained and negative if no tissue is stained, or equivocal if no definitive result can be obtained.	Advantages: ability to define areas that could be malignant or abnormal but cannot be seen; assess the extent of the PMD for excision. Disadvantages: benign inflammatory lesions are subject to stain; possibility of failure of some cancerous lesions to stain; possibility of failure of some dysplastic lesions (particularly those with a lower grade or with a thick keratotic surface) to stain; variation in test performance depending on how thorough the test procedures are followed; contraindicated in those who are known to be allergic to iodine.
Brush cytology (e.g. OralCDx brush biopsy)	Brush cytology refers to the microscopic assessment and interpretation of cell samples from PMD that are flaked off from the oral mucosa by the brushing, smearing, scraping or lavage to collect cell samples, which are then sealed on glass slides. They are then analysed using an imaging system that assesses the sampled cells (Leston 2010; Lingen 2008; Patton 2008).	Following analysis, cytopathologists classify test results as positive, atypical or negative.	Advantages: include the ability to collect information from, and detect large or multiple lesions and to access "the basement membrane collecting cells from all three epithelial layers of the oral mucosa. The liquid‐based cytology reduces the problems relating to sampling and fixation and presents a better cytological morphology" (Divani 2009). Disadvantages: smaller or less obvious lesions may be overlooked; difficulties in detecting lesions when there is necrosis or coagulated blood; inadequate training of operators (Divani 2009); cells are potentially seen out of context.
Light‐based detection (chemiluminescence e.g. ViziLite, ViziLite plus, Microlux DL; tissue fluorescence imaging e.g. VELscope, Identafi 3000; tissue fluorescence spectroscopy)	Light‐based systems to identify malignant and potentially malignant lesions and to highlight their presence through tissue reflectance (Leston 2010; Lingen 2008; Patton 2008) e.g. using Microlux DL, the procedure is as follows (Lingen 2008). Pre‐rinse with acetic acid Use blue‐light light source to visually assess the oral cavity. ViziLlite Plus also provides a tolonium chloride solution (toluidine blue) to aid in the marking of the lesion for biopsy once the light source is removed.	The result of the test is classed as negative if the appearance of the epithelium is lightly bluish white and positive if the appearance of the epithelium is distinctly white (acetowhite).	Advantages: simple to use; non‐invasive; do not require consumable reagents; provide real time results; can be performed by a wide range of operators after a short training period. Disadvantages: the necessity of a dark environment; high initial set up (for VELscope) or recurrent costs (for ViziLite in low income countries); lack of permanent record unless photographed; inability to objectively measure visualisation results.
Blood and saliva analysis	These novel technologies are at an early stage of development and evaluation. Analysis of blood or saliva samples which tests for the presence of biomarkers of PMD and oral cancer (Brinkmann 2011; Lee 2009; Li 2006).	Cutoff probabilities vary widely and are dependent on the individual biomarker or combination of biomarkers examined.	Advantages: non‐invasive (saliva tests) or minimally invasive (blood tests). Disadvantages: there is a tendency for the estimated diagnostic accuracy of new health technologies to decline over time as evidence from independent evaluations accumulate (Wyatt 1995). This bias, which can be substantial, has been demonstrated in other domains, e.g. acute abdominal pain (Liu 2006) and clinical decision support systems (Garg 2005). Promising biomarker tests in several clinical areas were eventually been shown to be disappointing (Buchen 2011). It remains to be seen whether this is the case with oral cancer and PMDs.

PMD = potentially malignant disorders

The index tests also have the potential to improve patient diagnosis at a secondary care level. Following referral to a specialist clinic, the most important clinical step is to biopsy the area or areas representing the worst disease. This is simple with a single homogeneous lesion but becomes more complicated when the lesion or lesions become larger and more heterogeneous. Sample site selection may be facilitated by the diagnostic adjuncts, so performance in a secondary care setting becomes important.

Finally, the tests could be useful in a surveillance setting such as a cancer clinic where patients with a history of oral cancer or PMDs are followed at specified times. This population is likely to have had multiple biopsies, surgical procedures to treat cancer or dysplastic changes, or other treatments such as radiation therapy. Monitoring these patients for new disease (they often have field changes) is challenging. The diagnostic adjuncts could be of value in this setting.

Alternative test(s)

Medical imaging techniques such as computer tomography (CT), other forms of tomography and magnetic resonance imaging (MRI) have been used in addition to clinical evaluation. The diagnostic test accuracy of such techniques will not be considered in this review.

Rationale

Oral cancer is a significant global health problem with increasing incidence and mortality rates (Ferlay 2010; Warnakulasuriya 2009). Cancer of the lip or oral cavity is a relatively common cancer worldwide, with an estimated 263,000 new cases and 127,000 deaths in 2008, and an increasing incidence in recent years (Ferlay 2010). There is wide geographic variation in disease incidence and mortality, with almost double the incidence in developing countries as in developed countries, and a threefold increase in mortality. Tobacco use, alcohol consumption, betel quid chewing and low socioeconomic status are the most important risk factors of oral cancer (Conway 2008; Faggiano 1997; La Vecchia 1997; Macfarlane 1995). Men have a higher incidence of oral cancer than women, but this disparity can be explained by men having a higher exposure to the above risk factors (Freedman 2007). The gender difference has narrowed in recent decades from a ratio of 5 males to 1 female diagnosed with oral cancers in the 1960s to less than 2 to 1 in 2008 (Ferlay 2010). Although traditionally the risk of oral cancer increases with age, the incidence among younger adults has increased in the European Union and the United States (Warnakulasuriya 2009). Technologies to treat and manage oral cancer have progressed substantially, as shown by Cochrane systematic reviews of RCTs (Furness 2011; Glenny 2010). Nevertheless, the 5‐year survival following diagnosis has remained at around 50% for the past 30 years in most countries (Parkin 2001; Warnakulasuriya 2009).

The five year survival rate depends on the site of the cancer, ranging from more than 90% for the lip to 40% for the oropharynx (Cancer Research UK). Oral cancer is often diagnosed at a late stage, when the prognosis is poor and the risks of significant morbidity and mortality are substantially higher (Rusthoven 2010). Oral cancer mortality can be reduced using three approaches: (i) primary prevention, (ii) secondary prevention, screening and early detection, and (iii) improved treatment (Scully 2000a).

Currently, no national population‐based screening programmes for oral cancer have been implemented in developed countries, although opportunistic screening has been advocated (Brocklehurst 2010). Consequently, individuals will often present for examination at a later stage of the disease, when the risks of significant morbidity and mortality are substantially higher. A province‐wide programme is being evaluated in British Columbia, Canada but the evaluation is ongoing and no final results have been reported to date (Rosin 2006). Brocklehurst et al's Cochrane systematic review identified one RCT in India. They concluded that the evidence is insufficient to recommend population‐based screening and suggested that opportunistic screening of high risk groups may potentially improve outcomes (Brocklehurst 2010). Accurate case detection and early treatment of oral cancers can substantially improve an individual's morbidity, mortality and quality of life (Scully 2000a; Stell 1982).

There is some uncertainty on the diagnostic accuracy of the index tests listed above. Review studies have identified a number of these tests for oral cancer in individuals with an identified lesion (Leston 2010; Lingen 2008; Patton 2008). The focus of these reviews, however, has been on a description of the sensitivities and specificities of diagnostic tests rather than a comprehensive quality assessment of studies and meta‐analysis of all available data. The index tests have the potential to improve the accuracy of oral cancer diagnosis and to detect the disease at an earlier stage. This could result in improved diagnostic decisions, leading to appropriate treatment pathways and ultimately improved patient outcomes.

In this review we aim to identify diagnostic tests for oral cancer and PMD and to evaluate the diagnostic accuracy of these tests (Additional Table 1) when used as adjuncts to a COE by frontline clinicians, specialists and 'super‐specialists'. The proposed index tests cannot confirm whether a PMD is cancerous before deciding on referral to secondary care; biopsy with histopathology is currently the only confirmatory method of oral cancer diagnosis.

The Cochrane Oral Health Group has undertaken a number of intervention reviews in the field of treatment of oral and oropharyngeal cancers, and in screening programmes for the early detection and prevention of oral cancer (Bessell 2011; Brocklehurst 2010; Furness 2011; Glenny 2010). This diagnostic test accuracy review will complement the intervention reviews.

Objectives

To estimate the accuracy of index tests for the detection of oral cancer and potentially malignant disorders of the lip and oral cavity in patients presenting with clinically evident lesions.

Secondary objectives

To estimate the relative accuracy of the different tests.

Investigation of sources of heterogeneity

We will use meta‐regression to explore possible sources of heterogeneity. Covariates in these analyses will include:

characteristics of the study sample: prevalence of the disease in the study, the type of specialist (e.g. frontline clinicians, specialists and 'super‐specialists', i.e. those who see surveillance populations), setting (country, type of facility), proportion of human papillomavirus positive adults, tobacco users/high alcohol consumption; and
target conditions: the nature of target conditions included.

Methods

Criteria for considering studies for this review

Types of studies

Studies of clinical cohorts of patients presenting with clinically evident lesions which report the diagnostic accuracies of any individual index test listed in 'Additional Table 1', or a combination of these for oral cancer and potentially malignant disorders (PMD) with respect to the reference standard. These will include cross‐sectional diagnostic test accuracy studies (or consecutive series) and randomised controlled trials. We will exclude studies reported in abstract form alone, case‐control studies, uncontrolled reports and randomised controlled trials of the effectiveness of screening programmes (intervention studies). Where randomised or paired comparative designs are available these will be included in the review. Studies analysing only lesions, rather than patients, will be excluded. We will contact authors of studies that report only results at the lesion level for data at the patient level; if authors are able to provide such data, their studies will be included.

Participants

Adult patients (aged 16 years or over) presenting with clinically evident oral lesions.

Index tests

Index tests used alone or in combination that can be used as an adjunct to the conventional oral examination (Additional Table 1). The COE based on clinical appearance and criteria is the initial point of diagnosis, which all adults will receive. The remaining index test(s) will be used as an adjunct following the conventional oral examination (COE) irrespective of whether oral cancer or PMD is suspected by the COE alone (i.e. a positive test result is a positive result from either the COE or the index test or both).

Target conditions

Following the consensus views of the expert working group of the WHO collaborating centre for oral cancer and precancer (Warnakulasuriya 2007), the target conditions of the lip or oral cavity of interest are noted as:

Carcinoma

Squamous cell carcinoma

Potentially malignant disorders (PMDs)

Leukoplakia
Erythroplakia
Lichen planus
Lupus erythematosus
Submucous fibrosis
Actinic keratosis
Hereditary disorders such as dyskeratosis congenita or epidermolysis bullosa

Reference standards

Scalpel, punch or fine needle aspiration biopsy with histological confirmation of lesion. We will exclude studies that did not specify any reference standard.

Search methods for identification of studies

Electronic searches

The following databases will be searched using a highly sensitive search strategy:

Cochrane Oral Health Group's Trials Register (to present)
Cochrane Register of Diagnostic Test Accuracy Studies (to present)
MEDLINE (1948 to present)
EMBASE (1980 to present)
MEDION (2003 to present).

The MEDLINE search strategy outlined in Appendix 1 will be modified for the listed databases. The search will not be limited by language or publication status. Non‐English articles will be translated, unless a translator cannot be found through The Cochrane Collaboration.

The search strategy above has been constructed in accordance with this protocol and that of a companion Cochrane diagnostic test accuracy review 'Clinical assessment to screen for the detection of oral cavity cancer and potentially malignant disorders in apparently healthy adults' by the same review team (Walsh 2012).

Searching other resources

We will also search relevant conference proceedings. We will locate further studies through citation searches and reference lists of key articles, and by contacting authors of identified articles to request information of any unpublished or ongoing studies.

Data collection and analysis

Selection of studies

Titles and abstracts of all articles identified from the electronic searches will be independently assessed by two review authors. For articles that appear to meet the inclusion criteria, or where a clear decision cannot be made from scanning the title and abstract alone, full reports will be obtained. Full reports will also be obtained from searching other resources. Two review authors will independently assess each report. Where disagreements occur, the review authors will attempt to resolve these by discussion. If needed, a third review author will be asked to help resolve any discrepancies in consultation with the other two review authors.

Data extraction and management

Two review authors will independently extract data using a piloted data collection form. Discrepancies will be resolved through discussion. If an agreement cannot be reached, a third review author will be consulted. Study authors will be contacted to obtain relevant missing data if this is not available in the printed report.

The following data will be recorded from each study.

Sample characteristics (age, sex, socioeconomic status, risk factors where stated (e.g. human papillomavirus status positive/negative, prevalence of tobacco use and alcohol consumption), number of patients/lesions, lesion site)
Setting (country, disease prevalence, type of facility)
The type of index test(s) used (category, name, positivity threshold)
Study information (design, reference standard, case definition, training and calibration of personnel)
Study results (true positive, true negative, false positive, false negative, any equivocal results, withdrawal).

Data will be extracted by subgroups (tobacco and alcohol consumption) where available.

Assessment of methodological quality

Two review authors will each independently assess the quality of all studies selected for inclusion in the review. Where disagreements continue after discussion between the two review authors, a third review author will be asked to help resolve the discrepancies. The revised QUADAS tool, QUADAS‐2 (Whiting 2011), will be used to assess the quality of the primary diagnostic studies over four key domains: patient selection, index test, reference standard and flow and timing of participants through the study. In the first phase of the tool, the review question will be stated in terms of patient sample, index test, reference standard and target condition. This information is detailed in the 'Criteria for considering studies for this review' section of this protocol. In phase two, the QUADAS‐2 tool will be tailored to use with this review (Additional Table 2). Review specific guidance will be used to facilitate documentation of the pertinent descriptive information contained in the primary studies. Customised instructions to aid judgement of the signalling questions will be given (following Patton 2008). Two core signalling questions were removed: 'Was a case‐control design avoided?' (this study design was excluded from the review); and 'Did all patients receive a reference standard?' (this was a criterion for inclusion). Three additional signalling items relating to commercial funding, training and calibration and multiple index tests have been added to the core signalling questions. In phase three, a flow diagram will be drawn. In the final phase, an overall judgement of risk of bias and applicability is to be undertaken. A risk of bias judgement ('high', 'low' or 'unclear') will be reached for each domain. If the answers to all signalling questions within a domain are judged as 'yes' indicating low risk of bias, then the domain will be judged to be at low risk of bias. If any signalling question within a domain is judged as 'no' indicating high risk of bias then this indicates that potential bias exists. This will be followed by a judgement for concerns regarding applicability for the patient selection, index test and reference standard domains.

Open in table viewer

Table 2. Indicators for the assessment of quality (QUADAS‐2)

Domain	Patient selection	Index test	Reference standard	Flow and timing
Description	Describe methods of patient selection. Describe included patients (characteristics, prior testing, presentation and severity of the target condition (class), intended use of index test and setting).	Describe the index test(s) and how it was conducted and interpreted. Describe the sequence of tests, any training or calibration of clinicians (levels of agreement should be reported; where this is measured by the kappa statistic, acceptable values range from 0.61 (moderate agreement) to 1.00 (almost perfect agreement) (Landis 1977)), any procedures taken to ensure blinding of examiners, post‐hoc or a priori threshold specification, any conflict of interest or commercial funding. Methods of site selection should be clearly documented.	Describe the reference standard and how it was conducted and interpreted. Ideally, the biopsied tissue should be examined by more than one pathologist. If there is a lack of agreement any methods for reaching consensus should be clearly documented. Any measures taken to ensure pathologists were blinded to the results of the index tests should be documented, along with the sequence of reference and index tests. Methods of site selection should be clearly documented.	Describe the characteristics and proportion of patients who did not receive the index test(s) and/or reference standard, who received a reference standard other than the scalpel biopsy, or who were excluded from the 2 x 2 table (refer to flow diagram). Describe the time interval and any interventions between index test(s) and reference standard. The length of time between the index test and reference standard should be short in the majority of cases. If the period elapsed between index test and reference standard is greater than 2 weeks then this will be considered an unacceptable delay.
Signalling questions (Yes/No/Unclear)	Was a consecutive or random sample of patients enrolled? Classify as 'Yes' if consecutive patients or a random sample of individuals were recruited. Classify as 'No' if non‐consecutive patients or a non‐random sample of individuals were recruited. Classify as 'Unclear' if patient selection was not clearly described.	Was calibration of examiners undertaken and results reported? Classify as 'Yes' if the examiners participated in dedicated training and calibration was reported to an acceptable standard. Classify as 'No' if the examiners did not participate in dedicated training or was not assessed, or training was undertaken but calibration was not to an acceptable standard. Classify as 'Unclear' if the information on training and calibration was not stated.	Is the reference standard likely to correctly classify the target condition? Classify as 'Yes' if the biopsy was independently confirmed by at least two qualified pathologists. Classify as 'No' if the biopsy was not independently confirmed by at least two qualified pathologists, or there was lack of agreement between pathologists. Classify as 'Unclear' if the study does not state who confirmed the biopsy.	Was there an appropriate time interval between the index test(s) and reference standard? Classify as 'Yes' if the delay between the index test(s) and reference standard is considered acceptable for the majority of participants. Classify as 'No' if the delay between the index test(s) and reference standard is considered unacceptable for the majority of participants. Classify as 'Unclear' if the delay between the index test(s) and reference standard is not explicitly stated.
	Did the study avoid inappropriate exclusions? Classify as 'Yes' if patients with either class I or class II lesions were recruited. Classify as 'No' if only patients with class I lesions were recruited. Classify as 'Unclear' if class of lesions was not clearly described.	Were the index test results interpreted without knowledge of the results of the reference standard? Classify as 'Yes' if interpreters of index test results clearly do not know results of biopsy/histopathology. Classify as 'No' if interpreters of index test results clearly know results of biopsy/histopathology. Classify as 'Unclear' if study did not provide any information on whether interpreters of index tests were blinded to biopsy/histopathology.	Were the reference standard results interpreted without knowledge of the results of the index test? Classify as 'Yes' if pathologists clearly do not know the index test results when interpreting biopsied tissues. Classify as 'No' if pathologists know the results of index test results when interpreting biopsied tissues. Classify as 'Unclear' if the study did not provide any information on whether the pathologists were blinded to the index test results.	Did all patients receive the same reference standard? Classify as 'Yes' if the same reference standard was used in all participants. Classify as 'No' if the same reference standard was not used in all participants. Classify as 'Unclear' if it is unclear whether different reference standards were used.
		Where multiple index tests were used, were the results of the second index test interpreted without knowledge of the results of the first index test? Classify as 'Yes' if index test results were interpreted without knowledge. Classify as 'No' if the index test results were interpreted with knowledge. Classify as 'Unclear' if it is unclear whether the results of the second index test were interpreted without knowledge of the results of the first index test?		Were all patients included in the analysis? Classify as 'Yes' if all patients were included in the analysis. Classify as 'No' is only some patients were included in the analysis. Classify as 'Unclear' if it is unclear whether all patients were included in the analysis.
		If a threshold was used, was it prespecified? Classify as 'Yes' if the threshold was prespecified. Classify as 'No' if the threshold was not prespecified. Classify as 'Unclear' if it is unclear whether the threshold was prespecified.
		Were any conflicts of interest stated? Classify as 'Yes' if the study declared no conflict of interest. Classify as 'No' if the study if the study declared a conflict of interest. Classify as 'Unclear' there was no information on conflict of interest.
Risk of bias: High/Low/Unclear	Could the selection of patients have introduced bias?	Could the conduct or interpretation of the index test have introduced bias?	Could the reference standard, its conduct, or its interpretation have introduced bias?	Could the patient flow have introduced bias?
Concerns regarding applicability: High/Low/Unclear	Are there concerns that the included patients do not match the review question?	Are there concerns that the index test, its conduct, or interpretation differ from the review question?	Are there concerns that the target condition as defined by the reference standard does not match the review question?

We will pilot the use of the QUADAS ‐2 tool independently on five study reports. Where disagreements occur between the two review authors the review specific descriptions will be clarified until consistency is obtained.

Results of the quality assessment for all included studies will be summarised in a narrative report. A summary tabular presentation of the results for each domain will be also provided separately for risk of bias and concerns regarding applicability, along with a graphical display summarising this information.

Statistical analysis and data synthesis

The unit of analysis is the patient. Data for the true positive, true negative, false positive and false negative values for each study will be tabulated. For each index test, estimates of diagnostic accuracy as sensitivity and specificity along with their 95% confidence intervals will be displayed as coupled forest plots, and plotted in receiver operating characteristic (ROC) space. We will take the number of diseased and non‐diseased individuals as the sample size, not the total number of lesions.

Meta‐analysis will be used to combine the results of studies for each index test. Random‐effects models will be used. If the number of studies is small and the model parameters unestimable then we will follow the suggested approaches in Chapter 10 the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Macaskill 2010), e.g. moving from a random‐effects model to a fixed‐effect model, or separately modelling sensitivity and specificity. The statistical software SAS 9.2 will be used throughout (SAS Institute Inc, Cary, USA).

For the vital staining, brush cytology and light‐based detection methods, consistency in thresholds is anticipated as the test is deemed positive if any sign of malignancy or potential malignancy is detected. It is acknowledged that variation in test calibration and individual performance may contribute to heterogeneity. The analysis will estimate the expected values of sensitivity and specificity (bivariate approach Reitsma 2005). For the analysis of blood and salivary index tests, primary studies may evaluate these tests at many different thresholds within the same study, or between studies. The expected summary ROC (SROC) curve for the tests across different thresholds will be estimated (hierarchical SROC Macaskill 2010; Rutter 2001). Hierarchical SROC curves will be fitted using the Proc NLMixed procedure in SAS.

The proposed analysis is subject to change based on information reported in the primary studies. For example, if there is little variation in the positive thresholds of the blood and salivary index tests, it will not be appropriate to attempt to fit an SROC curve (Macaskill 2010).

The analysis plan can be specified as follows:

Primary analyses: The primary analyses will compare each index test with the reference standard. This will either estimate the average sensitivity and specificity of a test or describe the variation in sensitivity and specificity at different thresholds by estimating a hierarchical SROC curve depending on the nature of the index tests. Parameter estimates will include sensitivity, specificity and their correlation or hierarchical SROC curve.

Secondary analysis: The comparative accuracy of the index tests with the reference standard will be the focus of the secondary analyses. A preliminary analysis will graphically display the sensitivities and specificities of the index tests. This will be followed by a series of indirect pairwise analyses and structured as follows.

Vital staining versus brush cytology
Light detection versus brush cytology
Blood/salivary analysis versus brush cytology
Vital staining versus light detection
Vital stain versus blood/salivary analysis
Light detection versus blood/salivary analysis.

All studies will be included in each pairwise comparison. Where studies of direct comparisons exist (i.e. paired data from all patients or randomising individuals to different tests) the results of these studies will be analysed and reported separately.

The methodology used is akin to the investigation of heterogeneity (as below) i.e. adding a covariate for test type into the bivariate or hierarchical SROC analysis.

Investigations of heterogeneity

Meta‐regression analyses will be carried out to explore possible sources of heterogeneity, ways in which the observed diagnostic test accuracy varies according to particular characteristics. Covariates in these analyses will include:

characteristics of the study sample: prevalence of the disease in the study, the type of specialist (e.g. frontline clinicians, specialists and 'super‐specialists', i.e. those who see surveillance populations), setting (country, type of facility), proportion of human papillomavirus positive adults, tobacco users/high alcohol consumption; and
target conditions: the nature of target conditions included.

The log likelihood of models including the covariate will be compared to those models without the covariate. Formal model comparisons will be undertaken using the likelihood ratio statistic to statistically compare the effects of adding or removing covariates. If statistical evidence of heterogeneity is found, further investigations will be undertaken.

Sensitivity analyses

Sensitivity analyses will be conducted. This will entail restricting the analysis to studies where the reference standard is scalpel biopsy followed by histopathology. Binary categorisations which relate to decision making in clinical practice will be utilised for multiple disease categories (i.e. including equivocal results as positive screen, negative screen or omitting from reported results). If no consensus is found, consideration of alternative categorisations will be explored through sensitivity analysis.

Assessment of reporting bias

Tests for reporting bias will not be conducted because current tests are misleading when applied to systematic reviews of diagnostic test accuracy (Leeflang 2008; Tang 2000).

Table 1. Index tests for oral cancer and PMDs

Test	Characteristics	Classification of response	Other information
Conventional oral examination (COE)	A standard visual and tactile examination of the oral mucosa under normal (incandescent) light.	The presence of an oral mucosal abnormality with a suspicion of malignancy or potential malignancy is classified as a positive test result; the presence of oral mucosal abnormality without a suspicion of malignancy or potential malignancy is classified as a negative test result.	Traditionally used as a oral cancer screen rather than diagnosis, but its utility is debated (Lingen 2008). Advantages: quick and easy once trained, minimally invasive. Disadvantages: oral mucosal abnormalities are not necessarily clinically or biologically malignant; only as small percentage of leukoplakias are progressive or become malignant, COE cannot distinguish between those that are or are not; some precancerous lesions may exist within oral mucosa that appears clinically normal by COE alone (Lingen 2008).
Vital staining (e.g. toluidine blue, tolonium chloride)	Vital staining refers to the use of dyes such as toluidine blue or tolonium chloride to stain oral mucosa tissues for PMD or malignancy (Leston 2010; Lingen 2008; Patton 2008). The procedure is as follows. Pre‐rinse with acetic acid Rinse with water Apply toluidine blue Post rinse with acetic acid Rinse with water Observe mucosa to check for staining.	The result of the test is classified as positive if tissue is stained and negative if no tissue is stained, or equivocal if no definitive result can be obtained.	Advantages: ability to define areas that could be malignant or abnormal but cannot be seen; assess the extent of the PMD for excision. Disadvantages: benign inflammatory lesions are subject to stain; possibility of failure of some cancerous lesions to stain; possibility of failure of some dysplastic lesions (particularly those with a lower grade or with a thick keratotic surface) to stain; variation in test performance depending on how thorough the test procedures are followed; contraindicated in those who are known to be allergic to iodine.
Brush cytology (e.g. OralCDx brush biopsy)	Brush cytology refers to the microscopic assessment and interpretation of cell samples from PMD that are flaked off from the oral mucosa by the brushing, smearing, scraping or lavage to collect cell samples, which are then sealed on glass slides. They are then analysed using an imaging system that assesses the sampled cells (Leston 2010; Lingen 2008; Patton 2008).	Following analysis, cytopathologists classify test results as positive, atypical or negative.	Advantages: include the ability to collect information from, and detect large or multiple lesions and to access "the basement membrane collecting cells from all three epithelial layers of the oral mucosa. The liquid‐based cytology reduces the problems relating to sampling and fixation and presents a better cytological morphology" (Divani 2009). Disadvantages: smaller or less obvious lesions may be overlooked; difficulties in detecting lesions when there is necrosis or coagulated blood; inadequate training of operators (Divani 2009); cells are potentially seen out of context.
Light‐based detection (chemiluminescence e.g. ViziLite, ViziLite plus, Microlux DL; tissue fluorescence imaging e.g. VELscope, Identafi 3000; tissue fluorescence spectroscopy)	Light‐based systems to identify malignant and potentially malignant lesions and to highlight their presence through tissue reflectance (Leston 2010; Lingen 2008; Patton 2008) e.g. using Microlux DL, the procedure is as follows (Lingen 2008). Pre‐rinse with acetic acid Use blue‐light light source to visually assess the oral cavity. ViziLlite Plus also provides a tolonium chloride solution (toluidine blue) to aid in the marking of the lesion for biopsy once the light source is removed.	The result of the test is classed as negative if the appearance of the epithelium is lightly bluish white and positive if the appearance of the epithelium is distinctly white (acetowhite).	Advantages: simple to use; non‐invasive; do not require consumable reagents; provide real time results; can be performed by a wide range of operators after a short training period. Disadvantages: the necessity of a dark environment; high initial set up (for VELscope) or recurrent costs (for ViziLite in low income countries); lack of permanent record unless photographed; inability to objectively measure visualisation results.
Blood and saliva analysis	These novel technologies are at an early stage of development and evaluation. Analysis of blood or saliva samples which tests for the presence of biomarkers of PMD and oral cancer (Brinkmann 2011; Lee 2009; Li 2006).	Cutoff probabilities vary widely and are dependent on the individual biomarker or combination of biomarkers examined.	Advantages: non‐invasive (saliva tests) or minimally invasive (blood tests). Disadvantages: there is a tendency for the estimated diagnostic accuracy of new health technologies to decline over time as evidence from independent evaluations accumulate (Wyatt 1995). This bias, which can be substantial, has been demonstrated in other domains, e.g. acute abdominal pain (Liu 2006) and clinical decision support systems (Garg 2005). Promising biomarker tests in several clinical areas were eventually been shown to be disappointing (Buchen 2011). It remains to be seen whether this is the case with oral cancer and PMDs.
PMD = potentially malignant disorders

Table 1. Index tests for oral cancer and PMDs

Navigate to table in Protocol

Table 2. Indicators for the assessment of quality (QUADAS‐2)

Domain	Patient selection	Index test	Reference standard	Flow and timing
Description	Describe methods of patient selection. Describe included patients (characteristics, prior testing, presentation and severity of the target condition (class), intended use of index test and setting).	Describe the index test(s) and how it was conducted and interpreted. Describe the sequence of tests, any training or calibration of clinicians (levels of agreement should be reported; where this is measured by the kappa statistic, acceptable values range from 0.61 (moderate agreement) to 1.00 (almost perfect agreement) (Landis 1977)), any procedures taken to ensure blinding of examiners, post‐hoc or a priori threshold specification, any conflict of interest or commercial funding. Methods of site selection should be clearly documented.	Describe the reference standard and how it was conducted and interpreted. Ideally, the biopsied tissue should be examined by more than one pathologist. If there is a lack of agreement any methods for reaching consensus should be clearly documented. Any measures taken to ensure pathologists were blinded to the results of the index tests should be documented, along with the sequence of reference and index tests. Methods of site selection should be clearly documented.	Describe the characteristics and proportion of patients who did not receive the index test(s) and/or reference standard, who received a reference standard other than the scalpel biopsy, or who were excluded from the 2 x 2 table (refer to flow diagram). Describe the time interval and any interventions between index test(s) and reference standard. The length of time between the index test and reference standard should be short in the majority of cases. If the period elapsed between index test and reference standard is greater than 2 weeks then this will be considered an unacceptable delay.
Signalling questions (Yes/No/Unclear)	Was a consecutive or random sample of patients enrolled? Classify as 'Yes' if consecutive patients or a random sample of individuals were recruited. Classify as 'No' if non‐consecutive patients or a non‐random sample of individuals were recruited. Classify as 'Unclear' if patient selection was not clearly described.	Was calibration of examiners undertaken and results reported? Classify as 'Yes' if the examiners participated in dedicated training and calibration was reported to an acceptable standard. Classify as 'No' if the examiners did not participate in dedicated training or was not assessed, or training was undertaken but calibration was not to an acceptable standard. Classify as 'Unclear' if the information on training and calibration was not stated.	Is the reference standard likely to correctly classify the target condition? Classify as 'Yes' if the biopsy was independently confirmed by at least two qualified pathologists. Classify as 'No' if the biopsy was not independently confirmed by at least two qualified pathologists, or there was lack of agreement between pathologists. Classify as 'Unclear' if the study does not state who confirmed the biopsy.	Was there an appropriate time interval between the index test(s) and reference standard? Classify as 'Yes' if the delay between the index test(s) and reference standard is considered acceptable for the majority of participants. Classify as 'No' if the delay between the index test(s) and reference standard is considered unacceptable for the majority of participants. Classify as 'Unclear' if the delay between the index test(s) and reference standard is not explicitly stated.
	Did the study avoid inappropriate exclusions? Classify as 'Yes' if patients with either class I or class II lesions were recruited. Classify as 'No' if only patients with class I lesions were recruited. Classify as 'Unclear' if class of lesions was not clearly described.	Were the index test results interpreted without knowledge of the results of the reference standard? Classify as 'Yes' if interpreters of index test results clearly do not know results of biopsy/histopathology. Classify as 'No' if interpreters of index test results clearly know results of biopsy/histopathology. Classify as 'Unclear' if study did not provide any information on whether interpreters of index tests were blinded to biopsy/histopathology.	Were the reference standard results interpreted without knowledge of the results of the index test? Classify as 'Yes' if pathologists clearly do not know the index test results when interpreting biopsied tissues. Classify as 'No' if pathologists know the results of index test results when interpreting biopsied tissues. Classify as 'Unclear' if the study did not provide any information on whether the pathologists were blinded to the index test results.	Did all patients receive the same reference standard? Classify as 'Yes' if the same reference standard was used in all participants. Classify as 'No' if the same reference standard was not used in all participants. Classify as 'Unclear' if it is unclear whether different reference standards were used.
		Where multiple index tests were used, were the results of the second index test interpreted without knowledge of the results of the first index test? Classify as 'Yes' if index test results were interpreted without knowledge. Classify as 'No' if the index test results were interpreted with knowledge. Classify as 'Unclear' if it is unclear whether the results of the second index test were interpreted without knowledge of the results of the first index test?		Were all patients included in the analysis? Classify as 'Yes' if all patients were included in the analysis. Classify as 'No' is only some patients were included in the analysis. Classify as 'Unclear' if it is unclear whether all patients were included in the analysis.
		If a threshold was used, was it prespecified? Classify as 'Yes' if the threshold was prespecified. Classify as 'No' if the threshold was not prespecified. Classify as 'Unclear' if it is unclear whether the threshold was prespecified.
		Were any conflicts of interest stated? Classify as 'Yes' if the study declared no conflict of interest. Classify as 'No' if the study if the study declared a conflict of interest. Classify as 'Unclear' there was no information on conflict of interest.
Risk of bias: High/Low/Unclear	Could the selection of patients have introduced bias?	Could the conduct or interpretation of the index test have introduced bias?	Could the reference standard, its conduct, or its interpretation have introduced bias?	Could the patient flow have introduced bias?
Concerns regarding applicability: High/Low/Unclear	Are there concerns that the included patients do not match the review question?	Are there concerns that the index test, its conduct, or interpretation differ from the review question?	Are there concerns that the target condition as defined by the reference standard does not match the review question?

Table 2. Indicators for the assessment of quality (QUADAS‐2)

Navigate to table in Protocol

Cochrane Review language

Website language

Abstract

Visual summary

Background

Target condition being diagnosed

Index test(s)

Alternative test(s)

Rationale

Objectives

Secondary objectives

Investigation of sources of heterogeneity

Methods

Criteria for considering studies for this review

Types of studies

Participants

Index tests

Target conditions

Reference standards

Search methods for identification of studies

Electronic searches

Searching other resources

Data collection and analysis

Selection of studies

Data extraction and management

Assessment of methodological quality

Statistical analysis and data synthesis

Investigations of heterogeneity

Sensitivity analyses

Assessment of reporting bias

Copy or download citation

Cochrane Review language

Website language

Previously accessed institutions

Institutional users

Previously accessed institutions

Other access options