Point‐of‐care ultrasonography for diagnosing thoracoabdominal injuries in patients with blunt trauma

Dirk Stengel; Alexander Hoenning; Johannes Leisterer

doi:10.1002/14651858.CD012669

Point‐of‐care ultrasonography for diagnosing thoracoabdominal injuries in patients with blunt trauma

Authors' declarations of interest

Version published: 19 May 2017 Version history

https://doi.org/10.1002/14651858.CD012669

Collapse all Expand all

Abstract

This is a protocol for a Cochrane Review (Diagnostic test accuracy). The objectives are as follows:

To determine the diagnostic accuracy of point‐of‐care (POC) ultrasonography (US) in the detection and exclusion of:

free fluid in the thoracic or abdominal cavities;
organ injuries with or without bleeding in the thoracic or abdominal cavities;
vascular lesions of the thoracic or abdominal aorta, or other major vessels; and
other injuries (e.g. pneumothorax);

compared to any objective diagnostic reference standard (i.e. computed tomograph (CT; 'pan‐scan'), magnetic resonance imaging (MRI), thoracotomy, laparotomy, laparoscopy, thoracoscopy, autopsy, or any combination of these).

Background

Target condition being diagnosed

Severe and multiple injuries (i.e. polytrauma, as defined by an Injury Severity Score (ISS) ≥ 16) remain a leading cause of death and disability worldwide. Road traffic injuries and falls from a height are the premier causes of multiple trauma. According to data for 2010 from the World Health Organization (WHO) Global Burden of Disease Project, road traffic injuries rank eighth in the global death toll (Lozano 2012), and tenth in all sources of disability‐adjusted life years (Murray 2012). The WHO/United Nations Decade of Action for Road Safety 2011 to 2020 was launched to raise awareness of this considerable healthcare problem, and to establish simple and effective measures for prevention.

Injuries to the chest, abdomen, or both, may occur by either penetrating or blunt mechanisms. Stabbing (by sharp tools or weapons such as knives) and shooting are associated with a high chance of internal organ or vessel injury. The distinct location of wounds may further increase the likelihood of significant trauma to the lungs, heart, mediastinum, liver, spleen, thoracic and/or abdominal aorta or other major vessels. The quality and quantity of injuries sustained in civilian settings and armed conflicts differ in many ways (e.g. type of weapon, gun, or bullet, wound ballistics, protective armour, or austere environment (i.e. where medical care is provided under less than optimal sanitary and hospital‐like conditions)). Most people with penetrating trauma will undergo immediate surgical exploration (specifically in the case of haemodynamic instability), and any imaging procedure (if performed) will only influence the pretest probability of injury in this scenario marginally.

In blunt trauma, typically caused by road traffic crashes (involving vehicle drivers and occupants, bicyclists, pedestrians etc.) or falls from a height, imaging has a different and inevitable role. Clinical examination may reveal indirect signs of internal injury (like contusion marks or haematoma in certain skin areas), but these signals are inconsistent and neither sensitive nor specific. Computed tomography (CT) must be regarded as both the standard of care in a clinical environment, and the currently undisputable diagnostic reference test. Point‐of‐care (POC) ultrasonography (US), however, can be performed during resuscitation, repeated wherever and whenever needed, and does not involve any exposure to radiation.

Recent data from the German Trauma Registry suggest an overall mortality of 10% of severely injured patients managed within an organised trauma network and with unrestricted access to any potentially life‐saving intervention at high‐volume trauma centres in an industrialised country (German Trauma Society 2014). Therefore, there may be some sort of biological threshold in trauma survivability that cannot be overcome by current treatment modalities, and demands extra translational research efforts.

A 'treat first what kills first' strategy has been established at most trauma centres across the world in the last 10 years. Diagnostic algorithms such as Advanced Trauma Life Support (ATLS) are intended to (Chapleau 2013):

guarantee free airways and sufficient oxygenation (i.e. by intubation and tube thoracostomy in case of pneumo‐ or haematothorax, or both); and
stop traumatic bleedings (e.g. by pelvic binders, compression, surgical or interventional control of haemorrhage, and substitution of blood products, mainly coagulation factors).

With the exception of brain injury, the primary causes of death in multiple trauma are dominated by traumatic abdominal or retroperitoneal bleeding and chest injuries (Pfeifer 2016). The presence of free fluid surrounding the liver or spleen, capsular tears, detectable tissue haematoma, or vascular lesions influences therapeutic decisions in major trauma, predominantly in haemodynamically unstable patients.

Ultrasonography emerged as an integral part of trauma algorithms, and remains the POC imaging tool of choice for screening for thoracoabdominal bleeding in most regions of the world. Like any other imaging procedure or diagnostic test used for screening purposes, it is important to verify that:

a negative index test result is conclusive enough to exclude the condition of interest (guaranteeing that episodes of haemodynamic instability during decompressive brain surgery or fixation of spine, pelvic or femoral fractures are not caused by sudden major abdominal, thoracic, or retroperitoneal bleeding); and
a positive index test result is conclusive enough to prove the condition of interest (thus minimising the number of negative or unnecessary thoracotomies and laparotomies, or their endoscopic equivalents).

Both false‐negative and false‐positive findings of POC US may misguide trauma teams and affect care priorities in an adverse fashion.

Diagnostic accuracy (or efficacy) is the first level of the Fryback‐Thornbury hierarchy of evaluating the utility of a diagnostic test procedure (Fryback 1991). While the value and utility of a certain test cannot be derived from its accuracy alone, it would be absurd to ask for the effectiveness or efficiency of an inaccurate diagnostic test.

Thus, determining accuracy is the first and indispensable step in health technology assessment of POC US. This review aims to generate the best evidence available about the diagnostic accuracy of clinical ultrasound imaging protocols in the setting of thoracoabdominal and multiple trauma compared to appropriate reference standards. It will guide clinicians regarding the likelihood of chest and abdominal injuries given certain prior probabilities and ultrasound findings, and may facilitate the decision to perform a CT scan or to refer patients for immediate emergency laparoscopy or laparotomy.

Given the (higher) potential advantage and value of POC US in blunt ‐ compared to penetrating ‐ trauma, this review will consider only original studies that include patients with blunt (or closed) trauma, or sufficient details to explore the accuracy of POC US only amongst patients with blunt injuries.

Another aspect that needs further investigation is the use of POC US in paediatric trauma algorithms. Children are vulnerable to radiation for diagnostic purposes, and their lifetime‐attributable risk (LAR) of cancer due to medical imaging must be lowered to the necessary minimum. Still, there may be situations in which acute and potentially life‐threatening conditions require radiation‐emitting (i.e. multi‐detector row computed tomography (MDCT)) rather than radiation‐free imaging techniques (e.g. US or magnetic resonance imaging (MRI)).

Index test(s)

Ultrasound has emerged as the 'work horse' of POC imaging in emergency departments worldwide. Technological progress led to increasingly lighter and mobile (i.e. handheld) equipment (also available in the preclinical setting, e.g. on helicopters or rescue vehicles). Further advancements include colour‐duplex‐, contrast‐enhanced imaging, and 3D‐reconstruction.

Typically, POC US in the trauma setting is performed as focused abdominal sonography for trauma (FAST) (Scalea 1999). In its basic form, FAST includes oblique views of the left upper, right upper, left lower and right lower abdominal quadrants, as well as a sagittal scan of the mid abdomen and a transverse view of the pelvic region. The key target of the original FAST scan is free fluid as a surrogate for blood or active bleeding.

Recently, however, the genuine FAST protocol was modified and supplemented in many ways. The most useful and technically simple extensions were to screen for haematothorax (using oblique or intercostal chest planes, or both) and, by a xiphoid view, for pericardial effusion. US also proved reliable in detecting pneumothorax (Blaivas 2005). Skilled examiners may even be able to show and grade abdominal organ injury, although this is likely to exceed the diagnostic limits of POC US in the early resuscitation phase.

In this review, we will use the term POC US rather than FAST because of the varying definitions and targets used in different centres and countries. Altogether, technological evolution of hardware, increasing skills of operators, and significant advancements in picture acquisition and processing have significantly changed the view of healthcare providers about the role of US in the critical care setting. Ultrasound has evolved from a rough screening tool into a conclusive imaging modality.

Therefore, the index test for this review will be any clinical POC US performed in the setting of blunt trauma, intended to detect direct or indirect signs of injuries of the thoracic, abdominal, or retroperitoneal cavity or space and/or its organs.

Clinical pathway

ATLS and other structured trauma algorithms focus on the mechanism of injury and clinical findings. Clinical examination alone has little if any role in excluding injuries to the chest or abdomen. The presence of external injuries, such as seatbelt marks, may increase the likelihood of visceral tears, but their absence does not exclude significant trauma. Currently, all major trauma algorithms incorporate POC thoracoabdominal ultrasound as a screening imaging tool. However, the interpretation of ultrasound images depends on the experience and clinical background of the individual operator. This subjective component influences decision making, and hampers comparisons between initial and follow‐up scans taken by different examiners.

In 2013, Van Vugt and colleagues published an evidence‐based protocol for blunt trauma that included a chart that illustrates the integral role of POC US (FAST) in combination with an ATLS course (Van Vugt 2013).

Alternative test(s)

Currently, POC US is challenged by the liberal use of MDCT, either as abdominal, thoracic, thoracoabdominal, or whole‐body MDCT. The latter emerged as the diagnostic modality of choice in most European trauma centres and is used in the USA and other developed nations as well. The so‐called 'pan‐scan' usually comprises a native cranial CT, followed by a contrast‐enhanced CT from the skull base to the pelvis and/or trochanteric region. Whole‐body MDCT is highly specific, thereby minimising false‐positive findings (Stengel 2012), and may thus influence care priorities of the severely injured patient according to the 'treat first what kills first' rule. Data from the German Trauma Registry suggest the pan‐scan improves survival in both unselected trauma cohorts and haemodynamically unstable patients (Huber‐Wagner 2013). There are concerns regarding excess exposure to radiation caused by the uncritical use of the pan‐scan at both the individual and population level (Asha 2012). While dose‐reducing reconstruction and processing algorithms are available, it is debatable whether they produce images that are similar in quality and diagnostic certainty to those produced by conventional protocols.

The pan‐scan is not only a competing imaging tool; it is also regarded as the diagnostic reference standard to which POC US findings must be compared. This leads to an interesting methodological conflict, as it will be almost impossible to compare both imaging modalities in a head‐to‐head fashion in the trauma scenario, since POC sonograms must be confirmed by CT.

Rationale

In high‐income countries, it is doubtful whether POC US findings have any influence on treatment decisions. There are four likely scenarios.

POC US is positive for free abdominal or thoracic fluid, or both, in a haemodynamically stable patient. This will prompt a CT scan (usually a pan‐scan) to identify bleeding sources. In most cases, haemostatic transfusion (plus transarterial embolisation (TAE)) and intensive care unit (ICU) monitoring will be the treatment of choice in this setting.
POC US is negative for free abdominal or thoracic fluid, or both, in a haemodynamically stable patient. This will prompt a CT scan (usually a pan‐scan), to verify there are no active bleeding sources that were missed by US.
POC US is negative for free abdominal or thoracic fluid, or both, in a haemodynamically unstable patient. This will almost always prompt a later CT scan (usually a pan‐scan) to identify bleeding sources and to decide about TAE or emergency surgery, or both.
POC US is positive for free abdominal or thoracic fluid, or both, in a haemodynamically unstable patient. At present, it is unlikely that stability will not be achieved by haemostatic resuscitation and other critical care efforts to make patients pan‐scan ready. Positive POC US findings may, however, prompt laparoscopy or laparotomy.

Scenario 4 is highly relevant, but rare, in the Western world. There are very few occasions in which all resuscitation efforts fail and patients are scheduled for emergency thoracotomy or laparotomy, or both, based on POC US findings alone. Still, these situations occur, and clinical practice guidelines must include recommendations for trauma care teams to act within an evidence‐based framework.

In middle‐ and low‐income countries, however, FAST (in addition to conventional radiographs) may represent the most sophisticated or only non‐invasive diagnostic tool available to detect significant traumatic haemorrhage and guide triage. A classic example of this was provided by the Sichuan earthquake in 2008, which killed 69,197 people and left 18,222 missing. FAST ultrasound proved effective, efficient, and possibly life‐saving under these exceptional circumstances (Zhou 2012). Similar observations were made after the earthquake in Haiti in 2010. The recent earthquake in Nepal in April 2015 (which killed more than 6000 and left 2.8 million people homeless) demonstrated how FAST can play a role in triaging patients effectively during unexpected circumstances and outside the context of clinical research.

Objectives

To determine the diagnostic accuracy of point‐of‐care (POC) ultrasonography (US) in the detection and exclusion of:

free fluid in the thoracic or abdominal cavities;
organ injuries with or without bleeding in the thoracic or abdominal cavities;
vascular lesions of the thoracic or abdominal aorta, or other major vessels; and
other injuries (e.g. pneumothorax);

Secondary objectives

The secondary objectives of this review are to investigate the influence of individual study and cohort characteristics such as the:

reference standard;
target condition;
patient age;
patient disease status: type of trauma, type of injury, haemodynamic stability, injury severity or probability of survival;
environment;
operator's expertise and background;
hardware; and
test thresholds;

on both positive and negative POC sonograms.

We provide more details about the above enumerated characteristics in Investigations of heterogeneity.

Methods

Criteria for considering studies for this review

Types of studies

We will include:

prospective or retrospective diagnostic cohort studies that enrolled patients with blunt trauma who:
1. underwent any type of POC US as primary imaging modality to screen for thoracoabdominal injuries: and
2. also underwent any type of objective imaging or invasive reference test to verify POC US results; and
studies that provide 2 x 2 tables (or sufficient information to tabulate results) to allow for calculating sensitivity, specificity, and other indices of diagnostic test accuracy either in the full manuscript or from authors after personal contact.

We will exclude:

diagnostic case‐control studies with known case status, case series and case reports: case‐control studies create an artificial population (usually healthy controls with known negative test results) and tend to overestimate sensitivity of the index test;
studies with unclear index or reference test; and
studies that do not allow for the creation of 2 x 2 tables.

Participants

The target population of this review is patients of any age and gender who sustained any type of blunt injury (including vehicle crashes, falls, mass casualties, and others) in a civilian scenario who were transferred to a hospital of any care level. Also, to be eligible, patients had to undergo POC US as the primary imaging tool, and be followed‐up either as inpatients or outpatients with different diagnostic modalities used to determine whether the condition of interest is present or absent.

Because of distinct differences in management, we will exclude patients with penetrating injuries, as well as members of the armed forces wounded on the battlefield.

Index tests

Any type of clinical POC US performed in a trauma setting (e.g. FAST ultrasonography of the abdomen or thorax, or both, or any extended US) intended to detect:

free fluid (as a surrogate of bleeding) or air in the abdominal or thoracal cavity, or both;
injuries to solid organs such as the liver or spleen (including attempts to grade their severity);
vascular lesions of major vessels; and
other injuries (e.g. pneumothorax, as indicated by air in the pleural space).

We will also consider sequential ultrasonographic imaging, accounting for the dynamics of haemorrhage after resuscitation and restoration of peripheral blood flow, the subsequent demarcation of contused tissue, and delayed rupture of solid organs.

Target conditions

This review will focus on blunt thoracoabdominal and multiple trauma, meaning any blunt, non‐penetrating force to the abdomen and chest, and its solid and hollow viscera, as well as its major vessels. Only a combination of surrogates and direct evidence enables an examiner to detect or exclude a blunt thoracoabdominal trauma, which is why the target conditions will not be regarded separately in the primary analysis. Target conditions considered by this review include:

free fluid in the:
1. thoracic cavity (uni‐ or bilateral, where specified);
2. abdominal cavity (by abdominal quadrant, where specified);
3. retroperitoneal space;
4. pericardium; or
5. mediastinum;
organ injuries, defined as:
1. liver injuries (e.g. capsular tears, haematoma, tissue lacerations);
2. splenic injuries (e.g. capsular tears, haematoma, tissue lacerations);
3. injuries to other solid organs (e.g. pancreas, kidneys);
4. injuries to hollow viscera; or
5. any other organ laceration detected by ultrasonography;
vascular lesions, defined as:
1. dissection or rupture of the thoracic or abdominal aorta, or both;
2. rupture of other vessels such as the iliac arteries;
other injuries (e.g. pneumothorax, as indicated by air in the pleural space).

Reference standards

To be accepted as an objective diagnostic reference standard, the deliberate use (and the reasoning for its use) of the particular method must be specified in the individual paper. To avoid verification bias, all patients must independently undergo an objective imaging or invasive test, regardless of the initial POC sonogram.

The following tests will be classified as objective reference standards, which will be used to confirm the target conditions.

Any type of CT scan of the major body cavities (i.e. chest, abdomen, pelvis), either selective or performed as a whole‐body (pan) scan. Depending on published information, or extra information provided by authors, we will stratify results for the use of intravenous or oral contrast agents, or both, and the time interval between POC US and CT scan.
Any type of MRI of the major body cavities. This may be a rare option that is still possible, specifically in children and pregnant women.
Laparotomy (by a median or transverse approach) or laparoscopy, either diagnostic or therapeutic.
Thoracotomy (by median sternotomy or a clamshell approach) or thoracoscopy, either diagnostic or therapeutic.
Autopsy, either done by pathologists or forensic examiners.

Search methods for identification of studies

We will develop a reproducible search strategy in major online databases based on recommendations of the Cochrane Diagnostic Test Accuracy (DTA) Group and a systematic review that we performed previously (Stengel 2005). We will seek the assistance and advice of the Cochrane Injuries Group and its Information Specialist to guarantee a search algorithm with high sensitivity. We will also request access to the Cochrane Injuries Group Specialised Register. We will also use a 'snowball' procedure to identify related articles and articles cited in the reference lists of individual publications.

Electronic searches

We will search the following electronic sources.

Ovid MEDLINE (1946 to present).
PubMed (not MEDLINE) (1947 to present).
Ovid Embase (1974 to present).

The MEDLINE strategy is shown in Appendix 1 and will be modified for use in the other databases.

Searching other resources

A systematic review by Scherer and colleagues showed that results from studies that have not been published in a full text format are systematically different from fully published results (Scherer 2007). Therefore, we will search the BIOSIS database for conference abstracts to identify potentially relevant studies that have not yet been published formally.

We will contact authors of individual studies by email, letter, or phone if we consider their results to be important but in need of further explanation or data. We will guarantee that any data exchange will comply with the International Conference on Harmonisation‐Good Clinical Practice (ICH‐GCP) and individual rules and regulations of data safety and security.

Data collection and analysis

At our institution, we employ standard operating procedures (SOP) for the selection of studies, data extraction and recording in the context of a systematic review and meta‐analysis. This includes the following functions.

Two authors working independently will screen titles, abstracts and full texts of study reports identified by the search strategy.
Use of a data extraction form (including individual study characteristics, individual patient profiles, definition of procedures etc.).
Dual assessment and data entry by independent authors.
Dual assessment of methodological quality of individual studies.
Resolution of conflicts by a third author.

This should guarantee transparency and adherence to Cochrane standards and other recommendations (e.g. those issued by the EQUATOR (Enhancing the QUAlity and Transparency Of health Research) group).

Selection of studies

Two authors will independently screen titles and abstracts of the study reports identified. Details of the selected studies will be documented in a predefined electronic spreadsheet. The studies will be assessed for eligibility in terms of the defined inclusion and exclusion criteria. If it is not possible to make a decision based on title and abstract alone, the full texts of the potentially relevant studies will be assessed according to the inclusion criteria. Any disagreement between the two authors regarding the selection of studies will be resolved by a third author. The whole process of study selection will be documented in a detailed flow chart.

Data extraction and management

As stated above, we established SOP for data extraction in systematic reviews, meta‐analyses, and health technology assessment (HTA) reports. While working on this review we will adhere to ICH‐GCP, Good Epidemiological Practice (GEP), and all other relevant rules and recommendations. We have trained personnel on site to record, manage and audit data, and our data storage complies with national legislation on data safety for research purposes. Data from original papers will be extracted in duplicate by two independent authors and discrepancies will be resolved by discussion, as moderated by a third author. The following information will be extracted.

Study characteristics (e.g. author, year of study, year of publication, journal reference, study design, inclusion/exclusion criteria, operator characteristics, hardware specifications, index test used, reference test used, general setting (urban/rural), mass casualty (yes/no)).
Patient characteristics (e.g. age, gender, type of trauma, type of injury, injury severity, haemodynamic stability, probability of survival).
The outcome of the index test as assessed in the individual studies by diagnosing the target condition and, if available, the number of participants with inconclusive results or who had no test result.
Values of diagnostic 2 x 2 tables cross‐classifying the disease status on the basis of the reference test (number of true positive, false positive, false negative and true negative).

Diagnostic accuracy will be expressed by individual and pooled indicators such as sensitivity and specificity with 95% confidence intervals (CI) or regions (CR), positive and negative likelihood ratios (LR), and the summary receiver operating characteristic curve (SROC).

Assessment of methodological quality

We will use the QUADAS‐2 tool to assess the methodological quality of individual studies (Whiting 2011). This will be accomplished by two independent authors. Discrepancies will be resolved by discussion, as moderated by a third author. QUADAS‐2 is the revised and updated version of the Quality Assessment of Diagnostic Accuracy Studies list (QUADAS). It includes four domains, namely: patient selection, index test, reference standard, and flow and timing. Each domain is assessed in terms of risk of bias and rated as 'low', 'high' or 'unclear'. Concerns regarding applicability are assessed only for the first three domains and categorised into 'low', 'high', or 'unclear'. Signalling questions can be answered with 'yes', 'no', or 'unclear'.

In order to improve the reliability of the methodological assessment using the QUADAS‐2 tool, review‐specific guidance is available to the authors on how to assess each signalling question in terms of risk of bias and regarding concerns of applicability (Appendix 2).

Statistical analysis and data synthesis

If at least one of the target conditions is detected, we will consider the patient to be diseased, but otherwise we will consider the patient to be non‐diseased. Therefore we will not evaluate the accuracy of single target conditions separately in the primary analysis. For individual studies, we will calculate sensitivity and specificity with their 95% CIs, tabulate the pairs of sensitivity and specificity with CIs and depict these estimates by coupled forest plots using Review Manager 5.3 (RevMan 5) (Review Manager 2014).

We will assume no changing thresholds for test positivity in diagnosing thoracoabdominal injuries between studies. Nevertheless, we will assess a possible threshold effect visually by plotting the ‘true positive rate’ (sensitivity) from each study against the ‘false positive rate’ (1 − specificity) and estimating the degree of closeness of study results to the SROC. If indicated, we will evaluate the presence of thresholds further in the heterogeneity analysis.

Given the fact that implicit variation may occur, since individual observers may interpret the criteria slightly differently, we will use the bivariate random‐effects model according to Reitsma 2005 for pooling summary estimates of sensitivity and specificity. The random‐effects approach allows the calculation of average accuracy estimates while dealing with the heterogeneity between the included studies that is presumed to exist. Average operating points will be estimated via the bivariate model by using the user written program ‘metandi’ in Stata and visualised by means of RevMan 5. Summary points for sensitivity and specificity including 95% CRs and positive/negative LRs will be described and put into relation with each other.

Investigations of heterogeneity

As a first step, we will assess heterogeneity visually by means of the coupled forest plots and plots of study results in ROC space that are described in Statistical analysis and data synthesis. Secondly, we will investigate potential sources of heterogeneity by adding covariates to the basic bivariate random‐effects model, provided that an adequate number of studies that report the characteristics to be examined is available. Fitting of the bivariate model will be conducted via the built‐in command ‘xtmelogit’ in Stata (Takwoingi 2016).

We will inspect the following potential sources of heterogeneity.

Reference standard (CT versus MRI versus intraoperative findings versus autopsy).
Target condition (free fluid/free air versus organ injuries/vascular lesions).
Patient age (< 18 versus ≥ 18 years of age).
Patient disease status: type of trauma (multiple trauma (ISS ≥ 16) versus isolated thoracoabdominal trauma), type of injury (e.g. chest trauma versus abdominal trauma), haemodynamic stability (stable versus unstable), injury severity (as defined by the ISS, Abbreviated Injury Scale (AIS), New Injury Severity Score (NISS), or other schemes) or probability of survival (predicted by the Trauma Score ‐ Injury Severity Score (TRISS), Revised Injury Severity Classification (RISC) I or II, or other schemes) as specified in individual studies.
Environment (general setting (urban versus rural), mass casualty (yes or no)).
Operator's expertise (e.g. expressed as the number of sonographies performed) and background (e.g. medical sonographer, radiologist, surgeon).
Hardware (e.g. manufacturer, machine generation, transducer shape and frequency).
Test thresholds (e.g. due to different evaluation of inconclusive results).

The effect of adding each of these covariates will be investigated by a likelihood ratio test which compares the ‐2Log Likelihoods of the basic bivariate model with the model including a covariate. If a significant reduction in the ‐2Log Likelihood as indicated by a P value of less than 0.05 in the Chi² statistic is detected, the test performance is associated with the covariate. With significant test results an in‐depth analysis is needed to determine whether the covariate is associated with expected sensitivity, specificity, or both (Macaskill 2011). Therefore, we will fit further models by removing the covariate terms for either sensitivity or specificity, and comparing the fits of each alternative model using likelihood ratio tests.

Sensitivity analyses

We will perform sensitivity analyses in order to investigate the effect of study quality on diagnostic accuracy of initial ultrasound examination separately for each of the four QUADAS‐2 key domains. We will exclude each study that is considered to be at high or unclear risk of bias or judged to have high or unclear concerns of applicability in a respective domain from the domain‐specific analysis. By comparing the accuracy estimates of the sensitivity analysis of the remaining studies with those of the primary analysis that included the low quality studies, we will test the robustness of the primary analysis results. Additional sensitivity analyses may be identified during the review process as individual issues of the studies may show up.

Assessment of reporting bias

We will not assess reporting bias, as there are no commonly accepted tests available for diagnostic test studies. Existing tests are likely to result in publication bias being incorrectly indicated or have low power due to the heterogeneity in diagnostic text accuracy (Deeks 2005).

	Cochrane Review language Select your preferred language for Cochrane Reviews and other content. Sections without translation will be in English.

	Website language Select your preferred language for the Cochrane Library website.

Cochrane Review language

Website language