Scolaris Content Display Scolaris Content Display

Ultrasound for diagnosis of birth weight discordance in twin pregnancies

This is not the most recent version

Collapse all Expand all

Abstract

This is a protocol for a Cochrane Review (Diagnostic test accuracy). The objectives are as follows:

Our primary objective is to determine the diagnostic accuracy (sensitivity and specificity) of different estimated biometric ultrasound measures using 30% thresholds with the actual birth weight discordance (BWD) as the reference standard (e.g. abdominal circumference (AC) compared with actual BWD) and to provide readers with a summary of the diagnostic accuracy of ultrasound measures for the diagnosis of BWD in twin gestations.

Background

Target condition being diagnosed

Birth weight discordance (BWD), defined as a difference in the birth weights of twins, is a well‐documented phenomenon in twin pregnancies (Bagchi 2006; Mahony 2006). Growth discordance may occur as a physiological BWD, when both twins are appropriate for gestational age, or a pathological BWD when at least one twin is small for gestational age (Appleton 2007).

BWD is one of the major risk factors for adverse fetal, neonatal and maternal outcomes (Demissie 2002; Kilic 2006; Wen 2006). Demissie et al. found that the odds ratio of fetal death varied from 1.26 to 12.75 and the odds ratio of neonatal death varied from 1.02 to 3.43, depending on the degree of birth weight difference. Kilic and colleagues reported higher frequency of mortality, sepsis, polycythaemia, hypoglycaemia, anaemia and respiratory distress syndrome among discordant twins. Wen et al reported higher odds of maternal hypertension, eclampsia, and other medical complications associated with intratwin BWD. Depending on the chorionicity, degree of discordance and the threshold used, BWD occurs in about 15% to 30% of twin gestations (Lewi 2008; Lopriore 2012; Miller 2012). Fetal growth abnormalities are associated with multiple conditions that affect fetal or neonatal well‐being, such as twin‐to‐twin transfusion (TTT), chromosomal aberrations or structural defects. Excluding these conditions, discordant growth by itself is recognised as an independent factor associated for adverse perinatal outcomes (Mazhar 2010; Morin 2011; Suzuki 2009). BWD of 20% and 30% are the most common thresholds used in the literature to identify adverse perinatal outcomes (Jahanfar 2016a). A recent study suggested that BWD of greater or equal to 30% is significantly associated with adverse perinatal outcomes irrespective of chorionicity (Jahanfar 2016b).

Several mechanisms have been proposed for growth discordance of fetuses exposed to the same intrauterine environment (Victoria 2001). For monochorionic (MC) twin pregnancies, the mechanism is explained through conditions such as TTTs and intrauterine growth retardation (IUGR). These two conditions result in greater risk of fetal and neonatal mortality compared to dichorionic (DC) twins. Technology is available to manage TTT including amnio‐reduction, and septostomy (Behrendt 2016), hence monitoring is usually started as early as 20 weeks.

In the case of DC twin pregnancies, IUGR and placenta pathology can cause growth discordance. Abnormal cord insertion and insufficient placenta implantation in the uterus wall are other proposed mechanisms for BWD. These conditions are non‐treatable; hence the main management is to determine fetal well‐being and decide the optimal time for delivery to save the growth discordant twins.

There is a higher rate of morbidity for discordant twins, compared with concordant twin pairs (odds ratio 5.69; 95% confidence interval (CI) 3.24 to 10.00) (Appleton 2007). If such discordance increases to more than a 30% weight difference, and/or if TTT is suspected, the clinician has to make a decision as to whether or not to deliver the twins to avoid fetal morbidities or even death. Clinical relevance of other BWD thresholds (e.g. 25%, 20%) and estimated fetal birth weight measured by ultrasound (e.g. biparietal diameter) associated with adverse outcomes have not been reviewed systematically to date. Therefore we aim to identify the diagnostic accuracy of the available ultrasound measures used to determine estimated BWD of various thresholds in twin pregnancies.

Index test(s)

Ultrasound has been an important tool in the diagnosis of BWD since 1972 (Hoopmann 2011; Woo 1939). Diagnostic ultrasound is a sophisticated electronic technology, which utilises pulses of high frequency sound to produce an image. In the past, the only technology available was abdominal palpation, which is extremely poor for detecting growth discordance. Radiological examination is not recommended since it is not safe for the fetus.

Diagnosis of BWD can be done via three modalities of ultrasound:

  1. a series of diagnostic two‐dimensional (2D) ultrasound examinations to assess fetal growth of both twins, identify chorionicity, and diagnose problems with cord, amnion layers, congenital abnormalities and TTT;

  2. doppler ultrasound to detect abnormal blood flow patterns in fetal/placenta circulation which may indicate poor fetal prognosis (Alfirevic 2003);

  3. three‐dimensional (3D) ultrasound to facilitate the assessment of the placenta, such as surface‐rendered imaging and volume measurement, quantitative and qualitative assessments of the vascularisation and blood flow of the placenta (Hata 2011).

The standard diagnostic test for growth discordance is ultrasound. Growth discordance can be estimated by measuring crown rump length (CRL: the distance measured from the top of the head to the bottom of the buttocks) at or before 12 weeks of gestation (Tai 2007). Later, during the second and third trimesters, other ultrasound measures such as biparietal diameter (BPD), abdominal circumference (AC) and femur length (FL) are used to estimate fetal weight and calculate the degree of BWD (Simoes 2011).

It is still unclear whether first or second trimester findings can accurately predict BWD, what sonographic parameters assessed at each trimester are more reliable to assess for discordance and whether any particular ultrasound modality (e.g. 2D or 3D) is superior to the others.

Apart from BWD, information on chorionicity is essential for the management of a twin pregnancy for the following reasons.

  1. MC twins are at greater risk of fetal morbidity and mortality due to shared vascularisation.

  2. If monochorionicity is suspected at early ultrasound screening, subsequent screening should be serially scheduled during the second and third trimester.

Early ultrasound detection narrows down the differential diagnosis for the underlying causes such as placenta sharing and subsequent vascular anastomosis as an underlying cause of BWD that may diverge fetal growth. Better prognosis is expected for DC twin pregnancies (Miller 2012) as they do not share placentas.

Second trimester ultrasound screening, from 16 to 24 weeks’ gestation, provides other measures such as AC, amniotic fluid maximal and vertical pocket and the identification of dividing membrane, umbilical artery and potential for Doppler studies (Giles 1998; Rizzo 1993).

Third trimester ultrasound, after 24 weeks' gestation, aims at identifying discordance or insufficient fetal growth. Identification of a discordant twin is crucial if one of the twin pairs is small‐for‐gestational age rather than both being appropriate for gestational age.

It should be noted that there is overlap in the use of these diagnostic tests from one trimester to another.

Clinical pathway

In the management of a twin pregnancy, evaluation of fetal growth is particularly important because growth restriction and prematurity are major causes of the higher morbidity and mortality rates reported in twins compared with singleton pregnancy (Adegbite 2004). Most clinicians start monitoring MC twins to look for growth delay and discordant growth from 18 weeks' gestation every two to three weeks. Each twin's growth, amniotic volume and fetal bladder volume are monitored for signs of oligohydramnios or polyhydramnios. On the other hand, in DC twins, fetal growth monitoring by ultrasound usually begins after 20 weeks of gestation every four to six weeks as fetal growth deceleration leading to discordance is optimally detected between 20 and 28 weeks of gestation (Gonzalez 2003).

There is no consensus regarding the optimum threshold for defining discordance in twins, but many studies have indicated that discordance ranging from 15% to 40% has been considered predictive of an adverse outcome (Hartley 2002).

When discordant growth is identified in the absence of TTT and congenital abnormalities, intensive fetal monitoring of fetal well‐being is employed until delivery becomes indicated. In mildly discordant twins, expectant management by frequent fetal assessment is preferable to preterm delivery, given the limitation of ultrasound diagnosis of growth discordance. In one recent large non‐randomised study of 2399 twin pregnancies (D'Antonio 2014), the sonographic diagnosis of discordance greater than 25% was a poor predictor of BWD, fetal loss after 22 weeks' and 28 weeks' gestation, perinatal death and preterm birth before 34 weeks of gestation. In complicated twin pregnancies, in particular those with fetal growth restriction and discordant growth, fetal well‐being assessment by a non‐stress test (NST), biophysical profile (BPP), amniotic fluid index (AFI) measurement and Doppler velocimetry may be useful to identify which fetuses would benefit from early delivery (Devoe 1995).

Alternative test(s)

None.

Rationale

Guidelines provide conflicting advice about the time, frequency and type of measures reliable to diagnose growth discordance. According to the American College of Obstetricians and Gynaecologists (ACOG) (Wenstrom 2004), growth discordance is defined as a difference of abdominal circumference (AC) of 20 mm or estimated fetal weight (EFW) difference of 20%. The Society of Obstetricians and Gynaecologists of Canada (Okun 2000) recommends that the EFW be derived from bi‐parietal diameter with AC or a combination of AC and femur length (FL). ACOG does not provide recommendations on the frequency of ultrasound examinations for twins while the green guideline from the Royal College of Obstetricians and Gynaecologists (Packham 2011) advocates screening ultrasound for monochorionic (MC) twins every two to three weeks from 16 weeks’ gestation onwards.

For dichorionic (DC) twins, the issues related to feto‐placental perfusion leading to growth discordance can be clarified by screening ultrasound. However, there are no guidelines on the optimal gestation at which to start screening, or the frequency and gestations for subsequent screenings. Morevoer, gender is found to be an important factor that has interaction with birth weight discordance (BWD) and perinatal outcomes (Melamed 2009; Miller 2012). Identifying the gender mix by ultrasound improves predictability of prenatal outcomes (Di 2007). Thus the accuracy of ultrasound to identify gender is also crucial. As a part of this review we will investigate the impact of gender on BWD as a confounding factor.

The variation in recommendations made in the guidelines arises from the fact that the ability to estimate abnormal growth is challenging. It is not clear if the prenatal prediction of a critical BWD by fetal weight estimation formulae (birth weight of heavier twin‐birth weight of lighter twin/birth weight of heavier twin) two to three weeks prior to delivery (Caravello 1997), provides an accurate estimation of BWD (Kalish 2003; Klam 2003; Van Mieghem 2009). The pooled detection rate of BWD of ≥ 25%, in three studies, was 63% for a false positive rate of 2% (Caravello 1997; Diaz‐Garcia 2010; Gernt 2001).

The existing uncertainties are further complicated by an inconclusive predictive value of early (first‐second trimester) versus late detection (two to three weeks prior to delivery) of growth discordance, use of different sonographic estimates (AC versus EFW), and reliance on the retrospective nature of study designs and small sample sizes of existing literature. Literature suggests that biometric measurements of BWD at early gestation or during the second trimester have significantly different precisions (Banks 2008; Tai 2007). Moreover, the efficacy of a single biometric measurement, such as CRL or AC with or without other measurement(s), in predicting BWD, is controversial (Banks 2008; Bhide 2009; Chamberlain 1991; Chitkara 1985). The most popular current methods for predicting discordant growth in twin gestations have limited accuracy when held to a standard for discordance that requires a birth weight difference of at least 20% (Caravello 1997) as sonography tends to underestimate the degree of discordance (Chang 2006).

Ideally, a diagnostic test is expected to correctly identify all patients with the assessed condition and to exclude all patients without it; that is, to have a sensitivity and specificity of 100%. In practice, however, it is extremely rare to find a test with equally high sensitivity and specificity, which is similar or close to the current gold standard (actual BWD measure). For most tests there is usually a trade‐off between the measures. An approach to the test might vary depending on the clinical context, which relates to the consequences of missed diagnosis and magnitude of subsequent interventions if the test is positive. If clinical priority would be to avoid missed diagnosis, an adequate test in that case would be expected to have a high sensitivity (low false negative results), with lower specificity (higher number of false positive results). Thus, no woman with a growth discordant twin pregnancy is missed by a test, but some will still get unnecessary interventions (intensive surveillance, early delivery, consequential issues pertaining to prematurity such as admission to the neonatal intensive care unit, costs associated with care for the premature baby and increased distress for parents). An alternative approach gives priority to a test that avoids unnecessary invasive interventions. In this scenario, the emphasis should be on a high specificity (close to 100%) with lower sensitivity (preferably above 50%), which will definitely rule out growth discordance of the twins, but will not detect some women with this condition. Ruling out growth discordance will eliminate the need for further clinical intervention. We believe that a combination of both types of test can be useful in a clinical setting as it helps with efficient patient counselling and subsequent management. Improving the selection of patients who might benefit from early delivery would be of considerable benefit to neonates, mothers, families and to the community as a whole.

Objectives

Our primary objective is to determine the diagnostic accuracy (sensitivity and specificity) of different estimated biometric ultrasound measures using 30% thresholds with the actual birth weight discordance (BWD) as the reference standard (e.g. abdominal circumference (AC) compared with actual BWD) and to provide readers with a summary of the diagnostic accuracy of ultrasound measures for the diagnosis of BWD in twin gestations.

Secondary objectives

  1. To assess the sensitivity and specificity of available diagnostic ultrasound tests in subgroups of twin pregnancies at various gestational ages by week (less than 28 weeks, 28 to 32 weeks, 32 to 36 weeks and more than 36 weeks).

  2. To assess the sensitivity and specificity of each diagnostic ultrasound test in twin pregnancies with same‐sex versus opposite‐sex twins.

  3. To assess the sensitivity and specificity of each diagnostic ultrasound test in dichorionic diamniotic versus monochorionic diamniotic twins.

We anticipate the following potential sources of heterogeneity.

  1. Clinical factors: characteristics of study population (gestational age, chorionicity, inclusion of high risk pregnancy complicated by underlying maternal‐fetal conditions, breast feeding).

  2. Methodological factors: study design (patient selection, prospective versus retrospective studies, time of test performance (time between index test and reference standard), clinical settings (tertiary centre versus community health care), multiple testing versus single testing for diagnosis of high risk pregnancy complicated by maternal‐fetal conditions).

  3. Other factors: geographic area (high‐, middle‐ and low‐income countries), year of publication.

We will try to address these issues based on the number of studies available. Furthermore, observer variability bias or bias related to interpretation of results cannot be formally assessed in the context of this review.

Methods

Criteria for considering studies for this review

Types of studies

We will review published peer‐reviewed studies evaluating the accuracy of biometric measurements at ultrasound scanning of twin pregnancies that have been proposed for the diagnosis of estimated BWD, compared to BWD measurements after birth as a reference standard. We will include studies published in any language and performed in any healthcare setting, and will not limit the number of participants. We will include cross‐sectional study designs.

Participants

Twin pregnancies with ultrasound measurements at any stage of pregnancy. The twin pregnancies will include any type of chorionicity, any type of conception, and any maternal age and body mass index. Pregnancies that include inappropriate comparisons that are likely to distort an assessment of the diagnostic value of antenatal ultrasound will be excluded (e.g. pregnancies that include twins with one stillbirth, or other multiple pregnancies such as triplets or quadruplets). Only participants that had both the index test and reference standard reported will be included.

Index tests

Estimated fetal weight discordance (EFWD) estimated by ultrasound measures by using the formula: larger estimated weight ‐ smaller estimated weight)/larger estimate weight)*100. EFWD will be considered at the threshold of equal to or greater than 30% as this threshold has shown significant impact on adverse fetal and perinatal outcomes (Jahanfar 2016a). Any type of biometric measurements assessing EFWD will be considered, when either used as a single measurement, combinations or defined formulas, inclusive of CRL, BPD, AC and FL. The measurements are considered eligible if performed by using transabdominal or transvaginal ultrasound machines of any brand based on 2D or 3D ultrasound methods. The data on experience of the operators and inter‐/intra‐observable variability will be collected and incorporated in the quality assessment tool. Studies that do not report data in sufficient detail to construct 2x2 tables, and where this information is not available from the primary investigators, will be excluded.

Target conditions

The target condition is BWD in twin pregnancies. It occurs when there is a disparity in birth weight between the larger and smaller infants of a twin set. For this review, studies that have investigated growth discordance of equal to or more than 30% will be taken into consideration.

Reference standards

The reference standard is 10% BWD measured at birth. BWD is calculated by using the formula: larger estimated weight ‐ smaller estimated weight)/larger estimated weight)*100 and will be categorised as those below or above a 10% threshold. Birth weight will be accepted as estimated by using either electronic or mechanical scales of any type (bench‐top, portable, hanging, compact or not identified) from any manufacturer. Measurements will be considered if performed in the hospital (labour ward, nursery or newborn intensive care unit) by trained medical personnel (doctor, nurse, midwife, paramedic). We will include only the measurements performed within seven days of birth. Where available, the data on scale calibration, and whether baby was wet or dried before weighing will be recorded and incorporated in the QUADAS‐2 quality assessment tool. The measurements including baby length and head circumference will not be used in this review.

Search methods for identification of studies

A comprehensive search of multiple sources for eligible studies will be adopted, which will minimise the risk of reporting bias. However, publication bias generally arises when studies have a greater chance of being published if their results are positive. Therefore, we are planning to search unpublished and published study databases and conference proceedings, and evaluate identified sources.

The search strategy will be developed by a librarian following the recommendations in the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (de Vet 2008). The searches will not be limited to particular types of study design or have language or publication date restrictions. The search strategy will incorporate words in the title, abstract, text words across the record and the subject headings. The preliminary search strategies for each database are presented in Appendix 1.

Electronic searches

We will search the following electronic databases from inception to present.

  • Cochrane Central Register of Controlled Trials (CENTRAL) via Cochrane Library

  • MEDLINE via Ovid (from 1946 to current)

  • Embase via Ovid (from 1980 to current)

  • CINAHL (from 1982 to current)

  • ISI Web of Science Core Collection (from 1900 to current)

  • Trip Database (from 1997 to current)

  • PubMed Systematic Reviews subset (from 1946 to current)

  • DARE and NHS EED via the University of York (1994)

  • HTA (2003) and Prospero via the University of York (2011)

Searching other resources

Additional searches will include:

  1. a handsearch of Australasian Journal of Ultrasound in Medicine (2009); Canadian Journal of Medical Sonography (2013), the reference lists of all the included studies and the seminal reviews from the field; and

  2. communication with at least 5 experts in the field asking them to review our reference list and identify any studies that may be missing.

Data collection and analysis

Data extraction and handling, assessment of methodological quality and statistical analyses will be performed based on the recommendations of the Diagnostic Test Accuracy (DTA) group and their internet‐based tutorials (http://methods.cochrane.org/sdt/dta‐author‐training‐online‐learning).

Selection of studies

One review author will scan the titles of studies identified by our search to remove any clearly irrelevant articles and further, will scan the titles and abstracts of the remaining studies to select potentially‐relevant articles. Two review authors will independently review full‐text versions of the articles selected by title and abstract and assess their eligibility for inclusion. Any disagreements will be resolved by discussion and, if necessary, with a third review author, who is an expert in the field and in methodological aspects of Cochrane systematic reviews.

When we identify the reports that update previous publications of the same study population at different recruitment points, the earlier records will be classified as excluded. The most complete data set that superseded previous publications will be used, in order to avoid double counting participants or studies.

Missing data will be retrieved by contacting the authors of identified studies directly, in order to clarify the study eligibility. When potentially‐relevant studies are found in languages other than English, a translation will be arranged where possible.

For excluded studies, we will document the reasons for exclusion with details of which criteria were not met. The characteristics of included and excluded studies and studies awaiting classification will be presented under 'Characteristics of included studies', 'Characteristics of excluded studies' and 'Characteristics of studies awaiting classification', respectively.

A single failed eligibility criterion is sufficient for a study to be excluded from the review.

Data extraction and management

We will use a structured, piloted data‐extraction form to extract data from included studies. Two review authors will independently extract study characteristics. Disagreements will be solved by discussion. If disagreement persists, a third review author will resolve the issue. We will extract information on: author, year of publication, journal; study design; timing of data collection (prospective, retrospective); setting (inpatients, outpatients); study population (age, parity, obesity); type of index test and reference standard and data on index and reference test operators. The reported number of true positives (TP), false negatives (FN), true negatives (TN) and false positives (FP) will be used to construct a two‐by‐two (2 x 2) table for each index test. If these values were not reported, we attempted to reconstruct the 2 x 2 tables from the summary estimates presented in the article. Data will be extracted into Review Manager (RevMan 2014) software, which is used to graphically display the quality assessment, the diagnostic estimates data and the descriptive analyses.

Assessment of methodological quality

We will use the Quality Assessment of Diagnostic Test Accuracy Studies‐2 (QUADAS‐2) to assess the methodological quality of included studies. The QUADAS‐2 tool will be applied in four phases: summarise the review question, tailor the tool and produce review‐specific guidance, construct a flow diagram for the primary study, and judge bias and applicability (Whiting 2011). Each paper will be judged as having a 'low', 'high' or 'unclear' risk for each of four domains, and concerns about applicability will be assessed in three domains. The review‐specific QUADAS‐2 tool and explanatory document are presented in Appendix 2. Two review authors will independently pilot the review‐specific tool to rate three of the included studies. The tool will be utilised if a high level of agreement is achieved at the pilot stage. Two review authors will independently apply the QUADAS‐2 tool to the full text of each study. Disagreements will be resolved by discussion, or if needed, by a third review author. RevMan software will also be used to construct methodological quality summary graphs.

We will considered studies as having low methodological quality when classified at high or unclear risk of bias or at high concern regarding applicability in at least one domain. The assessment of methodological quality will be undertaken for each domain but a summary score to estimate the overall quality of studies will not be calculated (Whiting 2011).

Statistical analysis and data synthesis

We will carry out the analyses following recommendations presented in Chapter 10 of the Cochrane Handbook for Systematic Reviews of Diagnostic Test Accuracy (Macaskill 2010).

The unit of analysis in studies will be twin pregnancy, rather than twins themselves, as BWD is a single calculated measure estimated using biometrical measurements of two fetuses. We will extract the absolute counts of true positive (TP), false positive (FP), false negative (FN) and true negative (TN) from each study. TP is defined as ultrasound positive for BWD greater or equal than 20% by biometric ultrasound measurements (CRL, AC, FL) confirmed by actual birth weight after birth. FP is defined as ultrasound positive for BWD, as defined above, without the diagnosis of BWD after birth. FN is a negative ultrasound with the diagnosis of BWD confirmed with actual BWD. Finally, TN is a negative ultrasound without diagnosis of BWD after birth.

Two authors will extract the counts separately and discuss disagreements before engaging a third author to resolve any disagreements. We will then transfer the data into RevMan 2014 to produce plots and estimates.

We will use the counts (TP, FP, FN and TN) to construct 2x2 tables to estimate sensitivity and specificity with 95% confidence intervals (CIs) for each study. We will plot estimates of the sensitivities and specificities, with their 95% CIs, in forest plots. In addition, in order to assess the relationship between sensitivity and specificity, we will plot accuracy estimates of all included studies in the receiver‐operating characteristic (ROC) space. We will do informal comparisons by visualising the ROC plots. A bivariate model will be used to pool estimates of sensitivity and specificity (summary operating point) with their 95% CIs (Reitsma 2005). If a meta‐analysis is possible, test‐level covariates in the bivariate logit‐normal model will be used to identify statistically significant differences. Otherwise the available comparative data will be reported in a narrative way and illustrated using forest and ROC plots. Moreover, we will calculate positive (LR+) and negative (LR‐) likelihood ratios by using summary sensitivity and specificity.

Ultrasound BWD of 30% +/‐ 2% will be used to diagnose a prespecified BWD of 30% +/‐ 2%.

We will carry out all of the analyses using STATA 13 software and, where necessary, SAS statistical software (SAS 2008).

We will stratify our analysis using categories of gestational age (below or equal to 24 weeks of gestation, below or equal to 32 weeks of gestation, below or equal to 37 weeks of gestation).

Investigations of heterogeneity

Investigation of heterogeneity will be performed for the diagnostic tests where there are sufficient data (more than 10 included studies). Steps to be take are as follows.

Firstly, the investigation of heterogeneity will be performed through visual examination of the ROC plot and the forest plots by grouping the generated estimate according to all the items listed as potential sources of heterogeneity (e.g. type of ultrasound, year of publication, geographic areas (high income versus low‐ and middle‐income countries), consecutive enrolment, blinding of the operators to clinical data, modifications applied to the widely accepted method of imaging techniques (such as 2D and 3D acquisition), number of index test operators, missing data.

Secondly, our main analysis will be a bivariate model with covariates indicating the type of ultrasound (2D, 3D), gender determination (yes/no), gestational age (continuous variable), type of study (prospective versus retrospective), chorionicity determination (yes/no), estimated BWD threshold (≥ 10% versus ≥ 30%), and time frame between index test and delivery. Bivariate models will be performed with generalised mixed logistic module (Reitsma 2005). P values below 0.05 will be considered to indicate statistical significance.

Thirdly, we will perform meta‐regression analysis to investigate potential sources of heterogeneity conditional to access to adequate data for analysis. Meta‐regression will take into account the specific sources of clinical heterogeneity such as gestational age (less than 37 weeks versus equal to or greater than 37 weeks), chorionicity (mono‐chorion versus di‐chorion), fetal sex‐pairing (sex concordant versus sex discordant or male‐male versus female‐female versus male‐female) and conception type (spontaneous conception versus assisted conception). Methodological sources of heterogeneity will also be investigated using meta‐regression analysis. These will be listed as verification bias, incorporation bias, diagnostic review bias, and clinical review bias. Prespecified variables are inclusive of clinical settings (tertiary centre versus community health care), high risk pregnancy complicated by underlying maternal‐fetal conditions (yes/no). Maternal complications will include hypertensive disorders, gestational diabetes, and antepartum haemorrhage. Fetal complications will include TTT, major structural congenital anomalies, stillbirth of one twin, and intrauterine growth retardation.

Sensitivity analyses

We plan to conduct sensitivity analyses based on 'type of study design' (prospective versus retrospective study designs), chorionicity (dichorionic versus monochorionic pregnancies), threshold of estimated BWD, and the individual quality items of QUADAS‐2 tool.

We anticipate identifying the other most relevant factors to be subjected to sensitivity analyses in the process of reviewing the identified studies.

Assessment of reporting bias

We are not planning to use funnel plots to evaluate the impact of publication bias or other biases associated with small studies, because according to Leeflang et al the tests commonly used in interventional systematic reviews for publication bias are not useful for diagnostic testing reviews (Leeflang 2008). Deeks et al verified that the use of an asymmetric effective sample size plot to detect publication bias lacks power in situations where sample variability is present (Deeks 2005). We will attempt to use unpublished data to minimise reporting bias.