Sentinel node biopsy for diagnosis of lymph node involvement in endometrial cancer

Summary of findings 1. Summary of findings table (prevalence of positive LN rate 20%*)

Index test	Number of patients (number of studies)	Mean SLN detection rate (95% CI)¹	Pooled sensitivity results per woman (95% CI)²	Consequences in a cohort of 1000 women undergoing SLNB, assuming the prevalence of LN metastases to be 20%			Certainty of evidence
Review question: what is the diagnostic accuracy of different traces for the detection of sentinel lymph nodes? Patients/population: female adults with presumed early‐stage endometrial (womb) cancer Role: prognostic information for guiding adjuvant therapy after surgery Index tests: sentinel lymph node biopsy (SLNB) after injection of tracer substance/s into the cervix or uterine muscle Threshold for index tests: detection of micrometastases following ultrastaging (ultrasection of sentinel LN and immunohistochemistry). Presence of individual tumour cells (ITCs) excluded as positive result where possible. Reference standards: full pelvic +/‐ para‐aortic lymphadenectomy and standard histological examination Studies: cross‐sectional and cohort studies Setting: secondary/tertiary inpatient care at time of surgery for endometrial cancer
				Women with no SLN detected (i.e., failed test; 95% CI)³	Women with metastatic nodes diagnosed by index test (TP; 95% CI)⁴	Women with metastatic nodes missed by index test (FN; 95% CI)⁵
1. All tracers	2237 (33 studies)	86.9% (82.9% to 90.8%)	91.8% (86.5% to 95.1%	131 (92 to 171)	160 (150 to 162)	14 (9 to 23)	⊕⊕⊕⊖ moderate ⁶
2. Blue dye alone	559 women (11 studies)	77.8% (70.0% to 85.6%)	95.2% (77.2% to 99.2%)	222 (144 to 300)	148 (120 to 154)	7 (1 to 35)	⊕⊕⊖⊖ low ⁷
3. Technetium‐99m alone	257 women (4 studies)	80.9% (63.9 to 97.9)	90.5% (67.7% to 97.7%)	191 (21 to 361)	146 (109 to 158)	15 (4 to 52)	⊕⊕⊖⊖ low ⁷
4. Technetium‐99m and blue dye	548 women (12 studies)	86.3% (80.7 to 91.9)	91.9% (74.4% to 97.8%)	137 (81 to 193)	159 (128 to 169)	14 (4 to 44)	⊕⊕⊖⊖ low ⁷
5. ICG alone	953 women (9 studies)	92.4% (88.7 to 96.2)	92.5% (81.8% to 97.1%)	76 (38 to 113)	171 (151 to 179)	14 (6 to 34)	⊕⊕⊕⊖ moderate ⁶
6. ICG and blue dye	215 women (2 studies)	96.7% (92.7 to 100)	90.5% (63.2% to 98.1%)	33 (0 to 73)	175 (122 to 190)	18 (4 to 71)	⊕⊕⊖⊖ low ⁷
7. ICG and Technetium‐99m	32 women (1 study)	100%	100% (63% to 100%)	0	200 (126 to 200)	0 (0 to 74)	⊕⊖⊖⊖ very low ⁸
* Prevalence of positive LN rate 20% chosen to represent those with higher risk of LN metastasis (as per Creasman 1987). A false‐positive result cannot occur, as the histological examination of the SLN is unchanged by the results from any additional nodes removed at systematic lymphadenectomy. Abbreviations SLN: sentinel lymph node SLNB: sentinel lymph node biopsy LN: lymph node ICG:indolcyanine green dye (visualised with near infra‐red fluorescence) TP: true positive FN: false negative
GRADE certainty of the evidence High: we are very confident that the true effect lies close to that of the estimate of the effect. Moderate: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different. Low: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect. Very low: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.
¹ Calculated as the arithmetic mean of the total number of SLNs detected out of the total number of women included in the included studies with the woman as the unit of analysis. ² Calculated as the pooled estimate of sensitivity, using a univariate random‐effects logistic regression model (Takwoingi 2017) with the woman as the unit of analysis. ³ Calculated by subtracting the mean SLN detection rate estimates from 1000. ⁴ Calculated by subtracting the number of women with no SLN detected (i.e., with a failed test) from 1000 and then multiplying that with the prevalence and the pooled sensitivity estimate. ⁵ Calculated by subtracting the number of women with no SLN detected (i.e., with a failed test) from 1000 and then multiplying that with the prevalence and the false negative rate estimates. The false negative rate estimates were calculated by subtracting the sensitivity estimates from 100, ⁶ Downgraded by 1 level for risk of bias (for a combination of unclear patient selection and unclear risk of publication bias) ⁷ Downgraded by 2 levels; 1 for risk of bias (1 level for a combination of unclear patient selection and unclear risk of publication bias) and imprecision (1 level for wide confidence intervals). ⁸ Downgraded by 3 levels: 1 level for risk of bias (unclear patient selection and unclear risk of publication bias) and 2 levels for imprecision (1 small study with few true positive nodes with wide confidence intervals).

Summary of findings 2. Summary of findings table (prevalence of positive LN rate 5%*)

Index test	Number of patients (number of studies)	Mean SLN detection rate (95% CI)¹	Pooled sensitivity results per woman (95% CI)²	Consequences in a cohort of 1000 women undergoing SLNB, assuming the prevalence of LN metastases to be 5%			Certainty of evidence
Review question: what is the diagnostic accuracy of different traces for the detection of sentinel lymph nodes? Patients/population: female adults with presumed early‐stage endometrial (womb) cancer Role: prognostic information for guiding adjuvant therapy after surgery Index tests: sentinel lymph node biopsy (SLNB) after injection of tracer substance/s into the cervix or uterine muscle Threshold for index tests: detection of micrometastases following ultrastaging (ultrasection of sentinel LN and immunohistochemistry). Presence of individual tumour cells (ITCs) excluded as positive result where possible. Reference standards: full pelvic +/‐ para‐aortic lymphadenectomy and standard histological examination Studies: cross‐sectional and cohort studies Setting: secondary/tertiary inpatient care at time of surgery for endometrial cancer
				Women with no SLN detected (i.e., failed test; 95% CI)³	Women with metastatic nodes diagnosed by index test (TP; 95% CI)⁴	Women with metastatic nodes missed by index test (FN; 95% CI)⁵
1. All tracers	2237 (33 studies)	86.9% (82.9% to 90.8%)	91.8% (86.5% to 95.1%)	131 (92 to 171)	40 (38 to 41)	4 (2 to 6)	⊕⊕⊕⊖ moderate ⁶
2. Blue dye alone	559 women (11 studies)	77.8% (70.0% to 85.6%)	95.2% (77.2% to 99.2%)	222 (144 to 300)	37 (30 to 39)	2 (0 to 9)	⊕⊕⊖⊖ low ⁷
3. Technetium‐99m alone	257 women (4 studies)	80.9% (63.9 to 97.9)	90.5% (67.7% to 97.7%)	191 (21 to 361)	37 (27 to 40)	4 (1 to 13)	⊕⊕⊖⊖ low ⁷
4. Technetium‐99m and blue dye	548 women (12 studies)	86.3% (80.7 to 91.9)	91.9% (74.4% to 97.8%)	137 (81 to 193)	40 (32 to 42)	3 (1 to 11)	⊕⊕⊖⊖ low ⁷
5. ICG alone	953 women (9 studies)	92.4% (88.7 to 96.2)	92.5% (81.8% to 97.1%)	76 (38 to 113)	43 (38 to 45)	3 (1 to 8)	⊕⊕⊕⊖ moderate ⁶
6. ICG and blue dye	215 women (2 studies)	96.7% (92.7 to 100)	90.5% (63.2% to 98.1%)	33 (0 to 73)	44 (31 to 47)	5 (1 to 18)	⊕⊕⊖⊖ low ⁷
7. ICG and Technetium‐99m	32 women (1 study)	100%	100% (63% to 100%)	0	50 (32 to 50)	0 (0 to 19)	⊕⊖⊖⊖ very low ⁸
* Prevalence of positive LN rate 5% chosen to represent those with lower risk of LN metastasis (as per Creasman 1987). A false‐positive result cannot occur, as the histological examination of the SLN is unchanged by the results from any additional nodes removed at systematic lymphadenectomy. Abbreviations SLN: sentinel lymph node SLNB: sentinel lymph node biopsy LN: lymph node ICG:indolcyanine green dye (visualised with near infra‐red fluorescence) TP: true positive FN: false negative
GRADE certainty of the evidence High: we are very confident that the true effect lies close to that of the estimate of the effect. Moderate: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different. Low: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect. Very low: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.
¹ Calculated as the arithmetic mean of the total number of SLNs detected out of the total number of women included in the included studies with the woman as the unit of analysis. ² Calculated as the pooled estimate of sensitivity, using a univariate random‐effects logistic regression model (Takwoingi 2017) with the woman as the unit of analysis. ³ Calculated by subtracting the mean SLN detection rate estimates from 1000. ⁴ Calculated by subtracting the number of women with no SLN detected (i.e., with a failed test) from 1000 and then multiplying that with the prevalence and the pooled sensitivity estimate. ⁵ Calculated by subtracting the number of women with no SLN detected (i.e., with a failed test) from 1000 and then multiplying that with the prevalence and the false negative rate estimates. The false negative rate estimates were calculated by subtracting the sensitivity estimates from 100, ⁶ Downgraded by 1 level for risk of bias (for a combination of unclear patient selection and unclear risk of publication bias) ⁷ Downgraded by 2 levels; 1 for risk of bias (1 level for a combination of unclear patient selection and unclear risk of publication bias) and imprecision (1 level for wide confidence intervals). ⁸ Downgraded by 3 levels: 1 level for risk of bias (unclear patient selection and unclear risk of publication bias) and 2 levels for imprecision (1 small study with few true positive nodes with wide confidence intervals).

Background

Target condition being diagnosed

Endometrial cancer (cancer of the lining of the womb (uterus)) is the sixth most common cancer in women, accounting for an estimated 417,367 new cases globally each year by 2020 (Bray 2018, Sung 2021). A woman’s risk of developing endometrial cancer by the age of 75 is estimated to range from 0.6% in developing countries to 1.6% in developed countries (Jemal 2008).

In the UK, the incidence of endometrial cancer has risen by 55% since the 1990s (Cancer Research UK 2021; Evans 2011), largely due to increasing obesity (World Cancer Research Fund 2018). Risk factors for endometrial cancer include advancing age, obesity, diabetes, nulliparity, late menopause, unopposed oestrogen replacement therapy, and tamoxifen use (Berek 2014).

Endometrial cancer is staged based on histological examination of tissues removed at the time of hysterectomy using the 2009 International Federation of Gynecology and Obstetrics (FIGO) system (Table 1). The cornerstone of treatment in most women with endometrial cancer is surgery, normally involving a total hysterectomy and bilateral salpingo‐oophorectomy, with or without a lymph node dissection. For endometrial cancers thought to be confined to the uterus at surgery, previous randomised controlled trials (RCTs) have not demonstrated a survival advantage to routine systematic pelvic lymphadenectomy compared to removal of bulky nodes (Frost 2017). Systematic reviews of RCTs have concluded that minimal access surgery is the preferred approach, with reduced pain, reduced hospitalisation, and earlier resumption of daily activities, with no adverse effect on survival, when compared with open surgery (Galaal 2012; Janda 2006; Walker 2009).

Table 1. FIGO staging for endometrial cancer

Stage I:	Cancer confined to the uterus (womb), which has not spread to other parts of the body
Stage IA:	Cancer confined to the endometrium, or less than one‐half of the myometrium
Stage IB:	Cancer spread to the outer half of the myometrium
Stage II:	Cancer spread from the uterus to the cervical stroma, but not to other parts of the body
Stage III:	Cancer spread beyond the uterus, but it is still only in the pelvic area
Stage IIIA:	Cancer spread to serosa of the uterus, fallopian tubes and ovaries, or a combination, but not to other parts of the body
Stage IIIB:	Cancer spread to the vagina or parametria
Stage IIIC1:	Cancer spread to the regional pelvic lymph nodes
Stage IIIC2:	Cancer spread to the para‐aortic lymph nodes with or without spread to the regional pelvic lymph nodes
Stage IV:	Direct invasion into adjacent organs or distant spread
Stage IVA:	Cancer spread to the mucosa of the rectum or bladder
Stage IVB:	Cancer spread to lymph nodes in the groin area, or it has spread to distant organs, such as the bones or lungs

Pecorelli 2009

At diagnosis, the majority of women will have disease confined to the womb, leading to a five‐year overall survival of over 90% (Creasman 2006; Cancer Research UK 2021). About 10% of women will have advanced disease at diagnosis (FIGO stage 3 or more), leading to a poorer overall survival (Jemal 2008; NCRAS 2020).

After surgery, women may be offered adjuvant treatment, such as radiotherapy, chemotherapy, or a combination, based on risk factors for recurrence, such as stage, age, grade, lymphovascular space involvement, myometrial invasion, and lymph node status.

In a 2012 meta‐analysis of eight trials that evaluated external beam radiotherapy therapy (EBRT), the authors concluded that: quote: “EBRT reduces the risk of locoregional recurrence, but has no significant impact on cancer‐related deaths or overall survival. It is associated with significant morbidity and a reduction in quality of life.” (Kong 2012). Avoiding over‐treatment with radiotherapy, and its potential for long‐term side effects, is important for women with endometrial cancer, most of whom will not ultimately die from their disease.

Metastatic spread to the surrounding lymph nodes is a separate, but at least partially related, prognostic factor to other clinical and histopathological risk factors. Its presence is associated with an increased recurrence rate and decreased overall survival.

The benefit of adjuvant chemotherapy for women with positive lymph nodes is supported by a 2014 meta‐analysis. When compared with postoperative radiotherapy, giving combination chemotherapy resulted in significant improvement in overall survival, and significant improvement in progression‐free survival (Galaal 2014). This review investigated not only metastasis in the lymph nodes, but also distant metastasis (12% to 17% with FIGO stage 4 disease).

Index test(s)

Sentinel lymph node biopsy (SLNB) studies involve identifying and removing sentinel lymph nodes (SLN). This is the first node(s) involved in the drainage of lymph from the primary cancer to the lymph nodes. If these are negative, it is surmised that the other nodes are not involved. This is a well‐established technique in other cancers, such as breast or vulval cancer (Lawrie 2014; Lyman 2014). It involves injecting a detectable tracer into the area surrounding the tumour, which then travels along lymphatic channels to the SLN, enabling the sentinel node(s) to be identified at surgery, removed, and sent for histological examination to see if cancer cells have spread to the lymph node(s).

Ideally, SLNB is a more targeted method of assessing the spread of apparent early‐stage endometrial cancer, compared with a systematic lymph node dissection, enabling appropriate selection of adjuvant therapy, such as radiotherapy or chemotherapy, and avoiding over‐treatment. SLNB is normally combined with histopathological ultrastaging to increase the chances of detecting micrometastases. This involves taking multiple fine sections of the SLN, often combined with other histological techniques, such as immunohistochemistry (IHC). This is a much more detailed examination compared to conventional lymph node histological examination, and increases detection of lymph node metastases.

Standard histopathological assessment of lymph nodes will miss micrometastases. Hafner 2007 reported that using routine haematoxylin and eosin (H&E) histology results in only a 1% chance of identifying a cluster of less than 3 cell diameters. Ultrastaging is time consuming and expensive, making it unsuitable when there are larger numbers of nodes. The contribution of IHC is particularly relevant, as between 18% and 20% of patients were upstaged after detection of micrometastases by IHC, compared to H&E staining alone (Niikura 2004; Pelosi 2003). Therefore, SLNB should involve detection of the SLN and histopathological ultrastaging, including IHC, to maximise the detection of micrometastases.

The most commonly reported tracer(s) used for SLNB are radioactive technetium‐99m (Tc‐99m), with or without visible blue dyes, such as methylene blue or patent blue, and near‐infrared fluorescence tracers, such as indocyanine green (ICG; (Papadia 2017a)).

There are a variety of methods described for injecting radioactive tracer or coloured dye. These include cervical injection (injection into the cervix (neck of womb), normally using a speculum to visualise the cervix), hysteroscopic injection (injection into the inside wall of the womb using a telescope inserted through the cervix into the cavity of the womb), and subserosal myometrial injection (injection into the outside wall of the womb). Cervical injection is most convenient because of easy access to the cervix. Some studies have reported cervical injection as a single site injection (Buda 2016b), and others have used cervical injection in conjunction with subserosal myometrial injection (Niikura 2013). The main concern with cervical injection alone is that there is the potential to miss metastatic spread through the ovarian drainage route to the para‐aortic region, leading to false‐negative results. However, data from several studies looking at endometrial cancer completely staged with both pelvic and para‐aortic node dissection suggest that isolated metastases to the high para‐aortic region are rare; a review of 18 studies suggested that only 1.5% of women will have positive para‐aortic nodes when the pelvic nodes are negative (Chiang 2011). It is likely that para‐aortic nodal dissection results in an unnecessary surgical intervention in the vast majority of women with endometrial cancer.

In SLN studies, the main endpoints are detection rates, sensitivity, and the false‐negative rate. The detection rate is the percentage of the patients where a SLN was identified by the technique, regardless of whether the lymph node was positive or negative. It is not clear whether unilateral pelvic detection of a SLN should be viewed as a failure of the technique, because lymph drainage may not flow to both sides of the pelvis in every patient. Ideally, a pelvic SLN should be identified on both sides of the pelvis. False negatives are important results, as this represents a true failure of the technique. In this case, metastatic disease is undetected, which could lead to under‐treatment. Although there is no standard for an acceptable level of false‐negative results of a test in endometrial carcinoma, for SLNB in breast cancer, it is agreed that false negatives should not exceed 5%.

Clinical pathway

A patient with endometrial cancer will typically present with abnormal menstrual bleeding, or more commonly, post‐menopausal bleeding. An endometrial biopsy is taken to make a diagnosis, either by blind pipelle biopsy or hysteroscopically guided.

Following diagnosis, further imaging (with computer tomography (CT), magnetic resonance imaging (MRI), or both) may be indicated, although patients with grade 1 endometrial cancer on biopsy may only have a chest x‐ray performed, rather than cross‐sectional imaging.

Patients with endometrial cancer who are fit for surgery would then have a hysterectomy and bilateral salpingo‐oophorectomy performed, with or without sampling or excision of pelvic lymph nodes. Presence or absence of pelvic lymph node disease may affect recommendations for adjuvant pelvic radiotherapy, chemotherapy, or both. Others may recommend adjuvant treatment, based on histological and clinical risk factors. If SLNB is planned, injection of the tracer into the cervix, body of the womb, or both can be performed either shortly before or at the time of the hysterectomy. In those who have Tc‐99m tracers used, specialised imaging techniques, such as planar scintigraphy or single photon emission computed tomography (SPECT/CT; which overlays radioactive 'hot spots' onto conventional CT images) may be used to aid anatomical localisation of highlighted lymph nodes.

A SLNB could replace a full removal of lymph nodes at time of hysterectomy, if CT or MRI do not demonstrate markedly abnormal lymph nodes. Alternatively, SLNB could be performed instead of no pelvic lymph node sampling, since previous studies have demonstrated no therapeutic benefit of pelvic lymph node excision in endometrial cancer (Frost 2017). Presence or absence of a positive SLN would then guide subsequent treatment strategies.

Alternative test(s)

The gold standard for detecting whether lymph nodes have been invaded by cancer is systematic pelvic lymphadenectomy. In this case, all of the lymph nodes are removed from around the areas draining the womb in the pelvis, plus or minus para‐aortic areas. However, as explained above, this may miss micrometastases, as ultrastaging is not normally used in this technique.

A variety of imaging techniques have been evaluated to determine lymph node involvement prior to surgery. Computed tomography (CT) and magnetic resonance imaging (MRI) are the most commonly used. Both of these methods use size and morphological criteria to assess possible nodal metastases, suggesting that small or microscopic positive nodes will be missed. A systematic review of 18 studies reported a mean sensitivity and specificity of 72% and 97% for MRI, and 45% and 88% for CT (Selman 2008). More recently, positron emission tomography–computed tomography (PET/CT) has been reported as showing a sensitivity of 73.7% and specificity of 98.7%, with an accuracy of 93.6%, for detecting involved lymph nodes (Signorelli 2015). PET/CT uses a radioactive sugar (fluorodeoxyglucose (FDG)), which has a very short half‐life, to find metabolically‐active tissues. Cancer cells tend to be metabolically highly active and show up as 'hot spots' on PET/CT, although not all cancers are metabolically active, and in order to be detected, there needs to be a reasonable number of cells to generate a positive signal. Therefore, PET/CT detects the presence of cancer cells directly, whereas SLNB with Tc‐99m (plus or minus scintigraphy or SPECT/CT) finds the first draining lymph node. Whether or not the SLN has cancer cells in it requires subsequent histological examination.

Rationale

A recently updated Cochrane Review found that there was no therapeutic benefit to systematic pelvic lymphadenectomy in endometrial cancer (Frost 2017). In addition, there was a four‐fold increase in surgical‐related morbidity and an eight‐fold increase in lymphoedema and lymphocyst formation (a retroperitoneal collection of lymph) in the group who underwent the surgery. This can cause serious long‐term morbidity in women, many of whom would have an otherwise excellent prognosis. Despite these data, lymphadenectomy is still widely practiced world‐wide. Many women continue to be subjected to the short‐ and long‐term risks of over‐treatment.

A pelvic lymph node dissection becomes difficult with increased obesity, and carries a risk of vascular or nerve injury (Agar 2015). The risk of leg lymphoedema following a node dissection is commonly under‐reported (or not reported at all in results of surgical studies), with rates up to 38% (Biglia 2015), and lymphocyst formation in around 20% (Zikan 2015). The debilitating effects of lower limb lymphoedema cannot however, be underestimated, and have a marked effect on the quality of life of long‐term survivors (van de Poll 2015). Substitution of a SLNB instead of a lymph node dissection reduces both acute and chronic morbidity in other cancers (Ashikaga 2010). Therefore, it is highly likely that this is also the case in endometrial cancer, although there are no data available to support this. There are no published RCTs evaluating the accuracy of SLNB in endometrial cancer.

There is considerable disagreement among the UK cancer centres regarding the value of lymph node dissection in endometrial cancer, because a systematic review of RCTs of full pelvic lymph node dissection found no overall survival advantage (Frost 2017). Some centres do not perform any form of node dissection, while other centres will perform a node dissection in aggressive endometrial cancers, such as grade 3 endometrial, or serous cancer of the uterus (so‐called type II endometrial cancers). While it would seem unlikely that a lymph node dissection removing micrometastases may offer any therapeutic benefit, over and above treatment based on other prognostic factors (Frost 2017), it could identify more aggressive cancers that require loco‐regional, systemic adjuvant therapy, or both. However, whether there is any advantage to treating, based on lymph node status, rather than other known risk factors, such as age, histological type, myometrial invasion, and lymphovascular space invasion, remains controversial (and will not be answered by a study on diagnostic test accuracy).

The clinical implication of a false‐negative test, if full lymphadenectomy was not performed after SLNB, would be an increased risk of recurrence or residual disease following surgery, compared with a full lymphadenectomy. This could potentially lead to under‐treatment, depending on what risk factors are used to determine the administration of adjuvant treatment. However, if full lymphadenectomy was not otherwise planned, decisions about adjuvant treatment would normally be based on other histological and patient‐specific risk factors, rather than knowledge of lymph node status.

Objectives

To assess the diagnostic accuracy of sentinel lymph node biopsy (SLNB) in the identification of pelvic lymph node involvement in women with endometrial cancer, presumed to be at an early stage prior to surgery, including consideration of the detection rate.

Secondary objectives

To compare the diagnostic accuracy of different detection methods (Tc‐99m with or without methylene or patent blue dye versus near infrared fluorescence tracers, such as indocyanine green (ICG)) and injection sites (subserosal versus cervical versus combined subserosal and cervical).
To compare the accuracy of sentinel lymph nodes (SLN) detection by lymph node basin studied, i.e. pelvic versus para‐aortic region.

Methods

Criteria for considering studies for this review

Types of studies

Diagnostic accuracy studies looking at sentinel lymph node biopsy (SLNB) in endometrial cancer using ultrastaging of the SLN. Studies included patients with endometrial cancer who have undergone SLNB detection and ultrastaging of the SLN (index test), who then go on to have systematic pelvic, plus or minus para‐aortic lymphadenectomy (reference standard). Study designs included prospective or retrospective studies. A separate control group was not required, since each individual patient has the index text and reference standard.

We excluded case‐control studies and studies with fewer than 10 patients with endometrial cancer. This was chosen as a cut‐off because there is a learning curve for SLNB detection, which is thought to be at least 10 cases. For studies reporting insufficient data for accurate identification of the target population, or for the construction of a 2 x 2 table, we contacted the trial authors to clarify and/or request missing information.

Participants

Adult women (women aged 18 years and over) with adenocarcinoma of the endometrium of endometrioid, serous, clear cell, or mixed types with apparent cancer confined to the body of the uterus (International Federation of Gynecology and Obstetrics (FIGO) stages Ia and Ib).

We excluded women with uterine stromal tumours (leiomyosarcoma or sarcoma).

Index tests

A sentinel lymph node biopsy (SLNB).

Studies had to specify:

type of tracer(s) used, such as a blue dye, Tc‐99m, and near infra red fluorescence;
the technique of injection(s), including timing of injection, amount of tracer used, and location or site of injection;
the detection technique used, and whether a preoperative scintigram or single‐photon emission CT (SPECT) was performed; and
whether the surgery was performed by open or laparoscopic (including robotic) routes.

The sentinel node should be removed and subjected to both standard histopathological assessment and ultrastaging. If either tests were positive for metastases, the sentinel node was considered to be positive. If both were negative for metastases, then the sentinel node was considered to be negative. Studies must have reported whether the pelvic sentinel node was detected unilaterally or bilaterally. In the case of a false negative sentinel node, the study should have reported whether the positive reference node was from the same side of the pelvis as the negative sentinel node, or the opposite side of the pelvis.

Target conditions

Detection of metastases to the pelvic or para‐aortic lymph node, or both, in apparent early‐stage endometrial cancer.

Reference standards

A bilateral pelvic or para aortic lymph node dissection, or both (to renal vessels), is the reference standard. The procedure may be performed by open or laparoscopic (including robotic) routes. The pelvic node dissection should include removal of all external iliac, internal, external, and common iliac nodes. The surgical specimen should undergo standard histopathological assessment, and was considered positive if any cancer metastases were detected. Studies should describe if one or both sides of the hemipelvis had positive nodes. If a para‐aortic node dissection was included, it should detail if the dissection was to the level of the renal vessels, or inferior mesenteric artery. Results of both the standard and ultrasection histopathological assessment of the lymph nodes removed during the sentinel node biopsy stage were included in the results of the reference standard, since the reference standard should be all removed lymph nodes. If the sentinel node was only positive on ultrastaging, and not deemed to be positive on standard histopathological examination, we still counted this as a positive reference standard result (since it would be a false negative of the reference standard, not the index test). We reported data of where the sentinel node was negative on standard histopathological testing, but positive on ultra‐sectioning, separately, if reported.

Search methods for identification of studies

Electronic searches

We searched the following electronic databases on 23 July 2019.

MEDLINE Ovid (1946 to July week 2)
Embase Ovid (1980 to 2019 week 29)

We have presented the MEDLINE search strategy in Appendix 1, which reflects the key concepts of the review: index test (SLNB) AND target condition (lymph node metastases in endometrial cancer). We adapted the MEDLINE search strategy, as indicated, for other databases (Appendix 2).

We did not apply language restrictions to the electronic searches, and we attempted to obtain translations, as needed. Where relevant studies were only reported in abstract form, we contacted the trial authors for additional information. If we were able to include them, we conducted sensitivity analyses to test for their influence on the results.

Searching other resources

We searched the following databases for related systematic reviews and ongoing studies, and checked the reference lists of those that are relevant, for additional studies.

DARE (Database of Abstracts of Reviews of Effects): www.crd.york.ac.uk/CRDWeb
ClinicalTrials.gov: http://clinicaltrials.gov
WHO International Clinical Trials Registry Platform (ICTRP): www.who.int/ictrp/en/
HTA Database (Health Technology Assessments Database): www.york.ac.uk/crd/#HTA
ARIF (Aggressive Research Intelligence Facility): www.birmingham.ac.uk/research/activity/mds/projects/HaPS/PHEB/ARIF/index.aspx

We used all studies we identified as relevant, as seeds in PubMed, to search for additional studies using the related‐articles feature. We also used the relevant studies as seeds in the Science Citation Index ISI Web of Knowledge ResearchGate and Google Scholar, to determine whether articles citing these studies were also relevant.

We handsearched abstract books of meetings of the International Gynaecological Cancer Society, the European Society of Gynaecological Oncology and the Society of Gynecologic Oncology from 2010 to July 2019, to identify ongoing and unpublished studies. Where necessary, we contacted the main investigators of relevant ongoing studies for further information. We also contacted trial authors of relevant studies to ask if they know of further data which may or may not have been published.

Data collection and analysis

Selection of studies

We downloaded titles and abstracts retrieved by electronic searching to the reference management database Endnote, and uploaded them to the systematic review management tool, Covidence (Covidence). Two review authors (a combination of HN, JM, RG, WH and NW) independently screened references. We excluded those studies that clearly did not meet the inclusion criteria, and obtained copies of the full text of potentially relevant references. Two review authors (a combination of HN, JM RG, WH and NW) independently assessed the eligibility of retrieved papers. We resolved disagreements between review authors by discussion, if possible, and if not, by involving a third review author (JM or HN). We documented reasons for exclusion.

Data extraction and management

Two review authors (a combination of HN, RG, WH and NW) independently extracted data, and resolved disagreements by discussion, if possible, and if not, by involving a third review author (JM), who also performed a final check of data entry into the analysis. We extracted data on the following items: author, year of publication and journal (including language), country, settings, inclusion and exclusion criteria, study design, study population (number of patients, age, endometrial cancer details, including histopathological cell type, FIGO stage, grade on hysterectomy specimen, and lymphovascular space involvement), index test (type of endometrial sampling suction device, curettage or direct biopsy, experience of operator, tracer used and amount, method and timing of application, method of detection (histopathological assessment, including ultrastaging)), reference standard (open or laparoscopic, lymph node number and site, histopathological assessment), and study results (sentinel node detection rate for unilateral and bilateral pelvic nodes, sentinel node detection for para‐aortic nodes (for false‐negative cases, are reference nodes on same side or is there a failure to detect the sentinel node)), adverse reaction from index or reference test (including bleeding, infection (urine, chest, wound), lymphocyst formation, lymphoedema, venous thromboembolism), operating time, other intraoperative complication, other postoperative complication.

We extracted the numbers of true positives, false positives, true negatives, and false negatives for each study with the participant as the unit of analysis, not for example, lymph node area, according to the following definitions.

True‐positive (TP) test

A true‐positive result is where a lymph node identified by SLNB, has cancer identified within it, on histopathology sampling (positive node), regardless of the results of the other lymph nodes, following systematic lymphadenectomy. If a positive lymph node is detected, the patient would be treated by postoperative adjuvant chemoradiotherapy. Lymph nodes identified through the SLNB technique require ultrasection formalin‐fixed, paraffin‐embedded histopathology, and so results would not be available at the time of the operation.

True‐negative (TN) test

A true‐negative is defined where lymph node(s), identified by SLNB, did not have cancer identified within them (negative node), and there were no further cancer‐containing (positive) lymph nodes identified following systematic lymphadenectomy and standard histopathological examination. If the tests were found to be sufficiently accurate to remove the need for full pelvic or pelvic and para‐aortic lymphadenectomy, and if the SLN was negative, this may reduce the need for full lymphadenectomy, thereby reducing the risks of lymphoedema and lymphocyst formation.

False‐positive (FP) test

By definition, this cannot occur in this setting, since the histological examination of the SLN is unchanged by the results from any additional nodes removed at systematic lymphadenectomy, since there may only be one positive lymph node and this may be the sentinel node. If the node was only positive on ultrastaging and not on standard histopathological testing, this would be a false negative of the reference test, not the index test. This was not the focus of this review. However, we reported these data where available.

False‐negative (FN) test

A false‐negative result is where a lymph node, identified by SLNB, did not have cancer identified within it on histopathology sampling (negative node), but other lymph nodes were found to contain cancer (positive nodes) following systematic lymphadenectomy and standard histopathological examination.

Assessment of methodological quality

Two review authors (a combination of RG, NW and WH) independently performed a quality assessment of the studies, and resolved any disagreements by discussion. In case of persisting disagreement, they called upon a third review author (HN or JM). We assessed the quality of the studies using the QUADAS‐2 tool, according to the review‐specific criteria outlined in Table 2 (Whiting 2011).

Table 2. QUADAS‐2 assessment criteria

Item	Description
Domain 1: Patient selection
A. Risk of bias
Was a consecutive or random sample of participants enrolled?	Yes, if a consecutive or random sample of participants were enrolled No, if a consecutive or random sample of participants were not enrolled Unclear, if the study did not describe the method of participants enrolment
Was a case‐control design avoided?	The answer to this item will be 'yes' for all the included studies, as one of the exclusion criteria is case‐control studies
Did the study avoid inappropriate exclusions?	Yes, if the characteristics of the participants were well described and probably typical of a secondary healthcare setting No, if the sample was unrepresentative of people with apparent early‐stage endometrial cancer (i.e. a unexpectedly high proportion with rarer high grade histopathological types of endometrial cancer) Unclear, if the source or characteristics of participants was not adequately described
Could the selection of participants have introduced bias?	A judgement of low, high, or unclear risk of bias will be made, based on a balanced assessment of the responses to the above signalling questions.
B. Concerns about applicability
Are there concerns that the included participants and setting do not match the review question?	A judgement of low, high, or unclear concerns about applicability will be made based on a balanced assessment of the response to the third signalling question above, and on how closely the sample matched the target population of interest
Domain 2: Index test
A. Risk of bias
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes, if the report stated that the person undertaking the index test did not know the results of the reference tests No, if the report stated that the same person performed both tests, or that the results of the index tests were known to the person undertaking the reference tests Unclear, if insufficient information was provided
Did the study provide a clear definition of what was considered to be a positive result?	Yes, if the definition of a diagnosis of lymph node metastasis was clearly stated, including ultrastaging of the sentinel node No, if no definition of what was considered a positive result of lymph node metastasis was stated, or the definition of a positive result varied between the participants Unclear, if not enough information was given to permit judgement
If a threshold was used, was it prespecified?	This item is not applicable, as the test is not subject to a threshold.
Could the conduct or interpretation of the index test have introduced bias?	A judgement of low, high, or unclear risk of bias will be made based on a balanced assessment of the responses to the above signalling questions.
B. Concerns about applicability
Are there concerns that the index test, its conduct, or interpretation differ from the review question?	A judgement of low, high, or unclear concerns about applicability will be made, based on a balanced assessment of the information detailed under ’Index test’ in the 'Characteristics of included studies' tables.
Domain 3: Reference standard
A. Risk of bias
Is the reference standards likely to correctly classify the target condition?	Yes, if a full pelvic or aortic node dissection (or both) was carried out adequately to correctly classify the target condition. A pelvic node dissection includes bilateral removal of nodal tissue from the distal one‐half of each common iliac artery, the anterior and medial aspect of the proximal half of the external iliac artery and vein, and the distal half of the obturator fat pad anterior to the obturator nerve. A para‐aortic node includes resection of nodal tissue over the distal vena cava, from the level of the inferior mesenteric artery (or left renal vein) to the mid‐right common iliac artery, and between the aorta and the left ureter, from the inferior mesenteric artery to the left mid‐common iliac artery. No, if a full pelvic or aortic node dissection (or both) has not been carried out adequately to correctly classify the target condition. Unclear, if insufficient information was provided to assess whether the reference standard had been carried out adequately (e.g. no mention of site or number of lymph node yield).
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes, if the report stated that for SLNB, the histological examination from the lymphadenectomy was done without knowledge of the sentinel lymph node status. No, if the report stated that histological examination of the pelvic lymphadenectomy was carried out with the knowledge of the sentinel lymph node status. Unclear, if insufficient details given as to whether the histological examination was carried out with or without knowledge of sentinel lymph node status.
Could the reference standard, its conduct, or its interpretation have introduced bias?	A judgement of low, high, or unclear risk of bias will be made, based on a balanced assessment of the responses to the above signalling questions.
B. Concerns about applicability
Are there concerns that the target condition, as defined by the reference standard, does not match the question?	The answer to this question will always be low, because the target condition that the reference standard defines will always be the target condition of the review, i.e. adenocarcinoma of the endometrium of endometrioid, serous, clear cell, or mixed types, with apparent cancer confined to the uterus. Otherwise, the study will not be included.
Domain 3: Flow and timing
A. Risk of bias
Was there an appropriate interval between index test and reference standard?	Yes, if the time period between the index test and the reference standard was no more than one month No, if the time period between the index test and the reference standard was longer than one month Unclear, if insufficient information was provided
Did all participants receive the same reference standard?	Yes, if the same reference test was used, regardless of the index test results No, if different reference tests were used, depending on the results of the index test Unclear, if insufficient information was provided Report if any participants received a different reference test, what the reasons stated for this were, and how many participants were involved.
Were all participants included in the analysis?	Yes, if there were no participants excluded from the analysis, or if exclusions were adequately described No, if there were participants excluded from the analysis and there was no explanation given Unclear, if not enough information was given to assess whether any participants were excluded from the analysis Report how many participants were excluded from the analysis for reasons other than uninterpretable results Report how many results were uninterpretable (of the total)
Could the patient flow have introduced bias?	A judgement of low, high, or unclear risk of bias will be made, based on a balanced assessment of the responses to the above signalling questions.

Statistical analysis and data synthesis

As already explained in Data extraction and management, false positives are not possible and by definition specificity is 100%. Therefore, we only calculated sensitivity with 95% confidence intervals (CI) for each study. We plotted the estimates of the observed sensitivities and their 95% CI in forest plots, in order to visually assess the between‐study variability, and we meta‐analysed the data to derive an overall estimate of sensitivity, using a univariate random‐effects logistic regression model (Takwoingi 2017). We used xtmelogit in Stata Version 15 for Windows (StataCorp LLC, College Station, Texas, USA) to fit the univariate random‐effects logistic regression model (Appendix 3). For completeness, we also included the specificities of the studies in the forest plots, although they were 100% in all the studies. We calculated the detection rate as the arithmetic mean of the total number of SLNs detected out of the total number of women included in the included studies with the woman as the unit of analysis. For the summary of findings tables we calculated the number of women out of 1000 who would have a failed test (i.e., no SLN detected), a true positive test and a false negative test and their associated 95% CIs assuming prevalences of lymph node metastatic disease of 5% and 20%, corresponding to the rates of lymph node involvement for disease limited to the inner and outer half of the myometrium in historical studies (Creasman 1987).

Investigations of heterogeneity

We performed meta‐regression analyses for the following factors: FIGO stage (1a versus 1b or above), type of SLN identification methods used (blue dye alone; technetium‐99m alone; blue dye and technetium‐99m; infra red fluorescence alone; infra red fluorescence and blue dye; and infra red fluorescence and technetium‐99m), injection site (subserosal versus cervical versus combined/mixed subserosal and cervical) and lymph node basin studied (pelvic versus pelvic and para‐aortic region). We examined each factor anticipated to be a source of heterogeneity by including the factor as a covariate in the regression model (Appendix 3).

Comparison of different tracers or injection sites have been made indirectly. A sentinel lymph node (SLN) is either identified or not, which requires surgical dissection, thereby disrupting further tracer flow to lymph nodes. Furthermore, the test of SLN positivity occurs after surgical removal and histological examination. Therefore, direct intra‐patient comparisons, of differing tracers or injection sites, are not robust, even with a second surgical team, blinded to the first tracer, since the surgical disruption would invalidate the SLN detection technique. Some studies did report a mixture of techniques, but a randomised controlled trial (RCT) of detection methods would be required to directly compare methods to control for confounders, such as patient characteristics (e.g. FIGO stage and body mass index (BMI)).

Sensitivity analyses

We undertook sensitivity analyses for studies without verification bias (i.e. studies where all the included patients received the reference standard defined in Reference standards), and for studies without missing data.

Assessment of reporting bias

Due to a lack of sensitive and appropriate statistical methods to assess reporting bias in diagnostic test accuracy reviews, we did not undertake any assessment of reporting bias.

Summary of findings and assessment of the certainty of the evidence

We assessed the certainty of evidence using the GRADE approach for diagnostic studies (Balshem 2011; Schünemann 2008; Schünemann 2016). As recommended, we rated the certainty of the evidence as either high (not downgraded), moderate (downgraded by one level), low (downgraded by two levels), or very low (downgraded by more than two levels) based on five domains: risk of bias, indirectness, inconsistency, imprecision, and publication bias. For each outcome, the certainty of evidence started as high when there were high‐quality studies (cross‐sectional or cohort studies) that enrolled participants with diagnostic uncertainty. If there was a reason for downgrading, we used our judgement to classify the reason as either serious (downgraded by one level) or very serious (downgraded by two levels). Two review authors discussed the judgements of the certainty of the evidence and applied GRADE in the following way (GRADEpro GDT; Schünemann 2020a; Schünemann 2020b).

Risk of bias: we used QUADAS‐2 to assess risk of bias.
Indirectness: we assessed indirectness in relation to the population (including disease spectrum), setting, interventions, and outcomes (accuracy measures). We also used prevalence as a guide to whether there was indirectness in the population.
Inconsistency: GRADE recommends downgrading for unexplained inconsistency in sensitivity and specificity estimates. We carried out prespecified analyses to investigate potential sources of heterogeneity and did not downgrade when we believed we could explain inconsistency in the accuracy estimates.
Imprecision: we considered a precise estimate to be one that would allow a clinically meaningful decision. We considered the width of the confidence intervals (CI) and asked ourselves, would we make a different decision if the lower or upper boundary of the CI represented the truth? In addition, we worked out projected ranges for TP, FN, TN, and FP for a given prevalence of endometrial cancer lymph node metastasis and made judgements on imprecision from these calculations.
Publication bias: we rated publication bias as unclear.

Results

Results of the search

The search resulted in:

MEDLINE: 4615 records and Embase: 4665 records.

Following preliminary de‐duplication across the databases, we found 6381 records (see Figure 1). We found another 18 references through handsearching (included articles in systematic reviews identified by the search and a specialist issue of the International Journal and Gynecologic Oncology in February 2020). We uploaded records to Covidence, identifying a further 140 duplicate records, leaving 6259 studies for screening. We excluded 6027 records by screening titles and abstracts. We found 232 studies in full‐text publications and excluded 159 articles after full‐text review, leaving 73 records. We were unable to classify nine records because we were either unable to track down a full‐text article (five records) or obtain a full‐text translation (four records) despite requests on Cochrane task exchange and contacting trial authors. These studies are therefore classified as awaiting classification (see Characteristics of studies awaiting classification for further details). We allocated an additional 18 records to studies awaiting classification, as we were unable to extract data for inclusion in 2x2 tables, despite contacts with the trial authors for further information (See Characteristics of studies awaiting classification for further details: Basta 2005; Buda 2012; Buda 2017; Dzvincuk 2006; Ehrisman 2016; El‐Agwany 2018; Elisei 2017; Feranec 2010; Holub 2002; How 2015;Khoury‐Collado 2009; Laios 2015; Liang 2017; London 2015 Niikura 2013; Pitynski 2003; Plante 2015; Rossi 2012; Shimada 2018; Stephens 2020; Tanaka 2018; Tanner 2017; Touhami 2015; Touhami 2017; Yamagami 2017; Yan 2007; Yordanov 2014). If further data are made available, it may be possible to include or exclude some of these studies in future updates.

Figure 1

Study flow diagram.

We included 33 studies (46 references) with a total 2237 women in the review, which represent unique datasets, as far as we are able to ascertain (Abu‐Rustum 2009; Allameh 2015; Baiocchi 2017; Ballester 2011; Barranger 2009; Bats 2008; Bese 2016; Body 2018; Buda 2016b; Delaloye 2007; Gezer 2020; Holloway 2012; Holloway 2017; Kataoka 2016; Kuru 2011; Lopes 2007; Maccauro 2005; Mais 2010; Mücke 2014; Nejkovic 2017; Papadia 2018; Pelosi 2003; Perrone 2008; Persson 2017; Rossi 2017; Signorelli 2015; Solima 2012; Soliman 2017; Taskin 2017; Torne 2013; Valha 2015; Vidal 2013; Ye 2019). This has been challenging, as patient data sets were frequently represented across a number of publications in different, often overlapping, formats.

We excluded 159 studies for the following reasons (see Characteristics of excluded studies for further details);

Review articles (Abu‐Rustum 2014; Altgassen 2003; Ansari 2013; Bacalbasa 2016; Bonneau 2011; Cibula 2015; Collarino 2016; Delpech 2010; El‐Ghobashy 2009; Huchon 2010; Levinson 2013; Marchiole 2004; Niikura 2004a; Rocha 2016; Touboul 2013)
Systematic reviews (Abdullah 2013; Bodurtha Smith 2017; Cormier 2015; Crivellaro 2018; Delpech 2008; How 2018; Kang 2011; Lin 2017; Ruscito 2016; Scelzo 2015; Xiong 2014;
Letters/commentaries (Abu‐Rustum 2013; Altgassen 2005; Amant 2017; Bogani 2016; Bogani 2017; Di Martino 2018; Ditto 2015; Feranec 2007; Frumovitz 2008; Frumovitz 2014; Gorostidi 2017; Mahajan 2007; Martinelli 2019; Oonk 2013; Papadia 2017; Perrone 2009; Schneider 2011; Siesto 2016;
Wrong study design (e.g. identification of sentinel lymph node (SLN) and no requirement for systematic LN dissection in all cases; histological comparison study; description of SLN technique; description of use in clinical practice; learning curves of surgeons to perform technique) (Abbeloos 1965; Altgassen 2009; Baranov 1984; Barlin 2012; Barra 2019; Blakely 2019; Bollino 2020; Bournaud 2013; Cabrera 2020; Curcio 2018; Darai 2015; Desai 2014; De Villa 2018; Dittmann 2010; Echt 1999; Eitan 2015; Eriksson 2016; Eriksson 2017; Farghali 2015; Frati 2014; Gelissen 2019; Geppert 2017; Geppert 2017a; Hagen 2016; Hasanzadeh 2019; How 2017; Jewell 2014; Khoury‐Collado 2011; Kim 2013; Kim 2013a; Kim 2018; Kraft 2013; Leitao 2011; Markus 2016; Martinelli 2017; Martinelli 2017a; McNally 2018; Miao 2018; Naaman 2016; Nagai 2012; Paley 2016; Pandit‐Taskar 2010; Papadia 2016; Papadia 2016a; Papadia 2017a; Perissinotti 2013; Plante 2017; Ruiz 2018; Sahbai 2017; Sawicki 2013; Sawicki 2015; Sawicki 2015a; Shimada 2018a; Sinno 2014; St Clair 2016; Stewart 2019; Stewart 2020; Tanner 2015; Tucker 2020; Urh 2015; Vidal‐Sicart 2009; Volodarsky 2018; Zahl Eriksson 2016;
No evidence of ultrastaging (ultra‐section plus immunohistochemistry (IHC) of the SLN (Altgassen 2007; Behnamfar 2017; Behnamfar 2018; Berlev 2017; Biliatis 2017; Burke 1996; Canadas Salazar 2018; Clinton 2017; Cordero Garcia 2012; Eoh 2018; Favero 2015; Fersis 2004; Frumovitz 2007; Gargiulo 2003; Gien 2005; Holub 2004; How 2012 Jordanov 2014; Kadkhodayan 2014; Kantathavorn 2018; Lazar 2015;Lelievre 2004; Lelièvre 2004a; Lopez‐De 2014; Mendivil 2018; Mosgaard 2013; Niikura 2004; Niikura 2007; Papadia 2018a; Park 2018; Pelosi 2002; Qu 2010; Rajanbabu 2018; Raspagliesi 2004; Robova 2009; Rossi 2013; Sahbai 2016; Sinilkin 2018; Togami 2018; Yildiz 2013; Zenzola 2009; Zuo 2019).
Conference abstract list (Anonymous 2015).
Fewer than 10 cases (Buda 2016; Clement 2008; Fernandez‐Prada 2015; Holub 2001; Mangeshikar 2017).
Wrong patient population: SLN detection for cervical cancer (Silva 2005); or atypical hyperplasia (Touhami 2018).
Meeting abstracts and had inadequate detail to extract 2x2 table or confirm that they met full inclusion criteria (Toki 2018; Valieva 2018).

Methodological quality of included studies

Overall the methodological reporting in most of the studies was poor, which resulted in a very large proportion of 'unclear risk if bias' ratings and greatly limited our ability to assess them (see Figure 2).
Only eight of the included studies included their criteria for exclusion of participants and avoided inappropriate exclusions (Allameh 2015; Ballester 2011; Gezer 2020; Rossi 2017; Solima 2012; Soliman 2017; Taskin 2017; Ye 2019).
A number of studies included exclusion criteria that were possibly inappropriate in the setting of this review, for example exclusion of low risk endometrial cancers (endometrioid grades 1 or 2, less than 50% myometrial invasion, absence of lymphovascular space invasion (LVSI)) (Baiocchi 2017, Mücke 2014), positron emission tomography–computed tomography (PET/CT) findings from another clinic (Bese 2016), an inability to undergo robotically‐assisted hysterectomy for any reason (Holloway 2017), other malignancies within the previous five years (Kataoka 2016), non‐specific co‐morbidities (Kuru 2011), preoperative diagnoses of clear cell carcinoma or serous papillary adenocarcinoma (Maccauro 2005), a history of congenital uterine anomalies, and deep vein thrombosis in the lower limbs (Nejkovic 2017), patients in whom only a pelvic lymphadenectomy was performed or a para‐aortic lymphadenectomy that did not reach the renal vessels (Papadia 2018), prior chemotherapy or pelvic radiotherapy (Perrone 2008), inability to perform a transvaginal ultrasound exam (Torne 2013), and subtypes other than endometrioid endometrial adenocarcinoma (Valha 2015). The remaining studies did not report their exclusion criteria (Abu‐Rustum 2009; Barranger 2009; Bats 2008; Body 2018; Buda 2016b; Delaloye 2007; Holloway 2012; Lopes 2007; Mais 2010; Pelosi 2003; Persson 2017; Signorelli 2015; Vidal 2013).
In 29 studies a consecutive or random sample of patients were enrolled (although inclusion/exclusion criteria differed between the studies) and it was unclear in the remaining four studies (Mais 2010; Pelosi 2003; Perrone 2008; Solima 2012), and in all studies a case‐control design was avoided.
For the majority of studies, there was a low concern that the included patients and setting did not match the review question. For some studies the applicability was unclear (Baiocchi 2017; Signorelli 2015), as not all patients in the study underwent SLN biopsy. Baiocchi 2017 compared patients who underwent SLN procedures with those who did not, while Signorelli 2015 aimed to evaluate the role of PET/CT and SLN biopsy in staging high‐risk endometrial cancer patients in early clinical stage, but only introduced the SLN procedure towards the end of their recruitment period, therefore only a subset of patients underwent both PET/CT and SLN mapping.
All studies failed to implement blinded interpretation of the index test and reference standards. We did not find evidence that any of the studies were designed so that the index test results were interpreted without knowledge of the reference standard, or the reference standard results interpreted without knowledge of the results of the index tests, leading to an unclear risk of bias for all studies about the conduct and interpretation of the index test and reference standard.
In many studies, the criteria for defining metastases or micrometastases were clearly pre‐specified (Baiocchi 2017; Ballester 2011; Barranger 2009; Bese 2016; Buda 2016b; Gezer 2020; Holloway 2012; Holloway 2017; Kataoka 2016; Kuru 2011; Mücke 2014; Nejkovic 2017; Rossi 2017; Soliman 2017; Taskin 2017; Ye 2019), while others failed to include these criteria in their manuscripts (Abu‐Rustum 2009; Allameh 2015; Bats 2008; Body 2018; Delaloye 2007; Lopes 2007; Maccauro 2005; Mais 2010; Papadia 2018; Pelosi 2003; Perrone 2008; Persson 2017; Signorelli 2015; Solima 2012; Torne 2013; Valha 2015; Vidal 2013).
There were low concerns in all studies that the index test, its conduct, or interpretation differed from the review question, apart from Signorelli 2015 for the above mentioned reasons. The concern that the target condition as defined by the reference standard does not match the review question was deemed low for all studies.
In all studies, the patients received the index and reference standard during the same surgical procedure.
The reference standard that patients received was not always identical, both within and between studies. In only 10 studies (Abu‐Rustum 2009; Delaloye 2007; Gezer 2020; Lopes 2007; Nejkovic 2017; Papadia 2018; Pelosi 2003; Persson 2017; Torne 2013; Valha 2015) did all patients receive the same reference standard. Eight of these studies (Abu‐Rustum 2009; Delaloye 2007; Gezer 2020; Lopes 2007; Papadia 2018; Persson 2017; Torne 2013; Valha 2015) defined the reference standard as a full pelvic and para‐aortic lymphadenectomy, while two studies (Nejkovic 2017; Pelosi 2003) performed only a pelvic lymphadenectomy on all participants recruited.
In 21 studies, the reference standard was defined as either pelvic or pelvic and para‐aortic lymphadenectomy, and the analyses were combined (Baiocchi 2017; Barranger 2009; Bats 2008; Bese 2016; Body 2018; Buda 2016b; Holloway 2012; Holloway 2017; Kataoka 2016; Kuru 2011; Maccauro 2005; Mais 2010; Mücke 2014; Perrone 2008; Rossi 2017; Signorelli 2015; Solima 2012; Soliman 2017; Taskin 2017; Vidal 2013; Ye 2019). It is unclear what level of bias this introduces to the accuracy of sentinel node biopsy for diagnosis of lymph node involvement in endometrial cancer. In addition, the mode of surgery in these studies often differed between patients, including laparoscopic surgery, robotic surgery, and laparotomies. It is unclear how this may affect the outcome in terms of SLN detection (the index test) and ability to perform full lymphadenectomy (the reference standard).
For the remaining two studies it was unclear whether the same reference standard was used in all participants, as the level of LN dissection is not reported (Allameh 2015), or the ratios of patients receiving pelvic lymphadenectomy alone or pelvic and para‐aortic lymphadenectomy are not given (Ballester 2011).
All studies apart from Ballester 2011 and Persson 2017 include all patients in their subsequent analyses. Ballester 2011 includes only those patients in whom SLNs were detected, but not all patients who were recruited to the study, which may introduce considerable bias. Persson 2017 excludes a single patient from further analysis, whose surgery was converted to open surgery due to extensive intra‐abdominal adhesions, and whom therefore was not injected with ICG (the index test).

Figure 2

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Findings

We included 33 of studies (46 references), with the sample sizes ranging from 15 to 340 (median = 60 participants). Six studies were retrospective (Baiocchi 2017; Ballester 2011; Bese 2016; Buda 2016b; Holloway 2012; Papadia 2018) and 27 were prospective (Abu‐Rustum 2009; Allameh 2015; Barranger 2009; Bats 2008; Body 2018; Delaloye 2007; Gezer 2020; Holloway 2017; Kataoka 2016; Kuru 2011; Lopes 2007; Maccauro 2005; Mais 2010; Mücke 2014; Nejkovic 2017; Pelosi 2003; Perrone 2008; Persson 2017; Rossi 2017; Signorelli 2015; Solima 2012; Soliman 2017; Taskin 2017; Torne 2013; Valha 2015; Vidal 2013; Ye 2019). The studies were conducted in inpatient hospital settings in 15 countries (Brazil, Canada, China, Czech Republic, France, Germany, Iran, Italy, Japan, Serbia, Spain, Sweden, Switzerland, Turkey, USA). Where recorded, the mean/median age of the women ranged from 54 to 69.5 years. Where recorded, 45% of the women in the included countries had FIGO Stage IA disease (range 0% to 100%), 31.9% had G3 histology (range 2% to 100%) and 30.6% had LVSI present (range 17% to 54,2%). Thirty studies included both Type 1 and 2 endometrial cancer histological subtypes, whereas two studies included only Type 1 tumours, and one study included only Type 2 tumours. The mean/median body mass index (BMI) of patients ranged from 24.8 to 35.3 kg/m² and the maximum BMI in studies ranged from 33.4 to 68 kg/m²).

In terms of index test, 11 studies reported data for blue dye alone, four studies for technetium‐99m alone, 12 studies for a combination of blue dye and technetium‐99m, nine studies for indocyanine green (ICG) alone, two studies for ICG and blue dye and one study for ICG and technetium‐99m combination. Nineteen studies used cervical injection of tracer,10 studies used subserosal uterine injection of tracer (either hysteroscopic or trans‐fundal) and four studies used a combination of techniques. Few studies reported adverse reactions from index or reference test (including bleeding, infection (urine, chest, wound), lymphocyst formation, lymphoedema, venous thromboembolism), operating time, or other intraoperative and postoperative complications. Further details can be found in the Characteristics of included studies table.

Sentinel lymph node (SLN) detection (ability of test to detect the SLN)

Overall, the SLN detection rate ranged from 61.8% (Mais 2010) to 100% (Holloway 2012; Kataoka 2016; Kuru 2011; Maccauro 2005; Papadia 2018; Persson 2017; Signorelli 2015), with a mean SLN detection rate of 86.9% (95% confidence interval (CI) 82.9% to 90.8%; 2237 women, 33 studies; moderate‐certainty evidence). When considered according to which tracer was used, the SLN detection rate ranged from 77.8% (95% CI 70.0% to 85.6%) for blue dye alone (559 women; 11 studies; low‐certainty evidence) to 100% for ICG and technetium‐99m (32 women; 1 study; very low‐certainty evidence; Table 3; summary of findings Table 1). In the studies that reported the number of women with bilateral lymph node detection (Baiocchi 2017; Ballester 2011; Barranger 2009; Bats 2008; Bese 2016; Body 2018; Buda 2016b; Delaloye 2007; Gezer 2020; Holloway 2012; Holloway 2017; Mücke 2014; Nejkovic 2017; Papadia 2018; Pelosi 2003; Perrone 2008; Persson 2017; Rossi 2017; Signorelli 2015; Soliman 2017; Taskin 2017; Torne 2013; Vidal 2013; Ye 2019), the bilateral detection rate ranged from 25.5% (Torne 2013) to 97.1% (Holloway 2012), with a mean bilateral detection rate of 65.4% (95% CI 57.8% to 73.0%).

Table 3. Sentinel lymph node detection rate by tracer

Tracer (studies)	Number detected/total number (%; 95% CI)
Blue dye alone (11)	435/559 (77.8; 95%CI 70.0 to 85.6)
Technetium‐99m alone (4)	208/257 (80.9; 95% CI 63.9 to 97.9)
Technetium‐99m and blue dye (12)	473/548 (86.3; 95% CI 80.7 to 91.9)
ICG alone (9)	881/953 (92.4; 95% CI 88.7 to 96.2)
ICG and blue dye (2)	208/215 (96.7; 95% CI 92.7 to 100)
ICG and technetium‐99m (1)	32/32 (100%)

CI: confidence interval
ICG: Indocyanine green

Sensitivity (ability of test to detect endometrial cancer lymph node metastases)

The rates of women with lymph node involvement in endometrial cancer ranged from 5.2% (Bese 2016) to 34.4% (Baiocchi 2017), with a mean rate of 20.1% (95% CI 17.7% to 22.3%). The sensitivities of the included studies are illustrated in Figure 3 and ranged from 50.0% (Bese 2016; Mais 2010; Signorelli 2015; Ye 2019) to 100.0% (Abu‐Rustum 2009; Allameh 2015; Barranger 2009; Bats 2008; Buda 2016b; Kuru 2011; Maccauro 2005; Nejkovic 2017; Pelosi 2003; Perrone 2008; Persson 2017; Valha 2015) with a pooled sensitivity of 91.8% (95% CI 86.5% to 95.1%; total n = 2237, of whom 409 had lymph node involvement; moderate‐certainty evidence).

Figure 3

Figure 3 illustrates the (per patient) sensitivity of the included studies ordered by sensitivity.

Heterogeneity analyses

FIGO stage: all of the included studies that reported FIGO stage included women with a mixture of stage IA and IB or above, with the exception of Pelosi 2003, which only included women with stage IA. To analyse the effect of stage on sensitivity we categorised the studies into stage IA or stage IB+ based on whether the majority of the included women had stage IA (Allameh 2015; Ballester 2011; Bese 2016; Body 2018; Buda 2016b; Gezer 2020; Kataoka 2016; Kuru 2011; Maccauro 2005; Mais 2010; Pelosi 2003; Persson 2017; Rossi 2017; Solima 2012; Taskin 2017; Ye 2019), or stage 1b or above (Mücke 2014; Nejkovic 2017; Papadia 2018) endometrial cancer. A number of the studies that did report FIGO stage could not be categorised into these categories because they only reported stage for all the women included and not broken down by whether the SLN was detected or not. These studies, depending on whether the women whose SLN was not detected had stage IA or IB or above, could be classified into either category (Abu‐Rustum 2009; Barranger 2009; Bats 2008; Delaloye 2007; Soliman 2017; Torne 2013; Vidal 2013). Along with the studies that did not report stage at all (Baiocchi 2017; Holloway 2012; Holloway 2017; Lopes 2007; Perrone 2008; Signorelli 2015; Valha 2015), these studies were therefore not included in this analysis. Meta‐regression analyses found that sensitivity did not differ between studies with a majority of women with stage IA cancer (sensitivity: 91.1%, 95% CI 80.7% to 96.2%) and studies with a majority of women with stage IB or above cancer (sensitivity: 91.3%, 95% CI 71.1% to 97.8%). This was the case whether the model included FIGO stage and assumed equal variances (Chi² (1) = 0.13; P = 0.72) or separate variances (Chi² (2) = 1.21; P = 0.55).

Type of sentinel lymph node (SLN) identification method used

The included studies used either blue dye alone (Allameh 2015; Baiocchi 2017; Buda 2016b; Holloway 2017; Kuru 2011; Lopes 2007; Mais 2010; Nejkovic 2017; Soliman 2017; Valha 2015; Vidal 2013), ICG alone (Body 2018; Buda 2016b; Kataoka 2016; Papadia 2018; Persson 2017; Rossi 2017; Soliman 2017; Taskin 2017; Ye 2019), technetium‐99m alone (Gezer 2020; Perrone 2008; Solima 2012; Torne 2013), ICG and blue dye (Holloway 2012; Holloway 2017), technetium‐99m and blue dye (Abu‐Rustum 2009; Ballester 2011; Barranger 2009; Bats 2008; Bese 2016; Buda 2016b; Delaloye 2007; Maccauro 2005; Mücke 2014; Pelosi 2003; Signorelli 2015; Soliman 2017) or technetium‐99m and ICG (Kataoka 2016) in the women included in their studies. Due to restricted variance (few false negatives in a small number of studies) in the studies using technetium‐99m alone (sensitivity: 90.5%, 95% CI 67.7% to 97.7%; low‐certainty evidence), ICG and blue dye (sensitivity: 90.5%, 95% CI 63.2% to 98.1%; low‐certainty evidence) or technetium‐99m and ICG (sensitivity: 100%, 95% CI 63% to 100%; very low‐certainty evidence) (see also Figure 4) we could not examine the effect of this covariate in analyses that used separate variances. However, none of the meta‐regression analyses we performed yielded a significant effect of tracer(s) used whether it was entered with six levels including all the tracers and tracer combinations in the included studies and examined only with equal variances (Chi² (5) = 2.29; P = 0.81), or if the covariate was entered with three levels only including the studies that used either blue dye alone (sensitivity: 95.2%, 95% CI 77.2% to 99.2%; low‐certainty evidence), ICG alone (sensitivity: 92.5%, 95% CI 81.8% to 97.1%; moderate‐certainty evidence) or technetium‐99m and blue dye (sensitivity: 91.9%, 95% CI 74.4% to 97.8%; low‐certainty evidence) injection sites and assuming equal variances (Chi² (2) = 0.22; P = 0.90) or separate variances (Chi² (4) = ‐0.55; P = 1).

Figure 4

illustrates the (per patient) sensitivity of the included studies ordered by detection method: Blue dye alone, Technetium‐99m alone, Technetium‐99m & blue dye, ICG alone, ICG and blue dye, and ICG and Technecium‐99m.

Figure 4 illustrates the (per patient) sensitivity of the included studies ordered by detection method: Blue dye alone, Technetium‐99m alone, Technetium‐99m & blue dye, ICG alone, ICG and blue dye, and ICG and Technecium‐99m.

Injection site

The injection sites used in the included studies were either subserosal (Allameh 2015; Baiocchi 2017; Delaloye 2007; Gezer 2020; Kataoka 2016; Lopes 2007; Maccauro 2005; Perrone 2008; Solima 2012; Torne 2013; Valha 2015), cervical (Ballester 2011; Barranger 2009; Bats 2008; Bese 2016; Body 2018; Buda 2016b; Gezer 2020; Holloway 2012; Holloway 2017; Kuru 2011; Mais 2010; Mücke 2014; Nejkovic 2017; Papadia 2018; Pelosi 2003; Perrone 2008; Persson 2017; Rossi 2017; Signorelli 2015; Soliman 2017; Taskin 2017; Vidal 2013; Ye 2019), combined cervical and subserosal (Kuru 2011), or a mixture with some patients receiving cervical injections only and others receiving both cervical and subserosal injections (Abu‐Rustum 2009). However, In both studies that used a combined or mixed injection sites, sensitivity was 100%. We could therefore not examine the effect of this covariate in analyses that used separate variances if we included these two studies with a variance of 0. However, none of the meta‐regression analyses we performed yielded a significant effect of injection site whether it was entered with three levels (subserosal, cervical and combined/mixed) and examined only with equal variances (Chi² (2) = 1.85; P = 0.4), or if the covariate was entered with 2 levels only including the studies with either subserosal (sensitivity: 90.4%, 95% CI 82.6% to 94.9%) or cervical (sensitivity: 91.7%, 95% CI 84.0% to 95.9%) injection sites and assuming equal variances (Chi² (1) = 0.07; P = 0.79) or separate variances (Chi² (2) = 4.39; P = 0.11).

Lymph node basin studied

The majority of the studies studied the pelvic lymph node basin alone (Abu‐Rustum 2009; Allameh 2015; Ballester 2011; Barranger 2009; Bats 2008; Bese 2016; Body 2018; Buda 2016b; Delaloye 2007; Gezer 2020; Holloway 2012; Holloway 2017; Kataoka 2016; Kuru 2011; Maccauro 2005; Mais 2010; Nejkovic 2017; Pelosi 2003; Perrone 2008; Signorelli 2015; Taskin 2017; Vidal 2013; Ye 2019), with the remainder studying both the pelvic and para‐aortic lymph node basins (at least in the majority of the included women; Baiocchi 2017; Lopes 2007; Mücke 2014; Papadia 2018; Persson 2017; Rossi 2017; Solima 2012; Soliman 2017; Torne 2013; Valha 2015). Meta‐regression analyses found that although sensitivity was numerically higher when both the pelvic and the para‐aortic lymph node basins were studied (sensitivity: 94.0%, 95% CI 88.8% to 96.8%) compared to only the pelvic lymph node basin (sensitivity: 90.5%, 95% CI 81.9% to 95.3%), this difference was not statistically significant. This was the case whether the model included the lymph node basin studied and assumed equal variances (Chi² (1) = 1.48; P = 0.22) or separate variances (Chi² (2) = 4.4; P = 0.11).

The heterogeneity analyses are summarised in Table 4.

Table 4. Heterogeneity analyses

Covariate	Studies (women)	Sensitivity % (95% CI)	Statistical significance
FIGO stage
Majority of women with 1a	16 (1252)	91.1 (80.7 to 96.2)	Equal variances assumed: Chi² (1) = 0.13; P = 0.72 Separate variances assumed: Chi² (2) = 1.21; P = 0.55
Majority of women with 1b or above	3 (95)	91.3 (71.1 to 97.8)
Sentinel lymph node identification method
Blue dye alone	11 (435)	95.2 (77.2 to 99.2)	6 levels (all tracers and tracer combinations): Equal variances assumed: Chi² (5) = 2.29; P = 0.81 3 levels (blue dye alone, ICG alone, technetium‐99m and blue dye): Equal variances assumed: Chi² (2) = 0.22; P = 0.90 Separate variances assumed: Chi² (4) = ‐0.55; P = 1
ICG alone	9 (881)	92.5 (81.8 to 97.1)
Technetium‐99m alone	4 (208)	90.5 (67.7 to 97.7)
ICG and blue dye	2 (208)	90.5 (63.2 to 98.1)
Technetium‐99m and blue dye	12 (473)	91.9 (74.4 to 97.8)
Technetium‐99m and ICG	1 (32)	100
Injection site
Subserosal	11 (445)	90.4 (82.6 to 94.9)	3 levels (subserosal, cervical and combined/mixed): Equal variances assumed: Chi² (2) = 1.85; P = 0.4 2 levels (subserosal, cervical): Equal variances assumed: Chi² (1) = 0.07; P = 0.79 Separate variances assumed: Chi² (2) = 4.39; P = 0.11
Cervical	23 (1746)	91.7 (84.0 to 95.9)
Combined subserosal and cervical	1 (10)	100
Cervical with/without subserosal	1 (36)	100
Lymph node basin
Pelvic	23 (1405)	90.5 (81.9 to 95.3)	Equal variances assumed: Chi² (1) = 1.48; P = 0.22 Separate variances assumed: Chi² (2) = 4.4; P = 0.11
Pelvic and para‐aortic	10 (779)	94.0 (88.8 to 96.8)

CI: confidence interval
ICG: Indocyanine green

Sensitivity analyses

We performed two sensitivity analyses examining the sensitivity of the pooled sensitivity estimate to the effect of verification bias (including only the studies with a "yes" response to the signalling question "Did all patients receive the same reference standard?"; Abu‐Rustum 2009; Delaloye 2007; Gezer 2020; Lopes 2007; Nejkovic 2017; Papadia 2018; Pelosi 2003; Persson 2017; Torne 2013; Valha 2015), and to the effect of missing data (including only studies with a "yes" to signalling question "Were all patients included in the analysis?"; which was all the studies with the exception of Ballester 2011; Persson 2017). These analyses showed that including only studies where all the women received the same reference standard raised the sensitivity marginally (sensitivity: 93.0%, 95% CI 83.9% to 97.1%) whereas the effect of missing data on the overall sensitivity estimate was negligible (sensitivity: 91.4%, 95% CI 85.8% to 94.9%).

Discussion

Summary of main results

We identified 33 studies from which we were able to extract 2x2 data for inclusion in the review. Overall, the methodological reporting in most of the studies was poor, which resulted in a very large proportion of 'unclear risk if bias' ratings and greatly limited our ability to assess them.

Overall, the mean sentinel lymph node (SLN) detection rate was 86.9% (95% CI 82.9% to 90.8%). In those studies that reported bilateral detection the mean rate was mean bilateral detection rate of 65.4% (95% CI 57.8% to 73.0%). The rates of positive lymph nodes ranged from 5.2% to 34.4% with a mean of 20.1% (95% CI 17.7% to 22.3%). When considered according to which tracer was used, the SLN detection rate ranged from 77.8% (95% CI 70.0% to 85.6%) for blue dye alone (559 women; 11 studies) to 100% for indocyanine green (ICG) and technetium‐99m (32 women; 1 study). The detection rates of those studies that used ICG and near infra‐red fluorescence may be better than other techniques and a combination of a dye and technetium‐99m also may improve detection rates over a single tracer alone. However, there were few studies using some of the tracers, so we are not able to reliably show whether there is an effect of tracer used or not.

The pooled sensitivity of sentinel lymph node biopsy (SLNB) was 91.8% (95% CI 86.5% to 95.1%; n = 2237, of whom 409 had lymph node involvement; moderate‐certainty evidence). The sensitivity of SLNB for the different tracers were: blue dye alone 95.2% (95% CI 77.2% to 99.2%; 559 women; 11 studies; low‐certainty evidence); Technetium‐99m alone 90.5% (95% CI 67.7% to 97.7%; 257 women; 4 studies; low‐certainty evidence); technetium‐99m and blue dye 91.9% (95% CI 74.4% to 97.8%; 548 women; 12 studies; low‐certainty evidence); ICG alone 92.5% (95% CI 81.8% to 97.1%; 953 women; 9 studies; moderate‐certainty evidence); ICG and blue dye 90.5% (95% CI 63.2% to 98.1%; 215 women; 2 studies; low‐certainty evidence); and ICG and technetium‐99m 100.0% (95% CI 63.0% to 100.0%; 32 women; 1 study; very low‐certainty evidence). It is noted that the sensitivity of SLNB for many of the individual tracers is lower than the 95%. However, the percentage of women with positive nodes missed by the test (where a SLN can be detected) varies according to the underlying prevalence of lymph node metastases in the population and is lower than 5% in either a high‐ or low‐risk population ((1.4% (95% CI 0.9 to 2.3%) in a population with high risk (20%) of lymph node metastases and 0.4% (95% CI, 0.2 to 0.6%) in a population with low risk (5%) of lymph node metastases) (See summary of findings Table 1; summary of findings Table 2).

Strengths and weaknesses of the review

Strengths

This is a review of the diagnostic test accuracy of sentinel node biopsy in endometrial cancer. The Cochrane process involves applying predefined inclusion criteria to all potential studies in a standardised way allowing a reduction in heterogeneity, assessment of methodological quality and assessment of the risk of bias. We excluded studies with 10 or fewer women and contacted the lead trial author by email, if we were unable to construct 2x2 table from the results data reported for a study. We discovered that a number of studies contained patients from previously published studies on the same subject. Where we were unable to separate out cohorts, so that individual patients were only counted once, these studies were placed in the awaiting classification section pending trial author clarification. We recorded diagnostic accuracy per hemipelvis and per woman (including both hemipelvis+/‐ para‐aortic nodes) where possible from the data. We included only studies where there was evidence of ultrastaging of sentinel nodes and those where 2x2 data could be extracted and excluded studies with fewer than 10 patients. We included fewer studies than a previous systematic review of SLNB in endometrial cancer, which included 55 studies (4915 participants), of which 47 studies contributed their pooled sensitivity analysis (Bodurtha Smith 2017). They did not require ultrastaging of SLN, calculated an overall sensitivity for SLNB of 96% (95% CI, 91 to 98%) and did not find that ultrastaging improved sensitivity.

Weaknesses

By definition, if the sentinel nodes are positive for malignancy (index test) the pelvic nodes will also be positive (reference test), automatically producing a specificity of 100% for all studies. None of the studies appeared to assess samples from the SLN blindly, as part of the examination of the total lymphadenectomy specimen. Ultrastaging with ultra‐section and immunohistochemistry (IHC) is more sensitive than routine lymph node examination, used in the reference test, and is able to identify almost 50% more positive nodes (Burg 2020). However, many of these nodes contained only individual tumour cells (ITC) or very small micro‐metastases, which may not be clinically significant.

Not all studies detailed how nodes containing ITC were treated i.e. considered positive or negative sentinel nodes. More recent data suggest that sentinel nodes containing only ITC should be considered negative (Goebel 2020). Of note, a variety of different ultrastaging pathological methods were used by the included studies. A recent meta‐analysis concluded due to the large heterogeneity of the studies, assessing which ultrastaging method has the highest detection rate of SLN metastases was not possible (Burg 2020)

Where we were unable to construct a 2x2 table or to separate previously published data, we emailed the trial authors. However, this led to a significant number of studies being placed in the awaiting classification section and not included in the final analysis.

A limited number of studies provided enough data for us to construct a 2x2 table for diagnostic accuracy per hemipelvis. This was somewhat surprising, as all included studies required a full pelvic node dissection to be performed irrespective of finding a sentinel node. Overall, data extraction was challenging and was largely due to variable reporting of data. In many studies, much effort was put into a description of the site of sentinel nodes detected or where positive nodes were found, whereas reporting of the key outcomes of the study, with which the diagnostic test accuracy could be estimated, were missing or difficult to tease out. This is demonstrated by the large number of papers still within the Characteristics of studies awaiting classification section. It is disappointing how few of the trial authors have responded to requests for further information where information required clarification. A key outcome of this review is a need for standardised reporting of diagnostic test accuracy studies to ensure that important data are presented, which would reduce the risk of reporting bias in individual studies, as recommended in the STARD statement (Cohen 2016) . The majority of the studies reported negative predicative values, so most appeared to be, at least in part, performed with the aim of assessing diagnostic test accuracy. We could have opted to exclude studies, if we were unable to extract 2 x 2 data, but opted to leave them as "awaiting classification" where further clarity may allow future inclusion.

Another major weakness of the review is that a number of studies reported patient cohorts that had been added to over time and re‐published, including patients from previously published datasets. Where possible, we have separated out these data, to avoid 'double‐counting' individual patient results. Where this was not possible we contacted trial authors for clarification. Few trial authors responded to our requests and to those that did we are very grateful. Other data were not able to be included and we have allocated these studies to the Characteristics of studies awaiting classification section. Without a formal individual patient data review we cannot completely exclude the possibility that there may be some overlap in the data sets, which may have skewed the results, or other data sets have been incorrectly excluded that could have been included.

Finally, we were unable to assess the risk of most of the main biases in the included studies because the poor reporting outlined above that characterised many of the included studies also extended to these methodological aspects of the studies. This in turn means that we are not able to estimate how reliable our results are and is reflected in the low to very low level of certainty we have in many of the results.

Applicability of findings to the review question

Included studies enrolled women with presumed early‐stage endometrial cancer. It was not clear from studies whether consecutively‐diagnosed women were included and whether there were any exclusion criteria. It would appear that the average body mass index (BMI) in many of the studies may be somewhat lower than encountered in clinical practice (range of median/mean BMI 24.8 to 35.25 kg/m² and under 30 kg/m² in the majority of studies, with the exception of Body 2018; Gezer 2020; Holloway 2012; Holloway 2017; Kuru 2011; Mücke 2014; Rossi 2017; Soliman 2017; Valha 2015). This may be due to difference in obesity levels in the populations where the studies were performed, or because women with higher BMIs, in whom SLN detection is more challenging, may not have been included in the studies. As obesity is a major risk factor for developing endometrial cancer and is driving the increase in endometrial cancer incidence, this may limit the applicability of the findings to everyday clinical practice, especially in populations with high rates of obesity. Five of the nine studies where the mean BMI was >30 kg/m² were ICG studies, which may reflect that ICG may be easier in those with higher BMIs, although the proportion of studies using ICG where the average BMI was over 30 kg/m² was not different to other studies (Chi² = 0.098), and a previous meta‐analysis did not find a difference in SLN detection rates in those studies with an mean BMI ≤ 30 kg/m² compared to those with a BMI >30 kg/m² (Bodurtha Smith 2017). However, the SLN detection failure rates in studies that use ICG may be lower than those that did not (all > 90% successful detection rates).

Figure 1

Study flow diagram.

Figure 2

Risk of bias and applicability concerns summary: review authors' judgements about each domain for each included study

Figure 3

Figure 3 illustrates the (per patient) sensitivity of the included studies ordered by sensitivity.

Figure 4

Test 1

Blue dye alone per patient (all injection sites)

Test 2

Blue dye alone (cervical injection)

Test 3

Blue dye alone (subserosal injection)

Test 4

Technetium‐99m alone per patient (all injection sites)

Test 5

Technetium‐99m alone per patient (cervical injections)

Test 6

Technetium‐99m alone per patient (subserosal injection)

Test 7

Technetium‐99m & blue dye per patient

Test 8

ICG alone per patient

Test 9

ICG and blue dye per patient

Test 10

ICG and Technecium‐99m per patient

Test 11

Nanoparticles per patient

Test 18

SLNB ‐ all tracers per patient

Summary of findings 1. Summary of findings table (prevalence of positive LN rate 20%*)

Index test	Number of patients (number of studies)	Mean SLN detection rate (95% CI)¹	Pooled sensitivity results per woman (95% CI)²	Consequences in a cohort of 1000 women undergoing SLNB, assuming the prevalence of LN metastases to be 20%			Certainty of evidence
Review question: what is the diagnostic accuracy of different traces for the detection of sentinel lymph nodes? Patients/population: female adults with presumed early‐stage endometrial (womb) cancer Role: prognostic information for guiding adjuvant therapy after surgery Index tests: sentinel lymph node biopsy (SLNB) after injection of tracer substance/s into the cervix or uterine muscle Threshold for index tests: detection of micrometastases following ultrastaging (ultrasection of sentinel LN and immunohistochemistry). Presence of individual tumour cells (ITCs) excluded as positive result where possible. Reference standards: full pelvic +/‐ para‐aortic lymphadenectomy and standard histological examination Studies: cross‐sectional and cohort studies Setting: secondary/tertiary inpatient care at time of surgery for endometrial cancer
				Women with no SLN detected (i.e., failed test; 95% CI)³	Women with metastatic nodes diagnosed by index test (TP; 95% CI)⁴	Women with metastatic nodes missed by index test (FN; 95% CI)⁵
1. All tracers	2237 (33 studies)	86.9% (82.9% to 90.8%)	91.8% (86.5% to 95.1%	131 (92 to 171)	160 (150 to 162)	14 (9 to 23)	⊕⊕⊕⊖ moderate ⁶
2. Blue dye alone	559 women (11 studies)	77.8% (70.0% to 85.6%)	95.2% (77.2% to 99.2%)	222 (144 to 300)	148 (120 to 154)	7 (1 to 35)	⊕⊕⊖⊖ low ⁷
3. Technetium‐99m alone	257 women (4 studies)	80.9% (63.9 to 97.9)	90.5% (67.7% to 97.7%)	191 (21 to 361)	146 (109 to 158)	15 (4 to 52)	⊕⊕⊖⊖ low ⁷
4. Technetium‐99m and blue dye	548 women (12 studies)	86.3% (80.7 to 91.9)	91.9% (74.4% to 97.8%)	137 (81 to 193)	159 (128 to 169)	14 (4 to 44)	⊕⊕⊖⊖ low ⁷
5. ICG alone	953 women (9 studies)	92.4% (88.7 to 96.2)	92.5% (81.8% to 97.1%)	76 (38 to 113)	171 (151 to 179)	14 (6 to 34)	⊕⊕⊕⊖ moderate ⁶
6. ICG and blue dye	215 women (2 studies)	96.7% (92.7 to 100)	90.5% (63.2% to 98.1%)	33 (0 to 73)	175 (122 to 190)	18 (4 to 71)	⊕⊕⊖⊖ low ⁷
7. ICG and Technetium‐99m	32 women (1 study)	100%	100% (63% to 100%)	0	200 (126 to 200)	0 (0 to 74)	⊕⊖⊖⊖ very low ⁸
* Prevalence of positive LN rate 20% chosen to represent those with higher risk of LN metastasis (as per Creasman 1987). A false‐positive result cannot occur, as the histological examination of the SLN is unchanged by the results from any additional nodes removed at systematic lymphadenectomy. Abbreviations SLN: sentinel lymph node SLNB: sentinel lymph node biopsy LN: lymph node ICG:indolcyanine green dye (visualised with near infra‐red fluorescence) TP: true positive FN: false negative
GRADE certainty of the evidence High: we are very confident that the true effect lies close to that of the estimate of the effect. Moderate: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different. Low: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect. Very low: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.
¹ Calculated as the arithmetic mean of the total number of SLNs detected out of the total number of women included in the included studies with the woman as the unit of analysis. ² Calculated as the pooled estimate of sensitivity, using a univariate random‐effects logistic regression model (Takwoingi 2017) with the woman as the unit of analysis. ³ Calculated by subtracting the mean SLN detection rate estimates from 1000. ⁴ Calculated by subtracting the number of women with no SLN detected (i.e., with a failed test) from 1000 and then multiplying that with the prevalence and the pooled sensitivity estimate. ⁵ Calculated by subtracting the number of women with no SLN detected (i.e., with a failed test) from 1000 and then multiplying that with the prevalence and the false negative rate estimates. The false negative rate estimates were calculated by subtracting the sensitivity estimates from 100, ⁶ Downgraded by 1 level for risk of bias (for a combination of unclear patient selection and unclear risk of publication bias) ⁷ Downgraded by 2 levels; 1 for risk of bias (1 level for a combination of unclear patient selection and unclear risk of publication bias) and imprecision (1 level for wide confidence intervals). ⁸ Downgraded by 3 levels: 1 level for risk of bias (unclear patient selection and unclear risk of publication bias) and 2 levels for imprecision (1 small study with few true positive nodes with wide confidence intervals).

Summary of findings 1. Summary of findings table (prevalence of positive LN rate 20%*)

Summary of findings 2. Summary of findings table (prevalence of positive LN rate 5%*)

Index test	Number of patients (number of studies)	Mean SLN detection rate (95% CI)¹	Pooled sensitivity results per woman (95% CI)²	Consequences in a cohort of 1000 women undergoing SLNB, assuming the prevalence of LN metastases to be 5%			Certainty of evidence
Review question: what is the diagnostic accuracy of different traces for the detection of sentinel lymph nodes? Patients/population: female adults with presumed early‐stage endometrial (womb) cancer Role: prognostic information for guiding adjuvant therapy after surgery Index tests: sentinel lymph node biopsy (SLNB) after injection of tracer substance/s into the cervix or uterine muscle Threshold for index tests: detection of micrometastases following ultrastaging (ultrasection of sentinel LN and immunohistochemistry). Presence of individual tumour cells (ITCs) excluded as positive result where possible. Reference standards: full pelvic +/‐ para‐aortic lymphadenectomy and standard histological examination Studies: cross‐sectional and cohort studies Setting: secondary/tertiary inpatient care at time of surgery for endometrial cancer
				Women with no SLN detected (i.e., failed test; 95% CI)³	Women with metastatic nodes diagnosed by index test (TP; 95% CI)⁴	Women with metastatic nodes missed by index test (FN; 95% CI)⁵
1. All tracers	2237 (33 studies)	86.9% (82.9% to 90.8%)	91.8% (86.5% to 95.1%)	131 (92 to 171)	40 (38 to 41)	4 (2 to 6)	⊕⊕⊕⊖ moderate ⁶
2. Blue dye alone	559 women (11 studies)	77.8% (70.0% to 85.6%)	95.2% (77.2% to 99.2%)	222 (144 to 300)	37 (30 to 39)	2 (0 to 9)	⊕⊕⊖⊖ low ⁷
3. Technetium‐99m alone	257 women (4 studies)	80.9% (63.9 to 97.9)	90.5% (67.7% to 97.7%)	191 (21 to 361)	37 (27 to 40)	4 (1 to 13)	⊕⊕⊖⊖ low ⁷
4. Technetium‐99m and blue dye	548 women (12 studies)	86.3% (80.7 to 91.9)	91.9% (74.4% to 97.8%)	137 (81 to 193)	40 (32 to 42)	3 (1 to 11)	⊕⊕⊖⊖ low ⁷
5. ICG alone	953 women (9 studies)	92.4% (88.7 to 96.2)	92.5% (81.8% to 97.1%)	76 (38 to 113)	43 (38 to 45)	3 (1 to 8)	⊕⊕⊕⊖ moderate ⁶
6. ICG and blue dye	215 women (2 studies)	96.7% (92.7 to 100)	90.5% (63.2% to 98.1%)	33 (0 to 73)	44 (31 to 47)	5 (1 to 18)	⊕⊕⊖⊖ low ⁷
7. ICG and Technetium‐99m	32 women (1 study)	100%	100% (63% to 100%)	0	50 (32 to 50)	0 (0 to 19)	⊕⊖⊖⊖ very low ⁸
* Prevalence of positive LN rate 5% chosen to represent those with lower risk of LN metastasis (as per Creasman 1987). A false‐positive result cannot occur, as the histological examination of the SLN is unchanged by the results from any additional nodes removed at systematic lymphadenectomy. Abbreviations SLN: sentinel lymph node SLNB: sentinel lymph node biopsy LN: lymph node ICG:indolcyanine green dye (visualised with near infra‐red fluorescence) TP: true positive FN: false negative
GRADE certainty of the evidence High: we are very confident that the true effect lies close to that of the estimate of the effect. Moderate: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different. Low: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect. Very low: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.
¹ Calculated as the arithmetic mean of the total number of SLNs detected out of the total number of women included in the included studies with the woman as the unit of analysis. ² Calculated as the pooled estimate of sensitivity, using a univariate random‐effects logistic regression model (Takwoingi 2017) with the woman as the unit of analysis. ³ Calculated by subtracting the mean SLN detection rate estimates from 1000. ⁴ Calculated by subtracting the number of women with no SLN detected (i.e., with a failed test) from 1000 and then multiplying that with the prevalence and the pooled sensitivity estimate. ⁵ Calculated by subtracting the number of women with no SLN detected (i.e., with a failed test) from 1000 and then multiplying that with the prevalence and the false negative rate estimates. The false negative rate estimates were calculated by subtracting the sensitivity estimates from 100, ⁶ Downgraded by 1 level for risk of bias (for a combination of unclear patient selection and unclear risk of publication bias) ⁷ Downgraded by 2 levels; 1 for risk of bias (1 level for a combination of unclear patient selection and unclear risk of publication bias) and imprecision (1 level for wide confidence intervals). ⁸ Downgraded by 3 levels: 1 level for risk of bias (unclear patient selection and unclear risk of publication bias) and 2 levels for imprecision (1 small study with few true positive nodes with wide confidence intervals).

Summary of findings 2. Summary of findings table (prevalence of positive LN rate 5%*)

Table 1. FIGO staging for endometrial cancer

Stage I:	Cancer confined to the uterus (womb), which has not spread to other parts of the body
Stage IA:	Cancer confined to the endometrium, or less than one‐half of the myometrium
Stage IB:	Cancer spread to the outer half of the myometrium
Stage II:	Cancer spread from the uterus to the cervical stroma, but not to other parts of the body
Stage III:	Cancer spread beyond the uterus, but it is still only in the pelvic area
Stage IIIA:	Cancer spread to serosa of the uterus, fallopian tubes and ovaries, or a combination, but not to other parts of the body
Stage IIIB:	Cancer spread to the vagina or parametria
Stage IIIC1:	Cancer spread to the regional pelvic lymph nodes
Stage IIIC2:	Cancer spread to the para‐aortic lymph nodes with or without spread to the regional pelvic lymph nodes
Stage IV:	Direct invasion into adjacent organs or distant spread
Stage IVA:	Cancer spread to the mucosa of the rectum or bladder
Stage IVB:	Cancer spread to lymph nodes in the groin area, or it has spread to distant organs, such as the bones or lungs
Pecorelli 2009

Table 1. FIGO staging for endometrial cancer

Table 2. QUADAS‐2 assessment criteria

Item	Description
Domain 1: Patient selection
A. Risk of bias
Was a consecutive or random sample of participants enrolled?	Yes, if a consecutive or random sample of participants were enrolled No, if a consecutive or random sample of participants were not enrolled Unclear, if the study did not describe the method of participants enrolment
Was a case‐control design avoided?	The answer to this item will be 'yes' for all the included studies, as one of the exclusion criteria is case‐control studies
Did the study avoid inappropriate exclusions?	Yes, if the characteristics of the participants were well described and probably typical of a secondary healthcare setting No, if the sample was unrepresentative of people with apparent early‐stage endometrial cancer (i.e. a unexpectedly high proportion with rarer high grade histopathological types of endometrial cancer) Unclear, if the source or characteristics of participants was not adequately described
Could the selection of participants have introduced bias?	A judgement of low, high, or unclear risk of bias will be made, based on a balanced assessment of the responses to the above signalling questions.
B. Concerns about applicability
Are there concerns that the included participants and setting do not match the review question?	A judgement of low, high, or unclear concerns about applicability will be made based on a balanced assessment of the response to the third signalling question above, and on how closely the sample matched the target population of interest
Domain 2: Index test
A. Risk of bias
Were the index test results interpreted without knowledge of the results of the reference standard?	Yes, if the report stated that the person undertaking the index test did not know the results of the reference tests No, if the report stated that the same person performed both tests, or that the results of the index tests were known to the person undertaking the reference tests Unclear, if insufficient information was provided
Did the study provide a clear definition of what was considered to be a positive result?	Yes, if the definition of a diagnosis of lymph node metastasis was clearly stated, including ultrastaging of the sentinel node No, if no definition of what was considered a positive result of lymph node metastasis was stated, or the definition of a positive result varied between the participants Unclear, if not enough information was given to permit judgement
If a threshold was used, was it prespecified?	This item is not applicable, as the test is not subject to a threshold.
Could the conduct or interpretation of the index test have introduced bias?	A judgement of low, high, or unclear risk of bias will be made based on a balanced assessment of the responses to the above signalling questions.
B. Concerns about applicability
Are there concerns that the index test, its conduct, or interpretation differ from the review question?	A judgement of low, high, or unclear concerns about applicability will be made, based on a balanced assessment of the information detailed under ’Index test’ in the 'Characteristics of included studies' tables.
Domain 3: Reference standard
A. Risk of bias
Is the reference standards likely to correctly classify the target condition?	Yes, if a full pelvic or aortic node dissection (or both) was carried out adequately to correctly classify the target condition. A pelvic node dissection includes bilateral removal of nodal tissue from the distal one‐half of each common iliac artery, the anterior and medial aspect of the proximal half of the external iliac artery and vein, and the distal half of the obturator fat pad anterior to the obturator nerve. A para‐aortic node includes resection of nodal tissue over the distal vena cava, from the level of the inferior mesenteric artery (or left renal vein) to the mid‐right common iliac artery, and between the aorta and the left ureter, from the inferior mesenteric artery to the left mid‐common iliac artery. No, if a full pelvic or aortic node dissection (or both) has not been carried out adequately to correctly classify the target condition. Unclear, if insufficient information was provided to assess whether the reference standard had been carried out adequately (e.g. no mention of site or number of lymph node yield).
Were the reference standard results interpreted without knowledge of the results of the index tests?	Yes, if the report stated that for SLNB, the histological examination from the lymphadenectomy was done without knowledge of the sentinel lymph node status. No, if the report stated that histological examination of the pelvic lymphadenectomy was carried out with the knowledge of the sentinel lymph node status. Unclear, if insufficient details given as to whether the histological examination was carried out with or without knowledge of sentinel lymph node status.
Could the reference standard, its conduct, or its interpretation have introduced bias?	A judgement of low, high, or unclear risk of bias will be made, based on a balanced assessment of the responses to the above signalling questions.
B. Concerns about applicability
Are there concerns that the target condition, as defined by the reference standard, does not match the question?	The answer to this question will always be low, because the target condition that the reference standard defines will always be the target condition of the review, i.e. adenocarcinoma of the endometrium of endometrioid, serous, clear cell, or mixed types, with apparent cancer confined to the uterus. Otherwise, the study will not be included.
Domain 3: Flow and timing
A. Risk of bias
Was there an appropriate interval between index test and reference standard?	Yes, if the time period between the index test and the reference standard was no more than one month No, if the time period between the index test and the reference standard was longer than one month Unclear, if insufficient information was provided
Did all participants receive the same reference standard?	Yes, if the same reference test was used, regardless of the index test results No, if different reference tests were used, depending on the results of the index test Unclear, if insufficient information was provided Report if any participants received a different reference test, what the reasons stated for this were, and how many participants were involved.
Were all participants included in the analysis?	Yes, if there were no participants excluded from the analysis, or if exclusions were adequately described No, if there were participants excluded from the analysis and there was no explanation given Unclear, if not enough information was given to assess whether any participants were excluded from the analysis Report how many participants were excluded from the analysis for reasons other than uninterpretable results Report how many results were uninterpretable (of the total)
Could the patient flow have introduced bias?	A judgement of low, high, or unclear risk of bias will be made, based on a balanced assessment of the responses to the above signalling questions.

Table 2. QUADAS‐2 assessment criteria

Table 3. Sentinel lymph node detection rate by tracer

Tracer (studies)	Number detected/total number (%; 95% CI)
Blue dye alone (11)	435/559 (77.8; 95%CI 70.0 to 85.6)
Technetium‐99m alone (4)	208/257 (80.9; 95% CI 63.9 to 97.9)
Technetium‐99m and blue dye (12)	473/548 (86.3; 95% CI 80.7 to 91.9)
ICG alone (9)	881/953 (92.4; 95% CI 88.7 to 96.2)
ICG and blue dye (2)	208/215 (96.7; 95% CI 92.7 to 100)
ICG and technetium‐99m (1)	32/32 (100%)
CI: confidence interval ICG: Indocyanine green

Table 3. Sentinel lymph node detection rate by tracer

Table 4. Heterogeneity analyses

Covariate	Studies (women)	Sensitivity % (95% CI)	Statistical significance
FIGO stage
Majority of women with 1a	16 (1252)	91.1 (80.7 to 96.2)	Equal variances assumed: Chi² (1) = 0.13; P = 0.72 Separate variances assumed: Chi² (2) = 1.21; P = 0.55
Majority of women with 1b or above	3 (95)	91.3 (71.1 to 97.8)
Sentinel lymph node identification method
Blue dye alone	11 (435)	95.2 (77.2 to 99.2)	6 levels (all tracers and tracer combinations): Equal variances assumed: Chi² (5) = 2.29; P = 0.81 3 levels (blue dye alone, ICG alone, technetium‐99m and blue dye): Equal variances assumed: Chi² (2) = 0.22; P = 0.90 Separate variances assumed: Chi² (4) = ‐0.55; P = 1
ICG alone	9 (881)	92.5 (81.8 to 97.1)
Technetium‐99m alone	4 (208)	90.5 (67.7 to 97.7)
ICG and blue dye	2 (208)	90.5 (63.2 to 98.1)
Technetium‐99m and blue dye	12 (473)	91.9 (74.4 to 97.8)
Technetium‐99m and ICG	1 (32)	100
Injection site
Subserosal	11 (445)	90.4 (82.6 to 94.9)	3 levels (subserosal, cervical and combined/mixed): Equal variances assumed: Chi² (2) = 1.85; P = 0.4 2 levels (subserosal, cervical): Equal variances assumed: Chi² (1) = 0.07; P = 0.79 Separate variances assumed: Chi² (2) = 4.39; P = 0.11
Cervical	23 (1746)	91.7 (84.0 to 95.9)
Combined subserosal and cervical	1 (10)	100
Cervical with/without subserosal	1 (36)	100
Lymph node basin
Pelvic	23 (1405)	90.5 (81.9 to 95.3)	Equal variances assumed: Chi² (1) = 1.48; P = 0.22 Separate variances assumed: Chi² (2) = 4.4; P = 0.11
Pelvic and para‐aortic	10 (779)	94.0 (88.8 to 96.8)
CI: confidence interval ICG: Indocyanine green

Table 4. Heterogeneity analyses

Table Tests. Data tables by test

Test	No. of studies	No. of participants
1 Blue dye alone per patient (all injection sites) Show forest plot	11	435

2 Blue dye alone (cervical injection) Show forest plot	7	302

3 Blue dye alone (subserosal injection) Show forest plot	5	133

4 Technetium‐99m alone per patient (all injection sites) Show forest plot	4	208

5 Technetium‐99m alone per patient (cervical injections) Show forest plot	2	48

6 Technetium‐99m alone per patient (subserosal injection) Show forest plot	4	160

7 Technetium‐99m & blue dye per patient Show forest plot	12	473

8 ICG alone per patient Show forest plot	9	881

9 ICG and blue dye per patient Show forest plot	2	208

10 ICG and Technecium‐99m per patient Show forest plot	1	32

11 Nanoparticles per patient Show forest plot	0	0

18 SLNB ‐ all tracers per patient Show forest plot	33	2237

Table Tests. Data tables by test