Scolaris Content Display Scolaris Content Display

Individualised gonadotropin dose selection using markers of ovarian reserve for women undergoing IVF/ICSI

Esta versión no es la más reciente

Contraer todo Desplegar todo

Abstract

This is a protocol for a Cochrane Review (Intervention). The objectives are as follows:

This review has two objectives. These are: to assess the comparative effectiveness (pregnancy and live birth) and safety (ovarian hyperstimulation syndrome) in women undergoing in vitro fertilisation/intracytoplasmic sperm injection (IVF/ICSI):

  1. of different doses of gonadotropin in women subgrouped by their expected response to stimulation, as defined by at least one ovarian reserve test (ORT) measure (do women with low, moderate or high anticipated response to ovarian stimulation, based on an ORT, benefit from a modified gonadotropin dose?); and

  2. of individualisation of gonadotropin dose using ORT, as compared to dose selection without ORT, or to an alternative individualised dosing algorithm using ORT (does using ORT to individualise gonadotropin dose improve IVF/ICSI outcomes, and is there evidence to suggest one algorithm is better than another?).

Background

Description of the condition

As many as 15% of couples experience difficulty getting pregnant and are defined as being subfertile (Thoma 2013). Treatments are available to help these couples conceive, such as intrauterine insemination (IUI), ovulation induction, in vitro fertilisation (IVF) and intracytoplasmic sperm injection (ICSI). IVF with or without ICSI (together referred to as IVF/ICSI) is the leading treatment for most causes of infertility, however the success rate remains modest at approximately 30% per cycle started (Gunby 2010; Macaldowie 2013).

During an IVF/ICSI cycle, daily doses of the gonadotropin follicle‐stimulating hormone (FSH) are used to induce multifollicular development. Generally the dose of gonadotropin used is associated with the number of eggs retrieved, however the response of individual women is variable (Andersen 2006; Sunkara 2011). A poor ovarian response has been classified as the retrieval of three or fewer oocytes in gonadotropin‐releasing hormone (GnRH) agonist co‐treated cycles (Ferraretti 2011), and often results in cycle cancellation and generally poor outcome, and consequent stress and disappointment to the couple. The prevalence of poor response increases with age: approximately 10% to 15% of women aged 35 to 40 experience a poor response (Ferraretti 2011). Conversely, a hyper‐response is often defined as the retrieval of 15 to 20 or more oocytes and is associated with an exponential increase in the risk of ovarian hyperstimulation syndrome (OHSS) (Steward 2014; Youssef 2016). The incidence of OHSS is difficult to determine as there is no strict consensus definition (ASRM 2016). Historically, mild and moderate forms of OHSS were reasonably common, occurring in approximately 0% to 30% and 3% to 6% of cycles respectively (Delvigne 2002). Severe OHSS is much less common, but has potential to cause thromboembolic phenomena, multiple organ failure and death (Delvigne 2002). More recent estimates of the incidence of moderate OHSS range from 0.6% to 5% per IVF/ICSI cycle (ASRM 2016; Calhaz‐Jorge 2016; Kawwass 2015). Estimates of the rate of hospitalisation due to severe OHSS range from less than 0.01% to 0.3% of cycles (ESHRE 2014; Harris 2016). This rate increases with the number of oocytes retrieved, reaching 4% with the retrieval of over 20 oocytes (Harris 2016). The prevalence of OHSS has reduced due to the predictive ability of clinicians, and the availability of new strategies such as agonist triggering, and methods such as cycle cancellation and freeze‐all (Humaidan 2016; Mourad 2017).

The aim of most IVF cycles is to produce an embryo that leads to the live birth of a baby. It is thought that obtaining a number of high‐quality oocytes is an important step in this process. The number of eggs retrieved is associated with the probability of pregnancy, with the optimal number of eggs reported to be approximately 15 (Sunkara 2011). A yield of fewer or more than 15 eggs is associated with a reduced probability of pregnancy, the latter thought to be because of a deleterious effect of increased estradiol levels on the probability of implantation. The number of retrieved oocytes is believed to depend on many patient and treatment factors, however two of the most important variables are the dose of gonadotropin administered and the size of the pool of recruitable follicles. Up to certain limits, the increase in the gonadotropin dose may increase the number of growing follicles and the resulting oocyte yields. As a consequence, the use of a very low dose of gonadotropin may increase the risk of developing poor response; similarly, a very high dose of gonadotropin may lead to high risk of hyper‐response (in women with proper ovarian reserve).

Description of the intervention

In order to achieve an optimal ovarian response, and avoid either a poor or a hyper‐response, the dose of gonadotropin used may be tailored to subgroups of women with similar characteristics (Antoniou 2016). Although there is a clear relationship between declining fertility and female age, this relationship is highly variable. Therefore, a number of demographic and clinical factors can also be considered, as well as measures of functional ovarian reserve, for which tests have been developed. The oldest ovarian reserve test (ORT) is basal FSH (bFSH), measured in serum in the early follicular phase of a menstrual cycle. This was later supplemented by the antral follicular count (AFC), and more recently with Anti‐Müllerian hormone (AMH). AFC is measured by ultrasound and is a count of the number of antral follicles in both ovaries measured using standardised criteria, normally between 2 to 10 mm (Broekmans 2010). It provides an indication of the number of follicles available for stimulation in an IVF cycle. AMH is a protein expressed and secreted by the granulosa cells of the ovary and reflects the size of antral and pre‐antral follicles (Visser 2006). The AMH can be measured in serum and is a more direct and independent measure of the growing pre‐antral and antral follicular pool.

How the intervention might work

Increased doses of gonadotropin are observed to lead to an increased number of follicles. Therefore the potential arises for an individual woman to be administered a tailored dose of gonadotropin depending on her ovarian reserve. Individualisation of the gonadotropin dose requires two components. Firstly, there must be a tool that can predict a woman's response when given a particular dose of gonadotropin. Secondly, there must be a dose‐response relationship between gonadotropin and ovarian response, so that the response can be manipulated by adjusting the dose administered.

In relation to the first point, diagnostic test studies have reported that measures of ovarian reserve can be used to predict ovarian response to stimulation, with AMH and AFC being superior to bFSH (Broekmans 2006; Broer 2013a; Broer 2013b). An individual patient data meta‐analysis found that for predicting excessive response, the performance of both AMH and AFC were high and similar (areas under the receiver operator characteristic curves (AUC) of 0.81 and 0.79, respectively) (Broer 2013a). However, bFSH had lower predictive value (AUC of 0.66). Predictive performance was improved by combining AMH and AFC (AUC 0.85). A second individual patient data meta‐analysis showed that AFC and AMH as single tests both had high predictive value for poor response (AUC 0.78 and 0.76, respectively), and that combining these two tests did not substantially improve prediction (AUC 0.80, P = 0.19) (Broer 2013b).

In relation to the second point, a recent study indicated that increasing the dose of gonadotropin increases oocyte yield (Arce 2014). For example, women who are administered higher gonadotropin doses will produce more follicles than those administered lower gonadotropin doses. However it has also been suggested that the capacity to manipulate a woman's ovarian response may largely depend on her ovarian reserve. In particular, if a woman has relatively few antral follicles (and consequently, is predicted to have a low ovarian response) then it may not be possible to compensate for this fact by increasing the gonadotropin dose (Klinkert 2005; Lekamge 2008). There is likely to be more scope for tailoring the response in women with a larger pool of antral follicles. It is important to remember that the relationship between the stimulation response and probability of pregnancy is poorly understood, and therefore the use of surrogate outcomes such as number of eggs retrieved may be an insufficient measure compared to pregnancy or live birth (Vail 2003). In fact, the above‐mentioned individual patient data analysis (Broer 2013b) found that ORTs did not improve predictions of ongoing pregnancy following IVF compared to using age alone.

Why it is important to do this review

IVF/ICSI is expensive, invasive and requires extensive clinical monitoring. The process often requires a substantial financial investment from the couple, including time away from work, and is associated with a high emotional burden. If the most beneficial dose of gonadotropin can be anticipated or predicted for an individual to ensure a higher probability of a moderate response, individualised gonadotropin dosing has the potential to increase the probability of pregnancy and live birth while reducing the chances of: a cancelled cycle for either poor or hyper‐response, OHSS, initiating cycles with low probability of success, and women and couples withdrawing from treatment due to emotional stress.

Objectives

This review has two objectives. These are: to assess the comparative effectiveness (pregnancy and live birth) and safety (ovarian hyperstimulation syndrome) in women undergoing in vitro fertilisation/intracytoplasmic sperm injection (IVF/ICSI):

  1. of different doses of gonadotropin in women subgrouped by their expected response to stimulation, as defined by at least one ovarian reserve test (ORT) measure (do women with low, moderate or high anticipated response to ovarian stimulation, based on an ORT, benefit from a modified gonadotropin dose?); and

  2. of individualisation of gonadotropin dose using ORT, as compared to dose selection without ORT, or to an alternative individualised dosing algorithm using ORT (does using ORT to individualise gonadotropin dose improve IVF/ICSI outcomes, and is there evidence to suggest one algorithm is better than another?).

Methods

Criteria for considering studies for this review

Types of studies

Published and unpublished randomised controlled trials (RCTs) will be eligible for inclusion. We will exclude non‐randomised and quasi‐randomised studies (e.g. studies with evidence of inadequate sequence generation such as alternate days, patient numbers) as they are associated with a high risk of bias. Data from ongoing studies will not be used; such studies will be referenced as ongoing, to be incorporated (once complete) in future updates of the review. Cross‐over trials are eligible, but only data from the first phase may be used in meta‐analyses, as the cross‐over is not a valid design in the context of fertility trials.

Several trial designs may be employed in relation to the broad goal of investigating aspects of individualised ovarian stimulation dose, each with their associated advantages and disadvantages (Tajik 2013). Broadly the two types of design included in this review are as follows.

Type 1. Randomise women within a given ORT range to one of several doses of gonadotropin (Objective 1).

Type 2. Randomise women to either have a dose selected according to their ORT value using an algorithm, or to a dose selection without ORT, or an alternative algorithm (Objective 2).

The first type of design allocates women of a given profile to one of two (or more) doses of gonadotropin, so that the responses of similar women under each of the doses can be observed and compared. An example is a trial of women with low AMH who are randomised to two different doses of gonadotropin. This type of design is useful for such purposes as establishing whether there is a dose‐response relationship between gonadotropin and outcome in subgroups of women, or for identifying the optimal gonadotropin dose for women with a given set of predictive characteristics (for example Arce 2014). This design is able to tell us whether certain groups of women would benefit from a modified gonadotropin dose, or not (Tajik 2013).

The second type of design randomises women to either an individualised dose of gonadotropin based on an algorithm using their individual ORT, or to a gonadotropin dose selected without using ORT. In this design, all women in the control arm receive the same dose of gonadotropin, and women in the intervention arm will receive different doses of gonadotropin according to their individual characteristics, such as AMH level. A variant on this type of design randomises women to one of two (or more) individualised dose‐selection algorithms/policies (for example Popovic‐Todorovic 2003). The purpose of designs of this type is to establish the effectiveness of an individualised dose‐selection algorithm, which includes both allocation to a particular dose according to predictive characteristics and the administration of that dose of gonadotropin, compared to either a fixed dose or alternative dose‐selection policy.

Studies of both design types will be included in this review in separate comparisons. Some trials are not explicitly presented as falling into one of the above types of design but can nonetheless be interpreted and analysed in such a way that they are equivalent. Where trials can be analysed in such a way, they will be eligible for this review.

Types of participants

Type 1 studies

For these studies to be eligible, the study population must be women undergoing IVF/ICSI, categorised as either predicted poor, moderate or hyper‐responders based on at least one ORT (AMH, bFSH or AFC) (or providing data that enable categorisation by review authors). Studies including unselected populations will not be eligible, unless data from eligible subgroups within the studies can be obtained.

Type 2 studies

Studies of this type may include an unselected population.

Studies of women undergoing IVF/ICSI will be eligible for inclusion.

Studies in women who do not plan to undergo embryo transfer, for example women planning oocyte donation or fertility preservation, or who are receiving donated oocytes, will be excluded. Studies solely including women with polycystic ovarian syndrome (PCOS) will be excluded, as these women present a distinct clinical entity and likely warrant unique individualised dosing algorithms. Studies including some women with PCOS will be included, however we will attempt to obtain the data excluding women with PCOS. There are no exclusion criteria related to age, cause of infertility or previous IVF/ICSI exposure.

Types of interventions

Included interventions

Studies comparing individualised ovarian stimulation doses with each other (Type 1 studies) or comparing an individualised dose of gonadotropin with a non‐individualised dose of gonadotropin (Type 2 studies), will be eligible for inclusion. Individualised protocols include those where the decision of which gonadotropin dose to use is based, at least in part, on the individual's ORT measure (e.g. AMH, AFC, bFSH). Protocols where the dose is selected on the basis of combinations of characteristics will also be included. Studies comparing doses of human menopausal gonadotrophin (HMG), which contains both FSH and luteinising hormone, will also be included.

For studies in which women are randomised to either an individualised dosing regimen or to dose selection without ORT/competing algorithm (Type 2 in Types of studies), studies comparing different preparations and brands will be eligible, provided that the same dose‐selection algorithm is not used in all study arms. This reflects the more pragmatic nature of the questions being answered by these designs. Studies that allow dose adjustment following a certain number of days of administration of the randomised dose will be included, provided that adjustment is permitted in both study arms. This will be subject to sensitivity analysis.

Excluded interventions

In relation to studies where women of the same profile may be randomised to one of several doses of gonadotropin (Type 1 in Types of studies), we will exclude studies comparing different preparations, brands, or routes of administration of otherwise identical doses.

We will exclude:

  • studies comparing HMG to pure FSH preparations;

  • studies using medications other than gonadotropins, such as clomiphene citrate or letrozole;

  • studies comparing doses of corifollitropin alfa;

  • studies comparing step‐up/step‐down protocols, or protocols where the dose of gonadotropin is amended in one arm only after stimulation has commenced, for example coasting or withholding gonadotropin for a number of days; and

  • studies comparing different stimulation regimens (for example, GnRH agonist versus GnRH antagonist).

Types of outcome measures

Primary outcomes

1. Live birth or ongoing pregnancy per woman randomised. Ongoing pregnancy is defined as evidence of a gestational sac with fetal heart motion at or after twelve weeks gestation, confirmed with ultrasound (Harbin Consensus Workshop Group 2014). Ongoing pregnancy data will only be used when live birth data are not available. In the event that studies include multiple cycles for an individual women, cumulative live birth will also be reported. If studies report the live birth outcome of the fresh transfer and the first frozen transfer for women with freeze‐all cycles, this outcome will be also reported separately.

2. Severe ovarian hyperstimulation syndrome (OHSS) (as defined by authors) per woman randomised.

Secondary outcomes

3. Clinical pregnancy per woman randomised, defined as evidence of an intrauterine gestational sac on ultrasound or other definitive signs of pregnancy, and including ectopic pregnancy.

4. Time to clinical pregnancy per woman randomised.

5. Multiple pregnancy per woman randomised and per clinical pregnancy.

6. Average number of oocytes retrieved per woman randomised.

7. Poor response to stimulation per woman randomised (as defined and prespecified by trial authors).

8. Moderate response to stimulation per woman randomised (as defined and prespecified by trial authors).

9. High response to stimulation per woman randomised (as defined and prespecified by trial authors).

10. Cycle cancellations for hyper‐response (including freeze‐all cycles) per woman randomised.

11. Cycle cancellations for poor response per woman randomised.

12. Cycle cancellations for poor or hyper‐response per woman randomised.

13. Women with at least one transferable embryo per woman randomised.

14. Mean total dose of gonadotropin per woman randomised.

15. Mean number of days of gonadotropin per woman randomised.

16. Cost per woman randomised.

17. Moderate or severe OHSS (as defined by study authors) per woman randomised.

Search methods for identification of studies

We will search for all published and unpublished RCTs that meet our inclusion criteria, without language or date restriction and in consultation with the Cochrane Gynaecology and Fertility Group (CGF) Information Specialist.

Electronic searches

We will search the following electronic databases, trial registers and websites.

  • The Gynaecology and Fertility Group (CGF) Specialised Register of Controlled Trials (from inception onwards).

  • The Cochrane Central Register of Studies Online (CRSO) (from inception onwards).

  • MEDLINE (from 1946 onwards).

  • Embase (from 1980 onwards).

  • PsycINFO (from 1806 onwards).

  • CINAHL (from 1961 onwards).

The MEDLINE search will be combined with the Cochrane highly sensitive search strategy for identifying randomised trials which appears in the Cochrane Handbook of Systematic Reviews of Interventions section 6.4.11 (Higgins 2011). The Embase, PsycINFO and CINAHL searches will be combined with trial filters developed by the Scottish Intercollegiate Guidelines Network (SIGN) www.sign.ac.uk/methodology/filters.html#random.

Other electronic sources of trials will be searched from their inception onwards and will include:

  • trial registers for ongoing and registered trials: www.clinicaltrials.gov (a service of the US National Institutes of Health) and www.who.int/trialsearch/Default.aspx (The World Health Organisation International Trials Registry Platform search portal);

  • DARE (Database of Abstracts of Reviews of Effects) on the Cochrane Library: onlinelibrary.wiley.com/o/cochrane/cochrane_cldare_articles_fs.html (for reference lists from relevant non‐Cochrane reviews);

  • the Web of Knowledge: wokinfo.com/ (another source of trials and conference abstracts);

  • OpenGrey: www.opengrey.eu/ for unpublished literature from Europe;

  • LILACS database: regional.bvsalud.org/php/index.php?lang=en; and

  • PubMed and Google Scholar (for recent trials not yet indexed in the major databases).

The search strategies used are detailed in the Appendices.

Searching other resources

We will handsearch reference lists of articles retrieved by the search and contact experts in the field to obtain additional data. We will also handsearch relevant journals and conference abstracts that are not covered in the CGF register, in liaison with the Information Specialist.

Data collection and analysis

Selection of studies

After an initial screen of titles and abstracts retrieved by the search, we will retrieve the full texts of all potentially eligible studies. Two review authors will independently examine these full text articles for compliance with the inclusion criteria and select studies eligible for inclusion in the review. We will correspond with study investigators as required, to clarify study eligibility. Disagreements as to study eligibility will be resolved by discussion or by a third review author. We will document the selection process with a PRISMA flow chart.

Data extraction and management

Two review authors (one a methodologist and one a topic area specialist) will independently extract data from eligible studies using a data extraction form designed and pilot‐tested by the authors. Any disagreements will be resolved by discussion or by a third review author. Data extracted will include study characteristics and outcome data. Where studies have multiple publications the authors will collate multiple reports of the same study, so that each study rather than each report is the unit of interest in the review, and such studies will have a single study ID with multiple references.

We will correspond with study investigators for further data on methods, results or both, as required.

Assessment of risk of bias in included studies

Two review authors will independently assess the included studies for risk of bias using the Cochrane 'Risk of bias' assessment tool (Higgins 2011) which considers: selection (random sequence generation and allocation concealment); performance (blinding of participants and personnel); detection (blinding of outcome assessors); attrition (incomplete outcome data); reporting (selective reporting); and other bias. Disagreements will be resolved by discussion or by a third review author. We will describe all judgements fully and present the conclusions in the 'Risk of bias' table, which will be incorporated into the interpretation of the review findings by means of Sensitivity analysis. Where identified studies fail to report the primary outcome of live birth, but do report interim outcomes such as pregnancy, we will undertake informal assessment as to whether the interim values (e.g. pregnancy rates) are similar to those reported in studies that also report live birth.

We consider the following methods of random sequence generation adequate.

  • Referring to a random number table.

  • Using a computer random number generator.

  • Coin tossing.

  • Shuffling cards or envelopes.

  • Throwing dice.

  • Drawing of lots.

We consider the following methods of allocation concealment adequate.

  • Central allocation (including telephone, internet‐based and pharmacy‐controlled randomisation).

  • Sequentially numbered, opaque, sealed envelopes.

Blinding of participants and personnel will be considered low risk of bias if there is a description of blinding made, for example if the doses administered are identical in appearance. There is potential for performance bias as some methods and outcomes are not strictly objective, such as number of eggs collected and embryo selection for embryo transfer. Additionally in trials that allow dose adjustment during stimulation in participants with observed poor or hyper‐response, there is potential for performance bias.

Blinding of outcome assessors will be considered low risk of bias if there is some description of blinding made, such as that diagnosis of OHSS is done by a clinician not involved in the trial. Outcomes such as OHSS may be subjective.

Studies with a loss to follow‐up rate of 15% or more may be considered as having a high risk of attrition bias.

Studies that have collected more outcome measures than are reported in the paper will be considered as having a high risk of reporting bias. It is often difficult to determine what outcomes were measured unless a study protocol is available; and in the absence of a protocol the study may be rated as unclear. However, if all expected outcomes are reported then the study may be rated as low risk.

Measures of treatment effect

For dichotomous data (e.g. live birth rates), we will use the numbers of events in the control and intervention groups of each study to calculate Mantel‐Haenszel odds ratios (ORs). For continuous data (e.g. total dose of gonadotropin), if all studies report exactly the same outcomes we will calculate mean difference (MDs) between treatment groups. For time to event data, we will use hazard ratios (HRs) as the measure of treatment effect. We anticipate that the unit of time in this analysis will be the cycle. As described in Types of studies we anticipate that for some trials we will have to 'create' the treatment groups of interest in this review by pooling all women who received personalised doses and comparing them with all those who received a fixed dose. We will reverse the direction of effect of individual studies, if required, to ensure consistency across trials. We will present 95% confidence intervals (CIs) for all outcomes. Where data to calculate ORs or MDs are not available, we will utilise the most detailed numerical data available that may facilitate similar analyses of included studies (e.g. test statistics, P values). We will compare the magnitude and direction of effect reported by studies with how they are presented in the review, taking account of legitimate differences.

Unit of analysis issues

The primary analysis will be per woman randomised; per pregnancy data will also be included for the outcome of multiple pregnancy, but should be interpreted with caution as this does not represent a randomised comparison. For time to pregnancy, we anticipate that the unit of time in the analysis will be the cycle. Data that do not allow valid analysis will be briefly summarised in an additional table and will not be meta‐analysed. We will count multiple live births (e.g. twins or triplets) as one live birth event. Only first‐phase data from cross‐over trials will be included. In the event that studies include multiple cycles, 'cumulative' birth events will be included in the numerator for the primary outcome 'live birth per woman randomised'.

Dealing with missing data

For all outcomes, we will carry out analyses, as far as possible, on an intention‐to‐treat basis, i.e. we will attempt to include all participants randomised to each group in the analyses, and all participants will be analysed in the group to which they were allocated, regardless of whether or not they received the allocated intervention. The denominator for each outcome in each trial will be the number randomised. In relation to the primary outcome live birth, we will assume that those who dropped out of the study did not have a successful treatment outcome. When necessary we will contact the authors of included studies to obtain missing data, if possible.

Assessment of heterogeneity

We will consider whether the clinical and methodological characteristics of the included studies are sufficiently similar for meta‐analysis to provide a clinically meaningful summary. We will assess statistical heterogeneity by the measure of the I2. An I2 measurement greater than 50% will be taken to indicate substantial heterogeneity (Higgins 2003).

Assessment of reporting biases

In view of the difficulty of detecting and correcting for publication bias and other reporting biases, we aim to minimise their potential impact by ensuring a comprehensive search for eligible studies and by being alert for duplication of data. If there are ten or more studies in an analysis, we will use a funnel plot to explore the possibility of small study effects (a tendency for estimates of the intervention effect to be more beneficial in smaller studies).

Data synthesis

We anticipate that the included studies will display considerable protocol heterogeneity. However, if the studies are sufficiently similar, we will combine the data using a fixed‐effect model in the following comparisons.

Objective 1. Randomise women with a given ORT value to one of several doses.

  • Ovarian stimulation with dose 1 versus dose 2, in predicted poor responders.

  • Ovarian stimulation with dose 1 versus dose 2, in predicted moderate responders.

  • Ovarian stimulation with dose 1 versus dose 2, in predicted high responders.

Dose 1 and dose 2 represent any or all combinations of doses in the included studies. The following cut‐offs will be used to guide the categorisation of studies where necessary.

  • AMH < 7, AFC < 7, bFSH > 10 considered expected poor responders.

  • AMH 7‐21, AFC 7‐15 considered expected moderate responders.

  • AMH > 21, AFC > 15 considered expected high responders (bFSH not considered to be reliable predictor for moderate or high response).

The studies will be stratified by the ORT/s measured.

Objective 2. Randomise women to either have a dose selected according to their ORT value using an algorithm, or to a dose selected without using an ORT/alternative algorithm.

  • Ovarian stimulation using a uniform dosing protocol versus ovarian stimulation using an individualised dosing protocol.

  • Ovarian stimulation using an individualised dosing protocol compared to ovarian stimulation using an alternative individualised dosing protocol.

The combinations of study arms cannot be anticipated, so each combination of protocols will be analysed separately (e.g. AMH‐based individualisation versus bFSH‐based individualisation, AMH‐based individualisation versus AFC‐based individualisation, and so on). It would not be meaningful to pool trials investigating different combinations of biomarkers in their study arms (for example, pooling a study of AMH‐based individualisation versus bFSH‐based individualisation with a study of AFC‐based individualisation versus bFSH‐based individualisation).

An increase in the odds of a particular outcome, which may be beneficial (e.g. live birth) or detrimental (e.g. adverse effects), will be displayed graphically in the meta‐analyses to the right of the centre line and a decrease in the odds of an outcome to the left of the centre line.

Subgroup analysis and investigation of heterogeneity

Where data are available and substantial heterogeneity exists, we will conduct subgroup analyses to determine the separate evidence within the following subgroups for primary outcomes only.

  1. Predicted response category (e.g. high responders, moderate responders, low responders). The stratification of women into predicted response categories is already a feature of our analysis plan for Type 1 trials. However, we will consider the evidence, where available, for subgroups determined by predicted response category in Type 2 studies.

  2. Age (< 35, 35 to 40, > 40)

  3. IVF protocol type (e.g. long GnRH agonist, short GnRH agonist (or 'Flare'), antagonist)

If we detect substantial heterogeneity, we will explore possible explanations in sensitivity analyses. We will take any statistical heterogeneity into account when interpreting the results, especially if there is any variation in the direction of effect.

Sensitivity analysis

We will conduct sensitivity analyses for the primary outcomes to determine whether the conclusions are robust to arbitrary decisions made regarding the eligibility and analysis. These analyses will include consideration of whether the review conclusions would have differed if:

  1. eligibility had been restricted to studies at low risk of bias (defined as studies that rated as at low risk of bias with respect to sequence generation and allocation concealment, and not rated as at high risk of bias in any of the domains assessed);

  2. a random‐effects model had been adopted;

  3. ongoing pregnancy data were not combined with live birth data; or

  4. studies that allowed dose adjustment were excluded.

Overall quality of the body of evidence: 'Summary of findings' table

We will prepare several 'Summary of findings' (SoF) tables using GRADEpro software and Cochrane methods (GRADEpro GDT 2014; Higgins 2011). These tables will evaluate the overall quality of the body of evidence for the main review outcomes (live birth or ongoing pregnancy, OHSS, clinical pregnancy, multiple pregnancy) in each of the main comparisons of the review, using GRADE criteria. There will be one comparison for each patient subgroup (poor responders, moderate responders, high responders). There will be a further comparison relating to use of individualised tailoring algorithm versus no algorithm, for which we will create a SoF table, although this is likely to be based on heterogenous interventions. We will construct another SoF table to describe the evidence for particular algorithms compared against others, although we anticipate this will have to be completed in a narrative format. GRADE criteria include study limitations (i.e. risk of bias), consistency of effect, imprecision, indirectness and publication bias. Judgements about evidence quality (high, moderate, low or very low) will be made by two review authors working independently, with disagreements resolved by discussion. Judgements will be justified, documented, and incorporated into the reporting of results for each outcome.

We plan to extract study data, format our comparisons in data tables and prepare a SoF table before writing the results and conclusions of our review.