Splinting for the non‐operative management of developmental dysplasia of the hip (DDH) in children under six months of age

Summary of findings 1. Dynamic splinting versus delayed or no splinting for the non‐operative management of developmental dysplasia of the hip in babies under six months of age

Outcomes	№ of babies (Studies) Follow up	Certainty of the evidence (GRADE)	Impact
Dynamic splinting versus delayed or no splinting for the non‐operative management of developmental dysplasia of the hip in babies under six months of age
Patient or population: babies under six months of age with all severities of DDH Setting: hospital Intervention: dynamic splinting Comparison: delayed or no splinting
Measurement of acetabular index at 1 year Assessed with: radiographs (angle)	265 (2 RCTs)	⊕⊝⊝⊝ Very low^a,b	One study (stable hips) presented data at one year (MD 0.10, 95% CI −0.74 to 0.94), accounting for correlated observations from hips from the same baby. Another study (stable hips) reported an MD 0.20 (95% CI −1.65 to 2.05) but did not take into account hips from the same baby in the case of bilateral hip dysplasia, so the data were not combined.
Measurement of acetabular index at 2 years Assessed with: radiographs (angle)	181 (2 RCTs)	⊕⊝⊝⊝ Very low^a,b	One study (stable hips) reported a MD −1.90(95% CI −4.76 to 0.96). Another study (stable hips) reported an MD ‐0.10 (95% CI −1.93 to 1.73) but did not take into account hips from the same baby in the case of bilateral hip dysplasia, so the data were not combined.
Measurement of acetabular index at 5 years Assessed with: radiographs (angle)	0 (0 RCTs)	‐	No studies reported data at this time point.
Need for operative intervention at study follow up (range 12 weeks to 1 year)	434 (4 RCTs)	⊕⊝⊝⊝ Very low^a,b	Three studies reported no surgical intervention. In a further study, two babies developed instability in the Pavlik harness group and were subsequently treated with closed reduction and spica cast. It is not explicitly stated if this was to achieve concentric reduction or address residual dysplasia.
Complications: avascular necrosis and femoral nerve palsy at study follow up (range 12 weeks to one year) Assessed with: grading systems (not stated)	390 (3 RCTs)	⊕⊝⊝⊝ Very low^a,b	One study found that "over the period of follow‐up, no complications of treatment were observed, and none of the children developed abnormal clinical findings on hip examination." One study reported no avascular necrosis in either group and another study reported no femoral nerve palsy in either group.
*The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval; DDH: developmental dysplasia of the hip; MD: mean difference; RCT: randomized controlled trial
GRADE Working Group grades of evidence High certainty: we are very confident that the true effect lies close to that of the estimate of the effect Moderate certainty: we are moderately confident in the effect estimate; the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different Low certainty: our confidence in the effect estimate is limited; the true effect may be substantially different from the estimate of the effect Very low certainty: we have very little confidence in the effect estimate; the true effect is likely to be substantially different from the estimate of effect
^aWe downgraded the certainty of the evidence by one level for risk of bias, as studies were at high or unclear risk of bias for selective reporting, sequence generation, allocation concealment and blinding due to limited details reported in the trial reports, and high risk of bias due to incomplete outcome data. ^bWe downgraded the certainty of the evidence by two levels for imprecision, due to the small number of included studies and babies

Summary of findings 2. Dynamic splinting versus static splinting for the non‐operative management of developmental dysplasia of the hip in babies under six months of age

Outcomes	№ of babies (studies)	Certainty of the evidence (GRADE)	Impact
Dynamic splinting versus static splinting for the non‐operative management of developmental dysplasia of the hip in babies under six months of age
Patient or population: babies under six months of age with stable and unstable hips Setting: hospitals Intervention: dynamic splinting Comparison: static splinting
Measurement of acetabular index at 1 year Assessed with: radiographs (angle)	0 (0 RCTs)	‐	No data presented and it is unclear if the outcome was measured.
Measurement of acetabular index at 2 years Assessed with: radiographs (angle)	0 (0 RCTs)	‐	No data presented and it is unclear if the outcome was measured.
Measurement of acetabular index at 5 years Assessed with: radiographs (angle)	0 (0 RCTs)	‐	No data presented and it is unclear if the outcome was measured.
Need for operative intervention	0 (0 RCTs)	‐	No data presented and it is unclear if the outcome was measured.
Complications: avascular necrosis at 4 months Assessed with: grading systems (not stated)	118 hips (1 RCT)	⊕⊝⊝⊝ Very low^a,b	One RCT reported no occurrence of avascular necrosis in either group.
*The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval; DDH: developmental dysplasia of the hip; RCT: randomized controlled trial
GRADE Working Group grades of evidence High certainty: we are very confident that the true effect lies close to that of the estimate of the effect Moderate certainty: we are moderately confident in the effect estimate; the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different Low certainty: our confidence in the effect estimate is limited; the true effect may be substantially different from the estimate of the effect Very low certainty: we have very little confidence in the effect estimate; the true effect is likely to be substantially different from the estimate of effect
^aWe downgraded the certainty of the evidence by one level for risk of bias, as we judged risk of bias as generally unclear in all domains except incomplete outcome data, due to limited details reported in the trial report. ^bWe downgraded the certainty of the evidence by two levels for imprecision, due to there only being one small study.

Background

Description of the condition

Developmental dysplasia of the hip (DDH) encompasses a spectrum of abnormalities of the hip in babies, which ranges from delayed physiological development of the hip (i.e. immature), through to acetabular deficiency (i.e. abnormally shallow sockets), subluxation (i.e. partial dislocations), and dislocation (i.e. complete dislocations). DDH is a common paediatric condition, with a variable incidence that appears to be based on ethnicity (Loder 2011). Within the UK, USA, and Australia, the incidence of any hip dysplasia is approximately 10 per 1000 live births, with one in 1000 hips being dislocated at birth (Storer 2006). Amongst Native Americans, however, the incidence may be more than 10 times higher, and amongst African people, it is believed to be extremely rare (Loder 2011). DDH is associated with premature osteoarthritis and is the cause of 10% of all hip replacements, and a third in those under 60 years old (Furnes 2001). In the UK, abnormalities of the hip are screened for as part of the Newborn and Infant Physical Examination (NIPE) programme (UK National Screening Programme 2013). A Cochrane Review has assessed screening for DDH (Shorter 2013). It is more common in females, babies in the breech position in the third trimester, firstborn babies, in the babies of women who had oligohydramnios (not enough amniotic fluid during pregnancy), and in those with a family history of the condition (Storer 2006).

The management strategy for DDH depends on the baby's age and the severity of the disease. In babies under six months of age, the usual strategy, once abnormalities are identified, is to apply an abduction splint, such as a Pavlik harness (Mubarak 2003), and monitor the disease progression with serial ultrasound scans (Cooper 2014). If this is successful, no further intervention is required. If the baby fails to respond to splinting, then they are managed with surgery to gently reduce (relocate) the hip, which may be achieved closed (i.e. without surgical incisions) or may necessitate a formal surgical approach to achieve reduction of the hip. There is no consensus on the length of time splinting should be pursued before reverting to surgical intervention, but reports of treatment length vary from 11 weeks to 28 weeks (Tomlinson 2016).

The paediatric hip undergoes a variety of changes in normal physiological development. Indeed, evidence has suggested that some hips that are abnormal in newborns may become normal without any intervention at all (Barlow 1962; Gardiner 1990; Shipman 2006). Therefore, there is a balance between undertreating and overtreating this condition. This is especially important because therapy with splints risks localised blood supply damage known as avascular necrosis (AVN) and femoral nerve palsy (Murnaghan 2010; Pollet 2010). The risk of AVN using a splint is in the region of 1% (Cashman 2002; Eidelman 2003), although some reports may be as high as 11% (Suzuki 2000). Furthermore, treating newborns in splints can cause considerable upset to new parents and can interfere with the bond between parents and their new baby (Gardner 2005). Parents are also concerned about the use of splints interfering with ‘tummy time’ (i.e. supervised time with the infant spent prone), as ‘tummy time’ is believed to improve gross motor skills (Hewitt 2020).

Decisions regarding the treatment of DDH are typically made based on the ultrasonographic appearance of the hips. The most commonly used classification system is based on a static ultrasound image (Graf 2006; Karnik 2007). Other types of ultrasound assessment are also used, such as the dynamic assessment popularised by Harcke 1984; however, these techniques are typically combined with a static ultrasound assessment.

Babies with an alpha angle above 60 degrees are considered normal, and are classified as having a Graf I hip (Graf 2006). Babies with an alpha angle from 50 to 59 degrees and under the age of three months are classified as having a Graf IIa hip (Karnik 2007); they are usually managed with ultrasound follow‐up alone to ensure resolution. Babies with a persistent alpha angle from 50 to 59 degrees and older than three months are classified as having a Graf IIb hip. In the UK, babies with Graf IIb hips who are under the age of six months are frequently managed with a splint, in conjunction with ultrasound follow‐up. Graf IIb hips constitute one of the most common reasons to use a splint in the treatment of DDH; however, debate exists as to whether treating Graf IIb hips has any bearing on the outcome, with some centres ceasing to use splints for this reason. Those with more severe dysplasia (Graf III hips) or those that are dislocated (Graf IV hips) routinely receive treatment in the form of an abduction splint, but it is unclear when this should commence, which splint is best, or the extent to which splints offer additional benefit over natural history alone (Tomlinson 2016).

Therefore, it is important to establish the best practice for the non‐surgical management of babies with DDH under six months old, and identify the extent to which the intervention with a splint alters the prognosis of disease.

Description of the intervention

A variety of splints are used to abduct and flex the hips into the desired position.

The most commonly used splint is the Pavlik harness. This splint promotes a dynamic reduction; that is, babies are free to move their legs within the range permitted by the splint. This is thought to provide a more gentle reduction than other splints that fix the legs in a predefined position, thereby potentially lowering the risk of complications. Pavlik harnesses are also readily adjustable to the size of the baby and are more convenient to store (pack flat) than fixed abduction splints.

Fixed abduction splints (e.g. Von Rosen splint) are less commonly used, with greater concerns about complications and less convenience. These splints fix the legs of the baby in flexion and abduction using a hard plastic splint. One study reported excellent results with the Von Rosen splint but the certainty of evidence was limited (Heikkilä 1988). Other static splints include the Denis Browne bar (which splints the hips in abduction and flexion), the Rhino brace, and the Tübingen hip flexion splint (Ottobock splint).

The Frejka pillow is a further alternative, which is described as a non‐static splinting technique. This is widely used in Norway. The pillow is a further form of abduction splint; that is, a simple foam‐rubber pillow that is strapped to the child to flex and abduct the legs. The legs are fixed in abduction though not rigidly fixed. The argument for the use of this splint is that it is easy to use, needing less specialist supervision than other splints (Hinderaker 1992), which is better suited to very dispersed populations (such as Norway's). However, there are concerns about high complications and treatment failures.

All splints are applied by an individual with specialist knowledge of the use of these devices, which is typically a baby’s orthopaedic surgeon, an extended scope practitioner (physiotherapist or nurse with specialist training), or an orthotist. The splint is worn for a period of time defined by local policy, which will depend upon the appearance of the hip; typically this is between six and 16 weeks. There is considerable controversy about when to commence splinting, with evidence to suggest the majority of hip instability spontaneously resolves in the first six weeks of life (Barlow 1962; Shipman 2006). There are often planned delays in treatment, to enable spontaneous physiological resolution of abnormalities. These delays may vary based on the centre, and the stage in the disease process, with some delays being a few weeks and some being indefinite delays (i.e. no further treatment required).

Throughout the period of splinting, ultrasound scans are performed at regular intervals (typically between one and three weeks, depending upon the practitioner and type of splint used) to monitor progression. At the end of treatment, some centres immediately discontinue the use of the splint, whilst other centres 'wean' the splint and often advise treatment at night‐time only for a period of time. Children are then monitored according to local policy, for a time period between three years and 16 years.

There is no national or international consensus on when to begin the use of the splint, the type of splint, duration of splinting, weaning versus complete cessation, and long‐term follow‐up.

How the intervention might work

The interventions seek to direct the femoral head (ball) into the acetabulum (socket), thereby promoting the development of the joint. In babies, both femoral head and acetabulum are malleable and will readily undergo plastic deformation. With both the acetabulum and femoral head appropriately aligned, plastic deformation will ensue, to enable both head and socket to form the appropriate shape. For hips that have not sufficiently developed in utero, splints position the hips in flexion and abduction to achieve the optimal position for hip development. Splints can be either dynamic splints (i.e. Pavlik splint), whereby the baby is free to move his or her legs within the range permitted by the splint, or fixed (i.e. Von Rosen splint), whereby the baby’s legs are fixed in position to achieve the optimal position.

The goal of interventions in DDH is to improve long‐term hip "health", yet proxy outcomes are used earlier in childhood to determine the outcome of interventions. The most widely used proxy outcome is the acetabular index, which has been shown to be a predictor of osteoarthritis in the long‐term (Albinana 2004). Acetabular index is therefore the primary outcome used in this review. Broadly, an acetabular index angle below 30 degrees is considered normal in babies aged over six months, and below 25 degrees is considered normal at 24 months.

Why it is important to do this review

There is considerable variation in the non‐operative management of DDH (Tomlinson 2016). Treatment varies by country, institution, and even surgeon.

Optimising the treatment of hip dysplasia is paramount in order to ensure the best health outcomes, including maximising mobility and quality of life and minimising the long‐term risk of osteoarthritis and arthroplasty. Whilst non‐operative treatment is the simplest form of treatment, with huge potential benefits to babies, it is not without complication. Therefore, it is important to determine an optimal strategy that achieves the greatest successes (i.e. avoids subsequent operative interventions), whilst minimising complications related to splinting (which includes AVN and femoral nerve palsy). It is also important to identify whether there are particular subgroups for whom the optimal management strategy may differ.

Objectives

To determine the effectiveness of splinting and the optimal treatment strategy for the non‐operative management of DDH in babies under six months of age.

Methods

Criteria for considering studies for this review

Types of studies

Randomised controlled trials (RCTs), quasi‐RCTs, and cluster‐RCTs.
Prospective and retrospective non‐randomised controlled studies and cohort studies. We considered non‐randomised trials for inclusion, as we expected that the number of randomized trials in this population would be limited.

These studies must have been conducted after the introduction of ultrasound in 1980.

Types of participants

Babies with all severities of DDH who were under six months of age and who were diagnosed using ultrasound.

If studies included babies over six months of age, we contacted the study authors to obtain data on babies under six months of age.

We excluded babies with neurodevelopmental problems or neuromuscular syndromes.

Types of interventions

Dynamic splinting (i.e. Pavlik harness, Frejka pillow)
Static splinting (e.g. Von Rosen, Denis Browne bar, Rhino brace, Tübingen hip flexion splint (Ottobock splint))
Double nappies (diapers)
No treatment or delayed treatment

We considered the following comparisons:

dynamic splinting versus delayed or no splinting;
static splinting versus delayed or no splinting;
double nappies (diapers) versus delayed or no splinting;
dynamic versus static splinting; and
staged weaning versus immediate removal (post hoc comparison).

Types of outcome measures

The primary and secondary outcomes are listed below.

Primary outcomes

Measurement of acetabular index at years one, two, and five, as determined by radiographs (angle).
Need for operative intervention (dichotomous):
1. to achieve reduction; and
2. to address dysplasia.
Complications (dichotomous):
1. avascular necrosis (AVN; there are several grading systems, most commonly "total" AVN (Salter 1969), and "partial" AVN (Gage 1972));
2. femoral nerve palsy;
3. other nerve palsies; and
4. pressure areas on skin.

We used the primary outcomes to populate the summary of findings tables.

Secondary outcomes

Health economic assessment (including financial impact on the family), as reported in the included studies.
Bonding between parents and baby (including obstacles to breastfeeding, problems with winding and bathing baby), as reported in the included studies.
Motor skill development, as reported in the included studies. Motor skills is an outcome that parents are concerned about, as ʽtummy time’ affects both fine and gross motor skills, and the use of splints interferes with ʽtummy time':
1. fine motor skill development; and
2. gross motor skill development.

Search methods for identification of studies

We ran the searches in July 2017 without limiting by date, publication status, study type, or language. We updated the searches in September 2020 and November 2021, apart from those for the Database of Abstracts of Reviews of Effects (DARE) and the Networked Digital Library of Theses and Dissertations (see Differences between protocol and review). We sought translations when necessary.

Electronic searches

We searched the following databases up to 30 November 2021 using the search strategies in Appendix 1.

Central Register of Controlled Trials (CENTRAL; 2021, Issue 11) in the Cochrane Library, which includes the Cochrane Developmental, Psychosocial and Learning Problems Group's Specialised Register. Searched 30 November 2021.
MEDLINE Ovid (1946 to November Week 3 2021).
MEDLINE In‐Process and Other Non‐Indexed Citations Ovid (1946 to November 29, 2021).
MEDLINE Epub Ahead of Print (1946 to November 29, 2021).
Embase Ovid (1974 to 2021 November 29).
CINAHL Plus EBSCOhost (1937 to 30 November 2021).
PEDro (pedro.org.au/; searched 30 November 2021).
Science Citation Index Web of Science, Clarivate (1970 to 30 November 2021).
Conference Proceedings Citation Index ‐ Science Web of Science, Clarivate (1990 to 30 November 2021).
Cochrane Database of Systematic Reviews (CDSR; 2021, Issue 11) in the Cochrane Library. 30 November 2021.
Database of Abstracts of Reviews of Effects (DARE; 2015, Issue 2) in the Cochrane Library. Searched 4 July 2017.
Networked Digital Library of Theses and Dissertations (NDLTD; search.ndltd.org/index.php). Searched 5 July 2017.
ProQuest Dissertations & Theses Global (all available years). Searched 30 November 2021.
ClinicalTrials.gov (clinicaltrials.gov/). Searched 30 November 2021.
World Health Organization (WHO) International Clinical Trials Registry Platform (WHO ICTRP, trialsearch.who.int/). Searched 30 November 2021.

Searching other resources

We searched the reference lists of included studies and relevant reviews identified by the electronic searches (see Electronic searches). We also contacted study authors to ask if they knew of any other studies, including those that are ongoing and unpublished, and handsearched Orthopaedic Proceedings to November 2021 supplement 14, which is a source of abstracts from major international orthopaedic meetings (bjjprocs.boneandjoint.org.uk).

Data collection and analysis

We only report the methods we have used in the following sections. Please see the protocol, Dwan 2017, and Appendix 2 for unused methods to be used in future updates of the review.

Selection of studies

Two review authors (one clinical expert and one methodologist: KD and AN or DP and JK) independently screened the titles and abstracts of studies identified by the search strategy for eligibility (see Criteria for considering studies for this review). We then independently assessed the full texts of potentially eligible studies. We resolved any differences by discussion and by consulting a third review author (DP). We listed all studies excluded after full‐text assessment and their reasons for exclusion in the Characteristics of excluded studies table. We illustrated the study selection process in a PRISMA flow diagram (Moher 2009).

Data extraction and management

Two review authors (one clinical expert and one methodologist: KD or JK and AN or DP) independently extracted data onto a pre‐piloted data extraction form (Appendix 3), which we managed in Microsoft Excel and refined accordingly. We resolved any disagreements through discussion or by consulting a third review author.

Assessment of risk of bias in included studies

Two review authors (one clinical expert and one methodologist: KD or JK and AN or DP) independently assessed RCTs and quasi‐RCTs for risk of bias, using Cochrane's risk of bias tool, which is described in further detail in Chapter 8 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). We resolved disagreements through discussion or by consulting a third review author. We assessed six domains: sequence generation, allocation sequence concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, and selective outcome reporting. For each domain, we assigned a judgement of unclear, low or high risk of bias, along with a justification for this decision in the risk of bias tables.

If we identify any cluster‐RCTs in future updates, we will also consider (i) recruitment bias; (ii) baseline imbalance; (iii) loss of clusters; (iv) incorrect analysis; and (v) comparability with individually randomized trials.

As we expected that most studies would be observational in nature, we assessed the risk of bias for non‐randomised studies using the ROBINS‐I (Risk Of Bias In Non‐randomised Studies ‐ of Interventions) tool (Sterne 2016). We performed a separate risk of bias assessment for each study, based on two review outcomes of interest (need for surgical open reduction and acetabular index at one year) in each study. The ROBINS‐I tool considers seven domains of bias: two domains of bias pre‐intervention (bias due to confounding and bias in selection of participants into the study), one domain of bias at intervention (bias in the classification of interventions), and four domains of bias post‐intervention (bias due to departures from intended interventions, bias due to missing data, bias in measurement of outcomes, and bias in selection of the reported result). Central to implementing ROBINS‐I is the consideration of confounding factors and co‐interventions that have the potential to lead to bias.

Important confounders of interest in this Cochrane Review include the following:

age of baby at intervention (i.e. harness commencement);
proportion of females;
ethnicity of the babies (or if not stated, the country in which the study was conducted);
clinical assessment of the hip. Dislocated hip (reducible or not reducible), clinically unstable hip (i.e. dislocatable), or clinically stable hip.
ultrasound assessment of the hip. Acetabular dysplasia assessed using the alpha angle according to Graf classification of hip: I (normal), IIa or IIb (centred hip, 50 to 60 degrees of dysplasia), IIc (centred hip 43 to 50 degrees of dysplasia), III (de‐centred hip), and IV (dislocated hip).
indication for ultrasound screening (i.e. breech presentation in third trimester, family history of DDH, lower than normal levels of amniotic fluid, ʽclick' on clinical screening (abnormal clinical examination producing ʽclick' sound on hip movements), unequal skin creases).

Any further confounders identified following assessment of included studies were therefore considered post hoc. We did not anticipate that there would be any important co‐interventions to consider. Each of the seven domains of bias contain signalling questions to facilitate judgements of risk of bias. The full signalling question and response framework for each outcome is provided in Sterne 2016. Following completion of the signalling questions, we sought a risk of bias judgement for each domain and obtained an overall risk of bias judgement for each outcome and result being assessed. Overall risk of bias has four categories ranging from low risk of bias (the study is at low risk of bias across all domains) to critical risk of bias (the study is at critical risk of bias in at least one domain). If there was insufficient information to assess the risk of bias in one or more key domains, but there was no indication that there was any critical or serious risk of bias in any of the other domains, then we designated the overall classification as 'no information'.

We created risk of bias assessment figures using the web app robvis (McGuinness 2021), as both RCTs and non‐randomised studies were included.

Dichotomous outcome data

We summarised data from dichotomous outcomes (e.g. need for operative intervention, AVN, femoral nerve palsy) using the risk ratio (RR) and 95% confidence intervals (CIs).

AVN is often measured using a grading system and is categorical. There are many different rating systems for AVN, which are difficult to amalgamate. In all rating systems, stage or type 1 AVN is mild AVN that is clinically unimportant, as it completely heals without long‐term consequence. If a trial reported a categorical assessment of AVN, we used a clinical rating of 'two' and above to define AVN, thereby dichotomising the data. If we were unable to compute an effect size, we provided a narrative description of the results.

Continuous outcome data

For continuous outcomes (e.g. measurement of acetabular index, bonding between parents and baby, fine and gross motor skills) measured on the same scale, we computed the mean difference (MD) and 95% CIs. If we were unable to compute an effect size, we provided a narrative description of the results.

For measurement of acetabular index, less than 30 degrees is considered normal in babies aged over six months, and less than 25 degrees for children aged 24 months. Under six months of age, an alpha angle of the hip on ultrasound scan above 60 degrees is considered normal.

Unit of analysis issues

Cross‐over RCTs

We excluded cross‐over trials. These are not appropriate as DDH is not a chronic condition.

Multiple groups

If a study included more than two similar intervention groups, we combined them and compared them with the control arm, creating a single pair‐wise comparison. If a study included more than two dissimilar intervention groups, we included these arms in the review separately, and halved the control group to ensure there was no double counting of babies.

Bilateral hips

Studies that presented data by hips rather than babies or where the study did not account for bilateral hips within the baby (accounting for correlation) are noted as such in footnotes in the forest plot as we were unable to obtain data by baby.

Dealing with missing data

We contacted the authors of the included studies for missing data. For transparency, if we did not receive a reply, we noted this in the Characteristics of included studies tables. If we could not obtain missing statistics (i.e. standard deviations), or calculate them from data reported in the trial report, then we attempted to impute them for similar studies. We did not attempt imputation on missing participant data as most studies were non‐randomised studies.

Assessment of heterogeneity

We assessed clinical and methodological aspects of the included studies to determine whether there was clinical or methodological heterogeneity.

We did not assess statistical heterogeneity as we could not conduct a meta‐analysis.

Assessment of reporting biases

We completed an Outcome Reporting Bias in Trials (ORBIT) matrix, to help with the assessment of selective outcome reporting (Kirkham 2010).

Data synthesis

We analyzed different study designs separately (RCTs, quasi‐RCTs, retrospective and prospective non‐randomised studies). Due to clinical and methodological heterogeneity, meta‐analysis was not possible. However, we have displayed results in a forest plot (using the default inverse‐variance approach for continuous data, and the Mantel‐Haenszel method for dichotomous data as data were often sparse) and discussed these narratively. When data were not available by arm and only by comparison, we used the generic inverse‐variance approach for all studies included in the forest plot.

We assessed the following comparisons:

dynamic splinting versus delayed or no splinting;
static splinting versus delayed or no splinting;
dynamic splinting versus static splinting;
double nappies versus delayed or no splinting;
staged weaning versus immediate removal (post hoc comparison).

Subgroup analysis and investigation of heterogeneity

We did not conduct any subgroup analyses because meta‐analysis was not possible.

Sensitivity analysis

We did not conduct any sensitivity analyses as we were not able to combine any studies in a meta‐analysis.

Summary of findings and assessment of the certainty of the evidence

Two review authors (one clinical expert, DP, and one methodologist, KD) independently assessed the certainty of the evidence for each outcome using the GRADE approach, by considering the risks of bias, directness of evidence, heterogeneity, precision of effect estimates, and risk of publication bias for RCTs only. We resolved disagreements through discussion with a third review author. Using GRADEpro GDT, we created a summary of findings table for the following comparisons:

dynamic splinting versus delayed or no splinting;
dynamic splinting versus static splinting.

We included the following outcomes in both tables:

measurement of acetabular index at years one, two, and five;
need for operative intervention during study follow‐up; and
complications during study follow‐up.

Results

Description of studies

Results of the search

The electronic searches identified 3779 records. We found one additional record through contact with colleagues. After removing 1464 duplicates, we screened 2316 records by title and abstract, and excluded 2242 irrelevant records. We obtained full‐text reports for the remaining 74 records and excluded 35 with reasons (see Figure 1).

Figure 1

PRISMA flow diagram

We included 22 studies (33 reports) in the review (see Included studies). We identified four ongoing studies and one study awaiting classification.

Included studies

This review includes six RCTs or quasi‐RCTs (576 babies) (Azzoni 2011; Gardiner 1990; Lee 2022; Pollett 2020; Rosendahl 2010; Wood 2000), and 16 non‐randomised studies (8237 babies) (Bergo 2013; Bram 2021; Gou 2021; Kim 2019; Laborie 2014; Larson 2019; Lyu 2021; Munkhuu 2013; Murphy 2017; Paton 2004; Ran 2020; Reikerås 2002; Sucato 1999; Upasani 2016; Westacott 2014; Wilkinson 2002).

Study design

The 22 included studies were published over a 31‐year period between 1990 and 2021. Four studies were randomized trials (Azzoni 2011; Rosendahl 2010; Wood 2000; Pollett 2020), and two were quasi‐randomised, using alternate allocation (Gardiner 1990; Lee 2022). The remaining 16 studies utilised a range of non‐randomised observational techniques.

Study location

The majority of randomized studies were conducted in Europe, two in the United Kingdom (Gardiner 1990; Wood 2000), and one each in Italy (Azzoni 2011), Norway (Rosendahl 2010), and the Netherlands (Pollett 2020). One was conducted in Taiwan (Lee 2022). One study recruited from five centres within the Netherlands (Pollett 2020); the other five studies were single centre. Of the non‐randomised studies, two were multicentred: one covering seven centres across Australia, Europe, and North America (Upasani 2016), and the other included two centres in the USA (Bram 2021). The remainder were single centre studies: three from North America (Kim 2019; Larson 2019; Sucato 1999); three from Norway (Bergo 2013; Laborie 2014; Reikerås 2002); three from the United Kingdom (Paton 2004; Westacott 2014; Wilkinson 2002); three from China (Gou 2021; Lyu 2021; Ran 2020); and one apiece from Ireland (Murphy 2017), and Mongolia (Munkhuu 2013).

Study dates

Nineteen studies reported the dates for data collection, which ranged from 1988 to 2020. One study included data from the 1980s (Gardiner 1990), six from the 1990s (Laborie 2014; Paton 2004; Rosendahl 2010; Sucato 1999; Wilkinson 2002; Wood 2000), five from the 2000s (Azzoni 2011; Kim 2019; Laborie 2014; Larson 2019; Westacott 2014), and eleven from the 2010s (Bergo 2013; Gou 2021; Lee 2022; Lyu 2021; Kim 2019; Larson 2019; Munkhuu 2013; Murphy 2017; Westacott 2014; Pollett 2020; Ran 2020).

Study size

The randomized studies included between 44 and 128 babies. The numbers of babies in the non‐randomised studies ranged between 48 and 4818. Fourteen of these studies included between 48 and 251 babies. One study included 1839 babies (Munkhuu 2013), and the largest study was a review of a screening programme and included 4818 babies (Laborie 2014).

Funding

Fourteen studies did not state the funding source. Three studies stated there was no funding (Bram 2021; Gou 2021; Larson 2019), and five studies had non‐commercial funding (Gardiner 1990; Laborie 2014; Lyu 2021; Munkhuu 2013; Upasani 2016).

Participant age

All included studies had babies aged less than 26 weeks (six months) old. Of the randomized studies, three allocated treatment in the first week after birth (Gardiner 1990; Lee 2022; Rosendahl 2010). The Azzoni 2011 study randomized babies between birth and 14 weeks whereas Wood 2000 randomized babies aged two to six weeks. In Pollett 2020, babies were randomized later, between three to four months of age. The non‐randomised studies included babies at a range of ages below six months old.

Study comparisons

Dynamic splinting versus delayed or no splinting

Four randomized studies (Gardiner 1990; Rosendahl 2010; Wood 2000; Pollett 2020), and nine non‐randomised studies (Bergo 2013; Kim 2019; Laborie 2014; Larson 2019; Murphy 2017; Paton 2004; Reikerås 2002; Sucato 1999; Wilkinson 2002*), compared dynamic splinting versus delayed or no splinting.

Static splinting versus delayed or no splinting

Two non‐randomised studies compared static splinting with delayed or no splinting (Munkhuu 2013; Wilkinson 2002*).

Double versus single nappies

One quasi‐RCT compared double to single nappies (Lee 2022).

Dynamic splinting versus static splinting

One randomized study (Azzoni 2011), and five non‐randomised studies (Gou 2021; Lyu 2021; Upasani 2016; Ran 2020; Wilkinson 2002*), compared dynamic splinting versus static splinting.

Staged weaning versus immediate removal (post hoc comparison)

Two non‐randomised studies compared weaning with no weaning of the splint (Bram 2021; Westacott 2014).

The most common dynamic splints studied were the Pavlik harness and Frejka pillow. One study used the Coxa Flex splint (Azzoni 2011). Static splints were more varied and included the Teufel Mignon, Tubingen hip flexion (classed as static due to the fixed abduction but it does allow some dynamic flexion), Craig, Von Rosen, Denis Browne, human brace, and Plastazote splints.

*Note: Wilkinson 2002 compared four groups: one dynamic splint, two different static splints, and no splinting.

Subgroups: stable versus unstable hips

Studies included babies based on clinical and ultrasound diagnoses of dysplasia. We divided the studies into two broad categories: stable hips and unstable/dislocated hips. Hips were considered stable if they were Graf IIa to d and/or were documented to be clinically stable. Hips were considered unstable if Graf III/IV and/or documented to be clinically unstable. We sought to clearly explain the patient population investigated for all narrative syntheses of the data, particularly related to the key disease characteristics (i.e. severity of the hip affected).

Of the randomized studies, Azzoni 2011 included both stable and unstable hips on ultrasound ranging from Graf IIc to IIIb. They compared dynamic and static splints and the primary outcome measure was time to remission of dysplasia on ultrasound. Gardiner 1990 considered clinically unstable (but not dislocated) hips and compared immediate dynamic splinting with two weeks of surveillance followed by splinting if instability persisted. Two studies considered stable hips (Rosendahl 2010; Wood 2000). Both studies compared immediate dynamic splinting for six weeks versus no splinting for six weeks. The Pollett 2020 study also considered stable hips (Graf IIb or IIc) but started intervention at an older age of 3 to 4 months for 12 weeks. One study (Lee 2022), had a quasi randomized design and studied newborns with stable (Graf IIa) hips comparing double diapers to single diapers in the first month of life.

From the non‐randomised studies, five considered stable hips (Kim 2019; Munkhuu 2013; Murphy 2017; Reikerås 2002; Sucato 1999), five considered unstable hips (Gou 2021; Larson 2019; Paton 2004; Upasani 2016; Wilkinson 2002), one compared stable with unstable hips (Laborie 2014), and five included all hips (Bergo 2013; Bram 2021; Lyu 2021; Westacott 2014; Ran 2020).

Reported outcomes

The outcomes collected were determined based on both the expertise of the clinician contributors, and the lived experience of a parent who became a co‐author on this review. These were radiographic improvement (i.e. measurement of acetabular index on a pelvic radiograph in angles), the need for subsequent surgery, complications (i.e. avascular necrosis, femoral/other nerve palsies, pressure areas on the skin), health economic assessment, parental concerns (i.e. parental bonding and motor skill development). Any other outcomes described were also noted. The reported outcomes by study are described in Table 1. Measurement of acetabular index is a standard measure and reported in the studies at varying time points from 16 weeks to two years. However, some studies reported the number of hips that 'resolved' or were 'dysplastic' or used a cut‐off value for the angle of the hip and reported the number of hips above and below this value. No details are reported about how other outcomes are measured or their timings.

Table 1. ORBIT matrix

Study	Measurement of acetabular index	Need for operative intervention	Avascular necrosis	Femoral nerve palsy/other nerve palsies	Pressure areas on skin	Health economic assessment	Bonding between parents and child	Motor skill development	Other outcomes
Azzoni 2011	x	Reported	Reported	x	x	x	x	x	Time to recovery
Bergo 2013	x	x	x	x	x	x	x	x	Psychosocial outcomes, anxiety
Bram 2021	Reported	x	x	x	x	x	x	x	Time spent in harness
Gardiner 1990	x	Reported	Reported	x	x	x	x	x	Abnormal hips
Gou 2021	Reported	x	x	x	x	x	x	x	Success/ failure
Kim 2019	Reported	Reported	x	x	x	x	x	x	None
Laborie 2014	Measured	Reported	Reported	x	x	x	x	x	None
Larson 2019	x	Reported	x	x	x	x	Reported	x	Success/failure
Lee 2022	x	x	x	x	x	x	x	x	Alpha angle at 1 month, rate of improvement to Graf type I hips in 1 month, any problems or morbidities in the study period, and number of ultrasound examinations and orthopaedic clinic visits in the first year
Lyu 2021	Reported	x	Reported	Reported	x	x	x	x	Time needed to achieve Graf type IIb
Munkhuu 2013	x	x	x	x	x	x	x	x	Development of hips, complications
Murphy 2017	x	Partially reported	x	x	x	x	x	x	Resolution of dysplasia on subsequent imaging and failure of resolution or deterioration on subsequent imaging
Paton 2004	x	Reported	Reported	x	x	x	x	x	Late splintage
Pollett 2020	Reported	Reported	x	Reported	x	x	x	x	Bony roof angle, modifed tonnis classification
Ran 2020	Reported	Reported	Reported	Reported	x	x	x	x	Failure/ success, center‐edge angle
Reikerås 2002	Reported	x	x	x	x	x	x	x	Provokable instability, beta angles
Rosendahl 2010	Reported	NA	Reported	Reported	Reported	x	x	x	None
Sucato 1999	Reported	x	x	x	x	x	x	x	None
Upasani 2016	Partially reported	Reported	Reported	Reported	x	x	x	x	Osteonecrosis
Westacott 2014	Reported	Reported	Rreported	x	x	x	x	x	Retreatment, other complications, successful treatment
Wilkinson 2002	x	Reported	Reported	x	x	x	x	x	Number with acetabular angle ≥ 28°; improvement on ultrasound; further treatment with an abduction plaster; deformaties
Wood 2000	Reported	Reported	x	x	x	x	x	x	Acetabular cover

Ongoing studies

There are four ongoing studies, three of which are RCTs (NCT01375218; ChiCTR1900026634; NL9714), and one is a prospective cohort study (NCT02885831). The respective comparisons are Pavlik versus Tubingen (dynamic versus static, ChiCTR1900026634), Pavlik versus Plastizote (dynamic versus static, NCT01375218), Pavlik versus surveillance (dynamic versus delayed or no splinting, NL9714), and abduction splint versus surveillance (dynamic versus delayed or no splinting, NCT02885831). We provide further details in the Characteristics of ongoing studies table.

Excluded studies

We excluded 35 studies (35 reports) for the following reasons: ineligible study type (one study); ineligible population (28 studies, 26 of which did not use ultrasound); ineligible intervention (four studies); and ineligible comparator (two studies (see Characteristics of excluded studies table).

Risk of bias in included studies

RCTs

There were six RCTs or quasi‐RCTs (Azzoni 2011; Gardiner 1990; Lee 2022; Pollett 2020; Rosendahl 2010; Wood 2000). Our judgements about the risk of bias for these studies are shown in Figure 2.

Figure 2

Risk of bias plot for RCTs

Allocation

Two studies (Azzoni 2011; Wood 2000), stated that their studies were 'randomized' but provided no further information on sequence generation or allocation concealment. Therefore, we deemed both studies to be at unclear risk of bias on these domains. We considered Rosendahl 2010 at low risk of selection bias as they used a computer generated randomization and sealed opaque envelopes. Pollett 2020 was low risk of bias for sequence generation as computer generated randomization was used but no details were given on allocation concealment so this was deemed unclear. Two studies (Lee 2022; Gardiner 1990), were high risk of bias for sequence generation as babies were assigned based on day of the week and alternation respectively, and this also impacted sequence allocation.

Blinding

One study (Azzoni 2011), was stated as 'double blind' but no further information was given but due to the nature of the intervention we deemed the study to be at high risk for performance bias and unclear risk of detection bias. Gardiner 1990 stated that the "caring physician and patient could not be blinded" and thus we deemed it high risk of performance bias. However, those assessing outcomes were blinded in Gardiner 1990, so we rated the study at low risk of detection bias. No information was given for Wood 2000, so we judged this study to be unclear risk of performance and detection bias. Blinding of participants and personnel was unclear in Rosendahl 2010 and Pollett 2020, but radiologists were blinded to the intervention so we rated them at low risk of detection bias and high risk for performance bias. In Lee 2022, babies and parents could not be blinded but outcome assessors were blinded so performance bias was high risk and detection bias was low risk.

Incomplete outcome data

In three studies (Azzoni 2011; Lee 2022; Rosendahl 2010), data were available for all babies, so we considered these studies to be at low risk of attrition bias. We rated Gardiner 1990 at high risk of attrition bias as no causal analysis was performed to account for treatment switching, which may lead to bias. The Wood 2000 study stated that not all babies were followed up to 24 months, percentages in each group were not balanced and high with no reason given, so we judged it to be at high risk of attrition bias also. In Pollett 2020, a large proportion of participants withdrew after randomization, so this was deemed high risk of bias.

Selective reporting

In five studies (Azzoni 2011; Gardiner 1990; Lee 2022; Pollett 2020 Wood 2000), no protocol or trial registry information was available to compare pre‐specified outcomes with reported outcomes, so we rated these studies at unclear risk of reporting bias. We also considered Rosendahl 2010 to be at unclear risk of reporting bias because, although all outcomes stated in the trial registry appear to have been fully reported, the trial does not appear to have been registered a priori. See Table 1.

Other potential sources of bias

RCTs and quasi‐RCTs were not at risk of any other biases.

ROBINS‐I

Non‐randomised studies

There were 16 non‐randomised studies but not all reported the outcomes of interest. Table 2 shows the assessments for each domain in the included studies. Further detailed assessments are available from the authors on request.

See: Summary of findings 1 Dynamic splinting versus delayed or no splinting for the non‐operative management of developmental dysplasia of the hip in babies under six months of age; Summary of findings 2 Dynamic splinting versus static splinting for the non‐operative management of developmental dysplasia of the hip in babies under six months of age

Table 2. ROBINS‐I

Bias domain	Bias due to confounding	Bias in selection of participants into the study	Bias in the classification of interventions	Bias due to departures from intended interventions	Bias due to missing data	Bias in measurement of outcomes	Bias in selection of the reported result	Overall
*Acetabular index at one year*
Bram 2021	Serious	Moderate	Low	Moderate	Serious	Moderate	Moderate	Serious
Kim 2019	Moderate	No information	Low	Moderate	Moderate	Low	Moderate	Moderate
Murphy 2017	No information	Low	Low	No information	No information	Moderate	Moderate	Moderate
Paton 2004	Serious	Moderate	Low	Moderate	Moderate	Serious	Moderate	Serious
Sucato 1999	Low	Serious	Low	Moderate	Moderate	Moderate	Moderate	Serious
Upasani 2016	Low	Low	Low	Moderate	Moderate	Moderate	Moderate	Moderate
Wilkinson 2002	Serious	Moderate	Serious	No information	Moderate	Serious	Moderate	Serious
*Need for surgical open reduction*
Kim 2019	Moderate	No information	Low	Moderate	Moderate	Low	Moderate	Moderate
Laborie 2014	Critical	Low	Moderate	Low	Moderate	Moderate	Moderate	Critical
Larson 2019	Serious	Serious	Low	Moderate	Moderate	Moderate	Moderate	Serious
Murphy 2017	No information	Low	Low	No information	No information	Moderate	Moderate	Moderate
Paton 2004	Serious	Moderate	Low	Moderate	Moderate	Serious	Moderate	Serious
Ran 2020	Serious	Serious	Serious	Low	Serious	Low	Low	Serious
Upasani 2016	Low	Low	Low	Moderate	Moderate	Moderate	Moderate	Moderate
Wilkinson 2002	Serious	Moderate	Serious	No information	Moderate	Serious	Moderate	Serious

Seven studies reported acetabular index at one year (Bram 2021; Kim 2019; Murphy 2017; Paton 2004; Sucato 1999; Upasani 2016; Wilkinson 2002). Three studies were at moderate risk of bias (Kim 2019; Murphy 2017; Upasani 2016), and four studies were at serious risk of bias (Bram 2021; Paton 2004; Sucato 1999; Wilkinson 2002). See Figure 3.

Figure 3

ROBINS‐I plot: acetabular index at one year

Eight studies reported need for operative intervention to achieve reduction (Kim 2019; Laborie 2014; Larson 2019; Murphy 2017; Paton 2004; Ran 2020; Upasani 2016; Wilkinson 2002). Three studies were at moderate risk of bias (Kim 2019; Murphy 2017; Upasani 2016), four studies were at serious risk of bias (Larson 2019; Paton 2004; Ran 2020; Wilkinson 2002), and one study was at critical risk of bias (Laborie 2014). See Figure 4.

Figure 4

ROBINS‐I plot: Need for surgical open reduction

For both outcomes, we deemed overall risk of bias to be critical for one study because pre‐intervention confounders were not controlled for (Laborie 2014). We judged a further six studies as having a serious risk of bias due to lack of controlling for pre‐intervention confounding, and one study as having a serious risk of bias due to the retrospective identification of babies to include in the study (Sucato 1999). Serious risk of bias was also occurred in the measurement of outcome domain, where different methods of assessment were used by different assessors, assessments were unblinded or it was unclear who undertook the assessments at follow‐up.

Effects of interventions

Comparison 1: dynamic splinting versus delayed or no splinting

Thirteen studies compared dynamic splinting versus delayed or no treatment (Bergo 2013; Gardiner 1990; Kim 2019; Laborie 2014; Larson 2019; Murphy 2017; Paton 2004; Pollett 2020; Reikerås 2002; Rosendahl 2010; Sucato 1999; Wilkinson 2002; Wood 2000). Three RCTs (Pollett 2020; Rosendahl 2010; Wood 2000), and one quasi RCT (Gardiner 1990), were included in this comparison. However, due to methodological differences (different study designs) (Table 3) and different outcomes reported at different time points (Table 1), and some studies not accounting for bilateral hips from the same child in the analysis, we were not able to combine any data in a meta‐analysis .

Table 3. Dynamic splinting versus delayed or none

Study	Design	Intervention	Comparator
Bergo 2013	Cross‐sectional study	Early splinting (Frejka pillow)	Late splinting
Gardiner 1990	Quasi‐RCT	Immediate splinting Sonographic surveillance for 2 weeks	Control
Kim 2019	Prospective	Pavlik	Observed
Laborie 2014	Observational	Abduction splint (Frejka splint): persistent dislocated or dislocatable	Watchful waiting: clinically or ultrasound unstable but not dislocatable hips
Larson 2019	Reterospective	Pavlik harness	Groups were divided based on the age at which the Pavlik harness was initiated: group 1 = < 30 days; group 2 = 30 to 60 days; group 3 = > 60 days
Murphy 2017	Reterospective	Pavlik harness	Followed up without treatment
Paton 2004	Prospective	Early splinting (Pavlik)	Follow up with ultrasound
Pollett 2020	RCT	Pavlik harness	Active surveillance
Reikerås 2002	Babies 'divided' into 2 groups	Frejkas pillow for 16 weeks	Untreated
Rosendahl 2010	RCT	Immediate abduction splinting for at least 6 weeks (Frejka pillow splint with sonographic follow up)	Active sonographic surveillance but no treatment for 6 weeks
Sucato 1999	Reterospective review (observational)	Pavlik (chosen at the discretion of the treating physician)	No treatment
Wilkinson 2002	Retrospective	Pavlik	Not splinted
Wood 2000	RCT	Pavlik	No splint

RCT: Randomised controlled trial

Primary outcomes

Measurement of acetabular index

Randomised trials

Amongst the randomized comparisons, we identified no evidence of a difference in acetabular index related to the use of splinting. Gardiner 1990 reported results for the acetabular index in a follow‐up study of babies with unstable but not dislocated hips. Acetabular index was only reported at six months, which was not one of our included time points. The MD for acetabular index at six months was −0.65 (95% CI −2.98 to 1.68; 79 babies, Analysis 1.1). Rosendahl 2010 (stable hips) presented an MD of 0.10 (95% CI −0.74 to 0.94; 128 babies; very low‐certainty evidence; Analysis 1.1) for the acetabular index at one year, accounting for correlated observations from hips from the same baby. Pollett 2020 (stable hips) reported an MD 0.20 (95% CI −1.65 to 2.05) for the acetabular index at one year and MD −0.10 (95% CI −1.93 to 1.73) at two years (104 babies; very low‐certainty evidence; Analysis 1.1). Wood 2000 (stable hips) reported data at three months, which was not one of our included time points. At 24 months they reported an MD of −1.90 (95% CI −4.76 to 0.96; 44 babies; very low‐certainty evidence; Analysis 1.1).

Non‐randomised trials

Amongst the non‐randomised comparisons, we identified no evidence of a difference in acetabular index related to the use of splinting. Kim 2019 reported the results for the acetabular index (for number of hips) at two years, giving an MD of −1.20 (95% CI −3.09 to 0.69; 51 babies; Analysis 1.2). Murphy 2017 reported that, of the 72 hips that were harnessed after the first ultrasound, 69 resolved. Of the 61 not initially harnessed, "38 fully resolved on follow up imaging, 6 required harnessing after ultrasound at 3 months and 16 required harnessing after 6 month X‐ray, with one baby still being followed up in clinic (133 babies)." Absolute values were not reported. Reikerås 2002 reported results for the acetabular index (for number of hips) at 16 weeks (MD −0.80, 95% CI −2.55 to 0.95, 55 babies). No other time points were reported. Sucato 1999 reported that the final analysis was done at mean 15.9 months (range 3 to 50 months), and that no hips (0/43) were considered dysplastic in the Pavlik group, and that 1.3% (2/149) of hips in the non‐treated group were dysplastic (112 babies). Absolute values were not reported. Wilkinson 2002 reported no difference in the number of hips (%) with an acetabular angle ≥ 28°, between six and 12 months, which was 33% (14/43) in the Pavlik group and 38% (13/34) in the no splint group (58 babies).

Four studies (Bergo 2013; Laborie 2014; Larson 2019; Paton 2004), did not report data on this outcome.

Need for operative intervention

Randomised trials

Amongst the randomized comparisons, very few operative interventions occurred, with no obvious signal to indicate a higher frequency of this outcome in either group. Three studies (Gardiner 1990; Rosendahl 2010; Wood 2000), reported no surgical intervention (251 babies; very low‐certainty evidence). Pollett 2020 reported that two babies developed instability in the Pavlik harness group and were subsequently treated with closed reduction and spica cast. It is not stated explicitly if this was to achieve concentric reduction or to address residual dysplasia (104 babies; very low‐certainty evidence).

Non‐randomised trials

Amongst the non‐randomised comparisons, few operative interventions occurred, with no obvious signal to indicate a higher frequency of this outcome in either group. Kim 2019 reported that "none of the patients had any additional treatments or evidence of hip subluxation or dislocation at the follow‐up" (51 babies). Laborie 2014 (n = 2433 babies) report on surgery in babies identified through screening. Of those babies screened at birth, 20 later underwent surgery; 9 had closed or open reduction soon after birth, and 11 had initial splinting and subsequently underwent surgery for dysplasia or dislocation. In babies considered low risk and not screened, 19 underwent surgery (only one baby was splinted on diagnosis) but 14 of these were aged over six months at initial diagnosis, and thus beyond the scope of this review. In Larson 2019, groups were divided based on the age at which the Pavlik harness was initiated: group one < 30 days; group two 30 to 60 days; and group three > 60 days. The proportion of failures requiring operation were: group one 19.1% (18/94); group two 22.5% (9/40); and group three 26.2% (11/42). The study authors found no significant difference in failure rates by age (P = 0.65;176 babies). Murphy 2017 is an abstract and it is not clear if the three babies that were sent for consideration of surgery actually had surgery (133 babies; Analysis 1.3). In Paton 2004, none of the 37 babies in the early splinting group received surgery, but two of 11 babies received surgery in the delayed splinting group (unadjusted RR 0.06, 95% CI 0.00 to 1.23; 48 babies; Analysis 1.3): "one arthrogram and derogation femoral osteotomy aged 16 months for persistent dysplasia, and one open reduction aged 6 months for progression to dislocation". Wilkinson 2002 reported further treatment, with an operation in 13 of 43 hips treated with a Pavlik harness compared to 10 of 37 without splinting (unadjusted RR 1.40, 95% CI 0.25 to 7.77; 58 babies; Analysis 1.3).

Three studies (Bergo 2013; Reikerås 2002; Sucato 1999) did not report data on surgical intervention.

Complications

Randomised trials

Amongst the randomized comparisons, there were no reported complications that occurred, with no obvious signal to indicate a higher frequency of this outcome in either group. Pollett 2020 reported no femoral nerve palsy (104 babies). Gardiner 1990 reported no avascular necrosis in either group (79 babies; very low‐certainty evidence). Rosendahl 2010 found that "over the period of follow‐up, no complications of treatment were observed, and none of the babies developed abnormal clinical findings on hip examination" (128 babies; very low‐certainty evidence). Wood 2000 did not report data on complications.

Non‐randomised trials

Amongst the non‐randomised comparisons, there were very few complications, with no obvious signal to indicate a higher frequency of this outcome in either group. Laborie 2014 had an unadjusted RR of 0.39 (95% CI 0.09 to 1.74) for avascular necrosis, with four of 1882 in the early treatment group versus three of 551 in the delayed treatment group (2433 babies; Analysis 1.4). Paton 2004 and Wilkinson 2002 reported no occurrence of avascular necrosis in either group (106 babies).

Six studies (Bergo 2013; Kim 2019; Larson 2019; Murphy 2017; Reikerås 2002; Sucato 1999) did not report data on complications.

Secondary outcomes

Thirteen studies did not report data on a health economic assessment, bonding between parents and baby, or motor skill development (Bergo 2013; Gardiner 1990; Kim 2019; Laborie 2014; Larson 2019; Murphy 2017; Paton 2004; Pollett 2020; Reikerås 2002; Rosendahl 2010; Sucato 1999; Wilkinson 2002 Wood 2000). However, Larson 2019 concluded that "early initiation does not correlate with decreased failure rates, suggesting there is no urgency to initiate Pavlik harness treatment before 30 days of age. This waiting period can give parents time to become comfortable rearing their infant and improve the parent‐infant bond through activities such as feeding and holding the child." The Rosendahl 2010 study also concluded similarly "of interest is the fact that watchful waiting resulted in later treatment as well as less treatment, potentially allowing mothers time to care for their infants and establish breastfeeding. Conversely, delaying treatment may limit an increasingly mobile child. We were unable to assess these more qualitative but important outcomes in this trial."

Comparison 2: static splinting versus delayed or no splinting

Two studies compared static splinting versus delayed or no treatment (Munkhuu 2013; Wilkinson 2002). Munkhuu 2013 was a prospective cohort where treatment was delayed until 30 days in all centred hips with minor immaturity. Wilkinson 2002 was a retrospective study of decentred hips, comparing the time of splint initiation (including no use of any splint). Given the variable study designs we were not able to combine any data in a meta‐analysis (Table 4).

Table 4. Static splinting versus delayed or none

Study	Design	Intervention	Comparator
Munkhuu 2013	Prospective cohort	Type 2c‐4: Tubingen hip flexion splint	Type 2a: ultrasound follow‐up
Wilkinson 2002	Retrospective	Craig; Von Rosen	Not splinted

Amongst this comparison, there was no obvious signal to indicate a greater effectiveness of either approach based on the outcomes investigated.

Primary outcomes

Measurement of acetabular index

Wilkinson 2002 reported mean improvement on ultrasound between first examination and at 12 to 20 weeks, and the number of hips (%) with acetabular angle ≥ 28° between six and 12 months. This gave an unadjusted RR of 0.51 (95% CI 0.25 to 1.03) with the Von Rosen and Craig splint groups combined versus no splint (66 babies; Analysis 2.1).

Need for operative intervention

Wilkinson 2002 reported that further treatment with an operation was needed with an unadjusted RR of 0.34 (95% CI 0.03 to 3.64; 66 babies; Analysis 2.2).

Complications

Wilkinson 2002 reported no occurrence of avascular necrosis in either group (66 babies), and Munkhuu 2013 reported that there was no evidence for severe treatment‐related complications (1236 babies). No other complications were noted. Data were not included in a forest plot due to no events in either group.

Secondary outcomes

Neither Munkhuu 2013 nor Wilkinson 2002t reported data on a health economic assessment, bonding between parents and baby, or motor skill development.

Comparison 3: double nappies versus delayed splinting or no splinting

One quasi RCT compared double nappies to single nappies but did not report any of the review outcomes of interest (Lee 2022).

Comparison 4: dynamic splinting versus static splinting

Six studies compared dynamic versus static splints (Azzoni 2011; Gou 2021; Lyu 2021; Ran 2020; Upasani 2016; Wilkinson 2002). As one study (Azzoni 2011) was an RCT, one was a prospective cohort (Upasani 2016), and four were retrospective studies (Gou 2021; Lyu 2021;Ran 2020; Wilkinson 2002), we were not able to combine any data in a meta‐analysis (Table 5).

Table 5. Dynamic versus static splinting

Study	Design	Intervention	Comparator
Azzoni 2011	RCT	Static: Teuffel Mignon	Dynamic: Coxa‐flex
Gou 2021	Retrospective cohort	Static: Human Brace	Dynamic: Pavlik harness
Lyu 2021	Retrospective cohort	Static: Tubigen	Dynamic: Pavlik harness
Ran 2020	Retrospective cohort	Static: Tubigen	Dynamic: Pavlik harness
Upasani 2016	Prospective cohort	Static: brace treatment (Denis Browne, Von Rosen, Plastazote)	Dynamic: Pavlik harness
Wilkinson 2002	Retrospective cohort	Static: Craig; Von Rosen	Dynamic: Pavlik harness

RCT: randomised controlled trial

Primary outcomes

Measurement of acetabular index

Randomised trials

Azzoni 2011 (stable and unstable hips) did not report data on this outcome. This study reported that dynamic splinting resulted in faster acetabular development, with splints able to be discontinued seven days earlier. However, this was not supported with radiological follow‐up data.

Non‐randomised trials

Upasani 2016 did not report results by splint type but instead reported that "the average acetabular index at the time of final follow‐up was 22 ± 4 (range, 11 to 31) among the hips successfully treated with a brace and 26 ± 5 (range, 13 to 35) among the hips that required surgical treatment (P < 0.001, 159 babies)". Wilkinson 2002 reported mean improvement on ultrasound between first examination at 12 to 20 weeks, and the number of hips (%) with acetabular angle ≥ 28° between six and 12 months, with an unadjusted RR of 1.66 (95% CI 0.82 to 3.35; 68 babies; Analysis 3.1). Ran 2020 reported an unadjusted MD of 0.40 (95% CI −1.72 to 2.52; 52 babies; Analysis 3.2) at two years. Gou 2021 reported an unadjusted acetabular index at an early timepoint after treatment but the exact timepoint is unclear (MD ‐0.70 (95% CI:‐1.98 to 0.58), 134 babies, Analysis 3.2). Lyu 2021 found "no significant difference between the groups" but reported data for left and right hips separately only in successfully treated babies, so data are not reported here.

Need for operative intervention

Non‐randomised trials

Upasani 2016 reported that 42 hips failed brace treatment and required surgical treatment (unadjusted RR 0.27, 95% CI 0.16 to 0.44; 159 babies; Analysis 3.3). Wilkinson 2002 reported further treatment with an operation, which had an unadjusted RR of 3.77 (95% CI 0.41 to 34.95; 68 babies; Analysis 3.3). Ran 2020 reported need for operative intervention to achieve reduction was reported by hips rather than baby and data is shown unadjusted in Analysis 3.3 (66 hips).

Neither study (Gou 2021; Lyu 2021) reported this outcome.

Complications

Randomised trials

Azzoni 2011 reported no occurrence of avascular necrosis in either group (118 hips, very low‐certainty evidence).

Non‐randomised trials

The Upasani 2016 study found that 5% (10/204) of the hips in this cohort had radiographic evidence of osteonecrosis of the femoral head and eight hips treated with the Pavlik harness had femoral nerve palsy. Wilkinson 2002 reported no sign of avascular necrosis or deformity of the femoral head (68 babies). Ran 2020 reported no events for avascular necrosis and femoral nerve palsy (64 babies). Lyu 2021 reported no events for avascular necrosis and 3 events for femoral nerve palsy for the Pavlik harness group but no events for the Tubingen group (251 babies). No other complications were noted. Data were not included in a forest plot due to no events in either group for avascular necrosis. Gou 2021 did not report this outcome.

Secondary outcomes

Six studies did not report data on a health economic assessment, bonding between parents and baby, or motor skill development (Azzoni 2011; Gou 2021; Lyu 2021; Ran 2020; Upasani 2016; Wilkinson 2002).

For this comparison, there was no obvious signal to indicate a greater effectiveness of either approach based on the outcomes investigated.

Staged weaning versus immediate removal (post hoc comparison)

Two retrospective studies considered staged weaning of the Pavlik harness compared to removing the harness immediately (Bram 2021; Westacott 2014).

Primary outcomes

Measurement of acetabular index

Westacott 2014 reported that the mean acetabular index at 12 months in the staged weaning group (50 babies) was 26 (range 17 to 39; median 25) compared with 24.5 (range 12 to 35; median 25) in the immediate cessation group (30 babies). At two years, the mean acetabular index was 23.7 (range 16 to 42; median 23; 35 babies) and 24.8 (range 19 to 32; median 24; 11 babies), respectively. The study reported that neither difference was statistically significant. No standard deviations were reported or could be calculated so we could not include the data in a forest plot. In Bram 2021, the mean acetabular index was reported as "not significantly different between the weaned and non weaned cohorts" at one year (53 babies, Analysis 4.1). However, they included bilateral hips in the analysis (from the same babies) so this could bias the analysis.

Need for operative intervention

Westacott 2014 found no difference between groups for both a) to achieve reduction, with an unadjusted RR of 0.69 (95% CI 0.27 to 1.77; 128 babies; Analysis 4.2), and b) to address dysplasia, with an unadjusted RR of 1.80 (95% CI 0.19 to 16.82; 128 babies; Analysis 4.3). The Bram 2021 study did not report data on this outcome.

Complications

Westacott 2014 used the Kalamchi and MacEwen grading system to detect avascular necrosis radiologically at least 12 months after successful harness treatment, with an unadjusted RR of 1.96 (95% CI 0.23 to 16.73; 82 babies; Analysis 4.4), of which, in the staged weaning group, two babies were grade I, one was grade II and one was grade IV. The baby in the immediate cessation group was grade III. The study authors did not report femoral nerve palsy or other nerve palsies. One complication in the staged weaning group was reported, and this was skin breakdown in the groin crease of a baby with unilateral dysplasia, which was successfully treated with hydrocolloid dressings. Bram 2021 did not report data on this outcome.

Secondary outcomes

Neither Bram 2021 nor Westacott 2014) reported data on health economic assessment, bonding between parents and baby, or motor skill development.

Based on the comparison made, there was no obvious signal to indicate a greater effectiveness of either approach based on the outcomes investigated.

Subgroup analyses and sensitivity analyses

No subgroup analyses or sensitivity analyses were undertaken as we were not able to combine the data in meta‐analyses.

Assessment of reporting biases

We were unable to construct any funnel plots as data were not combined in meta‐analyses.

Discussion

This review found studies to address four main comparison groups for the treatment of babies under six months of age with DDH: dynamic splinting versus static splinting; static splinting versus delayed or no splinting; dynamic splinting versus delayed or no splinting and double nappies versus delayed or no splinting. A fifth, post hoc comparison was also made investigating staged weaning of the splint versus immediate removal.

Conclusions are drawn based on six randomized studies, which included 576 babies, and are supported by an additional 16 non‐randomised studies including 8237 babies. Conclusions are also made in the knowledge that each of the studies contributing the review are small, especially considering the size of the population affected, and the quality of the studies overall is poor.

Summary of main results

Dynamic splinting versus delayed or no splinting

This was the most commonly reported comparison addressed in four randomized studies and nine non‐randomised studies. Data were reported for the three primary outcomes of acetabular index, need for operative intervention and complications.

Acetabular index

All four randomized studies (355 babies) reported acetabular index at a number of different time points; 6, 10, 12, and 24 months (very low‐certainty evidence). The outcomes of 6 and 10 months were not originally specified as a time point in the protocol (Dwan 2017), but the data that were reported at an average of 10 months have been included with data at one year.

No study identified a difference in acetabular index when comparing immediate splinting to delayed splinting or no splinting. However, no study included babies with dislocated hips at the time of treatment allocation. Two studies (Rosendahl 2010, 128 babies; Wood 2000, 44 babies) compared the treatment of clinically stable dysplastic hips with immediate versus delayed splinting at six weeks gestational age, whereas Gardiner 1990 (n = 79 babies) included unstable (but not dislocated) hips comparing immediate versus delayed splinting at two weeks gestational age. There was no evidence to indicate a difference in acetabular index at any time point by delaying the onset of splinting until six weeks (for stable hips), or two weeks (for unstable hips). Furthermore, Pollett 2020 (104 babies) studied older babies at three to four months of age with stable Graf IIb or IIc hips. Initiating Pavlik harness treatment at 12 weeks, versus observation alone, did not improve the acetabular index when measured at 10 months (i.e. 3 months following completion of treatment) or at walking age.

Two non‐randomised studies directly reported acetabular index, and three categorised measures of acetabular index into ‘normal’ or ‘abnormal'. No study found evidence of a difference in acetabular index following planned delays in the onset of splinting.

Surgical intervention

In the randomized trials only, three studies (251 babies) reported no surgical intervention. One study, Pollett 2020 (104 babies), reported the need for surgical intervention (very low‐certainty evidence), with closed reduction and spica cast undertaken in two babies treated by Pavlik harness. Compared with the other randomized trials, this study included older babies aged three to four months at commencement of bracing. None of the RCTs reported the long‐term outcomes to skeletal maturity i.e. the need for surgery on the developing hip for residual dysplasia or its sequelae. Whilst this is important, to achieve these data would require studies to follow up babies for 12‐14 years.

Of the non‐randomised studies, one explicitly stated that no surgical interventions were undertaken, and three did not comment on surgical intervention. Larson 2019 compared age at initiation of Pavlik harness for unstable and dislocated hips. They found no evidence of an increase in the number needing operative intervention following delays in the initiation of harness treatment beyond 30 days postnatal age. However, there may have been differences in the types of surgery undertaken, and the subsequent morbidity associated with different surgeries.

The Paton 2004 study reported that two of 11 babies (16 hips) required surgery when splinting was delayed to 6 weeks versus none of 37 babies (59 hips) with early splinting group. Nine of the 16 hips with delayed treatment required splintage after the interval delay. All hips in this study were unstable (clinically dislocatable) and the period of splinting delay was two weeks. The authors used this to advocate for early splinting. Laborie 2014 reported numerous surgical interventions in their large screening study, but did not directly compare the effect of early versus delayed splinting. Wilkinson 2002 reported similar rates of surgery (around 30%) for unstable hips when treatment with either a Pavlik harness or no splint at any time point before three months of age.

Overall, there was no evidence of a difference in rates of surgical intervention when delaying the initiation of splinting up to six weeks gestational age for unstable dislocatable hips, though there is ongoing uncertainty amongst dislocated hips.

Complications

Three of the randomized trials (311 babies) observed no complications (one reported no femoral nerve palsy, one reported no avascular necrosis and one reported no complications without referring to any specific complications), and the other did not report complications (very low‐certainty evidence). Of the non‐randomised studies, two reported no avascular necrosis in either group, one reported very few events with no evidence of a difference between the groups and six did not report complications.

Secondary outcomes

Reporting on the secondary outcomes of health economic assessment, bonding between parents and baby, and motor skill development were limited. Several authors commented that delayed splinting can improve parent‐baby bonding, though none presented data to support this. The authors of one study highlighted that delayed splinting may “limit an increasingly mobile child.”

Static splinting versus delayed or no splinting

No randomized studies and only two non‐randomised studies looked at this comparison. It is therefore difficult to draw any conclusions. One study reported no “severe treatment related complications” with static splinting, though did not comment on the outcomes of acetabular index or need for operative intervention. Wilkinson 2002 found no evidence of a difference in acetabular index with splinting compared to no splinting. Hips in this study were either decentred or dislocated at start of treatment. This study also had serious risk of bias due to confounding, classification of the interventions and measurement of the outcome.

The lack of evidence comparing static versus delayed or no splint likely reflects the decreased use of static splinting in the treatment of developmental dysplasia of the hip.

Dynamic splinting versus static splinting

One randomized (118 hips) and three non‐randomised studies looked at dynamic versus static splinting. Of the non‐randomised studies, we deemed Upasani 2016 to be at moderate risk of bias overall; however, it was not designed as a comparison of static versus dynamic splinting (see below). Only 14 babies received a static splint. The other two non‐randomised studies were at an overall serious risk of bias due to serious risk of bias in several domains (Ran 2020; Wilkinson 2002).

The randomized trial (118 hips) reported no occurrence of avascular necrosis (very low‐certainty evidence) but did not report on acetabular index or need for surgery, instead using time to ‘recovery’ (i.e. splint discontinuation) as the primary outcome (Azzoni 2011). This study suggested that dynamic splinting resulted in faster acetabular development, with splints able to be discontinued seven days earlier. It was difficult to draw conclusions from this given the lack of radiological follow‐up data to support it.

The Upasani 2016 study was a prospective multicentre cohort of dislocated hips treated with a brace. Successful brace treatment was defined as a clinically and radiologically reduced hip without the need for surgical intervention. They found success was more likely (P < 0.001) with a dynamic splint (82.6%) versus a static splint (35.7%). However, the majority of babies in the study were treated with a dynamic splint ( dynamic 190, static 14). Whilst the effect size appears large, the certainty of this effect is very low given the very small comparator group and the potential for bias. Selection bias was unknown as the method of splint allocation was not discussed. The rate of femoral nerve palsy in this study was 4%, all of which occurred in babies treated with a Pavlik harness (the most common dynamic brace used). This rate is relatively high and may reflect the severity of cases (all hips dislocated at initiation of treatment). The study also reported a 5% avascular necrosis rate but did not offer a comparison identifying which babies had a static or dynamic brace.

Upasani 2016 did not report acetabular index for each group while Wilkinson 2002 reported the percentage of hips with an acetabular angle greater than 28 degrees at 6 and 12 months. This study involved unstable Graf III or IV hips, and demonstrated evidence of a difference in the acetabular angle between groups, though of the rate hip spica surgery was higher in the Pavlik harness group. Interpretation of the findings is hampered by the wide variation in baseline characteristics.

Ran 2020 reported no evidence of a difference in acetabular index at final follow up (minimum two years) between static and dynamic splinting despite the heterogenous severity in their treatment groups. Drawing conclusions from this study is difficult as it includes babies ranging from mild ultrasound dysplasia to frank dislocation so the numbers in each group are small.

None of the studies reported any of the planned secondary outcomes.

Staged weaning versus immediate removal

This post hoc comparison was considered by two retrospective studies, which reported no evidence of a difference in acetabular index at 12 or 24 months and no evidence of a difference in need for surgical intervention and complications.

Overall completeness and applicability of evidence

The majority of studies compared splinting in stable hips, which reflects the fact that it is an ongoing controversy in the treatment of babies with hip dysplasia. There were no randomized studies to consider the treatment of unstable hips.

Amongst stable hips, early versus late dynamic splinting was the most common comparison made. This is a comparison which is readily achievable and for which community equipoise is apparent; however, although there was evidence, the certainty of the evidence was very low. For the other interventions important within this review (i.e. early versus late static splinting, static versus dynamic splinting and weaning versus no weaning), there were no studies or high‐quality, observational research to guide treatment.

We carefully selected outcomes important to both clinicians and families. No study addressed the outcomes that were important to families (i.e. the ability to breastfeed, and the parent‐baby bond). Other studies partially assessed the outcomes, though there was no consistency as to which outcome was recorded at which time point. The inconsistencies in the timing and reporting of outcomes contributed to the difficulties in evidence synthesis.

Studies were included from a multitude of countries worldwide. However, all of the randomized controlled studies were from Europe (UK, Norway, Italy, the Netherlands) or Taiwan and only one included multiple centres. Non‐randomised studies were included from Europe, North America, Australia, China and Africa.

Babies in the studies covered the full spectrum of DDH with stable, unstable and dislocated hips included. This is positive as it encompasses all of the potential babies that we aimed to include. However, in some studies, the groups were mixed. This made drawing comparisons more difficult as the optimal treatment of the stable and unstable dysplastic hip is likely to differ.

Long‐term outcomes were not a focus within this review. Functional mobility, the development of osteoarthritis and the subsequent need for arthroplasty were not recorded. The outcomes used, such as the alpha angle on pelvic radiographs, were surrogate markers for these long‐term outcomes. However, we acknowledge the limitations in the use of surrogate measures.

Quality of the evidence

Having considered both randomized and non‐randomised studies, the overall certainty of evidence was very low. Three studies provided very low‐certainty evidence (Pollett 2020; Rosendahl 2010; Wood 2000), but had well‐defined inclusion groups that only included stable hips, and reported the acetabular index. The certainty of the evidence from Azzoni 2011 was very low, as they had a mixed population of both stable and unstable hips. They also reported time to discontinuation of splint as their main outcome and did not report acetabular index.

All included randomized studies but one were conducted in single centres, with relatively small numbers of babies (44 to 128). Due to this and there being so few RCTs, the certainty of the evidence was downgraded twice for imprecision. Generally, the reporting of measures to reduce bias was poor and many areas were determined to be at unclear or high risk of bias, so the certainty of evidence was also downgraded for risk of bias. Only two of the randomized studies properly described their randomization procedure. Azzoni 2011 stated it was “double blind” with no further explanation; two randomized studies explicitly stated that the assessors were blinded to the intervention (Gardiner 1990; Lee 2022). Two studies had complete outcome data with low risk of attrition bias (Azzoni 2011;Rosendahl 2010), but three studies were at high risk of attrition bias (Gardiner 1990; Pollett 2020; Wood 2000). In Pollett 2020, consent was withdrawn for 33 babies, since parents decided to alter the allocated treatment. None of the randomized studies had an accessible protocol or trial registration to allow for the assessment of reporting bias, though this is largely reflective of the age of the studies.

Publication bias could not be assessed, as not enough studies were included to produce a funnel plot, but a thorough search was conducted and the inclusion of non‐randomised studies may have reduced the impact of publication bias.

Table 2 shows the bias of the non‐randomised studies. They were all at moderate risk of bias, at least, with six having serious or critical risk. This was mainly because pre‐intervention confounders were not controlled for or retrospective identification of babies to include in the study. Serious risk of bias was also accountable in the measurement of outcome domain, where different methods of assessment were used by different assessors, assessments were unblinded, or it was unclear who undertook the assessments at follow‐up.

Potential biases in the review process

We attempted to overcome bias in the process. We minimised selection bias by having a comprehensive search strategy followed by manual screening. A full protocol was registered and published prior to commencement of the search (Dwan 2017). We included searches of conference abstracts to identify further studies. Studies were assessed by two assessors – one with a clinical and the other with a scientific background. Recognised assessment tools were utilised for assessment, as detailed in the protocol.

This review was not without challenges. An initial scoping review identified very few randomized controlled trials in this area, which led to the decision to include non‐randomised studies. However, it was difficult to discern cohort studies from large case series, as the methodological quality was generally poor. None of the cohort studies included pre‐registration details or had published protocols; therefore, a decision was made amongst the authors as to which were considered to be cohort studies. Also, data from non‐randomised studies are more prone to bias but the included studies have been assessed using an up‐to‐date risk of bias tool that compares non‐randomised studies to a target trial.

The search resulted in a small collection of disparate, poor‐quality studies and poor‐quality observation studies. We were consequently faced with decisions about how best to summarise this heterogeneous body of studies. We judged that a narrative approach was the most appropriate method of data synthesis, considering trials and observational studies separately.

We attempted to contact authors to acquire additional data or clarifications. The response rate was poor and therefore we included studies based on our best assessment of the conduct and results.

Study outcomes were rarely reported at the time points specified within our protocol. We therefore took a pragmatic approach, whereby we made decisions to broaden the window for reporting (i.e. outcomes reported at 10 months were considered in the one‐year analysis). Whilst we acknowledge that these decisions are to some extent arbitrary, the clinicians felt that a review at 10 months would be considered the one‐year review in routine clinical practice.

Ideally, reported results would be from high‐quality RCTs with consistent time points for outcome measurement. As this was not the case, it clearly impacts on the synthesis of the results when comparing across studies. However, for the randomized studies, there was a limited difference in the effect of the different time points reported and overall, it had limited impact on the messages in the results.

When drawing conclusions, we focused on data from randomized studies, as documented in the summary of findings tables, and supported this with further data from other studies in the text. We avoided drawing conclusions not supported by randomized study data due to the bias associated with the included non‐randomised studies. As further randomized studies are published, we anticipate further decreasing the focus on non‐randomised data.

Agreements and disagreements with other studies or reviews

There is an abundance of low‐certainty studies (i.e. case series) surrounding the treatment of non‐operative management of hip dysplasia. However, the absence of high‐certainty evidence alongside the paucity of comparative studies makes this difficult to interpret. The American Academy of Orthopaedic Surgery commissioned a review of the non‐operative treatments of hip dysplasia in babies up to six months of age (Mulpuri 2015). This review found limited evidence to support observation without splinting for babies with a clinically stable hip with ultrasound abnormalities, limited evidence to support either immediate or delayed (two to nine weeks) brace treatment for hips with clinical instability and limited evidence to support the type of brace used. The results are therefore in keeping with ours.

A recent review by Ashoor 2021 used "treatment failure" as the primary outcome. The paper attempts to attribute relative success of different splint types according to their rate of treatment failure, using pooled data from the included studies. They do not define "treatment failure" or report how this varies across the different studies. The authors have focused on comparing splints from different manufacturers. No focus on splinting regimen, such as delayed splinting, is reported. No meta‐analysis is reported. They did not report inclusion criteria for babies such as the presence of neuromuscular conditions. Included studies included randomized trials and case series. They concluded that the Von Rosen splint was superior to other devices, but we found no robust evidence to support this. They are very clear to acknowledge the lack of certainty in the evidence from the included studies and call for comparative RCTs to address the question of best splint to use in the treatment of DDH.

Figure 1

PRISMA flow diagram

Figure 2

Risk of bias plot for RCTs

Figure 3

ROBINS‐I plot: acetabular index at one year

Figure 4

ROBINS‐I plot: Need for surgical open reduction

Analysis 1.1

Comparison 1: Dynamic splinting versus delayed or no splinting, Outcome 1: Acetabular index: angle (RCTs)

Analysis 1.2

Comparison 1: Dynamic splinting versus delayed or no splinting, Outcome 2: Acetabular index: angle (non RCTs)

Analysis 1.3

Comparison 1: Dynamic splinting versus delayed or no splinting, Outcome 3: Need for operative intervention

Analysis 1.4

Comparison 1: Dynamic splinting versus delayed or no splinting, Outcome 4: Avascular necrosis

Analysis 2.1

Comparison 2: Static splinting versus delayed or no splinting, Outcome 1: Acetabular index: angle ≥ 28° (non‐RCTs)

Analysis 2.2

Comparison 2: Static splinting versus delayed or no splinting, Outcome 2: Need for operative intervention (non‐RCTs)

Analysis 3.1

Comparison 3: Dynamic splinting versus static splinting, Outcome 1: Acetabular index: angle ≥ 28° (non RCTs)

Analysis 3.2

Comparison 3: Dynamic splinting versus static splinting, Outcome 2: Acetabular index:angle (non‐RCTs)

Analysis 3.3

Comparison 3: Dynamic splinting versus static splinting, Outcome 3: Need for operative intervention (non RCTs)

Analysis 4.1

Comparison 4: Staged weaning versus immediate removal (post hoc comparison), Outcome 1: Acetabular index: angle (non‐RCT)

Analysis 4.2

Comparison 4: Staged weaning versus immediate removal (post hoc comparison), Outcome 2: Need for operative intervention to achieve reduction (non‐RCTs)

Analysis 4.3

Comparison 4: Staged weaning versus immediate removal (post hoc comparison), Outcome 3: Need for operative intervention to address dysplasia (non‐RCTs)

Analysis 4.4

Comparison 4: Staged weaning versus immediate removal (post hoc comparison), Outcome 4: Avascular necrosis (non‐RCTs)

Summary of findings 1. Dynamic splinting versus delayed or no splinting for the non‐operative management of developmental dysplasia of the hip in babies under six months of age

Outcomes	№ of babies (Studies) Follow up	Certainty of the evidence (GRADE)	Impact
Dynamic splinting versus delayed or no splinting for the non‐operative management of developmental dysplasia of the hip in babies under six months of age
Patient or population: babies under six months of age with all severities of DDH Setting: hospital Intervention: dynamic splinting Comparison: delayed or no splinting
Measurement of acetabular index at 1 year Assessed with: radiographs (angle)	265 (2 RCTs)	⊕⊝⊝⊝ Very low^a,b	One study (stable hips) presented data at one year (MD 0.10, 95% CI −0.74 to 0.94), accounting for correlated observations from hips from the same baby. Another study (stable hips) reported an MD 0.20 (95% CI −1.65 to 2.05) but did not take into account hips from the same baby in the case of bilateral hip dysplasia, so the data were not combined.
Measurement of acetabular index at 2 years Assessed with: radiographs (angle)	181 (2 RCTs)	⊕⊝⊝⊝ Very low^a,b	One study (stable hips) reported a MD −1.90(95% CI −4.76 to 0.96). Another study (stable hips) reported an MD ‐0.10 (95% CI −1.93 to 1.73) but did not take into account hips from the same baby in the case of bilateral hip dysplasia, so the data were not combined.
Measurement of acetabular index at 5 years Assessed with: radiographs (angle)	0 (0 RCTs)	‐	No studies reported data at this time point.
Need for operative intervention at study follow up (range 12 weeks to 1 year)	434 (4 RCTs)	⊕⊝⊝⊝ Very low^a,b	Three studies reported no surgical intervention. In a further study, two babies developed instability in the Pavlik harness group and were subsequently treated with closed reduction and spica cast. It is not explicitly stated if this was to achieve concentric reduction or address residual dysplasia.
Complications: avascular necrosis and femoral nerve palsy at study follow up (range 12 weeks to one year) Assessed with: grading systems (not stated)	390 (3 RCTs)	⊕⊝⊝⊝ Very low^a,b	One study found that "over the period of follow‐up, no complications of treatment were observed, and none of the children developed abnormal clinical findings on hip examination." One study reported no avascular necrosis in either group and another study reported no femoral nerve palsy in either group.
*The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval; DDH: developmental dysplasia of the hip; MD: mean difference; RCT: randomized controlled trial
GRADE Working Group grades of evidence High certainty: we are very confident that the true effect lies close to that of the estimate of the effect Moderate certainty: we are moderately confident in the effect estimate; the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different Low certainty: our confidence in the effect estimate is limited; the true effect may be substantially different from the estimate of the effect Very low certainty: we have very little confidence in the effect estimate; the true effect is likely to be substantially different from the estimate of effect
^aWe downgraded the certainty of the evidence by one level for risk of bias, as studies were at high or unclear risk of bias for selective reporting, sequence generation, allocation concealment and blinding due to limited details reported in the trial reports, and high risk of bias due to incomplete outcome data. ^bWe downgraded the certainty of the evidence by two levels for imprecision, due to the small number of included studies and babies

Summary of findings 1. Dynamic splinting versus delayed or no splinting for the non‐operative management of developmental dysplasia of the hip in babies under six months of age

Summary of findings 2. Dynamic splinting versus static splinting for the non‐operative management of developmental dysplasia of the hip in babies under six months of age

Outcomes	№ of babies (studies)	Certainty of the evidence (GRADE)	Impact
Dynamic splinting versus static splinting for the non‐operative management of developmental dysplasia of the hip in babies under six months of age
Patient or population: babies under six months of age with stable and unstable hips Setting: hospitals Intervention: dynamic splinting Comparison: static splinting
Measurement of acetabular index at 1 year Assessed with: radiographs (angle)	0 (0 RCTs)	‐	No data presented and it is unclear if the outcome was measured.
Measurement of acetabular index at 2 years Assessed with: radiographs (angle)	0 (0 RCTs)	‐	No data presented and it is unclear if the outcome was measured.
Measurement of acetabular index at 5 years Assessed with: radiographs (angle)	0 (0 RCTs)	‐	No data presented and it is unclear if the outcome was measured.
Need for operative intervention	0 (0 RCTs)	‐	No data presented and it is unclear if the outcome was measured.
Complications: avascular necrosis at 4 months Assessed with: grading systems (not stated)	118 hips (1 RCT)	⊕⊝⊝⊝ Very low^a,b	One RCT reported no occurrence of avascular necrosis in either group.
*The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). CI: confidence interval; DDH: developmental dysplasia of the hip; RCT: randomized controlled trial
GRADE Working Group grades of evidence High certainty: we are very confident that the true effect lies close to that of the estimate of the effect Moderate certainty: we are moderately confident in the effect estimate; the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different Low certainty: our confidence in the effect estimate is limited; the true effect may be substantially different from the estimate of the effect Very low certainty: we have very little confidence in the effect estimate; the true effect is likely to be substantially different from the estimate of effect
^aWe downgraded the certainty of the evidence by one level for risk of bias, as we judged risk of bias as generally unclear in all domains except incomplete outcome data, due to limited details reported in the trial report. ^bWe downgraded the certainty of the evidence by two levels for imprecision, due to there only being one small study.

Summary of findings 2. Dynamic splinting versus static splinting for the non‐operative management of developmental dysplasia of the hip in babies under six months of age

Table 1. ORBIT matrix

Study	Measurement of acetabular index	Need for operative intervention	Avascular necrosis	Femoral nerve palsy/other nerve palsies	Pressure areas on skin	Health economic assessment	Bonding between parents and child	Motor skill development	Other outcomes
Azzoni 2011	x	Reported	Reported	x	x	x	x	x	Time to recovery
Bergo 2013	x	x	x	x	x	x	x	x	Psychosocial outcomes, anxiety
Bram 2021	Reported	x	x	x	x	x	x	x	Time spent in harness
Gardiner 1990	x	Reported	Reported	x	x	x	x	x	Abnormal hips
Gou 2021	Reported	x	x	x	x	x	x	x	Success/ failure
Kim 2019	Reported	Reported	x	x	x	x	x	x	None
Laborie 2014	Measured	Reported	Reported	x	x	x	x	x	None
Larson 2019	x	Reported	x	x	x	x	Reported	x	Success/failure
Lee 2022	x	x	x	x	x	x	x	x	Alpha angle at 1 month, rate of improvement to Graf type I hips in 1 month, any problems or morbidities in the study period, and number of ultrasound examinations and orthopaedic clinic visits in the first year
Lyu 2021	Reported	x	Reported	Reported	x	x	x	x	Time needed to achieve Graf type IIb
Munkhuu 2013	x	x	x	x	x	x	x	x	Development of hips, complications
Murphy 2017	x	Partially reported	x	x	x	x	x	x	Resolution of dysplasia on subsequent imaging and failure of resolution or deterioration on subsequent imaging
Paton 2004	x	Reported	Reported	x	x	x	x	x	Late splintage
Pollett 2020	Reported	Reported	x	Reported	x	x	x	x	Bony roof angle, modifed tonnis classification
Ran 2020	Reported	Reported	Reported	Reported	x	x	x	x	Failure/ success, center‐edge angle
Reikerås 2002	Reported	x	x	x	x	x	x	x	Provokable instability, beta angles
Rosendahl 2010	Reported	NA	Reported	Reported	Reported	x	x	x	None
Sucato 1999	Reported	x	x	x	x	x	x	x	None
Upasani 2016	Partially reported	Reported	Reported	Reported	x	x	x	x	Osteonecrosis
Westacott 2014	Reported	Reported	Rreported	x	x	x	x	x	Retreatment, other complications, successful treatment
Wilkinson 2002	x	Reported	Reported	x	x	x	x	x	Number with acetabular angle ≥ 28°; improvement on ultrasound; further treatment with an abduction plaster; deformaties
Wood 2000	Reported	Reported	x	x	x	x	x	x	Acetabular cover

Table 1. ORBIT matrix

Table 2. ROBINS‐I

Bias domain	Bias due to confounding	Bias in selection of participants into the study	Bias in the classification of interventions	Bias due to departures from intended interventions	Bias due to missing data	Bias in measurement of outcomes	Bias in selection of the reported result	Overall
*Acetabular index at one year*
Bram 2021	Serious	Moderate	Low	Moderate	Serious	Moderate	Moderate	Serious
Kim 2019	Moderate	No information	Low	Moderate	Moderate	Low	Moderate	Moderate
Murphy 2017	No information	Low	Low	No information	No information	Moderate	Moderate	Moderate
Paton 2004	Serious	Moderate	Low	Moderate	Moderate	Serious	Moderate	Serious
Sucato 1999	Low	Serious	Low	Moderate	Moderate	Moderate	Moderate	Serious
Upasani 2016	Low	Low	Low	Moderate	Moderate	Moderate	Moderate	Moderate
Wilkinson 2002	Serious	Moderate	Serious	No information	Moderate	Serious	Moderate	Serious
*Need for surgical open reduction*
Kim 2019	Moderate	No information	Low	Moderate	Moderate	Low	Moderate	Moderate
Laborie 2014	Critical	Low	Moderate	Low	Moderate	Moderate	Moderate	Critical
Larson 2019	Serious	Serious	Low	Moderate	Moderate	Moderate	Moderate	Serious
Murphy 2017	No information	Low	Low	No information	No information	Moderate	Moderate	Moderate
Paton 2004	Serious	Moderate	Low	Moderate	Moderate	Serious	Moderate	Serious
Ran 2020	Serious	Serious	Serious	Low	Serious	Low	Low	Serious
Upasani 2016	Low	Low	Low	Moderate	Moderate	Moderate	Moderate	Moderate
Wilkinson 2002	Serious	Moderate	Serious	No information	Moderate	Serious	Moderate	Serious

Table 2. ROBINS‐I

Table 3. Dynamic splinting versus delayed or none

Study	Design	Intervention	Comparator
Bergo 2013	Cross‐sectional study	Early splinting (Frejka pillow)	Late splinting
Gardiner 1990	Quasi‐RCT	Immediate splinting Sonographic surveillance for 2 weeks	Control
Kim 2019	Prospective	Pavlik	Observed
Laborie 2014	Observational	Abduction splint (Frejka splint): persistent dislocated or dislocatable	Watchful waiting: clinically or ultrasound unstable but not dislocatable hips
Larson 2019	Reterospective	Pavlik harness	Groups were divided based on the age at which the Pavlik harness was initiated: group 1 = < 30 days; group 2 = 30 to 60 days; group 3 = > 60 days
Murphy 2017	Reterospective	Pavlik harness	Followed up without treatment
Paton 2004	Prospective	Early splinting (Pavlik)	Follow up with ultrasound
Pollett 2020	RCT	Pavlik harness	Active surveillance
Reikerås 2002	Babies 'divided' into 2 groups	Frejkas pillow for 16 weeks	Untreated
Rosendahl 2010	RCT	Immediate abduction splinting for at least 6 weeks (Frejka pillow splint with sonographic follow up)	Active sonographic surveillance but no treatment for 6 weeks
Sucato 1999	Reterospective review (observational)	Pavlik (chosen at the discretion of the treating physician)	No treatment
Wilkinson 2002	Retrospective	Pavlik	Not splinted
Wood 2000	RCT	Pavlik	No splint
RCT: Randomised controlled trial

Table 3. Dynamic splinting versus delayed or none

Table 4. Static splinting versus delayed or none

Study	Design	Intervention	Comparator
Munkhuu 2013	Prospective cohort	Type 2c‐4: Tubingen hip flexion splint	Type 2a: ultrasound follow‐up
Wilkinson 2002	Retrospective	Craig; Von Rosen	Not splinted

Table 4. Static splinting versus delayed or none

Table 5. Dynamic versus static splinting

Study	Design	Intervention	Comparator
Azzoni 2011	RCT	Static: Teuffel Mignon	Dynamic: Coxa‐flex
Gou 2021	Retrospective cohort	Static: Human Brace	Dynamic: Pavlik harness
Lyu 2021	Retrospective cohort	Static: Tubigen	Dynamic: Pavlik harness
Ran 2020	Retrospective cohort	Static: Tubigen	Dynamic: Pavlik harness
Upasani 2016	Prospective cohort	Static: brace treatment (Denis Browne, Von Rosen, Plastazote)	Dynamic: Pavlik harness
Wilkinson 2002	Retrospective cohort	Static: Craig; Von Rosen	Dynamic: Pavlik harness
RCT: randomised controlled trial

Table 5. Dynamic versus static splinting

Comparison 1. Dynamic splinting versus delayed or no splinting

Outcome or subgroup title	No. of studies	No. of participants	Statistical method	Effect size
1.1 Acetabular index: angle (RCTs) Show forest plot	4		Mean Difference (IV, Fixed, 95% CI)	Totals not selected

1.1.1 Six months	1		Mean Difference (IV, Fixed, 95% CI)	Totals not selected
1.1.2 One year	2		Mean Difference (IV, Fixed, 95% CI)	Totals not selected
1.1.3 Two years	2		Mean Difference (IV, Fixed, 95% CI)	Totals not selected
1.2 Acetabular index: angle (non RCTs) Show forest plot	1		Mean Difference (IV, Fixed, 95% CI)	Totals not selected

1.2.1 Two years	1		Mean Difference (IV, Fixed, 95% CI)	Totals not selected
1.3 Need for operative intervention Show forest plot	6		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected

1.3.1 RCT	3		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected
1.3.2 Prospective study	2		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected
1.3.3 Retrospective study	1		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected
1.4 Avascular necrosis Show forest plot	4		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected

1.4.1 Quasi RCT	1		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected
1.4.2 Prospective study	1		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected
1.4.3 Retrospective study	1		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected
1.4.4 Observational study	1		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected

Comparison 1. Dynamic splinting versus delayed or no splinting

Comparison 2. Static splinting versus delayed or no splinting

Outcome or subgroup title	No. of studies	No. of participants	Statistical method	Effect size
2.1 Acetabular index: angle ≥ 28° (non‐RCTs) Show forest plot	1		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected

2.2 Need for operative intervention (non‐RCTs) Show forest plot	1		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected

Comparison 2. Static splinting versus delayed or no splinting

Comparison 3. Dynamic splinting versus static splinting

Outcome or subgroup title	No. of studies	No. of participants	Statistical method	Effect size
3.1 Acetabular index: angle ≥ 28° (non RCTs) Show forest plot	1		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected

3.2 Acetabular index:angle (non‐RCTs) Show forest plot	2		Mean Difference (IV, Fixed, 95% CI)	Totals not selected

3.2.1 Less than 6 months	1		Mean Difference (IV, Fixed, 95% CI)	Totals not selected
3.2.2 2 years	1		Mean Difference (IV, Fixed, 95% CI)	Totals not selected
3.3 Need for operative intervention (non RCTs) Show forest plot	3		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected

3.3.1 Prospective cohort	1		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected
3.3.2 Retrospective study	2		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected

Comparison 3. Dynamic splinting versus static splinting

Comparison 4. Staged weaning versus immediate removal (post hoc comparison)

Outcome or subgroup title	No. of studies	No. of participants	Statistical method	Effect size
4.1 Acetabular index: angle (non‐RCT) Show forest plot	1		Mean Difference (IV, Fixed, 95% CI)	Totals not selected

4.1.1 Hips with positive Ortolani sign	1		Mean Difference (IV, Fixed, 95% CI)	Totals not selected
4.1.2 Stable Hips	1		Mean Difference (IV, Fixed, 95% CI)	Totals not selected
4.2 Need for operative intervention to achieve reduction (non‐RCTs) Show forest plot	1		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected

4.3 Need for operative intervention to address dysplasia (non‐RCTs) Show forest plot	1		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected

4.4 Avascular necrosis (non‐RCTs) Show forest plot	1		Risk Ratio (M‐H, Fixed, 95% CI)	Totals not selected

Comparison 4. Staged weaning versus immediate removal (post hoc comparison)