Early intensive behavioral intervention (EIBI) for young children with autism spectrum disorders (ASD)

Summary of findings for the main comparison. Early intensive behavioral intervention (EIBI) for young children with autism spectrum disorders (ASD)

Early intensive behavioral intervention (EIBI) for young children with autism spectrum disorders (ASD)
Patient or population: patients with young children (less than six years old) with autism spectrum disorders (ASD) Settings: family's homes Intervention: early intensive behavioral intervention (EIBI) Comparison: treatment as usual (TAU)
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	Number of participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	TAU	EIBI
Adaptive behavior Measured by: Vineland Adaptive Behavior Scales (parent‐reported scale; mean = 100 (SD = 15); higher score equates to better outcomes) Follow‐up: 2 to 3 years	The mean adaptive behavior score ranged across control groups from 48.60 points to 67.10 points	The mean adaptive behavior score in the intervention groups was, on average, 9.58 points higher (5.57 points higher to 13.6 points higher)	‐	202 (5 studies)	⊕⊕⊝⊝ Low^1,2,3	‐
Autism symptom severity Measured by: parent‐reported autism symptoms on standardised autism screening and diagnostic instruments (lower scores indicate less severe autism symptoms) Follow‐up: 2 years	‐	The mean autism symptom severity score in the intervention groups was 0.34 standard deviations lower (0.79 standard deviations lower to 0.11 standard deviations higher)	‐	81 (2 studies)	⊕⊝⊝⊝ Very low^3,4	General guidelines for the magnitude of an effect suggest that effect sizes of 0.20 to 0.50 are considered to have a small effect, effect sizes of 0.50 to 0.80 are considered to have a medium effect, and effect sizes greater than 0.80 are considered to have a large effect (Cohen 1988)
Adverse effects Measured by: worsening of adaptive behavior or autism symptom severity Follow‐up: 2 to 3 years	No adverse events were reported in any study
Intelligence Measured by: standardized IQ tests (mean = 100 (SD = 15); higher scores indicate higher IQ) Follow‐up: 2 to 3 years	The mean IQ score ranged across control groups from 49.67 points to 73.20 points	The mean IQ score in the intervention groups was, on average, 15.44 higher (9.29 points higher to 21.59 points higher)	‐	202 (5 studies)	⊕⊕⊝⊝ Low^1,2,3	‐
Communication and language skills: expressive language Measured by: standardized measures of expressive language (higher scores indicate better expressive language skills) Follow‐up: 2 to 3 years	‐	The mean expressive language score in the intervention groups was 0.51 standard deviations higher (0.12 standard deviations higher to 0.90 standard deviations higher)	‐	165 (4 studies)	⊕⊕⊝⊝ Low^1,3,5	General guidelines for the magnitude of an effect suggest that effect sizes of 0.20 to 0.50 are considered to have a small effect, effect sizes of 0.50 to 0.80 are considered to have a medium effect, and effect sizes greater than 0.80 are considered to have a large effect (Cohen 1988)
Communication and language skills: receptive language Measured by: standardized measures of receptive language (higher scores indicate better receptive language skills) Follow‐up: 2 ‐ 3 years	‐	The mean receptive language score in the intervention groups was 0.55 standard deviations higher (0.23 standard deviations higher to 0.87 standard deviations higher)	‐	164 (4 studies)	⊕⊕⊝⊝ Low^1,3,5	General guidelines for the magnitude of an effect suggest that effect sizes of 0.20 to 0.50 are considered to have a small effect, effect sizes of 0.50 to 0.80 are considered to have a medium effect, and effect sizes greater than 0.80 are considered to have a large effect (Cohen 1988)
Problem behavior Measured by: standardized parent‐report measures and checklists (lower scores indicate lower levels or less severe problem behavior) Follow‐up: 2 to 3 years	‐	The mean problem behavior score in the intervention groups was 0.58 standard deviations lower (1.24 standard deviations lower to 0.07 standard deviations higher)	‐	67 (2 studies)	⊕⊝⊝⊝ Very low^3,6	General guidelines for the magnitude of an effect suggest that effect sizes of 0.20 to 0.50 are considered to have a small effect, effect sizes of 0.50 to 0.80 are considered to have a medium effect, and effect sizes greater than 0.80 are considered to have a large effect (Cohen 1988)
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). ASD: autism spectrum disorders; CCT: clinical controlled trial; CI: Confidence interval; EIBI: early intensive behavioral intervention; IQ: intelligence quotient; RCT: randomized controlled trial
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹One study was conducted using an RCT design (Smith 2000) and four studies were conducted using a CCT design (Cohen 2006; Howard 2014; Magiati 2007; Remington 2007). Quality of evidence rating downgraded two levels due to inclusion of non‐randomized studies and associated risks of bias. ²Outcome collected in four of five studies by assessors who were blind to treatment status of participants. ³Small number of included studies precludes our ability to examine funnel plot and thereby cannot exclude the potential of publication bias. ⁴Both studies were conducted using a CCT design (Magiati 2007; Remington 2007). Quality of evidence rating downgraded three levels due to inclusion of non‐randomized studies, associated risks of bias, and small number of included studies. ⁵Outcomes collected in three of the four studies by assessors who were blind to treatment status of participants. ⁶One study was conducted using a RCT design (Smith 2000) and one study was conducted using a CCT design (Remington 2007). Quality of evidence rating downgraded three levels due to inclusion of non‐randomized studies and associated risks of bias and a small number of included studies.

Background

Description of the condition

Autism spectrum disorders (ASD) are life‐long, neurodevelopmental conditions interfering with social communication, interactions, and relationships with others. In recent years epidemiological evidence has indicated that the prevalence of ASD is higher than previously thought. In 2016, the Center for Disease Control's Autism and Developmental Disabilities Monitoring Network reported that approximately 1 in 68 children in the USA has been identified with ASD (Christensen 2016). In a systematic review of epidemiological surveys, Elsabbagh 2012 concluded that the median global prevalence of ASD was 62 in 10,000.

The fifth edition of the Diagnostic and Statistical Manual for Mental Disorders (DSM‐5) includes a single, broad category of ASD, which includes two core symptom domains: deficits in social communication and interactions, and restricted and repetitive patterns of behavior or interests (APA 2013). The DSM‐5 also includes a three‐level severity modifier for each symptom domain. The criteria can be met currently or retroactively, although symptoms must be present in early developmental periods. In the tenth revision of the International Statistical Classification of Diseases and Related Health Problems (ICD‐10), the diagnostic criteria for childhood autism is similar to the DSM‐5; however, abnormal or impaired development must be evident before three years of age in three domains: communication, social interactions, and play (WHO 1993).

Individuals with ASD are diverse in their symptom presentation; for example, some individuals avoid social contact while others are overly social and intrusive. They also vary greatly in cognitive functioning level (for example, from severe intellectual disability to well‐above average intelligence) and their ability to function in real‐life situations (for example, from living in an institutional setting to full independent living with a spouse and children). International prevalence estimates of ASD suggest that it affects 1% of children in the population (Baird 2006; Kuehn 2007), making it more prevalent than childhood cancer or juvenile diabetes. Prevalence studies have consistently indicated more boys are diagnosed with ASD than girls; the reported ratio is approximately four boys for every girl (Fombonne 2005). A lifelong condition such as this often has long‐term societal and familial costs associated with it. The total costs per year for children with ASD in the USA are estimated to be between USD 11.5 billion and USD 60.9 billion (Lavelle 2014). Children and adolescents with ASD, on average, have medical expenditures that are 4.1 to 6.2 times greater than for those without ASD (Shimabukuro 2008).

There are no evidence‐based pharmacotherapies to treat the core symptoms associated with ASD, but advances in treatment continue to be made. In fact, advances in behavioral treatments have likely outpaced advances in pharmaceutical ones. Behavioral therapies have shifted both in terms of terminology and the state of the evidence. A number of interventions, particularly for young children, are now couched under the term 'naturalistic, developmental and behavioral interventions' (NDBI; Schreibman 2015). This important but subtle shift in terminology is meant to address both the move in the field from isolated teaching episodes — somewhat characteristic of early intensive behavioral intervention (EIBI) — towards teaching in the natural environment, and the growing number of interventions informed by child development theories (Wetherby 2014). The current state of the evidence on EIBI actually suggests that most of the empirical research is of poor quality; thus making it difficult to draw firm conclusions (NICE 2014; Reichow 2012). Empirical research on other comprehensive treatment models (see: Learning Experiences — An Alternative Program for Preschoolers and Parents (LEAP) (Strain 2011) as well as focused interventions (see: Pre‐school Autism Communication Trial (PACT) (Green 2010); Joint Attention, Symbolic Play and Engagement Regulation (JASPER) (Kasari 2006; Kasari 2015), using more rigorous study designs, have shown more robust and longer‐term treatment effects (PACT: Pickles 2016; JASPER: Kasari 2008).

Description of the intervention

There is no standard, recommended treatment for ASD. Practice guidelines (for example, Dawson 1997; National Autism Center 2015; NICE 2014; NRC 2001; Odom 2010a; SIGN 2007; Volkmar 1999; Volkmar 2014) typically recommend the following treatment components be included in comprehensive programs:

addressing the core deficits of autism (for example, social and communication deficits, restricted interests, play skills, imitation);
delivering instruction in structured, predictable settings;
having a low student‐to‐teacher ratio;
programming for generalization and maintenance;
promoting family involvement;
implementing a functional approach to challenging behaviors; and
monitoring progress over time.

EIBI is a specific form of behavioral intervention and is one of the more commonly‐used treatments for ASD. The origins of EIBI are linked to the Young Autism Project model (also termed the Lovaas model) at the University of California, Los Angeles (UCLA; see Lovaas 1981; Lovaas 1987). The core elements of EIBI involve:

specific teaching procedure referred to as discrete trial training;
use of a 1:1 adult‐to‐child ratio in the early stages of the treatment; and
implementation in either home or school settings for a range of 20 to 40 hours per week across one to four years of the child's life (see Eikeseth 2009; Smith 2010).

Typically, EIBI is implemented under the supervision of personnel trained in applied behavior analysis (ABA) procedures who systematically follow a treatment manual (for example, Lovaas 1981; Maurice 1996), which indicates the scope and sequence of tasks to be introduced and taught.

How the intervention might work

In EIBI, the core deficits of ASD are addressed by developing individualized intervention programs based on the child's current behavioral repertoires (for example, communication and social skills). These individualized plans utilize behavioral techniques to teach new skills. A function‐based approach is used to decrease challenging behaviors that might interfere with learning and teach more appropriate replacement behaviors. EIBI also typically includes a family component in that parents implement, manage, or assist in treatment planning and delivery, which is thought to enhance treatment effectiveness.

Why it is important to do this review

We undertook an update of our previous Cochrane Review of EIBI for ASD (Reichow 2012), to examine if additional evidence could be identified on the effects of EIBI on young children with ASD. EIBI remains one of the most requested (Zirkel 2011), and at times, controversial comprehensive treatment models for ASD. Additionally, EIBI, like other therapeutic approaches, evolves over time and so it is essential to periodically update existing evidence. Thus, updating this review is necessary to determine if new evidence has been shown to provide greater insight into this treatment method. Finally, there is still confusion between EIBI and ABA. EIBI is a manualized treatment package, which uses technologies and techniques guided by the principles of ABA. ABA defines the science of human behavior; EIBI is one type of treatment (i.e. a set of practices), which is based on this science. ABA is, but should not be, considered synonymous with EIBI; EIBI uses ABA, but ABA is much broader than EIBI.

Objectives

Methods

Criteria for considering studies for this review

Types of studies

Randomized control trials (RCTs), quasi‐RCTs (that is, trials where a quasi‐random method of allocation was used, such as alternation or date of birth), and controlled clinical trials (CCTs) comparing EIBI. We defined and included CCTs when the studies utilized a parallel group trial design without randomized allocation of participants. For the CCTs, the participants must have been prospectively identified and assigned to treatment and comparison groups (e.g. a two‐group comparison of treatment and control, in which parent preference of experimental condition was used for group assignment). Given the longitudinal nature of the intervention, we excluded cross‐over trials.

Types of participants

Young children with ASD, autistic disorder, Asperger's disorder, pervasive developmental disorder — not otherwise specified (PDD‐NOS), or atypical autism (APA 1994; APA 2013; WHO 1993), who were younger than six years of age at the onset of treatment (that is, all participants within a group must have been younger than six years of age).

We did not exclude participants based on intelligence quotient (IQ) or presence of comorbidities.

Types of interventions

EIBI as defined above (see Description of the intervention), compared with no treatment, wait‐list control, or treatment as usual (TAU). TAU often combines a variety of treatment components, sometimes referred to as eclectic.

Types of outcome measures

Primary outcomes

Adaptive behavior
Autism symptom severity, rated by parents on autism screening and diagnostic instruments such as the Autism Diagnostic Interview — Revised (ADI‐R) (Lord 1994)
Adverse effects, defined as a deterioration or worsening in adaptive behaviour or autism symptom severity

Secondary outcomes

Intelligence
Communication and language skills
Social competence
Daily living skills
Problem behavior
Academic placement
Parent stress
Quality of life

Outcomes were measured using standardized assessments, qualitative data (for example, social validity), parent‐ or teacher‐rated scales (or both), and behavioral observation. Due to the likely variability in quality, we considered all measures, which are shown by study in Table 1. Where both parent and teacher measures were used, we prioritised parent‐reported measures. Parent‐reported measures were consistent across studies; teacher‐reported measures were not included in all studies.

Table 1. Outcome assessments and time points measured by studies

			Treatment Groups		Comparison Groups
Study	Outcomes		Pre‐Treatment	Post‐Treatment	Pre‐Treatment	Post‐Treatment
Cohen 2006	Primary	Adaptive behavior	VABS composite	VABS composite	VABS composite	VABS composite
	Primary	Autism severity	NA	NA	NA	NA
	Secondary	IQ	BSID‐II; WPPSI‐R	BSID‐II; WPPSI‐R	BSID; WPPSI‐R	BSID‐II; WPPSI‐R
		Non‐verbal IQ	MPS	MPS	MPS	MPS
		Non‐verbal social communication	NA	NA	NA	NA
		Expressive communication	RDLS	RDLS	RDLS	RDLS
		Receptive communication	RDLS	RDLS	RDLS	RDLS
		Play	NA	NA	NA	NA
		Social competence	VABS socialization domain	VABS socialization domain	VABS socialization domain	VABS socialization domain
		Daily living skills	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain
		Academic achievement	NA	NA	NA	NA
		Problem behavior	NA	NA	NA	NA
		Parent stress	NA	NA	NA	NA
		Academic placement	NA	Class placement	NA	Class placement
		Quality of life	NA	NA	NA	NA
Howard 2014	Primary	Adaptive behavior	VABS composite; Denver; DP‐II; RIDES	VABS composite; Denver; DP‐II; RIDES	VABS composite	VABS composite
	Primary	Autism severity	# of DSM‐IV criteria (APA 1994)	NA	# of DSM‐IV criteria	NA
	Secondary	IQ	WPPSI‐R; BSID‐II; S‐B; DAYC; PEP‐R; DAS; DP‐II	WPPSI‐R, BSID‐II, S‐B; DAYC, PEP‐R, DAS	WPPSI‐R, BSID‐II, S‐B; DAS	WPPSI‐R, BSID‐II, S‐B; DAS
		Non‐verbal IQ	MPS; S‐B	MPS; S‐B; Leiter‐R	MPS; S‐B	MPS; S‐B; Leiter‐R
		Non‐verbal social communication	NA	NA	NA	NA
		Expressive communication	RDLS; ITLS; REEL‐R; PLS‐3; ITDA; EVT; DP‐II	RDLS; ITLS; REEL‐R; PLS‐3; ITDA; EVT; EOWPVT	RDLS; ITLS; REEL‐R; PLS‐3; ITDA; EVT; DP‐II	RDLS; ITLS; REEL‐R; PLS‐3; ITDA; EVT; EOWPVT
		Receptive communication	RDLS; ITLS; REEL‐R; PLS‐3; ITDA; PPVT‐III; DP‐II	RDLS; ITLS; REEL‐R; PLS‐3; PPVT‐III; ROWPVT; ITDA‐1	RDLS; ITLS; REEL‐R; PLS‐3; PPVT‐III; DP‐II; ITDA‐1	RDLS; ITLS; REEL‐R; PLS‐3; PPVT‐III, ROWPVT; ITDA‐1
		Play	NA	NA	NA	NA
		Social competence	VABS socialization domain	VABS socialization domain	VABS socialization domain	VABS socialization domain
		Daily living skills	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain
		Academic achievement	NA	NA	NA	NA
		Problem behavior	NA	NA	NA	NA
		Parent stress	NA	NA	NA	NA
		Academic placement	NA	NA	NA	NA
		Quality of life	NA	NA	NA	NA
Magiati 2007	Primary	Adaptive behavior	VABS composite	VABS composite	VABS composite	VABS composite
	Primary	Autism severity	ADI‐R	ADI‐R	ADI‐R	ADI‐R
	Secondary	IQ	WPPSI‐R; BSID‐R; MPS	WPPSI‐R; BSID‐R; MPS	WPPSI‐R; BSID‐R; MPS	WPPSI‐R; BSID‐R; MPS
		Non‐verbal IQ	NA	NA	NA	NA
		Non‐verbal social communication	NA	NA	NA	NA
		Expressive communication	EOWPVT‐R	EOWPVT‐R	EOWPVT‐R	EOWPVT‐R
		Receptive communication	BPVS‐II	BPVS‐II	BPVS‐II	BPVS‐II
		Play	SPT‐II	SPT‐II	SPT‐II	SPT‐II
		Social competence	VABS socialization domain	VABS socialization domain	VABS socialization domain	VABS socialization domain
		Daily living skills	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain
		Academic achievement	NA	NA	NA	NA
		Problem behavior	NA	NA	NA	NA
		Parent stress	NA	NA	NA	NA
		Academic placement	NA	NA	NA	NA
		Quality of life	NA	NA	NA	NA
Remington 2007	Primary	Adaptive behavior	VABS composite	VABS composite	VABS composite	VABS composite
	Primary	Autism severity	ASQ	ASQ	ASQ	ASQ
	Secondary	IQ	BSID‐R; S‐B	BSID‐R; S‐B	BSID‐R; S‐B	BSID‐R; S‐B
		Non‐verbal IQ	NA	NA	NA	NA
		Non‐verbal social communication	ESCS	ESCS	ESCS	ESCS
		Expressive communication	RDLS	RDLS	RDLS	RDLS
		Receptive communication	RDLS	RDLS	RDLS	RDLS
		Play	NA	NA	NA	NA
		Social competence	VABS socialization domain	VABS socialization domain	VABS socialization domain	VABS socialization domain
		Daily living skills	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain
		Academic achievement	NA	NA	NA	NA
		Problem behavior	DCBC	DCBC	DCBD	DCBD
		Parent stress	QRS‐F parent and family problems subscale	QRS‐F parent and family problems subscale	QRS‐F parent and family problems subscale	QRS‐F parent and family problems subscale
		Academic placement	NA	NA	NA	NA
		Quality of life	NA	NA	NA	NA
Smith 2000	Primary	Adaptive behavior	VABS composite	VABS composite	VABS composite	VABS composite
	Primary	Autism severity	NA	NA	NA	NA
	Secondary	IQ	BSID‐R; S‐B	BSID‐R; S‐B	BSID‐R; S‐B	BSID‐R; S‐B
		Non‐verbal IQ	MPS	MPS	MPS	MPS
		Non‐verbal social communication	NA	NA	NA	NA
		Expressive communication	RDLS	RDLS	RDLS	RDLS
		Receptive communication	RDLS	RDLS	RDLS	RDLS
		Play	NA	NA	NA	NA
		Social competence	VABS socialization domain	VABS socialization domain	VABS socialization domain	VABS socialization domain
		Daily living skills	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain
		Academic achievement	WIAT; ELM	WIAT	WIAT	WIAT
		Problem behavior	CBCL	CBCL	CBCL	CBCL
		Parent stress	NA	NA	NA	NA
		Academic placement	Class placement	Class placement	Class placement	Class placement
		Quality of life	NA	NA	NA	NA

ADI‐R: Autism Diagnostic Interview ‐ Revised (Lord 1994)
ASQ: Autism Screening Questionnaire (Berument 1999)
BPVS‐II: British Picture Vocabulary Scale ‐ 2nd Edition (Dunn 1997b)
BSID‐II: Bayley Scales of Infant Development ‐ 2nd Edition (Bayley 1993)
CBCL: Child Behavior Checklist (Achenbach 1991)
DAS: Differential Ability Scales (Elliot 1990)
DAYC: Developmental Assessment of Young Children (Voress 1998)
DBC: Developmental Behavior Checklist (Einfeld 1995)
Denver: Denver Developmental Screening Test (Frankenbrug 1992)
DP‐II: Developmental Profile ‐ 2nd Edition (Alpern 1986)
DSM‐IV: Diagnostic and Statistical Manual of Mental Disorders ‐ 4th Edition (APA 1994)
ELM: Early Learning Measure (Smith 1995)
EOWPVT‐R: Expressive One‐Word Picture Vocabulary Test (Brownell 2000a)
EOWPVT‐R: Expressive One‐Word Picture Vocabulary Test ‐ Revised (Gardner 1990)
ESCS:Early Social Communication Scales (Mundy 1996)
EVT:Expressive Vocabulary Test (Williams 1997)
ITDA:Infant‐Toddler Developmental Assessment (Provence 1985)
ITLS: Infant‐Toddle Language Scale (Rosetti 1990)
IQ: intelligence quotient
Leiter‐R: Leiter International Performance Scale ‐ Revised (Roid 1997)
MPS: Merrill‐Palmer Scale of Mental Tests (Stutsman 1948)
NA: not assessed
NCBRF: Nisonger Child Behavior Rating Form (Tasse 1996)
PEP‐R: Psychoeducational Profile ‐ Revised (Schopler 1990)
PLS‐3:Preschool Language Scale — 3rd Edition (Zimmerman 1992)
PPVT‐III:Peabody Picture Vocabulary Test —3rd Edition (Dunn 1997a)
QRS‐F: Questionnaire on Resources and Stress‐Friedrich, Short Form (Friedrich 1983)
RDLS: Reynell Developmental Language Scales (Reynell 1990)
ROWPVT: Receptive One‐Word Picture Vocabulary Test (Brownell 2000b)
REEL‐R:Receptive Expressive Emergent Language scales — Revised (Bzoch 1991)
RIDES: Rockford Infant Developmental Evaluation Scales (Project RHISE 1979)
S‐B: Stanford‐Binet Intelligence Scale — 4th Edition (Thorndike 1986)
SPT‐II: Symbolic Play Test — 2nd Edition (Lowe 1988)
VABS: Vineland Adaptive Behavior Scales (Sparrow 1984)
WIAT: Weschler Individual Achievement Test (Weschler 1992)
WWPSI‐R: Weschler Preschool and Primary Scale of Intelligence — Revised (Wechsler 1989)

We grouped outcome time points as follows: immediately post‐intervention, one to five months post‐intervention, six to 11 months post‐intervention, 12 to 23 months post‐intervention, 24 to 35 months post‐intervention, and so on.

We reported key outcomes in a 'Summary of findings' table (see summary of findings Table for the main comparison)

Search methods for identification of studies

The search strategy emphasized sensitivity rather than specificity to avoid missing any potential studies. We did not limit the search by date or language and we did not use a study methods filter.

Electronic searches

In November 2011 we conducted the initial searches for this review (see Other published versions of this review). For this update, we conducted searches of the following databases in August 2015, April 2016 and August 2017.

Cochrane Central Register of Controlled Trials (CENTRAL; 2017, Issue 7) in the Cochrane Library, which includes the Cochrane Developmental, Psychosocial and Learning Problems Specialized Register (searched 10 August 2017)
MEDLINE Ovid (1950 to July Week 4 2017)
MEDLINE In‐Process & Other Non‐Indexed Citations Ovid (8 August 2017)
MEDLINE EPub Ahead of Print Ovid (8 August 2017)
Embase Ovid (1980 to 2017 Week 32)
CINAHL EBSCOhost (Cumulative Index to Nursing and Allied Health Literature; 1937 to 10 August 2017)
PsycINFO Ovid (1806 to July Week 5 2017)
ERIC EBSCOhost (Education Resources Information Center; 1966 to 10 August 2017)
Sociological Abstracts Proquest (1952 to 10 August 2017)
Social Science Citation Index Web of Science (SSCI; 1970 to 9 August 2017)
Conference Proceedings Citation Index — Social Science & Humanities Web of Science (CPCI‐SS&H; 1990 to 9 August 2017)
Cochrane Database of Systematic Reviews (CDSR; 2017, Issue 8) part of the Cochrane Library (searched 10 August 2017)
Database of Abstracts of Reviews of Effects (DARE; 2015, Issue 2) part of the Cochrane Library (searched on 24 August 2015, DARE ceased to be updated after this issue)
Epistemonikos (www.epistemonikos.org; searched 10 August 2017)
ClinicalTrials.gov (clinicaltrials.gov; searched 11 August 2017)
WorldCat OCLC (www.oclc.org/worldcat.en.html; searched 10 August 2017)
World Health Organization International Clinical Trials Registry Platform (WHO ICTRP; apps.who.int/trialsearch; searched 11 August 2017)

The search strategies for each database are in Appendix 1. Further details of the updated searches, including the exact search dates, are reported in Appendix 2.

Searching other resources

Grey Literature

We identified unpublished and ongoing trials by searching the following sources.

Reference lists: we searched the reference lists of the studies included in this review, and any relevant papers, to identify additional studies in the published and unpublished literature.
Correspondence: we contacted the authors of the included studies to identify any unpublished or ongoing trials.

Data collection and analysis

Selection of studies

Two review authors (BB and KH) independently screened the titles and abstracts yielded by the search against the inclusion criteria listed above (Criteria for considering studies for this review). Next, they screened the full‐text reports of studies that appeared relevant. We sought additional information from the study authors, as necessary, to resolve questions about a study's relevance or methodology. We resolved disagreement about eligibility through discussion, and when disagreements could not be resolved, we sought advice from a mediator (BR or EB). We recorded the reasons for excluding studies and presented the results of our selection process in a PRISMA diagram (Moher 2009). Neither review author was blinded to journal titles or to studies' authors and institutions.

Data extraction and management

Two review authors (BR and EB) independently extracted data for each trial using a predesigned data extraction form, to collect information about the population, intervention, randomization methods, blinding, sample size, outcome measures, follow‐up duration, attrition and handling of missing data, and methods of analysis. When data were missing, one review author (BR) contacted the study authors to request additional information (see Dealing with missing data). If further information could not be obtained, we coded the variables in question as 'unsure'.

Assessment of risk of bias in included studies

Two review authors (BR and EB) independently assessed risk of bias using Cochrane's 'Risk of bias' tool (Higgins 2017). We resolved any disagreements by discussion; no third party (KH) was needed to resolve disagreements.

We present the results of the 'Risk of bias' assessment in a 'Risk of bias' table (beneath the Characteristics of included studies tables), with the judgment of the review authors (low, high or unclear risk of bias) followed by a text box providing details on the available information that led to each judgment.

We assessed the following sources of bias: sequence generation, allocation concealment, blinding of participants and personnel, blinding of outcome assessment, incomplete outcome data, selective outcome reporting, protection against contamination, baseline measurements, and any other potential sources of bias.

Descriptions of criteria for judgements of risk of bias are shown in Table 2.

Table 2. Assessment of risk of bias

'Risk of bias' item	Question	How risk of bias was assessed
Sequence generation	Was the sequence generation method used adequate?	We judged the risk of bias as follows: 'low' ‐ when participants were allocated to treatment conditions using randomization such as computer‐generated random numbers, a random numbers table, or coin‐tossing; 'unclear' ‐ when the randomization method was not clearly stated or unknown; or 'high' ‐ when randomization did not use any of the above methods.
Allocation concealment	Was allocation adequately concealed?	We judged the risk of bias as follows: 'low' ‐ when participants and researchers were unaware of participants' future allocation to treatment condition until after decisions about eligibility were made and informed consent was obtained; 'unclear' ‐ when allocation concealment was not clearly stated or unknown; or 'high' ‐ when allocation was not concealed from either participants before informed consent or from researchers before decisions about inclusion were made, or allocation concealment was not used.
Blinding of participants and personnel	Were participants and personnel blind to which participants were in the treatment group?	We judged the risk of bias as follows: 'low' ‐ when blinding of participants and key personnel was ensured; 'unclear' ‐ when blinding of participants and key personnel was not reported; or 'high' ‐ when there was no or incomplete blinding of participants and key personnel or blinding of participants and key personnel was attempted but likely to have been broken.
Blinding of outcome assessment	Were outcome assessors blind to which participants were in the treatment group?	We judged the risk of bias as follows: 'low' ‐ when blinding of outcome assessment was ensured; 'unclear' ‐ when there was not adequate information provided in the study report to determine blinding of outcome assessment, or blinding of outcome assessment was not addressed; or 'high' ‐ when blinding of outcome assessment was not ensured.
Incomplete outcome data	Did the trial authors deal adequately with missing data?	We judged the risk of bias as follows: 'low' ‐ when the numbers of participants randomized to groups is clear and it is clear that all participants completed the trials; 'unclear' ‐ when information about which participants completed the study could not be acquired by contacting the researchers of the study; or 'high' ‐ when there was clear evidence that there was attrition or exclusion from analysis in at least one participant group that was likely related to the true outcome.
Selective outcome reporting	Did the authors of the trial omit to report on any of their outcomes?	We judged the risk of bias as follows: 'low' ‐ when it is clear that the published report includes all expected outcomes; 'unclear' ‐ when it is not clear whether other data were collected and not reported; or 'high' ‐ when the data from one or more expected outcomes were missing.
Protection against contamination	Could the control group also have received the intervention?	We judged the risk of bias as follows: 'low' ‐ when allocation was by community, institution or school, and it is unlikely that the control group received the intervention; 'unclear' ‐ when professionals were allocated within a clinic or school and it is possible that the communication between intervention and control professionals could have occurred; or 'high' ‐ when it is likely that the control group received part of the intervention.
Baseline measurements	Were the intervention and control groups similar at baseline for chronological age, IQ, adaptive behavior skills, and communication skills?	We judged the risk of bias as follows: 'low' ‐ when participant performance on outcomes were measured prior to the intervention and no important differences were present across study groups; 'unclear' ‐ when no baseline measures of outcome were reported or it was difficult to determine if baseline measures were substantially different across study groups; or 'high' ‐ when important differences were present and were likely to undermine any post‐intervention difference.
Other potential sources of bias	Through assessment, we determined whether any other source of bias was present in the trial, such as changing methods during the trial, or other anomalies.	We judged the risk of bias as follows: 'low' ‐ when no other sources of bias were detected; 'unclear' ‐ when additional sources of bias were suspected but could not be confirmed; or 'high' ‐ when other sources of bias were clearly present and likely to contribute to post‐intervention differences.

IQ: intelligence quotient

Measures of treatment effect

Dichotomous data

We did not identify any eligible study that included dichotomous data; see Reichow 2011; Table 3.

See: Summary of findings for the main comparison Early intensive behavioral intervention (EIBI) for young children with autism spectrum disorders (ASD)

Table 3. Additional methods that were not used

Analysis	Description of method	Reason not used
Measurement of treatment effect	Continuous data If outcomes are measured on a consistent scale across studies, we will calculate the effect of each study using the mean difference effect size.	As we needed to use the standardized mean difference (SMD) across most outcomes, we decided to report all effect sizes using the SMD effect size.
Measurement of treatment effect	Dichotomous data If we locate dichotomous data, we will calculate a risk ratio with a 95% confidence interval for each outcome in each trial (Deeks 2017).	We did not locate dichotomous data.
Unit of analysis issues	Cluster‐randomized trials If we locate cluster‐randomized trials, we will analyze them in accordance with the methods outlined in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011, 16.3).	We did not find cluster‐randomized trials.
Unit of analysis issues	Multiple treatment groups If we locate data from studies with multiple treatment groups, we will analyze each intervention group separately by dividing the sample size for the common comparator groups proportionately across each comparison (Higgins 2011, 16.5.5).
Assessment of reporting bias	If we identify 10 or more studies, we will draw funnel plots (estimated differences in treatment effects against their standard error). Asymmetry could be due to publication bias, but could also be due to a real relation between trial and effect size, such as when larger trials have lower compliance and compliance is positively related to effect size (Sterne 2011). If we find such a relation, we will examine clinical variation between the studies (Sterne 2011, 10.4). As a direct test for publication bias, we will conduct sensitivity analyses to compare the results from published data with data from other sources. We will do a funnel plot in an update of the review if enough additional trials are located.	We did not locate enough studies to assess reporting bias.
Subgroup analyses	If we locate enough trials, we will examine possible clinical and methodological heterogeneity using subgroup analyses. The possible subgroups that we will examine, if present, are: intervention density (intensity) and duration; type of comparison group (for example, home‐based TAU, school‐based TAU, no treatment control), and pre‐treatment participant characteristics (for example, chronological age, symptom severity, IQ, communicative ability, and level of adaptive behavior).	We did not conduct subgroup analyses due to the small number of included trials.
Sensitivity analyses	If we locate enough trials, we will explore the impact of studies with high risk of bias on the robustness of the results of the review in sensitivity analyses by removing studies with a high risk of bias on baseline measurements and blinding of outcome assessment, and reanalyzing the remaining studies to determine whether these factors affected the results.	We did not conduct sensitivity analyses due to the small number of included trials.

CCTs: controlled clinical trials
CI: confidence interval
IQ: intelligence quotient
TAU: treatment as usual

Continuous data

We analyzed continuous data when means and standard deviations were either presented in the study reports, were made available by the authors of the trials, or were calculable from the available data. No study reported individual data, so we were unable to analyze the data to assess and correct for skewness, according to the guidelines outlined in the Cochrane Handbook for Systematic Reviews of Interventions (Deeks 2017, 9.4.5.3). For outcomes that were measured using a Likert scale, we calculated the mean difference (MD) effect size. When similar outcomes were measured using different scales, we calculated a standardized mean difference (SMD) using Hedges g, with small sample correction (Hedges 1985). We presented effect sizes with 95% confidence intervals (CIs). The meta‐analysis combined all three types of effect sizes by transforming all metrics to Hedges g.

Unit of analysis issues

We did not find any cluster‐randomized trials or studies with multiple treatment groups. Please see our protocol (Reichow 2011) and Table 3 for the methods we will use to handle these studies should we find them in future updates of this review.

Given the nature of the intervention, different groups of clinicians (therapists) would often work with different children. However, there were no sets of clinicians that worked exclusively with certain sets of families for the duration of the studies, and therefore we do not feel that such trials are cluster trials.

Dealing with missing data

We assessed missing data and dropouts in the included studies. We examined the number of missing data collections at post‐treatment and reflected this examination in our analysis of the risk of bias of incomplete outcome data. We contacted authors of all included studies to inquire about missing data. We also contacted study authors if missing data were noted in a study; two study authors (Cohen 2006 and Magiati 2007) provided the review team with data. If data were missing due to attrition, we used the data reported in the study report, none of which did any imputations. For studies with missing data at post‐treatment assessment, we conducted analyses using only the available data; we did not impute missing data.

Assessment of heterogeneity

We examined heterogeneity among included studies through the use of the Chi² test, where we used a low P value (i.e. less than 0.10) to indicate statistical heterogeneity of treatment effects. We also used the I² statistic (Higgins 2002) to determine the percentage of variability that was due to heterogeneity rather than sampling error or chance. We examined estimates of the between‐studies variance components using Tau². We also discussed the possible reasons for heterogeneity and planned to conduct sensitivity analyses accordingly, where data permitted (see Sensitivity analysis). We also planned to use subgroup analyses to investigate methodological and clinical heterogeneity (see Subgroup analysis and investigation of heterogeneity).

Assessment of reporting biases

As this review includes only five studies, we did not draw funnel plots to assess reporting bias. Please see our protocol (Reichow 2011) and Table 3 for methods archived for use in future updates of this review.

Data synthesis

We combined the means of each included study by conducting a meta‐analysis. In this update, we synthesized all studies, regardless of research design; in other words, where possible, we synthesized data from the one RCT and four CCTs together. We conducted the meta‐analyses using a random‐effects model due to the possibility of variation in intervention techniques. Two studies conducted follow‐up analyses at five and two years after the cessation of the treatment (Magiati 2007; Remington 2007); for these data, we calculated effect sizes and provided a narrative description of each study's results.

Summary of findings

Using the GRADEprofiler: Guideline Development Tool (GRADEpro GDT 2015), we created a 'Summary of findings' table for our main comparison: EIBI for young children with ASD. In this table we present our findings for the primary outcomes of adaptive behavior, autism symptom severity and adverse effects, and the secondary outcomes of intelligence, communication and language skills and problem behavior.

Two review authors (BR and EB) independently assessed the quality of the evidence for each outcome using the GRADE approach (Guyatt 2008); they assigned each outcome a rating of high, moderate, low or very low quality, according to the presence of the following five criteria:

limitations in study design and implementation;
indirectness of evidence;
unexplained heterogeneity or inconsistency of results;
imprecision of results; and
high probability of publication bias (Guyatt 2008).

Subgroup analysis and investigation of heterogeneity

We planned to conduct further investigation of the causes of methodological and clinical heterogeneity using subgroup analyses. However, we decided subgroup analyses were not appropriate due to the small number of included studies. For details and examples of analyses that might be conducted should we include more studies in future updates, see Table 3 and Reichow 2011.

Sensitivity analysis

Because we located only a small number of studies, we deemed sensitivity analyses inappropriate. For further details and examples of analyses that might be conducted should we include more studies in future updates, see Table 3 and Reichow 2011.

Results

Description of studies

Results of the search

The electronic searches in November 2011 yielded five included studies. The electronic searches in August 2017 returned a total of 3660 records after de‐duplication. After initial screening, we reduced the number to 25 potential reports. We evaluated the full texts of these 25 reports. Three reports were suitable for inclusion; 21 were excluded because they were not RCTs or CCTs (for example, retrospective studies); and one was excluded because some participants were age six years or older (see Excluded studies). All three reports that were located included data of one of the five studies that was included in the previous review; hence no additional studies were located. For a flow diagram of search results, see Figure 1.

Figure 1

Study flow diagram.

We did not identify any additional studies in our searches of reference lists. We contacted the five authors of the included studies; the authors of Magiati 2007, Remington 2007, and Smith 2000 responded and indicated no knowledge of other studies that we did not locate, or of any ongoing studies. There are no ongoing studies of which we are aware.

Included studies

We included five studies examining early intensive behavioral intervention (EIBI) for young children with autism spectrum disorders (ASD) (Cohen 2006; Howard 2014; Magiati 2007; Remington 2007; Smith 2000).

Study location

Three of the five included studies were conducted in the USA (Cohen 2006; Howard 2014; Smith 2000). Two of the five studies were conducted in the UK (Magiati 2007; Remington 2007).

Study Design

One study used a randomized controlled trial (RCT) design, in which participants were randomized to EIBI or treatment as usual (TAU) (Smith 2000). Four of the five studies used a controlled clinical trial (CCT) design (Cohen 2006; Howard 2014; Magiati 2007; Remington 2007). We located no quasi‐RCTs.

Participants

The five studies included a total of 219 children; 116 children in the EIBI groups and 103 children in the TAU groups. Across all five studies the mean chronological age at treatment entry ranged from 30.2 to 42.5 months.

All studies had an inclusion criterion that participants have an independent ASD diagnosis; four of the five studies specified children could have a diagnosis of autistic disorder or pervasive developmental disorder — not otherwise specified (PDD‐NOS). The ASD diagnoses were further confirmed in three of the five studies by using the Autism Diagnostic Interview — Revised (ADI‐R) (Lord 1994). All studies specified that children could not have any other major medical conditions that would interfere with participation in the treatment.

Two studies specified an IQ inclusion criterion. In Smith 2000 children with autism had to have an IQ of 35 to 75 at treatment entry; in Cohen 2006 children with autism had to have an IQ of greater than 35. Across studies the mean pre‐treatment IQs ranged from 30.9 to 83.0 for children in the treatment groups and 37.4 to 65.0 for children in the comparison groups.

Three of the five studies included a residency inclusion criterion for participants (for example, children had to live within 60 miles of treatment center) (Cohen 2006; Remington 2007; Smith 2000). Two of the five studies specified children could not have, or currently be participating in, other interventions (Howard 2014; Magiati 2007).

Interventions

Three studies provided EIBI treatment for 24 months (Magiati 2007; Remington 2007; Smith 2000) and two studies provided treatment for 36 months (Cohen 2006; Howard 2014). The intensity of treatment was greater than 24 hours per week across all five studies.

Four of the five studies reported using EIBI based on the Lovaas/UCLA Young Autism Project model (Lovaas 1993). One study, Howard 2014, reported using EIBI based on the approach described by Maurice and colleagues (Maurice 1996; Maurice 2001).

Comparisons

The comparison group in four studies consisted of TAU provided by public schools (Cohen 2006; Howard 2014; Magiati 2007; Remington 2007), and in one study it consisted of parent training (Smith 2000).

Three studies reported that public school treatment was eclectic or autism specific (Howard 2014; Magiati 2007; Remington 2007). In one study, Cohen 2006, the comparison group received eclectic general programming for children with special needs provided by the public school system.

Outcomes

Outcome assessments and time points measured by studies is provided in Table 1.

Excluded studies

We examined 25 full‐text reports, of which we subsequently excluded 22 from this updated review. The main reason we excluded studies was due to the use of study designs other than RCTs or CCTs (primarily retrospective studies), see Figure 1. We present select characteristics of five, key excluded studies in Characteristics of excluded studies. We elected to highlight these studies because they were either: a seminal study (Lovaas 1987); a study that has led to misinterpretation of results in previous systematic reviews (Eikeseth 2007; Sallows 2005); or reviews of EIBI (Eikeseth 2009; Smith 2010).

Risk of bias in included studies

Risk of bias is shown graphically across studies in Figure 2 and for each included study in Figure 3. Further details are also provided in the 'Risk of bias' tables (beneath the Characteristics of included studies).

Figure 2

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Figure 3

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Allocation

Random sequence generation (selection bias)

One study was conducted using a randomized design (Smith 2000), and thus has a lower risk of selection bias than the other four studies (Cohen 2006; Howard 2014; Magiati 2007; Remington 2007), which were conducted using non‐randomized assignment to groups.

Allocation concealment (selection bias)

In all four non‐randomized studies, preference of experimental condition (e.g. "assignment to groups based on parental preference" (Cohen 2006, p S145); "Parents of children in the intervention group had opted for early intensive behavioral intervention" (Remington 2007, p 421)) was used as a factor in group assignment, which might introduce high risks of bias. For the randomized control trial (Smith 2000), allocation concealment was unclear.

Blinding

Blinding of participants and personnel (performance)

Due to the nature of the intervention, in which participants and study personnel interact with high frequency and regularity, we considered all five studies to be at high risk of performance bias (Cohen 2006; Howard 2014; Magiati 2007; Remington 2007; Smith 2000).

Blinding of outcome assessment (detection bias)

We considered there to be a high risk of detection bias in all five studies (Cohen 2006; Howard 2014; Magiati 2007; Remington 2007; Smith 2000). For all studies, the primary outcome was assessed using parent reports and, in one study (Magiati 2007), outcome assessors for the remaining measures were not blind to treatment status: "Assessments were conducted at home or school by the first author and a Research Assistant. They were not blind to group status." (p 805).

Incomplete outcome data

We considered the risk of bias from incomplete outcome data to be low for four studies (Cohen 2006; Magiati 2007; Remington 2007; Smith 2000). For the fifth study (Howard 2014) we rated the risk of bias as unclear since attrition was not clearly reported, with some final outcomes reporting smaller sample sizes than initial assessment sample sizes.

Selective reporting

We rated the risk of reporting bias to be low for all five included studies (Cohen 2006; Howard 2014; Magiati 2007; Remington 2007; Smith 2000).

Other potential sources of bias

Protection against contamination

We considered the risk of bias from contamination of the comparison groups receiving EIBI to be low in all five studies (Cohen 2006; Howard 2014; Magiati 2007; Remington 2007; Smith 2000).

Baseline measurements

We assessed the risk of differences between groups at baseline on four variables (chronological age, IQ, adaptive behavior, and language skills). These variables were specified post‐protocol (see Differences between protocol and review); our original protocol did not specify which variables we would assess for baseline imbalance (Reichow 2011). The risk of important differences between groups before treatment was low in two studies (Remington 2007; Smith 2000), and high in the remaining three studies (Cohen 2006; Howard 2014; Magiati 2007). In the Cohen 2006, Howard 2014, and Remington 2007 studies, on average, the children in the EIBI group were at least three months younger than the TAU group at intake. In one study, Magiati 2007, the baseline Vineland Adaptive Behaviour Scales (VABS) composite score was higher in the EIBI group compared to the TAU group (g = 0.69, 95% CI 0.04 to 1.35). Effect sizes for the differences in baseline between groups for these four variables are shown in the Characteristics of included studies table.

Other sources of bias

We did not identify any other potential sources of bias across studies.

Effects of interventions

Primary outcomes

Adaptive behavior

All five studies reported outcome data on adaptive behavior at post‐treatment using the Vineland Adaptive Behaviour Scales (VABS) composite (Sparrow 1984), which is a standardized parent interview (normative mean = 100, normative SD = 15). We synthesized the results of studies using a random‐effects meta‐analysis of the mean difference (MD) effect size. The MD effect size was 9.58 (95% confidence interval (CI) 5.57 to 13.60, P< 0.001; 202 participants; Analysis 1.1; low‐quality evidence, summary of findings Table for the main comparison; Figure 4), favoring early intensive behavioral intervention (EIBI) over treatment as usual (TAU). We downgraded the quality of the evidence due to the inclusion of non‐randomized trials. To assess the clinical significance of this effect size, we examined the raw scores reported by Remington 2007, which showed that children receiving EIBI had, on average, up to 20 more adaptive behaviors than children receiving TAU. We assessed heterogeneity using the Q statistic (Q(4) = 2.43, P = 0.66), I² = 0%, and Tau² = 0.00.

Figure 4

Forest plot of comparison: 1 Adaptive behavior, outcome: 1.1 Vineland Adaptive Behavior Scales Composite

Syntheses of the three domains (communication, socialization, and daily living skills) of the VABS are reported in the sections below on communication and language skills, social competence, and daily living skills.

Autism symptom severity

Two studies (Magiati 2007; Remington 2007) reported autism symptom severity through parent reports using the Autism Diagnostic Interview — Revised (ADI‐R; Lord 1994) and the Autism Screening Questionnaire (ASQ; Berument 1999), respectively. We combined the results of both studies using a random‐effects meta‐analysis of the standardized mean difference (SMD) effect size with small sample correction (Hedges 1985). The SMD effect size on the post‐treatment measurement was −0.34 (95% CI −0.79 to 0.11, P = 0.14; 81 participants; Analysis 1.2; very low‐quality evidence, summary of findings Table for the main comparison). The negative value of the effect size reflects that children in the EIBI group had fewer autism symptoms after treatment than children in the TAU group, although this difference was not statistically significant. We downgraded the quality of evidence due to the inclusion of non‐randomized trials. We assessed heterogeneity using the Q statistic (Q(1) = 0.23, P = 0.63), I² = 0%, and Tau² = 0.00.

Adverse effects (deterioration on a primary outcome)

No adverse effects were reported as a result of treatment in any study.

Secondary outcomes

Intelligence

IQ was measured in five all studies at post‐treatment using standardized, norm‐referenced IQ tests (e.g. Bayley Scales of Infant Development — 2nd Edition (Bayley 1993), and Weschler Preschool and Primary Scale of Intelligence — Revised (Wechsler 1989); normative mean = 100, normative SD = 15); the specific IQ tests used varied across and within studies (see Table 1). We synthesized the data for IQ across all five studies using a random‐effects meta‐analysis using the MD effect size. The mean effect size for difference in IQ between the treatment and comparison groups was 15.44 (95% CI 9.29 to 21.59, P < 0.001; 202 participants; Analysis 1.3; low‐quality evidence, summary of findings Table for the main comparison). In one study, Magiati 2007, the baseline IQ was significantly higher in the EIBI group compared to the TAU group (g = 0.64, 95% CI 0.02 to 1.25; see Characteristics of included studies table). We assessed heterogeneity using the Q statistic (Q(4) = 1.16, P = 0.88), I² = 0%, and Tau² = 0.00. For the two studies reporting follow‐up data, the SMD effect sizes were g = 0.36 (95% CI −0.26 to 0.98) for Remington 2007 and g = 0.18 (95% CI −0.49 to 0.86) for Magiati 2007 (analysis not shown).

Communication and language skills

Participants' daily communication skills were measured in all five studies at post‐treatment using the Communication domain on the VABS (normative mean = 100, normative SD = 15; Sparrow 1984). We synthesized the results of all five studies using a random‐effects meta‐analysis of the MD effect size. The mean effect size for difference in communication skills between treatment and comparison groups was 11.22 (95% CI 5.39 to 17.04, P < 0.001; 201 participants; Analysis 1.4; low‐quality evidence), favoring EIBI over TAU. In one study, Magiati 2007, the baseline scores on the communication subscale of the VABS were significantly higher in the EIBI group compared to the TAU group (g = 0.57, 95% CI −0.78 to 1.22; see Characteristics of included studies). We assessed heterogeneity using the Q statistic (Q(4) = 1.86, P = 0.76), I² = 0%, and Tau² = 0.00.

Four studies (Cohen 2006; Magiati 2007; Remington 2007; Smith 2000) measured the effects of EIBI on expressive and receptive language at post‐treatment using the Reynell Developmental Language Scales (Reynell 1990), which is a standardized, norm‐referenced assessment. The results of the four studies were synthesized in a random‐effects meta‐analysis of the SMD effect size with small sample correction (Hedges 1985). The SMD effect size for difference in expressive language between the treatment and comparison group was 0.51 (95% CI 0.12 to 0.90, P = 0.001; 165 participants; Analysis 1.5; low‐quality evidence), favoring EIBI over TAU. We assessed heterogeneity using the Q statistic (Q(3) = 4.46, P = 0.22), I² = 33%, Tau² = 0.05. The SMD effect size for difference in receptive language between the treatment and comparison group was 0.55 (95% CI 0.23 to 0.87, P = 0.001; 164 participants; Analysis 1.5; low‐quality evidence), favoring EIBI over TAU. We assessed heterogeneity using the Q statistic (Q(3) = 1.52, P = 0.68), I² = 0%, and Tau² = 0.0. The effects of EIBI on expressive and receptive language skills is shown in summary of findings Table for the main comparison.

Social competence

Participants' daily socialization skills were measured at post‐treatment using the socialization domain on the VABS (normative mean = 100, normative SD = 15; Sparrow 1984). We synthesized the results across all five studies using a random‐effects meta‐analysis of the MD effect size. The MD effect size for difference in social competence between treatment and comparison groups was 6.56 (95% CI 1.52 to 11.61, P = 0.01; 201 participants; Analysis 1.6; low‐quality evidence), favoring EIBI over TAU. We assessed heterogeneity using the Q statistic (Q(4) = 5.25, P = 0.26), I² = 24%, and Tau² = 7.94.

Daily living skills

All five studies reported post‐treatment data on the daily living skills domain of the VABS (normative mean = 100, normative SD = 15; Sparrow 1984). We synthesized the results across studies using the MD effect size. The MD effect size for difference in daily living skills between the treatment and comparison groups was 7.77 (95% CI 3.75 to 11.79, P < 0.001; 201 participants; Analysis 1.7; low‐quality evidence), favoring EIBI over TAU. We assessed heterogeneity using the Q statistic (Q(4) = 1.73, P = 0.79), I² = 0%, and Tau² = 0.00.

Problem behavior

Two studies (Remington 2007; Smith 2000) reported parent‐reported data on children's problem behavior using the Developmental Behavior Checklist (Einfeld 1995) and the Child Behavior Checklist (Achenbach 1991), respectively. We synthesized the data from these two studies using a random‐effects meta‐analysis of SMD effect size with small sample correction (Hedges 1985). The SMD effect size for differences in problem behavior between treatment and comparison groups on the post‐treatment measurement was −0.58 (95% CI −1.24 to 0.07, P = 0.08; 67 participants; Analysis 1.8; very low‐quality evidence, summary of findings Table for the main comparison), indicating no statistical differences between EIBI and TAU. We assessed heterogeneity using the Q statistic (Q(1) = 1.71, P = 0.19), I² = 41%, and Tau² = 0.09.

Academic placement

Two studies provided data pertaining to academic placement (that is, percentage of time spent with typical peers) (Cohen 2006; Smith 2000). Cohen 2006 reported that 17/21 children receiving EIBI (6/17 full inclusion without assistance, 11/17 with paraprofessional support) and 1/21 children receiving TAU were included in general education settings. Smith 2000 reported that 6/15 children receiving EIBI (4/6 full inclusion without assistance, 2/6 partial inclusion with paraprofessional support) and 3/13 children receiving TAU were included in general education settings at post‐treatment. See Analysis 1.9.

Parent stress

One study (Remington 2007) reported data on parental stress using the Questionnaire on Resources and Stress ‐ Short Form (52‐item scale; Friedrich 1983). The results from their study indicated that parents of children receiving EIBI had similar levels of stress compared to parents of children receiving TAU; that is, there was not a statistically significant difference in the levels of stress between parents of children in the treatment and comparison groups at post‐treatment (see Analysis 1.10).

Quality of life

We did not identify any data on parents' or children's quality of life.

Discussion

Summary of main results

We identified five studies which compared the effects of early intensive behavioral intervention (EIBI) to treatment as usual (TAU) in young children with autism spectrum disorders (ASD). One study used a randomized controlled trial (RCT) design (Smith 2000); four studies used a controlled clinical trial (CCT) design (Cohen 2006; Howard 2014; Magiati 2007; Remington 2007). We conducted meta‐analyses using a random‐effects model for the outcomes: adaptive behavior composite, autism symptom severity, IQ, communication and language skills, social competence, and daily living skills. The results provide weak evidence that EIBI improves adaptive behavior and autism symptom severity. Analyses of our secondary outcomes also provide weak evidence that EIBI improves IQ, expressive and receptive language, everyday communication skills, everyday social competence, daily living skills, and problem behavior for this population. We rated the quality of the evidence as low to very low using the GRADE system (Guyatt 2008), which means that more research could very well change the effect estimate and our confidence that it is precise; therefore results should be considered with caution. In addition, four studies used a CCT design and, in three of those studies, there were large differences at baseline between groups. Thus, the results must also be interpreted with caution because of risk of bias.

Overall completeness and applicability of evidence

The number of studies meeting our inclusion criteria was few; more studies examining EIBI for children with ASD were excluded than were included. We found only two RCTs investigating the use of EIBI in young children with ASD, one of which we included in this review and one of which we excluded based on the characteristics of the comparison group (see Characteristics of excluded studies).

Several factors impact the completeness and applicability of the review findings. First, the reliance on four CCTs, three of which showed group imbalance which limits the internal validity of those studies and makes it difficult to draw firm conclusions about the strength of EIBI. Second, our inclusion criteria relating to the age of the participants (that is, all participants had to be under six years old) limits the generalizability of the results to older children. Although the intervention is generally targeted at young children, there have been additional CCTs reporting positive effects of the intervention in older children. Third, the effects we found may not be generalizable to young children with significant intellectual impairments, as the floor effect of the IQ measures in several of the studies may have limited the accuracy of the sample characterization. Fourth, ASD are variable in their presentation and the diagnostic criteria have changed several times during the periods in which the included studies were conducted. Although the core characteristics have remained the same over each revision to the diagnostic criteria, each study included in this review identified slightly different inclusion criteria related to diagnosis and child characteristics, which impacts the acceptability of the evidence. Fifth, the lack of a standardized control group also limits the generalization of results, as TAU conditions varied in intensity, duration, and intervention strategies implemented. Finally, intervention effects related to psychopathology, quality of life (caregiver mental health, classroom placement), and community functioning (participation in community events or activities) were either not included in all studies or were not measured in a standardized way that allowed for meta‐analysis, or both. Outcomes related to these domains are important and will allow for greater generalizability of findings if they can be included in future versions of this review. In order for us to draw more confident conclusions about the effect of EIBI on these outcomes, we need additional research to be done which uses rigorous methods, standardized control groups, and measures that accurately record quality of life and functioning across environments.

Quality of the evidence

We assessed the quality of the evidence, using the GRADE approach, as low for most outcomes; we judged the quality of evidence for autism symptom severity and problem behavior as very low due to the inclusion of only two studies in the meta‐analyses. See Summary of findings table 1; Summary of findings table 2; Summary of findings table 3; Summary of findings table 4; Summary of findings table 5; Summary of findings table 6; Summary of findings table 7. Our assessments of the quality of evidence reflect the use of non‐randomized trials, concerns about risk of bias, imprecision due to small sample sizes, and the inability to rule out publication bias. Given the nature of the intervention, and the selected outcome measures, the risks of performance and detection bias are high. Intervention providers and the children's parents were aware of treatment status, and parental interview or report were the methods of collecting data for the two primary outcome measures (adaptive behavior and autism symptom severity). Although the Vineland Adaptive Behaviour Scales (VABS) is commonly used and is a standardized measure, parent report is not considered the most reliable method of measurement; this is further compounded because parents were aware of, and in most cases chose, the treatment status. Because of this, the results should be interpreted cautiously. The risk of publication bias is unclear since we included too few studies to enable us to assess this.

Potential biases in the review process

Our decision to include four non‐randomized studies, three of which had group imbalance, increases the risk of bias in this review, as indicated by the low‐quality rating assigned using the GRADE approach (Guyatt 2008). Because adherence and quality of treatment delivery (e.g. treatment fidelity, treatment integrity) are not provided, there is the possibility that certain therapists who delivered the intervention were more skilled than others and thus provided a higher quality of therapy, which increases the potential for performance bias.

We also decided to synthesize data from one RCT with those from four CCTs. There is not currently consensus in the field for when or whether it is appropriate to combine data from RCTs and CCTs in a single synthesis; this lack of consensus should be considered when interpreting our results.

Agreements and disagreements with other studies or reviews

The results of this review are consistent with most meta‐analyses of EIBI (Eldevik 2009; Makrygianni 2010; Reichow 2009; Virues‐Ortega 2010), which show positive effects in favor of EIBI for adaptive behavior and IQ. Our review differs from the one systematic review and meta‐analysis which showed no effect for EIBI (Spreckley 2009). Whilst we excluded the study Sallows 2005, the systematic review by Spreckley et al (Spreckley 2009) included this study, treating the parent‐mediated EIBI group as a control group for their analysis. Our review also differs from previous meta‐analyses due to our selection of adaptive behavior as the primary outcome; all previous reviews used IQ as the primary outcome. Our review extends the knowledge of the effects of EIBI through the inclusion of additional outcomes such as autism severity and language skills.

Figure 1

Study flow diagram.

Figure 2

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Figure 3

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Figure 4

Forest plot of comparison: 1 Adaptive behavior, outcome: 1.1 Vineland Adaptive Behavior Scales Composite

Analysis 1.1

Comparison 1 Early intensive behavioral intervention (EIBI) compared to for young children with autism spectrum disorders (ASD), Outcome 1 Adaptive behavior.

Analysis 1.2

Comparison 1 Early intensive behavioral intervention (EIBI) compared to for young children with autism spectrum disorders (ASD), Outcome 2 Autism symptom severity.

Analysis 1.3

Comparison 1 Early intensive behavioral intervention (EIBI) compared to for young children with autism spectrum disorders (ASD), Outcome 3 Intelligence.

Analysis 1.4

Comparison 1 Early intensive behavioral intervention (EIBI) compared to for young children with autism spectrum disorders (ASD), Outcome 4 Communication skills.

Analysis 1.5

Comparison 1 Early intensive behavioral intervention (EIBI) compared to for young children with autism spectrum disorders (ASD), Outcome 5 Language skills.

Analysis 1.6

Comparison 1 Early intensive behavioral intervention (EIBI) compared to for young children with autism spectrum disorders (ASD), Outcome 6 Social competence.

Analysis 1.7

Comparison 1 Early intensive behavioral intervention (EIBI) compared to for young children with autism spectrum disorders (ASD), Outcome 7 Daily living skills.

Analysis 1.8

Comparison 1 Early intensive behavioral intervention (EIBI) compared to for young children with autism spectrum disorders (ASD), Outcome 8 Problem behavior.

Navigate to figure in Review


Study	EIBI N	EIBI N for general education with no extra support	EIBI N for general education with support	TAU N	TAU N for general education with no extra support	TAU N for general education with support
Cohen 2006	21	6	11	21	0	1
Smith 2000	15	4	2	13	0	3

Analysis 1.9

Comparison 1 Early intensive behavioral intervention (EIBI) compared to for young children with autism spectrum disorders (ASD), Outcome 9 Academic placement.

Analysis 1.10

Comparison 1 Early intensive behavioral intervention (EIBI) compared to for young children with autism spectrum disorders (ASD), Outcome 10 Parent stress.

Summary of findings for the main comparison. Early intensive behavioral intervention (EIBI) for young children with autism spectrum disorders (ASD)

Early intensive behavioral intervention (EIBI) for young children with autism spectrum disorders (ASD)
Patient or population: patients with young children (less than six years old) with autism spectrum disorders (ASD) Settings: family's homes Intervention: early intensive behavioral intervention (EIBI) Comparison: treatment as usual (TAU)
Outcomes	*Illustrative comparative risks (95% CI)**		Relative effect (95% CI)	Number of participants (studies)	Quality of the evidence (GRADE)	Comments
	Assumed risk	Corresponding risk
	TAU	EIBI
Adaptive behavior Measured by: Vineland Adaptive Behavior Scales (parent‐reported scale; mean = 100 (SD = 15); higher score equates to better outcomes) Follow‐up: 2 to 3 years	The mean adaptive behavior score ranged across control groups from 48.60 points to 67.10 points	The mean adaptive behavior score in the intervention groups was, on average, 9.58 points higher (5.57 points higher to 13.6 points higher)	‐	202 (5 studies)	⊕⊕⊝⊝ Low^1,2,3	‐
Autism symptom severity Measured by: parent‐reported autism symptoms on standardised autism screening and diagnostic instruments (lower scores indicate less severe autism symptoms) Follow‐up: 2 years	‐	The mean autism symptom severity score in the intervention groups was 0.34 standard deviations lower (0.79 standard deviations lower to 0.11 standard deviations higher)	‐	81 (2 studies)	⊕⊝⊝⊝ Very low^3,4	General guidelines for the magnitude of an effect suggest that effect sizes of 0.20 to 0.50 are considered to have a small effect, effect sizes of 0.50 to 0.80 are considered to have a medium effect, and effect sizes greater than 0.80 are considered to have a large effect (Cohen 1988)
Adverse effects Measured by: worsening of adaptive behavior or autism symptom severity Follow‐up: 2 to 3 years	No adverse events were reported in any study
Intelligence Measured by: standardized IQ tests (mean = 100 (SD = 15); higher scores indicate higher IQ) Follow‐up: 2 to 3 years	The mean IQ score ranged across control groups from 49.67 points to 73.20 points	The mean IQ score in the intervention groups was, on average, 15.44 higher (9.29 points higher to 21.59 points higher)	‐	202 (5 studies)	⊕⊕⊝⊝ Low^1,2,3	‐
Communication and language skills: expressive language Measured by: standardized measures of expressive language (higher scores indicate better expressive language skills) Follow‐up: 2 to 3 years	‐	The mean expressive language score in the intervention groups was 0.51 standard deviations higher (0.12 standard deviations higher to 0.90 standard deviations higher)	‐	165 (4 studies)	⊕⊕⊝⊝ Low^1,3,5	General guidelines for the magnitude of an effect suggest that effect sizes of 0.20 to 0.50 are considered to have a small effect, effect sizes of 0.50 to 0.80 are considered to have a medium effect, and effect sizes greater than 0.80 are considered to have a large effect (Cohen 1988)
Communication and language skills: receptive language Measured by: standardized measures of receptive language (higher scores indicate better receptive language skills) Follow‐up: 2 ‐ 3 years	‐	The mean receptive language score in the intervention groups was 0.55 standard deviations higher (0.23 standard deviations higher to 0.87 standard deviations higher)	‐	164 (4 studies)	⊕⊕⊝⊝ Low^1,3,5	General guidelines for the magnitude of an effect suggest that effect sizes of 0.20 to 0.50 are considered to have a small effect, effect sizes of 0.50 to 0.80 are considered to have a medium effect, and effect sizes greater than 0.80 are considered to have a large effect (Cohen 1988)
Problem behavior Measured by: standardized parent‐report measures and checklists (lower scores indicate lower levels or less severe problem behavior) Follow‐up: 2 to 3 years	‐	The mean problem behavior score in the intervention groups was 0.58 standard deviations lower (1.24 standard deviations lower to 0.07 standard deviations higher)	‐	67 (2 studies)	⊕⊝⊝⊝ Very low^3,6	General guidelines for the magnitude of an effect suggest that effect sizes of 0.20 to 0.50 are considered to have a small effect, effect sizes of 0.50 to 0.80 are considered to have a medium effect, and effect sizes greater than 0.80 are considered to have a large effect (Cohen 1988)
The basis for the assumed risk* (e.g. the median control group risk across studies) is provided in footnotes. The corresponding risk (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI). ASD: autism spectrum disorders; CCT: clinical controlled trial; CI: Confidence interval; EIBI: early intensive behavioral intervention; IQ: intelligence quotient; RCT: randomized controlled trial
GRADE Working Group grades of evidence High quality: Further research is very unlikely to change our confidence in the estimate of effect. Moderate quality: Further research is likely to have an important impact on our confidence in the estimate of effect and may change the estimate. Low quality: Further research is very likely to have an important impact on our confidence in the estimate of effect and is likely to change the estimate. Very low quality: We are very uncertain about the estimate.
¹One study was conducted using an RCT design (Smith 2000) and four studies were conducted using a CCT design (Cohen 2006; Howard 2014; Magiati 2007; Remington 2007). Quality of evidence rating downgraded two levels due to inclusion of non‐randomized studies and associated risks of bias. ²Outcome collected in four of five studies by assessors who were blind to treatment status of participants. ³Small number of included studies precludes our ability to examine funnel plot and thereby cannot exclude the potential of publication bias. ⁴Both studies were conducted using a CCT design (Magiati 2007; Remington 2007). Quality of evidence rating downgraded three levels due to inclusion of non‐randomized studies, associated risks of bias, and small number of included studies. ⁵Outcomes collected in three of the four studies by assessors who were blind to treatment status of participants. ⁶One study was conducted using a RCT design (Smith 2000) and one study was conducted using a CCT design (Remington 2007). Quality of evidence rating downgraded three levels due to inclusion of non‐randomized studies and associated risks of bias and a small number of included studies.

Summary of findings for the main comparison. Early intensive behavioral intervention (EIBI) for young children with autism spectrum disorders (ASD)

Table 1. Outcome assessments and time points measured by studies

			Treatment Groups		Comparison Groups
Study	Outcomes		Pre‐Treatment	Post‐Treatment	Pre‐Treatment	Post‐Treatment
Cohen 2006	Primary	Adaptive behavior	VABS composite	VABS composite	VABS composite	VABS composite
	Primary	Autism severity	NA	NA	NA	NA
	Secondary	IQ	BSID‐II; WPPSI‐R	BSID‐II; WPPSI‐R	BSID; WPPSI‐R	BSID‐II; WPPSI‐R
		Non‐verbal IQ	MPS	MPS	MPS	MPS
		Non‐verbal social communication	NA	NA	NA	NA
		Expressive communication	RDLS	RDLS	RDLS	RDLS
		Receptive communication	RDLS	RDLS	RDLS	RDLS
		Play	NA	NA	NA	NA
		Social competence	VABS socialization domain	VABS socialization domain	VABS socialization domain	VABS socialization domain
		Daily living skills	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain
		Academic achievement	NA	NA	NA	NA
		Problem behavior	NA	NA	NA	NA
		Parent stress	NA	NA	NA	NA
		Academic placement	NA	Class placement	NA	Class placement
		Quality of life	NA	NA	NA	NA
Howard 2014	Primary	Adaptive behavior	VABS composite; Denver; DP‐II; RIDES	VABS composite; Denver; DP‐II; RIDES	VABS composite	VABS composite
	Primary	Autism severity	# of DSM‐IV criteria (APA 1994)	NA	# of DSM‐IV criteria	NA
	Secondary	IQ	WPPSI‐R; BSID‐II; S‐B; DAYC; PEP‐R; DAS; DP‐II	WPPSI‐R, BSID‐II, S‐B; DAYC, PEP‐R, DAS	WPPSI‐R, BSID‐II, S‐B; DAS	WPPSI‐R, BSID‐II, S‐B; DAS
		Non‐verbal IQ	MPS; S‐B	MPS; S‐B; Leiter‐R	MPS; S‐B	MPS; S‐B; Leiter‐R
		Non‐verbal social communication	NA	NA	NA	NA
		Expressive communication	RDLS; ITLS; REEL‐R; PLS‐3; ITDA; EVT; DP‐II	RDLS; ITLS; REEL‐R; PLS‐3; ITDA; EVT; EOWPVT	RDLS; ITLS; REEL‐R; PLS‐3; ITDA; EVT; DP‐II	RDLS; ITLS; REEL‐R; PLS‐3; ITDA; EVT; EOWPVT
		Receptive communication	RDLS; ITLS; REEL‐R; PLS‐3; ITDA; PPVT‐III; DP‐II	RDLS; ITLS; REEL‐R; PLS‐3; PPVT‐III; ROWPVT; ITDA‐1	RDLS; ITLS; REEL‐R; PLS‐3; PPVT‐III; DP‐II; ITDA‐1	RDLS; ITLS; REEL‐R; PLS‐3; PPVT‐III, ROWPVT; ITDA‐1
		Play	NA	NA	NA	NA
		Social competence	VABS socialization domain	VABS socialization domain	VABS socialization domain	VABS socialization domain
		Daily living skills	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain
		Academic achievement	NA	NA	NA	NA
		Problem behavior	NA	NA	NA	NA
		Parent stress	NA	NA	NA	NA
		Academic placement	NA	NA	NA	NA
		Quality of life	NA	NA	NA	NA
Magiati 2007	Primary	Adaptive behavior	VABS composite	VABS composite	VABS composite	VABS composite
	Primary	Autism severity	ADI‐R	ADI‐R	ADI‐R	ADI‐R
	Secondary	IQ	WPPSI‐R; BSID‐R; MPS	WPPSI‐R; BSID‐R; MPS	WPPSI‐R; BSID‐R; MPS	WPPSI‐R; BSID‐R; MPS
		Non‐verbal IQ	NA	NA	NA	NA
		Non‐verbal social communication	NA	NA	NA	NA
		Expressive communication	EOWPVT‐R	EOWPVT‐R	EOWPVT‐R	EOWPVT‐R
		Receptive communication	BPVS‐II	BPVS‐II	BPVS‐II	BPVS‐II
		Play	SPT‐II	SPT‐II	SPT‐II	SPT‐II
		Social competence	VABS socialization domain	VABS socialization domain	VABS socialization domain	VABS socialization domain
		Daily living skills	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain
		Academic achievement	NA	NA	NA	NA
		Problem behavior	NA	NA	NA	NA
		Parent stress	NA	NA	NA	NA
		Academic placement	NA	NA	NA	NA
		Quality of life	NA	NA	NA	NA
Remington 2007	Primary	Adaptive behavior	VABS composite	VABS composite	VABS composite	VABS composite
	Primary	Autism severity	ASQ	ASQ	ASQ	ASQ
	Secondary	IQ	BSID‐R; S‐B	BSID‐R; S‐B	BSID‐R; S‐B	BSID‐R; S‐B
		Non‐verbal IQ	NA	NA	NA	NA
		Non‐verbal social communication	ESCS	ESCS	ESCS	ESCS
		Expressive communication	RDLS	RDLS	RDLS	RDLS
		Receptive communication	RDLS	RDLS	RDLS	RDLS
		Play	NA	NA	NA	NA
		Social competence	VABS socialization domain	VABS socialization domain	VABS socialization domain	VABS socialization domain
		Daily living skills	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain
		Academic achievement	NA	NA	NA	NA
		Problem behavior	DCBC	DCBC	DCBD	DCBD
		Parent stress	QRS‐F parent and family problems subscale	QRS‐F parent and family problems subscale	QRS‐F parent and family problems subscale	QRS‐F parent and family problems subscale
		Academic placement	NA	NA	NA	NA
		Quality of life	NA	NA	NA	NA
Smith 2000	Primary	Adaptive behavior	VABS composite	VABS composite	VABS composite	VABS composite
	Primary	Autism severity	NA	NA	NA	NA
	Secondary	IQ	BSID‐R; S‐B	BSID‐R; S‐B	BSID‐R; S‐B	BSID‐R; S‐B
		Non‐verbal IQ	MPS	MPS	MPS	MPS
		Non‐verbal social communication	NA	NA	NA	NA
		Expressive communication	RDLS	RDLS	RDLS	RDLS
		Receptive communication	RDLS	RDLS	RDLS	RDLS
		Play	NA	NA	NA	NA
		Social competence	VABS socialization domain	VABS socialization domain	VABS socialization domain	VABS socialization domain
		Daily living skills	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain	VABS daily living skills domain
		Academic achievement	WIAT; ELM	WIAT	WIAT	WIAT
		Problem behavior	CBCL	CBCL	CBCL	CBCL
		Parent stress	NA	NA	NA	NA
		Academic placement	Class placement	Class placement	Class placement	Class placement
		Quality of life	NA	NA	NA	NA
ADI‐R: Autism Diagnostic Interview ‐ Revised (Lord 1994) ASQ: Autism Screening Questionnaire (Berument 1999) BPVS‐II: British Picture Vocabulary Scale ‐ 2nd Edition (Dunn 1997b) BSID‐II: Bayley Scales of Infant Development ‐ 2nd Edition (Bayley 1993) CBCL: Child Behavior Checklist (Achenbach 1991) DAS: Differential Ability Scales (Elliot 1990) DAYC: Developmental Assessment of Young Children (Voress 1998) DBC: Developmental Behavior Checklist (Einfeld 1995) Denver: Denver Developmental Screening Test (Frankenbrug 1992) DP‐II: Developmental Profile ‐ 2nd Edition (Alpern 1986) DSM‐IV: Diagnostic and Statistical Manual of Mental Disorders ‐ 4th Edition (APA 1994) ELM: Early Learning Measure (Smith 1995) EOWPVT‐R: Expressive One‐Word Picture Vocabulary Test (Brownell 2000a) EOWPVT‐R: Expressive One‐Word Picture Vocabulary Test ‐ Revised (Gardner 1990) ESCS:Early Social Communication Scales (Mundy 1996) EVT:Expressive Vocabulary Test (Williams 1997) ITDA:Infant‐Toddler Developmental Assessment (Provence 1985) ITLS: Infant‐Toddle Language Scale (Rosetti 1990) IQ: intelligence quotient Leiter‐R: Leiter International Performance Scale ‐ Revised (Roid 1997) MPS: Merrill‐Palmer Scale of Mental Tests (Stutsman 1948) NA: not assessed NCBRF: Nisonger Child Behavior Rating Form (Tasse 1996) PEP‐R: Psychoeducational Profile ‐ Revised (Schopler 1990) PLS‐3:Preschool Language Scale — 3rd Edition (Zimmerman 1992) PPVT‐III:Peabody Picture Vocabulary Test —3rd Edition (Dunn 1997a) QRS‐F: Questionnaire on Resources and Stress‐Friedrich, Short Form (Friedrich 1983) RDLS: Reynell Developmental Language Scales (Reynell 1990) ROWPVT: Receptive One‐Word Picture Vocabulary Test (Brownell 2000b) REEL‐R:Receptive Expressive Emergent Language scales — Revised (Bzoch 1991) RIDES: Rockford Infant Developmental Evaluation Scales (Project RHISE 1979) S‐B: Stanford‐Binet Intelligence Scale — 4th Edition (Thorndike 1986) SPT‐II: Symbolic Play Test — 2nd Edition (Lowe 1988) VABS: Vineland Adaptive Behavior Scales (Sparrow 1984) WIAT: Weschler Individual Achievement Test (Weschler 1992) WWPSI‐R: Weschler Preschool and Primary Scale of Intelligence — Revised (Wechsler 1989)

Table 1. Outcome assessments and time points measured by studies

Table 2. Assessment of risk of bias

'Risk of bias' item	Question	How risk of bias was assessed
Sequence generation	Was the sequence generation method used adequate?	We judged the risk of bias as follows: 'low' ‐ when participants were allocated to treatment conditions using randomization such as computer‐generated random numbers, a random numbers table, or coin‐tossing; 'unclear' ‐ when the randomization method was not clearly stated or unknown; or 'high' ‐ when randomization did not use any of the above methods.
Allocation concealment	Was allocation adequately concealed?	We judged the risk of bias as follows: 'low' ‐ when participants and researchers were unaware of participants' future allocation to treatment condition until after decisions about eligibility were made and informed consent was obtained; 'unclear' ‐ when allocation concealment was not clearly stated or unknown; or 'high' ‐ when allocation was not concealed from either participants before informed consent or from researchers before decisions about inclusion were made, or allocation concealment was not used.
Blinding of participants and personnel	Were participants and personnel blind to which participants were in the treatment group?	We judged the risk of bias as follows: 'low' ‐ when blinding of participants and key personnel was ensured; 'unclear' ‐ when blinding of participants and key personnel was not reported; or 'high' ‐ when there was no or incomplete blinding of participants and key personnel or blinding of participants and key personnel was attempted but likely to have been broken.
Blinding of outcome assessment	Were outcome assessors blind to which participants were in the treatment group?	We judged the risk of bias as follows: 'low' ‐ when blinding of outcome assessment was ensured; 'unclear' ‐ when there was not adequate information provided in the study report to determine blinding of outcome assessment, or blinding of outcome assessment was not addressed; or 'high' ‐ when blinding of outcome assessment was not ensured.
Incomplete outcome data	Did the trial authors deal adequately with missing data?	We judged the risk of bias as follows: 'low' ‐ when the numbers of participants randomized to groups is clear and it is clear that all participants completed the trials; 'unclear' ‐ when information about which participants completed the study could not be acquired by contacting the researchers of the study; or 'high' ‐ when there was clear evidence that there was attrition or exclusion from analysis in at least one participant group that was likely related to the true outcome.
Selective outcome reporting	Did the authors of the trial omit to report on any of their outcomes?	We judged the risk of bias as follows: 'low' ‐ when it is clear that the published report includes all expected outcomes; 'unclear' ‐ when it is not clear whether other data were collected and not reported; or 'high' ‐ when the data from one or more expected outcomes were missing.
Protection against contamination	Could the control group also have received the intervention?	We judged the risk of bias as follows: 'low' ‐ when allocation was by community, institution or school, and it is unlikely that the control group received the intervention; 'unclear' ‐ when professionals were allocated within a clinic or school and it is possible that the communication between intervention and control professionals could have occurred; or 'high' ‐ when it is likely that the control group received part of the intervention.
Baseline measurements	Were the intervention and control groups similar at baseline for chronological age, IQ, adaptive behavior skills, and communication skills?	We judged the risk of bias as follows: 'low' ‐ when participant performance on outcomes were measured prior to the intervention and no important differences were present across study groups; 'unclear' ‐ when no baseline measures of outcome were reported or it was difficult to determine if baseline measures were substantially different across study groups; or 'high' ‐ when important differences were present and were likely to undermine any post‐intervention difference.
Other potential sources of bias	Through assessment, we determined whether any other source of bias was present in the trial, such as changing methods during the trial, or other anomalies.	We judged the risk of bias as follows: 'low' ‐ when no other sources of bias were detected; 'unclear' ‐ when additional sources of bias were suspected but could not be confirmed; or 'high' ‐ when other sources of bias were clearly present and likely to contribute to post‐intervention differences.
IQ: intelligence quotient

Table 2. Assessment of risk of bias

Table 3. Additional methods that were not used

Analysis	Description of method	Reason not used
Measurement of treatment effect	Continuous data If outcomes are measured on a consistent scale across studies, we will calculate the effect of each study using the mean difference effect size.	As we needed to use the standardized mean difference (SMD) across most outcomes, we decided to report all effect sizes using the SMD effect size.
Measurement of treatment effect	Dichotomous data If we locate dichotomous data, we will calculate a risk ratio with a 95% confidence interval for each outcome in each trial (Deeks 2017).	We did not locate dichotomous data.
Unit of analysis issues	Cluster‐randomized trials If we locate cluster‐randomized trials, we will analyze them in accordance with the methods outlined in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011, 16.3).	We did not find cluster‐randomized trials.
Unit of analysis issues	Multiple treatment groups If we locate data from studies with multiple treatment groups, we will analyze each intervention group separately by dividing the sample size for the common comparator groups proportionately across each comparison (Higgins 2011, 16.5.5).
Assessment of reporting bias	If we identify 10 or more studies, we will draw funnel plots (estimated differences in treatment effects against their standard error). Asymmetry could be due to publication bias, but could also be due to a real relation between trial and effect size, such as when larger trials have lower compliance and compliance is positively related to effect size (Sterne 2011). If we find such a relation, we will examine clinical variation between the studies (Sterne 2011, 10.4). As a direct test for publication bias, we will conduct sensitivity analyses to compare the results from published data with data from other sources. We will do a funnel plot in an update of the review if enough additional trials are located.	We did not locate enough studies to assess reporting bias.
Subgroup analyses	If we locate enough trials, we will examine possible clinical and methodological heterogeneity using subgroup analyses. The possible subgroups that we will examine, if present, are: intervention density (intensity) and duration; type of comparison group (for example, home‐based TAU, school‐based TAU, no treatment control), and pre‐treatment participant characteristics (for example, chronological age, symptom severity, IQ, communicative ability, and level of adaptive behavior).	We did not conduct subgroup analyses due to the small number of included trials.
Sensitivity analyses	If we locate enough trials, we will explore the impact of studies with high risk of bias on the robustness of the results of the review in sensitivity analyses by removing studies with a high risk of bias on baseline measurements and blinding of outcome assessment, and reanalyzing the remaining studies to determine whether these factors affected the results.	We did not conduct sensitivity analyses due to the small number of included trials.
CCTs: controlled clinical trials CI: confidence interval IQ: intelligence quotient TAU: treatment as usual

Table 3. Additional methods that were not used

Comparison 1. Early intensive behavioral intervention (EIBI) compared to for young children with autism spectrum disorders (ASD)

Outcome or subgroup title	No. of studies	No. of participants	Statistical method	Effect size
1 Adaptive behavior Show forest plot	5	202	Mean Difference (IV, Random, 95% CI)	9.58 [5.57, 13.60]

2 Autism symptom severity Show forest plot	2	81	Std. Mean Difference (IV, Random, 95% CI)	‐0.34 [‐0.79, 0.11]

3 Intelligence Show forest plot	5	202	Mean Difference (IV, Random, 95% CI)	15.44 [9.29, 21.59]

4 Communication skills Show forest plot	5	201	Mean Difference (IV, Random, 95% CI)	11.22 [5.39, 17.04]

5 Language skills Show forest plot	4		Std. Mean Difference (IV, Random, 95% CI)	Subtotals only

5.1 Expressive language	4	165	Std. Mean Difference (IV, Random, 95% CI)	0.51 [0.12, 0.90]
5.2 Receptive language	4	164	Std. Mean Difference (IV, Random, 95% CI)	0.55 [0.23, 0.87]
6 Social competence Show forest plot	5	201	Mean Difference (IV, Random, 95% CI)	6.56 [1.52, 11.61]

7 Daily living skills Show forest plot	5	201	Mean Difference (IV, Random, 95% CI)	7.77 [3.75, 11.79]

8 Problem behavior Show forest plot	2	67	Std. Mean Difference (IV, Random, 95% CI)	‐0.58 [‐1.24, 0.07]

9 Academic placement Show forest plot			Other data	No numeric data

10 Parent stress Show forest plot	1		Std. Mean Difference (IV, Random, 95% CI)	Totals not selected

Comparison 1. Early intensive behavioral intervention (EIBI) compared to for young children with autism spectrum disorders (ASD)