Scolaris Content Display Scolaris Content Display

Formación de los profesionales en materia de protección de la infancia para mejorar la denuncia del maltrato y el abandono de los niños

Contraer todo Desplegar todo

Antecedentes

Muchos países exigen que los profesionales que trabajan con niños denuncien los casos conocidos o sospechosos de maltrato y abandono infantil significativo a las autoridades legales de protección o salvaguardia de la infancia. Considerado de manera global, hay millones de profesionales que cumplen estas funciones, y muchos más que lo harán en el futuro. Garantizar su formación para denunciar el maltrato y el abandono de los niños es una prioridad clave para los países y las organizaciones si se quiere que los esfuerzos para abordar la violencia contra los niños tengan éxito.

Objetivos

Evaluar la efectividad de la formación destinada a mejorar la denuncia del maltrato y el abandono infantil por parte de los profesionales e investigar los posibles componentes de las intervenciones formativas efectivas.

Métodos de búsqueda

Se realizaron búsquedas en CENTRAL, MEDLINE, Embase, otras 18 bases de datos y un registro de ensayos hasta el 4 de junio de 2021. También se realizaron búsquedas manuales en listas de referencias, revistas seleccionadas y sitios web, y se distribuyó una solicitud de estudios a los investigadores a través de una lista de discusión por correo electrónico.

Criterios de selección

Todos los ensayos controlados aleatorizados (ECA), cuasialeatorizados y estudios controlados tipo antes y después (before‐and‐after studies) que examinaran los efectos de las intervenciones de formación para profesionales cualificados (p. ej., maestros, profesionales de la atención infantil, médicos, enfermeros y profesionales de la salud mental) para mejorar la denuncia del maltrato y el abandono infantil, en comparación con ninguna formación, un control de lista de espera o la formación alternativa (no relacionada con el maltrato ni el abandono infantil).

Obtención y análisis de los datos

Se utilizaron los procedimientos metodológicos descritos en el Manual Cochrane de revisiones sistemáticas de intervenciones. Cuando fue posible se resumieron en un metanálisis los efectos de la formación. Además, se resumieron los resultados de los desenlaces principales (número de casos denunciados de maltrato y abandono infantil, calidad de los casos denunciados, eventos adversos) y de los desenlaces secundarios (conocimientos, habilidades y actitudes hacia el deber de denunciar). Se utilizó el método GRADE para calificar la certeza de la evidencia.

Resultados principales

Se incluyeron 11 ensayos (1484 participantes) y se utilizaron los datos de nueve de los 11 ensayos en la síntesis cuantitativa. Los ensayos se realizaron en países de ingresos altos como EE.UU., Canadá y los Países Bajos, con profesionales cualificados. En ocho de los 11 ensayos, las intervenciones se realizaron en talleres o seminarios presenciales, y en tres ensayos las intervenciones se llevaron a cabo como módulos de formación a distancia al ritmo de cada persona. Las intervenciones fueron desarrolladas por expertos e impartidas por facilitadores especializados, expertos en áreas de contenido o equipos interdisciplinarios. Solo tres de los 11 estudios incluidos se realizaron en los últimos diez años.

Desenlaces principales

Tres estudios midieron el número de casos de maltrato y abandono infantil notificados por los participantes sobre los casos reales denunciados, tres meses después de la formación. Los resultados de un estudio (42 participantes) favorecieron la intervención frente a la lista de espera, pero la evidencia es muy incierta (diferencia de medias estandarizada [DME] 0,81; intervalo de confianza [IC] del 95%: 0,18 a 1,43; evidencia de certeza muy baja).

Tres estudios midieron el número de casos de maltrato y abandono infantil a través de las respuestas de los participantes a viñetas de casos hipotéticos inmediatamente después de la formación. Un metanálisis de dos estudios (87 participantes) favoreció la formación frente a ninguna formación o la lista de espera para la formación, pero la evidencia es muy incierta (DME 1,81; IC del 95%: 1,30 a 2,32; evidencia de certeza muy baja).

No se identificaron estudios que midieran el número de casos de maltrato y abandono infantil a través de los registros oficiales de las denuncias realizadas a las autoridades de protección infantil, ni los efectos adversos de la formación.

Desenlaces secundarios

Cuatro estudios midieron el conocimiento de los profesionales sobre el deber, los procesos y los procedimientos para denunciar después de la intervención. Los resultados de un estudio (744 participantes) podrían favorecer la intervención sobre la lista de espera para la formación (DME 1,06; IC del 95%: 0,90 a 1,21; evidencia de certeza baja).

Cuatro estudios midieron los conocimientos de los profesionales sobre los conceptos básicos de todas las formas de maltrato y abandono infantil después de la intervención. Un metanálisis de dos estudios (154 participantes) favoreció la formación frente a ninguna formación, pero la evidencia es muy incierta (DME 0,68; IC del 95%: 0,35 a 1,01; evidencia de certeza muy baja).

Tres estudios midieron el conocimiento de los profesionales sobre los conceptos básicos del abuso sexual infantil después de la intervención. Un metanálisis de estos tres estudios (238 participantes) favoreció la formación frente a ninguna formación o la lista de espera para la formación, pero la evidencia es muy incierta (DME 1,44; IC del 95%: 0,43 a 2,45; evidencia de certeza muy baja).

Un estudio (25 participantes) midió la habilidad de los profesionales para distinguir los casos denunciables y no denunciables después de la intervención. Los resultados favorecieron la intervención frente a ninguna formación, pero la evidencia es muy incierta (DME 0,94; IC del 95%: 0,11 a 1,77; evidencia de certeza muy baja).

Dos estudios midieron las actitudes de los profesionales hacia el deber de denunciar sobre el maltrato y el abandono infantil después de la intervención. Los resultados de un estudio (741 participantes) favorecieron la intervención frente a la lista de espera, pero la evidencia es muy incierta (DME 0,61; IC del 95%: 0,47 a 0,76; evidencia de certeza muy baja).

Conclusiones de los autores

Los estudios incluidos en esta revisión indican que podría haber evidencia de mejorías en los desenlaces de la formación en los profesionales expuestos a la formación en comparación con los que no están expuestos. Sin embargo, la evidencia es muy incierta. La certeza de la evidencia se consideró baja a muy baja y se disminuyó debido al diseño y a las limitaciones de los informes de los estudios. Las conclusiones se basan en un escaso número de estudios, en su mayoría antiguos, limitados a grupos únicos de profesionales. Aún se desconoce si se observan efectos similares en una variedad más amplia de profesionales. Teniendo en cuenta los numerosos grupos profesionales que tienen la obligación de denunciar, se recomienda encarecidamente que se realicen más estudios de investigación para evaluar la efectividad de las intervenciones de formación, con una variedad más amplia de profesionales que trabajen con niños. Se necesitan ensayos más grandes que utilicen métodos apropiados para la asignación a los grupos, así como métodos estadísticos que consideren la formación proporcionada a los profesionales en grupos de trabajo.

PICO

Population
Intervention
Comparison
Outcome

El uso y la enseñanza del modelo PICO están muy extendidos en el ámbito de la atención sanitaria basada en la evidencia para formular preguntas y estrategias de búsqueda y para caracterizar estudios o metanálisis clínicos. PICO son las siglas en inglés de cuatro posibles componentes de una pregunta de investigación: paciente, población o problema; intervención; comparación; desenlace (outcome).

Para saber más sobre el uso del modelo PICO, puede consultar el Manual Cochrane.

Formación de los profesionales en materia de protección de la infancia para mejorar la denuncia del maltrato y el abandono de los niños

Mensajes clave

‐ Debido a la falta de evidencia sólida, no está claro si la formación en materia de protección de la infancia es mejor que ninguna formación o una formación alternativa (p. ej., la formación en sensibilidad cultural) para mejorar la denuncia del maltrato y el abandono infantil por parte de los profesionales.

‐ Se necesitan estudios más grandes y bien diseñados para evaluar los efectos de la formación con una variedad más amplia de grupos profesionales.

‐ Los estudios de investigación futuros deberán comparar las intervenciones presenciales con las de formación a distancia.

¿Por qué hay que mejorar la denuncia del maltrato y el abandono de los niños?

El maltrato y el abandono de los niños provocan un daño importante a los niños, las familias y las comunidades. La consecuencia más grave es la muerte del niño, pero otras consecuencias son las lesiones físicas, los problemas de salud mental, el consumo de alcohol y drogas, y los problemas en la escuela y el empleo. Muchos grupos profesionales, como los maestros, los enfermeros, los médicos y la policía, están obligados por ley o por la política de la organización a denunciar los casos conocidos o sospechosos de maltrato y abandono de niños a las autoridades legales de protección de la infancia. Para prepararles en la presentación de la denuncia, se han desarrollado y utilizado diversas intervenciones de formación. Pueden variar en cuanto a la duración, el formato y los métodos de impartición. Por ejemplo, podrían tener como objetivo aumentar el conocimiento y la concienciación sobre los indicadores de maltrato y abandono infantil; la naturaleza del deber de denunciar y los procedimientos; y las actitudes hacia el deber de denunciar. Esta formación se suele realizar después de la cualificación como una forma de desarrollo profesional continuo. Sin embargo, se sabe poco sobre si la formación funciona, ya sea mejorando la denuncia del maltrato y el abandono infantil en general, en diferentes tipos de profesionales o en diferentes tipos de maltrato.

¿Qué se quería averiguar?

Se deseaba determinar lo siguiente:

‐ si la formación en materia de protección de la infancia mejora la denuncia por parte de los profesionales del maltrato y el abandono de los niños;

‐ qué componentes de una formación efectiva ayudan a los profesionales a denunciar el maltrato y el abandono infantil; y

‐ si la formación provoca algún efecto no deseado.

¿Qué se hizo?

Primero se buscaron estudios que compararan:

‐ la formación en materia de protección de la infancia con ninguna formación o con un control de lista de espera (aquellos que se inscriben en una lista de espera para recibir la formación en una fecha posterior); y

‐ la formación en materia de protección de la infancia con una formación alternativa (no relacionada con el maltrato ni el abandono infantil, p. ej., formación en sensibilidad cultural).

Se compararon y resumieron los resultados de los estudios y la confianza en la evidencia se evaluó según factores como la metodología y el tamaño de los estudios.

¿Qué se encontró?

Se encontraron 11 estudios con 1484 personas. El tamaño de los estudios varió entre 30 y 765 participantes. Nueve estudios se realizaron en EE.UU., uno en Canadá y otro en los Países Bajos. En los estudios se probaron diferentes tipos de intervenciones de formación. Algunos eran talleres presenciales, cuya duración varió entre un único taller de dos horas y seis seminarios de 90 minutos realizados a lo largo de un mes; y otros eran intervenciones de formación a distancia al ritmo de cada persona. La formación fue desarrollada por expertos e impartida por facilitadores especializados, expertos en el área de contenidos o equipos interdisciplinarios. Nueve estudios recibieron financiación externa: cinco de organismos gubernamentales federales, dos de una universidad y una organización filantrópica, uno de la rama filantrópica de una empresa tecnológica internacional y uno de una organización no gubernamental (un desarrollador de intervenciones de formación).

Resultados principales

No está claro si la formación en materia de protección de la infancia tiene algún efecto en:

‐ el número de casos de maltrato y abandono infantil denunciados (un estudio, 42 participantes); o

‐ el número de casos denunciados con base en casos hipotéticos de maltrato y abandono infantil (dos estudios, 87 participantes).

Según la información disponible, no fue posible responder a la pregunta sobre si la formación tiene un efecto sobre el número de casos oficiales registrados por las autoridades de protección de la infancia, o la calidad de esas denuncias; ni si la formación tiene algún efecto no deseado.

La formación en materia de protección de la infancia podría aumentar los conocimientos de los profesionales sobre el deber, los procesos y los procedimientos para denunciar (un estudio, 744 participantes). Sin embargo, no está claro si esta formación tiene un efecto sobre:

‐ el conocimiento de los profesionales sobre los conceptos básicos del maltrato y el abandono infantil en general (dos estudios, 154 participantes);

‐ el conocimiento de los profesionales sobre los conceptos básicos del abuso sexual infantil específicamente (tres estudios, 238 participantes);

‐ la habilidad de los profesionales para distinguir entre casos denunciables y no denunciables (un estudio, 25 participantes); o

‐ las actitudes de los profesionales hacia el deber de denunciar (un estudio, 741 participantes).

¿Cuáles son las limitaciones de la evidencia?

Se tiene una confianza baja y muy baja en la evidencia. Esto se debe a que los resultados se basan en un escaso número de estudios, algunos de los cuales son antiguos y presentan problemas metodológicos. Por ejemplo, las personas que participaron en los estudios sabían qué tratamiento recibían y no todos los estudios proporcionaron datos sobre todos los desenlaces de interés de esta revisión. Además, los análisis a veces solamente incluían un grupo profesional, lo que limita la aplicabilidad de las conclusiones a otros grupos profesionales.

¿Cuál es el grado de actualización de esta evidencia?

La evidencia está actualizada hasta el 4 de junio de 2021.

Authors' conclusions

Implications for practice

Training for professionals to improve reporting of child abuse and neglect is an essential part of a comprehensive public health response. All professionals having direct contact with children and families require this training to equip them with the required knowledge, attitudes, and skills to report cases of child abuse and neglect, and to avoid making unwarranted reports. 

However, the development of training programmes, and research into their efficacy, is still in its infancy. Consequently, at least when measured against rigorous GRADE criteria, it is not possible to provide firm conclusions about the extent to which professional training of the types described in this review increase knowledge, skills, attitudes, and reporting practices due to the low and very low certainty of evidence.

We know little about the effectiveness of training interventions delivered in different modes (online versus face‐to‐face), and by trainers with different expertise (e.g. specialists versus non‐specialists). Evidence of such comparative effects requires studies of a sufficiently high standard, reported in sufficient detail to enable quantitative synthesis for overall trends. In addition, the generalisability and applicability of the available evidence is limited by the scarcity of training intervention trials conducted with key professional groups, such as police, doctors, paediatric nurses, and allied health professionals. The evidence is also limited by the lack of long‐term follow‐up of outcomes relevant to training effectiveness.

Despite these evidence gaps, child protection training designers and providers should consider the evidence in this review when planning training interventions for specific professionals in relation to the reporting of different types of child abuse and neglect. Although the paucity of studies precluded development of a training typology, the Characteristics of included studies table and in‐text summary provide training developers with important information about the possible components of training interventions. 

Implications for research

Further rigorous studies are required in a wider range of countries, with diverse groups of child‐serving professionals to assess the effectiveness of training interventions for improving reporting of child abuse and neglect. Interventions with police and doctors are needed in particular. Such studies are methodologically complex, costly, and time‐consuming. Nevertheless, the need to support professionals in reporting appropriate cases of different types of abuse and neglect, and avoiding unwarranted reports, demands evidence‐based approaches to training interventions. Rigorous training of professionals about appropriate reporting of child abuse and neglect directly promotes children’s rights to protection from abuse and neglect. In doing so, it implements the United Nations Convention on the Rights of the Child (United Nations 1989, article 19) and the United Nations Sustainable Development Goals 2015 (United Nations 2015, Target 16.2).

Greater rigour in interventions and their reporting is required. In particular, all interventions must be informed by an accurate and updated understanding of the nature of the duty to report different types of child abuse and neglect in the specific location, as applied to the relevant profession. Duties in both law and occupational policy to report different types of child abuse and neglect vary substantially across locations and professions, and over time. Accordingly, existing scales cannot simply be reused uncritically, even in the same location or professional setting. Rather, every training intervention and its evaluation must be underpinned by an updated, accurate review of the current reporting duties. Training on reporting duties must be customised, as should outcome measures. Outcome measures must be designed to capture data on participants' knowledge, attitudes, and practices in relation to specific duties and different types of child abuse and neglect. Training interventions and assessment therefore need to be designed by multidisciplinary teams with the capacity to identify the contemporary applicable law and ensure its accurate integration into direct and indirect measures of training outcomes.

In future studies, baseline comparisons of intervention participants (those receiving training) and control group participants (those not receiving training or waitlisted to receive training) should be undertaken to determine group equivalence on variables likely to influence training outcomes, such as years of experience, prior training, and encounters with the child protection system. Trials should be adequately powered and use appropriate methods for group allocation. Researchers should register trials (see, for example, www.isrctn.com/), and publish study protocols (Chan 2013; see www.spirit-statement.org/). Interventions should be comprehensively reported using international guidelines such as Template for Intervention Description and Replication (TIDieR) (Hoffmann 2014; and see www.consort-statement.org/resources/tidier-2) and CONSORT (Schulz 2010; and see www.consort-statement.org/).

Decisions concerning outcome measures pose challenges for future research due to constraints of cost and time. At a minimum, studies should always conduct pre‐ and post‐test assessment of key secondary outcomes as conceptualised in this review, including knowledge and attitudes. This is required to assess mastery of training content in accordance with educational theories which recommend both direct and indirect assessment of learning outcomes (e.g. Allan 1996Allen 2006Calderon 2013Suskie 2018). Ideally, research should examine long‐term outcomes of training, and test the effect of supplementary or booster training. Measurement of primary outcomes as conceptualised in this review is challenging, since actual reporting of cases of child abuse and neglect by trained individuals occurs infrequently (Mathews 2020). In particular, actual reports (primary outcome 1c) is contraindicated due to statutory agency recording conventions which de‐identify reporters. Nevertheless, measurement via subjective self‐reports of reporting behaviour (primary outcome 1a) remains possible, albeit requiring long‐term follow‐up. We recommend strongly that future research employ case vignettes for direct assessment of training outcomes (primary outcome 1b). Using vignettes enables researchers to collect data at scale about participants’ responses to hypothetical scenarios requiring the application of knowledge and demonstration of skills, and can be used in combination with other direct and indirect assessments to indicate intended future reporting behaviour. Future research may also consider the use of animations, films, and virtual reality in case vignettes. 

Our final conclusions on future research were suggested by peer reviewers who drew attention to the need for research on training interventions for professionals to improve reporting of child abuse and neglect, expressly for professionals serving culturally diverse children and families (Flemington 2021). As the field matures, and studies improve in quality and scope, there is also potential for future research to assess the broader social and economic impacts of child protection training for individuals, organisations, and systems.

Summary of findings

Open in table viewer
Summary of findings 1. Child protection training for professionals to improve reporting of child abuse and neglect compared with no training, waitlist control, or alternative training not related to child abuse and neglect (primary outcomes)

Setting: professionals' workplaces or online e‐learning, mainly in the USA

Patient of population: postqualified professionals, including elementary and high school teachers, childcare professionals, medical practitioners, nurses, and mental health professionals

Intervention: face‐to‐face or online training, with a range of teaching strategies (e.g. didactic presentations, role‐plays, video, experiential exercises), ranging from 2 hours to 6 x 90‐minute sessions over a 1‐month period

Comparator: no training, waitlist for training, alternative training (not related to child abuse and neglect)

Outcomes

Anticipated absolute effects* (95% CI)

Relative effect (95% CI)

No. of participants (studies)

Certainty of the evidence

Comments

Risk with control conditions

Risk with training interventions

Number of reported cases of child abuse and neglect (professionals' self‐report, actual cases)

 

Time of outcome assessment: short term (3 months postintervention)

The mean number of cases reported in the training group was, on average, 0.81 standard deviations higher (0.18 higher to 1.43 higher).

42

(1 RCT)

⨁◯◯◯
Very Lowa,b,c

SMD of 0.81 represents a large effect size (Cohen 1988).

 

Outcome measured by professionals' self‐report of cases they had reported to child protection authorities.

Number of reported cases of child abuse and neglect (professionals' self‐report, hypothetical vignette cases)

 

Time of outcome assessment: short term (postintervention)

The mean number of cases reported in the training group was, on average, 1.81 standard deviations higher (1.30 higher to 2.32 higher).

87

(2 RCTs)

⨁◯◯◯
Very Lowa,b,c

SMD of 1.81 represents a large effect size (Cohen 1988).

 

Outcome measured by professionals’ responses to hypothetical case vignettes.

Number of reported cases of child abuse and neglect (official records of reports made to child protection authorities)

Unknown

0

(0 studies)

No studies were identified that measured numbers of official reports made to child protection authorities.

Quality of reported cases of child abuse and neglect (official records of reports made to child protection authorities)

Unknown

0

(0 studies)

No studies were identified that measured the quality of official reports made to child protection authorities.

Adverse events 

Unknown

0

(0 studies)

No studies were identified that measured adverse effects.

*The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

CI: confidence interval; RCT: randomised controlled trial; SMD: standardised mean difference

GRADE Working Group grades of evidence

High certainty: we are very confident that the true effect lies close to that of the estimate of the effect.
Moderate certainty: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low certainty: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.
Very low certainty: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

aDowngraded by one level due to high risk of bias for multiple risk of bias domains.
bDowngraded by one level due to imprecision (CI includes small‐sized effect or small sample size, or both).
cDowngraded by one level due to indirectness (single or limited number of studies, thereby restricting the evidence in terms of intervention, population, and comparators). 

Open in table viewer
Summary of findings 2. Child protection training for professionals to improve reporting of child abuse and neglect compared with no training, waitlist control, or alternative training not related to child abuse and neglect (secondary outcomes)

Setting: professionals' workplaces or online e‐learning, mainly in the USA

Patient of population: postqualified professionals, including elementary and high school teachers, childcare professionals, medical practitioners, nurses, and mental health professionals

Intervention: face‐to‐face or online training, with a range of teaching strategies (e.g. didactic presentations, role‐plays, video, experiential exercises), ranging from 2 hours to 6 x 90‐minute sessions over a 1‐month period

Comparator: no training, waitlist for training, alternative training (not related to child abuse and neglect)

Outcomes

Anticipated absolute effects* (95% CI)

Relative effect (95% CI)

No. of participants (studies)

Certainty of the evidence

Comments

Risk with control conditions

Risk with training interventions

Knowledge of reporting duty, processes, and procedures

 

Measured by: professionals' self‐reported knowledge of jurisdictional or institutional reporting duties, or both

Time of outcome assessment: short term (postintervention)

The mean knowledge score in the training group was, on average, 1.06 standard deviations higher (0.90 higher to 1.21 higher).

744

(1 RCT)

⨁⨁◯◯
Lowa,b

SMD of 1.06 represents a large effect size (Cohen 1988).

Knowledge of core concepts in child abuse and neglect (all forms)

 

Measured by: professionals' self‐reported knowledge of all forms of child abuse and neglect (general measure)

Time of outcome assessment: short term (postintervention)

The mean knowledge score in the training group was, on average, 0.68 standard deviations higher (0.35 higher to 1.01 higher).

154

(2 RCTs)

⨁◯◯◯
Very lowa,b,c

SMD of 0.68 represents a medium effect size (Cohen 1988).

Knowledge of core concepts in child abuse and neglect (child sexual abuse only)

 

Measured by: professionals' self‐reported knowledge of child sexual abuse (specific measure)

Time of outcome assessment: short term (postintervention)

The mean knowledge score in the training group was, on average, 1.44standard deviations higher (0.43 higher to 2.45 higher).

238

(3 RCTs)

⨁◯◯◯
Very lowa,b,c,d

SMD of 1.44 represents a large effect size (Cohen 1988).

Skill in distinguishing between reportable and non‐reportable child abuse and neglect cases

 

Measured by: professionals’ performance on simulated cases scored by trained and blinded expert panel

Time of outcome assessment: short term (postintervention)

The mean skill score in the training group was, on average, 0.94standard deviations higher (0.11 higher to 1.77 higher).

25

(1 RCT)

⨁◯◯◯
Very Lowa,b,c

SMD of 0.94 represents a large effect size (Cohen 1988).

Attitudes toward the duty to report child abuse and neglect

 

Measured by: professionals’ self‐reported attitudes towards the duty to report child abuse and neglect

Time of outcome assessment: short term (postintervention)

The mean attitude score in the training group were, on average, 0.61 standard deviations higher (0.47 higher to 0.76 higher).

741

(1 RCT)

⨁◯◯◯
Very Lowa,b,c

SMD of 0.61 represents a medium effect size (Cohen 1988).

*The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

CI: confidence interval; RCT: randomised controlled trial; SMD: standardised mean difference

GRADE Working Group grades of evidence

High certainty: we are very confident that the true effect lies close to that of the estimate of the effect.
Moderate certainty: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low certainty: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.
Very low certainty: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

aDowngraded by one level due to high risk of bias for multiple risk of bias domains.
bDowngraded by one level due to indirectness (one or both of the following reasons: (1) single or limited number of studies, thereby restricting the evidence in terms of intervention, population, and comparators; (2) outcome not a direct measure of reporting behaviour by professionals). 
cDowngraded by one level due to imprecision (one or both of the following reasons: (1) CI includes small‐sized effect; (2) small sample size)
dAlthough studies can only be downgraded by three levels, it is important to note that there was significant heterogeneity of the effect for this outcome (i.e. inconsistency), which also impacts the certainty of the evidence. 

Background

Description of the condition

Child abuse and neglect

Child abuse and neglect is a broad construct including physical abuse, sexual abuse, psychological or emotional abuse, and neglect. Exposure to domestic violence is increasingly considered to be a fifth domain (Kimber 2018). Most child abuse and neglect occurs in private, is inflicted or caused by parents and caregivers, and does not become known to government authorities or helping agencies. Except for sexual abuse, younger children (aged one year and under) are the most vulnerable of all children to be abused and neglected (US DHHS 2021). Whilst its true extent is unknown, child abuse and neglect is a well‐established problem worldwide (Hillis 2016Pinheiro 2006). Numerous prevalence studies have established that the various forms of child abuse and neglect are very widespread, although some forms of abuse and neglect are more common than others (Almuneef 2018Chiang 2016Cuartas 2019Finkelhor 2010Lev‐Weisel 2018Nguyen 2019Nikolaidis 2018Radford 2012Stoltenborgh 2011Stoltenborgh 2012Stoltenborgh 2015Ward 2018).

The adverse effects of child abuse and neglect are significant and can endure throughout a person's life. The most serious consequence is child fatality, with an estimated 155,000 deaths globally per annum (WHO 2006). Other effects include: physical injuries; failure to thrive; impaired social, emotional, and behavioural development; reduced reading ability and perceptual reasoning; depression; anxiety; post‐traumatic stress disorder; low self‐image; alcohol and drug use; aggression; delinquency; long‐term deficits in educational achievement; and adverse effects on employment and economic status (Bellis 2019Egeland 2009Gilbert 2009Hildyard 2002Hughes 2017Landsford 2002Maguire‐Jack 2015Norman 2012Paolucci 2001Taillieu 2016). Coping mechanisms used to deal with the trauma, such as alcohol and drug use, can compound adverse health outcomes, and chronic stress can cause coronary artery disease and inflammation (Danese 2009Danese 2012). There is some evidence suggesting that child abuse and neglect affects brain development and produces epigenetic neurobiological changes (Moffitt 2013Nelson 2020Shalev 2013Tiecher 2016). For society, effects include lost productivity and cost to child welfare systems (Currie 2010Fang 2012Fang 2015), and intergenerational victimisation (Draper 2008). The annual economic cost in the USA has been estimated at USD 124 billion, based on a cost per non‐fatal case of USD 210,012 (Fang 2012).

Although there is some variance across cultures in perceptions of what may and may not constitute child abuse and neglect (Finkelhor 1988Korbin 1979), in recent decades there is an emerging consensus about its parameters, especially for child sexual abuse (Mathews 2019), physical abuse (WHO 2006), emotional abuse (Glaser 2011), and neglect (Dubowitz 2007). This is reflected in criminal prohibitions on this conduct across low‐, middle‐, and high‐income countries, and scholarly research addressing the contribution of structural inequalities in societies to child maltreatment (Bywaters 2019Finkelhor 1988). Global legal and policy norms recognise the main domains of child abuse and neglect and require substantial efforts to identify and respond to them. The Convention on the Rights of the Child has been almost universally ratified, and article 19 embeds children's right to be free from abuse and neglect (United Nations 1989). It requires States Parties to take all appropriate legislative, administrative, social, and educational measures to protect the child from all forms of maltreatment, and to include effective procedures for the identification and reporting of maltreatment. Similarly, the universal Sustainable Development Goals urge all nations to eradicate child maltreatment, with Target 16.2 aiming to end child abuse and requiring governments to report their efforts (United Nations 2015). 

Professionals' reporting of child abuse and neglect

To identify child abuse and neglect, and to enable early intervention to assist children and their families, many nations' governments require members of specified professional groups to report known or suspected cases of significant child abuse and neglect (Mathews 2008a). The duty to report is usually conferred on professionals who encounter children frequently in their daily work, such as teachers, nurses, doctors, and law enforcement (Mathews 2008b). In some jurisdictions and for some categories of professionals, reporting duties have been enacted in child protection legislation (called 'mandatory reporting laws'), but in others, reporting duties are ascribed solely in organisational policies. Although differences exist across jurisdictions and professions with respect to some features of reporting duties (e.g. in stating which types of abuse and neglect must be reported), there is also consistency in the essential nature of reporting duties (e.g. in always requiring reports of child sexual abuse; and in activating the reporting duty when the reporter has a reasonable suspicion the abuse has occurred, rather than requiring knowledge or evidence) (Mathews 2008a). These differences and similarities also determine key dimensions of child protection training for professionals in different contexts.

Studies have found that professionals who are required to report child abuse and neglect consider that they have not had sufficient training to fulfil their role (Abrahams 1992Christian 2008Hawkins 2001Kenny 2001Kenny 2004Mathews 2011Reiniger 1995Starling 2009Walsh 2008). Research has also found low levels of knowledge about both the nature of the reporting duty, Beck 1994Mathews 2009, and indicators of abuse and neglect (Hinson 2000), and that professionals may hold attitudes which are not conducive to reporting (Feng 2005Jones 2008Kalichman 1993Mathews 2009Zellman 1990). Effective reporting is thought to be influenced by several factors, including higher levels of knowledge of the reporting duty (Crenshaw 1995Kenny 2004), ability to recognise abuse (Crenshaw 1995Goebbels 2008Hawkins 2001), and positive attitudes towards the duty (Fraser 2010Goebbels 2008Hawkins 2001).

Improved reporting offers the prospect of enhanced detection of child abuse and neglect (Mathews 2016), provision of interventions and redress for victims (Kohl 2009), and engagement with parents and caregivers to establish supportive measures (Drake 1996Drake 2007). In this way, improved reporting is an essential part of a public health response to child abuse and neglect, which requires both tertiary and secondary prevention as well as primary prevention, and the full participation of communities and organisations (McMahon 1999). Improved reporting by professionals should also diminish clearly unnecessary reports and avoid the wasting of scarce government resources and unwarranted distress to families (Ainsworth 2006Calheiros 2016). In addition, effective child protection training for professionals should also assist in developing greater understanding of legal protections conferred on professional reporters themselves, and avoidance of potential legal liability and professional discipline for non‐compliance. At its best, child protection training could also enhance professional ethical identities and contribute to broader workforce professionalisation.

Description of the intervention

In this review, child protection training for professionals is defined as education or training undertaken postqualification, after initial professional qualifications have been awarded, as a form of continuing or ongoing professional education or development. Child protection training interventions that are the subject of this review aim to improve reporting of child abuse and neglect to statutory child protection authorities by professionals who are required by law or policy to do so. Improving reporting is conceptualised as increasing the reporting of cases where abuse or neglect exists or can reasonably be thought to exist; and decreasing the reporting of cases where there are insufficient grounds upon which to make a report and where reporting is unnecessary or unwarranted.

Different approaches may be taken in training professionals to improve reporting of child abuse and neglect. Child protection training may focus on increasing knowledge and awareness of the indicators of each type of abuse and neglect, the nature of the reporting duty, and reporting procedures. Training may also focus on enhancing reporters' attitudes towards the reporting duty or to child protection generally. Training may vary in duration (Donohue 2002Hazzard 1983), be implemented in a range of different formats (e.g. single sessions through to extended multisession courses), and target different skill levels (e.g. basic through to advanced) (Walsh 2019). Different delivery methods may be adopted, for example online, face‐to‐face, or blended learning modes (Kenny 2001McGrath 1987).

How the intervention might work

Viewed as an application of adult learning (Knowles 2011), child protection training for professionals is an educational intervention through which professionals develop knowledge, skills, attitudes, and behaviours. By raising awareness, providing information and resources, developing skills and strategies, and fostering dispositions, training may change professionals' ability and willingness to engage in decision‐making processes that lead to improved reporting. There is some evidence to suggest that, for some categories of professionals and for some types of abuse, exposure to training is associated with effective reporting (Fraser 2010Walsh 2012a), self‐reported preparedness to report (Fraser 2010), confidence identifying abuse (Hawkins 2001), and awareness of reporting responsibilities (Hawkins 2001). Some studies have indicated that lack of adequate training is associated with low awareness of the reporting duty (Hawkins 2001), low preparedness to report (Kenny 2001), low self‐reported confidence identifying child abuse (Hawkins 2001Mathews 2008bMathews 2011), and low knowledge of indicators of abuse (Mathews 2011). However, the literature has not been synthesised, and the specific components of training that are responsible for improving reporting are not yet known.

Why it is important to do this review

Child abuse and neglect results in significant costs for children, families, and communities. As a core public health strategy, many professional groups are required by law and policy in many jurisdictions to report suspected cases. Numerous different training initiatives appear to have been developed and implemented for professionals, but there is little evidence regarding their effectiveness in improving reporting of child abuse and neglect both generally, for specific professions, and for distinct types of child abuse and neglect. To enhance reporting practice, designers of training programmes require detailed information about what programme features will offer the greatest benefit. A systematic review that identifies the effectiveness of different training approaches will advance the evidence base and develop a clearer understanding of optimal training content and methods. In addition, it will provide policymakers with a means by which to assess whether current training interventions are congruent with what is likely to be effective.

Objectives

To assess the effectiveness of training aimed at improving reporting of child abuse and neglect by professionals and to investigate possible components of effective training interventions.

Methods

Criteria for considering studies for this review

Types of studies

Randomised controlled trials (RCTs), quasi‐RCTs (i.e. studies in which participants are assigned to intervention or comparison or control groups using a quasi‐randomised method such as allocation by date of birth, or similar methods), and controlled before‐and‐after (CBA) studies (i.e. studies where participants are allocated to intervention and control groups by means other than randomisation). We included CBA studies because studies of educational interventions are often conducted in settings where truly randomised trials may not be feasible, for example in the course of a training series where enrolment decisions are based on group availability or logistics.

When deciding on included studies we used explicit study design features rather than study design labels. We followed the guidance on how to assess and report on non‐randomised studies in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2022aReeves 2022Sterne 2022).

Types of participants

Studies that involved qualified professionals who are typically required by law or organisational policy to report child abuse and neglect (e.g. teachers, nurses, doctors, and police/law enforcement).

Types of interventions

Included

Child protection training interventions aimed explicitly at improving reporting of child abuse and neglect by qualified professionals, irrespective of programme type, mode, content, duration, intensity, and delivery context. These interventions were compared with no training, waitlist control, or alternative training not related to child abuse and neglect (e.g. cultural sensitivity training). 

Excluded

We excluded training interventions in which improving professionals' reporting of child abuse and neglect was a minor training focus, such as brief professional induction or orientation programmes targeting a broad range of employment responsibilities in which it would not be possible to isolate the specific intervention effects for a child protection training component (e.g. training for interagency working). We excluded child protection training conducted before professional qualifications were awarded (e.g. as part of undergraduate college or university professional preparation programmes in initial teacher education, pre‐service education for nurses, or entry‐level medical education).

Types of outcome measures

We included studies assessing the primary and secondary outcomes listed below. We excluded studies that did not set out to measure any of these outcomes. 

Primary outcomes

  1. Number of reported cases of child abuse and neglect:

    1. as measured subjectively by participant self‐reports of actual cases reported;

    2. as measured subjectively by participant responses to vignettes; and

    3. as measured objectively in official records of reports made to child protection authorities.

  2. Quality of reported cases of child abuse and neglect, as measured via coding of the actual contents of reports made to child protection authorities (i.e. in government records or archives).

  3. Adverse events, such as:

    1. increase in failure to report cases of child abuse and neglect that warrant a report as measured subjectively by participant self‐reports (i.e. in questionnaires); and

    2. increase in reporting of cases that do not warrant a report as measured subjectively by participant self‐reports (i.e. in questionnaires).

We note that studies using official records (i.e. primary outcome 1c), such as the number of reports made and the number of reports substantiated after investigation as indicative of training outcomes, must be interpreted with caution. Although objective, official records cannot measure all types of reporting behaviours, for example non‐reporting behaviour in which a professional fails to report a case that should have been reported. Official records must also be interpreted within the context and purpose of training, for example if training was introduced in the context of responses to recommendations from a public inquiry, or if training was used for the purpose of encouraging or discouraging specific types of reports, or both.

Secondary outcomes

  1. Knowledge of the reporting duty, processes, and procedures.

  2. Knowledge of core concepts in child abuse and neglect such as the nature, extent, and indicators of the different types of abuse and neglect.

  3. Skill in distinguishing between cases that should be reported from those that should not.

  4. Attitudes towards the duty to report child abuse and neglect.

Timing of outcome assessment

We classified primary and secondary outcomes using three time periods: short‐term outcomes (assessed immediately after the training intervention and up to 12 months after); medium‐term outcomes (assessed between one and three years after the training intervention); and long‐term outcomes (assessed more than three years after the training intervention).

Search methods for identification of studies

We used the MEDLINE strategy from our protocol and adapted it for other databases (Mathews 2015). The first round of searches for the review were conducted  in December 2016, with search updates in January 2017 and December 2018. When we came to update the searches in 2020, we noticed that errors had been made in earlier searches. We corrected the errors and re‐ran all searches in all databases up to June 2021. We de‐duplicated these records by comparing them with records from previous searches and removed records which had already been screened. We did not apply any date or language restrictions, and sought translation for papers published in languages other than English. 

We recorded data for each search in a Microsoft Excel spreadsheet (Microsoft Corporation 2018), including: date of the search, database and platform, exact search syntax, number of search results, and any modifications to search strategies to accommodate variations in search functionalities for specific databases. The results for each search were exported as RIS files and stored in EndNote X8.0.1 (EndNote 2018), with a folder for each searched database. Search strategies and specific search dates are shown in Appendix 1. Changes to the planned search methods in our review protocol, Mathews 2015, are detailed in the Differences between protocol and review section.

Electronic searches

We searched the following databases.

  1. Cochrane Central Register of Controlled Trials (CENTRAL; 2021, Issue 6), in the Cochrane Library (searched 11 June 2021).

  2. Cochrane Database of Systematic Reviews (CDSR; 2021,  Issue 6), in the Cochrane Library (searched 11 June 2021).

  3. Ovid MEDLINE (1946 to 4 June 2021).

  4. Embase.com Elsevier (1966 to 11 June 2021).

  5. CINAHL (Cumulative Index to Nursing and Allied Health Literature) EBSCOhost (1981 to 4 June 2021).

  6. ERIC EBSCOhost (1966 to 4 June 2021).

  7. PsycINFO EBSCOhost (1966 to 4 June 2021).

  8. Social Services Abstracts via ProQuest Research Library (1966 to 18 June 2021).

  9. Science Direct Elsevier (1966 to 4 June 2021).

  10. Sociological Abstracts via ProQuest Research Library (1952 to 18 June 2021).

  11. ProQuest Psychology Journals via ProQuest Research Library (1966 to 11 June 2021).

  12. ProQuest Social Science via ProQuest Research Library (1966 to 23 July 2021).

  13. ProQuest Dissertations and Theses via ProQuest Research Library (1997 to 23 July 2021).

  14. LexisNexis Lexis.com (1980 to 19 December 2018).

  15. LegalTrac GALE (1980 to 19 December 2018).

  16. Westlaw International Thomson Reuters (1980 to 19 December 2018).

  17. Conference Proceedings Citation Index – Social Science & Humanities (Web of Science; Clarivate) (1990 to 11 June 2021).

  18. Violence and Abuse Abstracts (EBSCOhost) (1971 to 4 June 2021).

  19. EducationSource (EBSCOhost) (1880 to 4 June 2021).

  20. LILACS (Latin American and Caribbean Health Science Information database) (lilacs.bvsalud.org/en/) (2003 to 11 June 2021).

  21. World Health Organization International Clinical Trials Registry Platform (WHO ICTRP; trialsearch.who.int; searched 2000 to 11 June 2021).

  22. OpenGrey (opengrey.eu/; searched 27 May 2019).

Searching other resources

We carried out additional searches to identify studies not captured by searching the databases listed above. We handsearched the following journals.

  1. Child Maltreatment (2 July 2021).

  2. Child Abuse and Neglect (2 July 2021).

  3. Children and Youth Services Review (2 July 2021).

  4. Trauma, Violence and Abuse (2 July 2021).

  5. Child Abuse Review (2 July 2021).

We also searched the following key websites for additional studies.

  1. International Society for Prevention of Child Abuse and Neglect via ispcan.org/ (2 July 2021).

  2. US Department of Health and Human Services Children’s Bureau, Child Welfare Information Gateway via childwelfare.gov/ (2 July 2021).

  3. Promising Practices Network operated by the RAND Corporation via promisingpractices.net/ (21 March 2019).

  4. National Resource Center for Community‐Based Child Abuse Prevention (CBCAPP) via friendsnrc.org/ (2 July 2021).

  5. California Evidence‐Based Clearinghouse for Child Welfare (CEBC) via cebc4cw.org/ (2 July 2021).

  6. Coalition for Evidence‐Based Policy via coalition4evidence.org/ (21 March 2019).

  7. Institute of Education Sciences What Works Clearinghouse via ies.ed.gov/ncee/wwc/ (2 July 2021).

  8. National Institute for Health and Care Excellence (NICE) UK via nice.org.uk/ (9 July 2021).

Finally, we harvested the reference lists of included studies to identify further potential studies. We did not contact key researchers in the field for unpublished studies as prescribed in our review protocol. Instead, we circulated requests for relevant studies via email to the Child‐Maltreatment‐Research‐Listserv, a moderated electronic mailing list with over 1500 subscribers, as this offered the possibility of reaching a far larger number of researchers (Walsh 2018 [pers comm]).

Data collection and analysis

We conducted data collection and analysis following our published protocol (Mathews 2015), and in accordance with the guidance in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011Higgins 2022a). In the following sections, we have reported only those methods that were used in this review. Preplanned but unused methods are reported in Appendix 2.

Selection of studies

We used SysReview review management software for title and abstract and full‐text screening (Higginson 2014). Search results were imported from Endnote into SysReview and duplicates removed prior to title and abstract screening. Each title and abstract was screened by at least two review authors working independently to determine eligibility according to the inclusion and exclusion criteria. During title and abstract screening, screeners (KW, EE, LH, BM, NA, ED, EP) assessed if each record was: (i) an eligible document type (e.g. not a book review); (ii) a unique document (i.e. not an undetected duplicate); (iii) about child protection training; and (iv) a study conducted with professionals. A third screener resolved any conflicts in screening decisions (either KW or EE). Titles and abstracts published in languages other than English were translated into English using Google Translate.

Three review authors (KW, EE, LH) working independently screened the full texts of potentially eligible studies against the inclusion criteria as described in Criteria for considering studies for this review. Any discrepancies were resolved by discussion with a third review author who had not previously screened the record (BM, MK, EE, KW) until consensus was reached. As authors of potentially included studies, BM and MK were excluded from decisions on studies for which they were authors. 

We documented the primary reasons for exclusion of each excluded record. To determine eligibility for studies published in languages other than English, we translated studies into English using Google Translate. We contacted study authors to request missing information if there was insufficient information to determine eligibility.

We identified and linked together multiple reports on the same study so that each study, rather than each report, was the principal unit of interest (e.g. Hazzard 1984Palusci 1995). We listed studies that were close to meeting the eligibility criteria but were excluded at the full‐text screening or data extraction stages, along with the primary reasons for their exclusion, in the Characteristics of excluded studies table. We recorded our study selection decisions in a PRISMA flow diagram (Moher 2009).

Data extraction and management

We used SysReview review management software for data extraction and management (Higginson 2014). We developed and pilot‐tested a data extraction template based on the checklist of items in the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011, Table 7.3a; Li 2022, Table 5.3a), the PRISMA minimum standards (Liberati 2009), and the Template for Intervention Description and Replication (TIDieR) checklist and guide (Hoffmann 2014). We extracted data from study reports concerning details of:

  1. study general information: title identifier, full citation, study name, document type, how located, country, ethical approval, funding;

  2. study design and methods: research design, comparison condition, unit of allocation, randomisation (and details on how this was implemented), baseline assessment (including whether intervention and comparison conditions were equivalent at baseline), unit of analysis, adjustment for clustering;

  3. participant characteristics: participants, recruitment, eligibility criteria, number randomised, number consented, number began (intervention and control groups), number completed (intervention and control groups), number T1, T2, T3 (etc.), age (mean, standard deviation (SD), range) (baseline, intervention and control groups), gender (% female) (baseline, intervention and control groups), ethnicity (intervention and control groups), socio‐economic status (intervention and control groups), years of experience, previous child protection training, previous experience with child maltreatment reporting, any other information;

  4. intervention characteristics: name of intervention, setting, delivery mode, contents and topics, methods and processes, duration, intensity, trainers and qualifications, integrity monitoring, fidelity issues; and

  5. outcome measures: primary outcomes, secondary outcomes, other outcomes.

As authors of potentially included studies, BM and MK were not involved in data extraction. Data were extracted from each study and entered into SysReview by at least two review authors (EE, LH, KW) working independently. A third review author (KW) also extracted data on intervention and outcome characteristics and prepared the Characteristics of included studies tables. Any discrepancies between review authors were resolved through discussion.

Assessment of risk of bias in included studies

Our study protocol, Mathews 2015, was designed prior to introduction of the Risk Of Bias In Non‐randomized Studies of Interventions (ROBINS‐I) tool (Sterne 2016), and predated new guidance for assessing risk of bias in non‐randomised studies provided in Chapter 25 of the Cochrane Handbook for Systematic Reviews of Interventions (Sterne 2022). As planned in our protocol (Mathews 2015), we used the original Cochrane risk of bias tool (Higgins 2011, Table 8.5a), which has seven domains: (i) sequence generation; (ii) allocation concealment; (iii) blinding of participants and personnel; (iv) blinding of outcome assessment; (v) incomplete outcome data; (vi) selective reporting; and (vii) other sources of bias. In our protocol, we added three additional domains: (viii) reliability of outcome measures, as we anticipated that some studies may have used custom‐made instruments and scales; (ix) group comparability; and (x) contamination. Adoption of this approach corresponds with the 'Suggested risk of bias criteria for EPOC reviews' from Cochrane Effective Practice and Organisation of Care (EPOC 2017).

One review author (EE) incorporated the above 10 domains into a module within SysReview. Three review authors (KW, EE, LH), working independently, assessed risk of bias of the included studies. Assessors were not blinded to the names of the authors, institutions, journals, or study results. Where possible, we extracted verbatim text from the study reports as support for risk of bias judgements, resolving any disagreements by discussion. For studies where essential information to assess risk of bias was not available, we planned to contact study authors with a request for missing information, but this was not needed. We entered the information first into SysReview and then into Review Manager 5 (Review Manager 2020), and summarised findings in the risk of bias tables for each included study. We generated two summary figures: a risk of bias graph and a risk of bias summary showing scores for all studies, and showing the proportion of studies for each risk of bias domain. We planned to conduct sensitivity analyses for each outcome to determine how results might be affected by our inclusion/exclusion of studies at high risk of bias; however, this was not possible owing to the small number of studies with data available for meta‐analyses.

For each included study, we scored the relevant risk of bias domains as 'low', 'high', or 'unclear' risk of bias. We made judgements by answering 'yes' (scored as low risk of bias), 'no' (scored as high risk of bias), or 'unclear' (scored as unclear risk of bias) to a prespecified question for each domain as detailed in Appendix 3, with reference to the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011, Table 8.5b) and the 'Suggested risk of bias criteria for EPOC reviews' (EPOC 2017).

Measures of treatment effect

We calculated intervention effects using Cochrane software RevMan Web (RevMan Web 2021).

Continuous data

All eligible outcomes in all of the included studies were measured on continuous scales, most of which were slightly different from each other. For continuous outcomes, we extracted postintervention means and SDs and summarised study effects using standardised mean differences (SMDs) and 95% confidence intervals (CI), to account for scale differences in the meta‐analyses.

Unit of analysis issues

Cluster‐randomised trials

Cluster‐randomised trials are widespread in the evaluation of healthcare and educational interventions (Donner 2002), but are often poorly reported (Campbell 2004). Adjusting for this clustering in analyses is important in order to reduce the risk of overestimating the treatment effect or underestimating the variance (or both), and thereby the weight of the study in meta‐analyses (Hedges 2015Higgins 2022b).

Congruent with our protocol (Mathews 2015), we planned that for included studies with incorrectly analysed data that did not account for clustering, we would use procedures for adjusting study sample sizes outlined in Section 16.3.4 of the Cochrane Handbook for Systematic Reviews of Interventions (Higgins 2011). None of the included studies reported an intracluster correlation coefficient (ICC), nor were these available from the study authors. No published ICC for child protection training interventions for professionals could be found, so we imputed a conservative ICC of 0.20 based on reviews of ICCs for professional development interventions with teachers (ICC range 0.15 to 0.21) (Kelcey 2013), and primary care providers (ICC range 0.01 to 0.16) (Eccles 2003).

We planned to test the robustness of these assumptions in sensitivity analysis, in which we would use two extreme ICC values reported in the literature for each professional subgroup to assess the extent to which different ICC values affected the weights assigned to the included trials. We also planned to investigate whether results were similar or different for cluster and non‐cluster trials. However, due to the small number of studies included for each outcome (one to three studies), we deemed these analyses to be inappropriate. Rather, where a study with clustering was included for a given outcome, we have presented two results: one without an adjustment for clustering, and one with an adjustment for clustering (using an ICC of 0.2). 

Dealing with missing data

Missing data can be in the form of missing studies, missing outcomes, missing outcome data, missing summary data, or missing participants. We did not anticipate missing studies, as our search strategy was comprehensive, and we took all reasonable steps to locate the full texts of eligible studies. Where possible, we identified missing outcomes by cross referencing study reports with trial registrations. For studies with missing or incomplete outcome data, or missing summary data required for effect size calculation, we contacted first‐named study authors via email to supply the missing information (e.g. intervention and control group participant totals, means, SDs, ICCs). 

If the data to calculate effect sizes with Review Manager Web or the RevMan Web calculator (or both) were not available in study reports or from study authors (RevMan Web 2021), we used David B Wilson's suite of effect size calculators to calculate an effect size (Wilson 2001). This was then entered directly into RevMan Web, and meta‐analyses were conducted using the generic inverse‐variance method in RevMan Web (Deeks 2022).

Assessment of heterogeneity

We used RevMan Web to conduct our analyses according to the guidance in Section 10.3 of the Cochrane Handbook (Deeks 2022). To estimate heterogeneity, this software uses the inverse‐variance method for fixed‐effect meta‐analysis, and the DerSimonian and Laird method for random‐effects meta‐analysis (Deeks 2022). We used standard default options in RevMan Web to calculate the 95% CI for the overall effect sizes.

To assess the extent of variation between studies, we initially examined the distributions of relevant participant (e.g. professional discipline), delivery (e.g. classroom), and trial (e.g. type and duration of intervention) variables. Using forest plots produced in RevMan Web (RevMan Web 2021), we visually examined CI for the outcome results of individual studies, paying particular attention to poor overlap, which can be used as an informal indicator of statistical heterogeneity (Deeks 2022Higgins 2011). Using output provided by RevMan Web (RevMan Web 2021), we examined three estimates that assess different aspects of heterogeneity as recommended by Borenstein 2009. Firstly, as a test of statistical significance of heterogeneity, we examined the Q statistic (Chi²) and its P value. For any observed Chi², a low P value was deemed to provide evidence of heterogeneity of intervention effects (i.e. that studies do not share a common effect size) (Deeks 2022Higgins 2011). Secondly, we examined Tau² to provide an estimate of the magnitude of variation between studies. Thirdly, we examined the I² statistic, which describes the proportion of variability in effect estimates due to heterogeneity rather than to chance (Deeks 2022Higgins 2011). These three quantities (Chi², Tau², and the I² statistic) together provide a comprehensive summary of the presence and the degree of heterogeneity amongst studies and are viewed as complementary rather than mutually exclusive quantities. 

Rather than defaulting to interpretations of heterogeneity based on rules of thumb (i.e. that an I² statistic value of 30% to 60% represents moderate heterogeneity, 50% to 90% substantial heterogeneity, and 75% to 100% considerable heterogeneity), we used all three measures of heterogeneity (Chi², Tau², and the I² statistic) to fully assess and describe the aspects of variability in the data as detailed in Borenstein 2009. For example, we used Tau² or the I² statistic (or both) to assess the magnitude of true variation, and the P value for Chi² as an indicator of uncertainty regarding the genuineness of the heterogeneity (P < 0.05). 

Assessment of reporting biases

We assessed reporting bias in the form of selective outcome reporting as one of the domains within the risk of bias assessment.

Data synthesis

We calculated effect sizes for single studies and quantitatively synthesised multiple studies using RevMan Web (RevMan Web 2021). We first assessed the appropriateness of combining data from studies based on sufficient similarity with respect to training interventions delivered, study population characteristics, measurement tools or scales used, and summary points (i.e. outcomes measured within comparable time frames pre‐ and postintervention). We combined data for comparable professional groups (e.g. elementary and high school teachers), similar outcome measures (e.g. knowledge measures, attitude measures), and training types (i.e. online and face‐to‐face). 

If studies reported means, SD, and the number of participants by group, we directly inputted that data into RevMan Web (RevMan Web 2021). If these data were not reported, and could not be obtained from the study authors, we consulted David B Wilson's suite of effect size calculators to ascertain if an effect size could be calculated (e.g. Randolph 1994 for primary outcome 1a). In cases where we needed to compute an effect size outside of RevMan Web (RevMan Web 2021), which then needed to be combined with other studies via meta‐analysis, we used the generic inverse method in RevMan Web to conduct the meta‐analysis (e.g. Dubowitz 1991 and analysis for secondary outcome 2a) (RevMan Web 2021). 

If there was only one study with available data to calculate an effect size for a given outcome, we reported a single SMD with 95% CIs. We acknowledge this is not standard practice and that normally the mean difference would be reported, but we adopted this strategy in order to maintain consistency and comparability in the presentation of the results, mindful of readers.

If there were at least two comparable studies with available data to calculate effect sizes, we performed meta‐analysis to compute pooled estimates of intervention effects for a given outcome. We reported the results of the meta‐analyses using SMDs and 95% CIs. Where we judged that studies were estimating the same underlying treatment effect, we used fixed‐effect models to combine studies. Fixed‐effect models ignore heterogeneity, but are generally interpreted as being the best estimate of the intervention effect (Deeks 2022). However, where the intervention effects are unlikely to be identical (e.g. due to slightly varying intervention models), random‐effects models can provide a more conservative estimate of effect because they do not assume that included studies estimate precisely the same intervention effect (Deeks 2022). We thus used a random‐effects meta‐analysis to combine studies where we judged that studies may not be estimating an identical treatment effect (e.g. different training curriculum). 

We had planned to develop a training intervention programme typology by independently coding and categorising intervention components (e.g. contents and methods) and then attempting to link specific intervention components to intervention effectiveness (Mathews 2015). However, we were unable to statistically test these proposals in subgroup analyses because there were too few studies. Instead, we provided a detailed narrative summary in the Characteristics of included studies tables. 

Subgroup analysis and investigation of heterogeneity

An insufficient number of studies precluded our planned subgroup analyses. However, in future review updates these methods may be required (Appendix 2).

Sensitivity analysis

We planned several sensitivity analyses; however, these were precluded by an insufficient number of included studies. Planned methods are provided in Appendix 2.

Summary of findings and assessment of the certainty of the evidence

To provide a balanced summary of the review findings, we have presented all review findings in two summary of findings tables, one that summarises primary outcomes and adverse effects (summary of findings Table 1), and one that summarises secondary outcomes (summary of findings Table 2). We chose this approach as both sets of outcomes have utility for practice and research. Each table summarises the evidence for RCT and quasi‐RCT studies that compare child protection training to no training, waitlist control, or alternative training (not related to child protection). None of the studies included long‐term follow‐up, and therefore the tables present findings only for outcomes that were measured in the short term, that is immediately postintervention or within three months after the intervention. Although the review includes CBA studies, we created the summary of findings tables only for RCTs and quasi‐RCTs, and rated the certainty of the evidence only for these studies.

At least two review authors (KW, EE, LH) rated the certainty of the evidence for all primary and secondary outcomes, with no disagreements to resolve. We rated the certainty of the evidence using the GRADE approach (Guyatt 2008Guyatt 2011Schünemann 2013Schünemann 2022). The GRADE system classifies the certainty of evidence into one of four categories, as follows.

  • High certainty: we are very confident that the true effect lies close to that of the estimate of the effect.

  • Moderate certainty: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.

  • Low certainty: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.

  • Very low certainty: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

We considered the following factors when grading the certainty of evidence: study design, risk of bias, precision of effect estimates, consistency of results, directness of evidence, and magnitude of effect (Schünemann 2022). We based our decisions on whether to downgrade the certainty of the evidence following the guidance in the Cochrane Handbook (Schünemann 2022), and entered the data for each factor in the GRADEpro GDT tool to obtain the overall rating of certainty (GRADEpro GDT). We recorded the process and rationale for downgrading the certainty of the evidence in footnotes to summary of findings Table 1 and summary of findings Table 2

All studies used to estimate treatment effects were RCTs or quasi‐RCTs. Each outcome began with an overall rating of high certainty; however, all outcomes were downgraded by a maximum of two or three levels. We downgraded the certainty of the evidence for all outcomes by one level due to high risk of bias and a further level due to indirectness of the evidence. We considered all findings to have concerns related to indirectness, either because the effect was estimated by a single study, thereby restricting evidence in terms of intervention, population, and comparators; or because the outcome was not a direct measure of reporting behaviour (i.e. the primary outcome of clinical relevance). Other outcomes were downgraded a further level due to inconsistency in the results (i.e. significant heterogeneity) or imprecision (i.e. CIs that included the possibility of a small effect size or small sample size), or both.

Results

Description of studies

Results of the search

In total, we identified 45,743 records through database searching, and a further 1839 records from other sources. After duplicates were removed, we screened the titles and abstracts of 33,702 records, excluding 32,221 as irrelevant. We assessed 1481 full‐text reports against our inclusion criteria, as detailed in Criteria for considering studies for this review. We excluded 1454 of these reports with reasons, as shown in Figure 1, with 'near misses' detailed in the Characteristics of excluded studies tables. We identified two ongoing studies (Ongoing studies) and three studies awaiting classification (Studies awaiting classification).


Study flow diagram.

Study flow diagram.

Included studies

We included 11 unique studies reported in 17 papers, as shown in the study flow diagram (Figure 1). Details for each of the 11 included studies are summarised in the Characteristics of included studies tables. 

Study design

Of the 11 included studies, five were RCTs (Kleemeier 1988Mathews 2017McGrath 1987Randolph 1994Smeekens 2011). Of these, two RCTs were conducted with individual participants (Mathews 2017Smeekens 2011), and three were conducted with participants in groups (Kleemeier 1988McGrath 1987Randolph 1994). Four studies were quasi‐RCTs (Alvarez 2010Dubowitz 1991Hazzard 1984Kim 2019). Of these, two were conducted at the individual level (Alvarez 2010Dubowitz 1991), and two were conducted at the group level (Hazzard 1984Kim 2019). The remaining two studies used a CBA design (Jacobsen 1993Palusci 1995), with one apiece conducted with individuals, Palusci 1995, and groups, Jacobsen 1993.

Location

One study was conducted in Canada (McGrath 1987), one in the Netherlands (Smeekens 2011), and the remaining nine studies were conducted in the USA. 

Sample sizes

The number of participants randomised per study ranged from 30, in Palusci 1995, to 765, in Mathews 2017. Only one study reported having used a sample size calculation (Mathews 2017).

Settings

Settings for the training interventions were aligned to workplaces. Reported settings included an urban public hospital (Palusci 1995), a university clinic (Dubowitz 1991), a rural school district (Jacobsen 1993Randolph 1994), and a suburban school district (Kleemeier 1988). In three studies, interventions were conducted online as e‐learning modules (Kim 2019Mathews 2017Smeekens 2011). Specific settings for interventions were not reported in three studies (Alvarez 2010;  Hazzard 1983McGrath 1987). 

Participants
Profession

The 11 studies included a total of 1484 participants. Participants were drawn from a small number of key groups having contact with children in their everyday work. In six studies, participants were elementary and high school teachers (Hazzard 1983Jacobsen 1993Kim 2019Kleemeier 1988McGrath 1987Randolph 1994). In two studies, participants were doctors ‐ specifically paediatric residents, Dubowitz 1991, and physicians, Palusci 1995. One study apiece was conducted with mental health professionals (Alvarez 2010), childcare professionals (Mathews 2017), and nurses (Smeekens 2011).

One study included both professional and student participants, but did not separate outcome data by group (Alvarez 2010). 

Demographic data contextually relevant to child protection training was reported in a minority of studies, including: years of professional work experience reported in seven studies (Jacobsen 1993Kim 2019Kleemeier 1988Mathews 2017McGrath 1987Randolph 1994Smeekens 2011); previous experience with child maltreatment reporting in three studies (Alvarez 2010Dubowitz 1991Hazzard 1983); and previous child protection training in three studies (Dubowitz 1991Hazzard 1983Mathews 2017). Participants in the 11 studies were relatively experienced in their professions, with the mean in the range of 9 years, in Smeekens 2011, to 15.4 years, in McGrath 1987.

Age, gender, and ethnicity

Study participants' demographic details at baseline were inconsistently reported. Four study authors reported mean age of participants at baseline separately for intervention and control groups (Alvarez 2010Jacobsen 1993Randolph 1994Smeekens 2011). One study reported age bracket data for intervention, control, and total participants (Mathews 2017). Four studies reported statistical assessment of baseline differences in age (Alvarez 2010Mathews 2017Randolph 1994Smeekens 2011). One study reported an age range of 18 to 55+ years for total participants (Kim 2019), and another study reported a median age bracket of 31 to 35 years (Hazzard 1983). Other studies reported means but not SDs for doctors (Dubowitz 1991: 27 years) and teachers (Kleemeier 1988: 41 years). One study did not report any data on participant age (McGrath 1987). 

The distribution of females to males in the included studies was low at 2:5 (44% female) for doctors (Dubowitz 1991), and high at 10:1 (97.7% female) for childcare professionals (Mathews 2017). Seven studies did not report gender‐specific proportions (Alvarez 2010Dubowitz 1991Hazzard 1983McGrath 1987Palusci 1995Randolph 1994Smeekens 2011). 

Ethnicity data were reported in only four studies, with the majority of participants in these studies being identified by use of the term 'White' or  'Caucasian': 70% (Jacobsen 1993), 75% (Kleemeier 1988), 84.2% (Mathews 2017), and 97.5% (Kim 2019). A minority of participants were Hispanic, African‐American, or Asian.

Interventions
Intervention conditions

The 11 trials examined the effectiveness of 11 distinct but comparable interventions. Interventions named were: child maltreatment reporting workshop (Alvarez 2010); child maltreatment course (Dubowitz 1991); one‐day training workshop on child abuse (Hazzard 1983); teacher training workshop (Kleemeier 1988); three‐hour inservice training on child sexual abuse adapted from Kleemeier 1988 (Jacobsen 1993); teacher awareness programme (McGrath 1987); child sexual abuse prevention teacher training workshop (Randolph 1994); interdisciplinary team‐based training (Palusci 1995); the Next Page (Smeekens 2011); iLookOut for Child Abuse (Mathews 2017); and Committee for Children Second Step Child Protection Unit (Kim 2019). All were education and training interventions aimed at building the capacity of postqualifying child‐serving professionals to protect children from harm by exposing these professionals to a series of intentional learning experiences. 

In eight of the 11 trials, interventions were delivered in face‐to‐face workshops or seminars, whilst in the remaining three trials, interventions were delivered as self‐paced e‐learning modules (Kim 2019Mathews 2017Smeekens 2011).

Contents or topics covered

All trials reported contents or topics covered in the training interventions. The most common topics included: indicators of child abuse and neglect; definitions and types of child abuse and neglect; reporting laws, policies, and ethics; how to make a report; incidence or prevalence, or both; and concerns, fears, myths, and misconceptions. Fewer interventions addressed aetiology (Dubowitz 1991Kleemeier 1988), effects (Jacobsen 1993Kleemeier 1988), responding to disclosures (Jacobsen 1993Kim 2019Randolph 1994), or community resources and referrals (Kleemeier 1988).

Training interventions for teachers were more likely to cover primary prevention, that is strategies for preventing child abuse and neglect before it occurs or preventing its reoccurrence (Jacobsen 1993Kim 2019Kleemeier 1988). Training for doctors, nurses, and mental health professionals tended to emphasise evaluation and diagnosis, communicating with children, and interviewing caregivers (Alvarez 2010Dubowitz 1991Palusci 1995Smeekens 2011). 

In three studies, all of the which evaluated e‐learning interventions, the study authors explained elements of underlying programme theory. In iLookOut, training was conceptualised as having two key dimensions designed to enhance participants' cognitive and affective attributes for reporting child maltreatment (Mathews 2017). In the Next Page, content was built around three dimensions: recognition, responding (acting), and communicating (Smeekens 2011). In Second Step Child Protection Unit, the staff training component was part of a broader comprehensive 'whole school' approach to prevention of child sexual abuse. This training addressed multiple features of the school ecology: school policies and procedures, staff training, student lessons, and family education (Kim 2019).

Teaching methods, strategies, or processes 

Teaching methods, strategies, or processes used in intervention delivery were reported in nine of the 11 trials, with two trials providing no information (Kim 2019McGrath 1987). The most common methods included: the use of films/videos; modelling via observations of clinicians; experiential exercises; and role‐plays. With the advent of technology, case simulations were used (Mathews 2017Smeekens 2011). These methods were directed towards providing insights into real‐life situations in which child abuse and neglect would be encountered, providing opportunities to observe experienced practitioners at work, engage in practice, and receive feedback. Some interventions also included question‐and‐answer sessions with experts (Hazzard 1984Kleemeier 1988Randolph 1994). Didactic presentations with group discussion were common, but less so the provision of reading tasks, Dubowitz 1991, and written activities, Randolph 1994. E‐learning modules offered opportunities for the use of interactive elements, including animations, Smeekens 2011, and filmmaking techniques designed to activate empathy for victims, Mathews 2017. For example, in iLookOut, e‐learning modules have an “interactive, video‐based storyline with films shot in point‐of‐view (i.e. the camera functioning as the learner’s eyes) ... as key events unfold through interactions involving children, parents, and co‐workers (all played by actors), the learner had to decide how to best respond” (Mathews 2017, p 19). 

The duration and intensity of interventions in the included trials ranged from a single two‐hour workshop, Alvarez 2010McGrath 1987, to six 90‐minute seminars conducted over a one‐month period (Dubowitz 1991). A six‐hour workshop for teachers, first reported in Hazzard 1984, was also used in Kleemeier 1988, and was then adapted for a three‐hour workshop by Jacobsen 1993. Similar content was spread over three two‐hour sessions by Randolph 1994. E‐learning interventions used in three studies offered the advantage of self‐paced learning within a specified window of availability, but also presented a challenge in specifying training length (Kim 2019Mathews 2017Smeekens 2011). 

The interventions were developed and delivered by specialist facilitators (Alvarez 2010Jacobsen 1993McGrath 1987), content area experts (Hazzard 1984Kim 2019Kleemeier 1988Mathews 2017Randolph 1994), and interdisciplinary teams (Dubowitz 1991Palusci 1995Smeekens 2011).

Control conditions

In one trial, the comparison condition was an alternative training, that is a cultural sensitivity workshop, which study authors explained was used for its appeal in recruiting participants who were looking for continuing education credits (Alvarez 2010, p 213). In four studies, the training intervention group was compared to a waitlist control group (Kim 2019Mathews 2017McGrath 1987Randolph 1994), and in five studies the comparison condition was no training (Dubowitz 1991Hazzard 1984Kleemeier 1988Palusci 1995Smeekens 2011). One study did not report the comparison condition (Jacobsen 1993).

Unit of analysis issues

Allocation of individuals to intervention or control conditions in many of the included studies occurred by workplace groups (e.g. all teachers in entire schools, all paediatricians on clinic rotations), thus forming clusters. None of the studies conducted at group level were labelled as clustered studies by study authors, nor were data analysed using statistical methods to account for similarities amongst participants in the same cluster. In some studies, all participants in a cluster (e.g. a school) were allocated to a condition (e.g. Hazzard 1984Kim 2019). In other studies, clustered data were created by allocating several participants from the same workplace to one condition (e.g. Alvarez 2010Kleemeier 1988). We identified unit of analysis issues, which we addressed in our reporting in the Effects of interventions section.

Missing data

We identified two types of missing data in the included studies: missing outcome data required for effect size calculation, and missing participant data due to attrition (Alvarez 2010Dubowitz 1991Hazzard 1984Kim 2019Kleemeier 1988Mathews 2017McGrath 1987Smeekens 2011). For details, see Characteristics of included studies tables. The approaches we used for dealing with missing data and data synthesis for each of these studies are detailed in Appendix 4.

Funding sources

All but two of the 11 included studies reported receiving external funding. Studies were funded by federal government agencies in the USA, Alvarez 2010Dubowitz 1991Kleemeier 1988Randolph 1994, and Canada (McGrath 1987), and by a combination of university and philanthropic funding in the USA (Hazzard 1984Mathews 2017). One study was funded by the philanthropic arm of an international technology company, which also hosted the online training platform used in the study (Smeekens 2011). One study was funded by a training intervention developer, a non‐government organisation in the USA (Kim 2019).

Outcomes

In this section, we have summarised the primary and secondary outcomes of interest that were investigated in the 11 included studies. For details by individual study, see Characteristics of included studies.

Primary outcomes

1. Number of reported cases of child abuse and neglect

As shown in Table 1 below, three of the 11 included studies measured changes in the number of reported cases of child abuse and neglect via participants' self‐reports of actual cases reported (i.e. primary outcome 1a) (Hazzard 1984Kleemeier 1988Randolph 1994). Although differently named, the instruments used were almost identical, comprising a battery of seven, Hazzard 1984, and five items, Kleemeier 1988Randolph 1994, assessing self‐reported actions taken in relation to child abuse and neglect (i.e. a behavioural measure). One common item in the batteries, 'reporting a case of suspected abuse' to a protective services agency, was classified as a 1b primary outcome measure. Data were collected in the three studies at six‐week, Kleemeier 1988, three‐month, Randolph 1994, and six‐month, Hazzard 1984, follow‐up periods.

Five of the 11 included studies measured changes in the number of reported cases of child abuse and neglect via participant responses to vignettes (i.e. primary outcome 1b) (Alvarez 2010Jacobsen 1993Kleemeier 1988Palusci 1995Randolph 1994). Alvarez 2010 used an inventory of eight child maltreatment vignettes, two for each of the four child maltreatment subtypes (physical abuse, emotional abuse, sexual abuse, and neglect), one of which required a report and the other which did not. Jacobsen 1993Kleemeier 1988, and Randolph 1994 used eight child sexual abuse vignettes in a measure known as the Teacher Vignettes Measure, to elicit participant knowledge of behavioural indicators, ability to respond to disclosures, and enact appropriate courses of action including reporting. From the text descriptions in the study reports, we can assume that the same vignettes were used in all three studies, and the vignettes were published in Jacobsen 1993 (p 43‐9). Kleemeier 1988, the original authors of the measure, reported psychometric properties including internal consistency (alpha (α) = 0.78) and scorer interrater reliability (0.99, coefficient not reported). Unlike Kleemeier 1988 and Randolph 1994, however, Jacobsen 1993 did not use the Teacher Vignettes Measure at baseline. Palusci 1995 presented four illustrated case vignettes as Part Three in a longer survey. Participants were asked to assess anatomical findings and decide on case reportability based on a short patient history and photographs. Internal consistency of the entire survey was reported (α = 0.69).

Smeekens 2011 used eight simulated scenarios based on real clinical cases with in vivo video‐recorded assessment, which was later coded independently. However, we judged this intervention, the skills/capabilities it targeted, and its measurement to be qualitatively different from the other included studies and the text‐based vignette assessments they used. We have reported on Smeekens 2011 under secondary outcome 3 below. 

None of the 11 included studies measured changes in the number of reported cases of child abuse and neglect by objective, official records of reports made to child protection authorities (i.e. primary outcome 1c). It is possible for such assessments of changes to reporting practice to be made by accessing administrative records of reports made to child protection agencies and examining these at the jurisdictional level required for the specific professional groups that are participating in interventions. However, such assessments are unlikely to be linkable to any trained individual or training cohort. This primary outcome measure would therefore be difficult, if not impossible, to assess in trials of training interventions. Research questions about the influence of training interventions on actual reporting practice may be better answered in other studies, for example time series analyses using child protection reporting data (e.g. Gilbert 2012Mathews 2016). These themes are further unpacked in Implications for research.

2. Changes in the quality of reported cases of child abuse and neglect

As shown in Table 1, none of the 11 included studies measured changes in the quality of reported cases of child abuse and neglect by objective official records of reports made to child protection authorities (i.e. primary outcome 2), for example via coding of de‐identified reports made to child protection authorities held in government or agency records. 

3. Adverse events

As shown in Table 1, none of the 11 included studies assessed adverse events, such as increases in failure to report (i.e. primary outcome 3a ‐ known colloquially as ‘under‐reporting’), or increases in reporting of cases that do not warrant a report (i.e. primary outcome 3b ‐ known colloquially as ‘over‐reporting’).

Table 1. Primary outcomes (* indicates inclusion in meta‐analysis or single effect size calculation)

Primary outcomes from included studies

Measure named in included studies

Studies

1. Number of reported cases of child abuse and neglect

1a. measured subjectively by participant self‐reports of actual cases reported

Reported Involvement in Child Abuse ‐ single item 'reporting a case of suspected abuse' from a multi‐item measure

Hazzard 1984

Teacher Prevention Behavior Measure ‐ a single item 'reporting a case of suspected abuse' from a multi‐item measure

Kleemeier 1988

Randolph 1994*

1b. measured subjectively by participant responses to vignettes

 

Recognition and Intention to Report Suspected Child Maltreatment

Alvarez 2010

Teacher Vignettes Measure

Jacobsen 1993

Kleemeier 1988*

Randolph 1994*

Survey (Part Three)

Palusci 1995

1c. measured objectively in official records of reports made to child protection authorities

Nil

Nil

2. Changes in the quality of reported cases of child abuse and neglect, measured via coding of the actual contents of reports made to child protection authorities (i.e. in government records or archives)

Nil

Nil

3. Adverse events

3a. increase in failure to report cases of child abuse and neglect that warrant a report, measured subjectively by participant self‐reports (i.e. in questionnaires)

Nil

Nil

3b. increase in reporting of cases that do not warrant a report, measured subjectively by participant self‐reports (i.e. in questionnaires)

Nil

Nil

Secondary outcomes

1. Knowledge of the reporting duty, processes, and procedures

Four of the 11 included studies measured knowledge of the reporting duty, processes, and procedures (i.e. secondary outcome 1 as shown in Table 2) (Alvarez 2010Kim 2019Mathews 2017McGrath 1987). Measurement instruments were customised to align with jurisdictional or institutional (or both) reporting requirements. Alvarez 2010 used a 15‐item inventory with multiple‐choice response options to assess knowledge of child maltreatment reporting laws and reported both internal consistency (α = 0.18) and test‐retest reliability (r = 0.88, P < 0.01) (Alvarez 2008, p 56). Kim 2019 used the Educators and Child Abuse Questionnaire (Knowledge of Reporting Procedures subscale) (Kenny 2004). McGrath 1987 presented participants with five items assessing knowledge of legislative reporting requirements and five items assessing school board policy reporting requirements, but correct answers were not summed for an overall score. Data were reported by item. No reliability data were reported. Mathews 2017 used a 21‐item scale to assess knowledge of the legal duty to report child abuse and neglect, which was subjected to psychometric testing; however, these data were not reported.

2. Knowledge of core concepts in child abuse and neglect such as the nature, extent, and indicators of the different types of child abuse and neglect 

Eight of the 11 included studies measured knowledge of core concepts in child abuse and neglect (i.e. secondary outcome 2 as shown in Table 2) (Dubowitz 1991Hazzard 1984Jacobsen 1993Kim 2019Kleemeier 1988McGrath 1987Palusci 1995Randolph 1994). Four of these studies assessed knowledge of core concepts in child abuse and neglect (inclusive of all forms of child abuse and neglect) (Dubowitz 1991Hazzard 1984Kim 2019McGrath 1987), and four studies assessed knowledge of core concepts in child sexual abuse specifically (Jacobsen 1993Kleemeier 1988Palusci 1995Randolph 1994).

Knowledge scales for core concepts in child abuse and neglect varied in length from four items, in McGrath 1987, to 34 items, in Hazzard 1984. Response options were presented as multiple choice or variations on true/false/don’t know, so that correct answers could be summed for an overall score. Dubowitz 1991 developed a custom‐made test based on course content. Hazzard 1984 developed the Knowledge About Child Abuse scale, which assessed knowledge about definitions, characteristics, causes and effects. Internal consistency was reported (α = 0.80, p 290). Kim 2019 used the Educators and Child Abuse Questionnaire (Awareness of Signs and Symptoms of Child Abuse subscale) (Kenny 2004), for which validity and reliability data were established in a similar population (Kenny 2004). McGrath 1987 used four items from a longer questionnaire ("First Measure subscale 1"), to separately assess knowledge of indicators of physical abuse, neglect, sexual abuse, and emotional abuse. 

Knowledge scales for core concepts in child sexual abuse specifically were 30 items in length (Jacobsen 1993Kleemeier 1988Palusci 1995Randolph 1994). The Teacher Knowledge Scale, Kleemeier 1988, was used in three studies (Jacobsen 1993Kleemeier 1988Randolph 1994). Jacobsen 1993 provided a full list of scale items for the Teacher Knowledge Scale (p 36‐8), and Kleemeier 1988 reported on internal consistency (α = 0.84) and test‐retest reliability (r = 0.90). Palusci 1995 used a 30‐item survey divided into three parts (i.e. subscales). Only part one, assessing knowledge of female genital anatomy (12 items), was relevant to this outcome. 

3. Skill in distinguishing cases that should be reported from those that should not

In our study protocol (Mathews 2015), we did not sufficiently describe this secondary outcome such that during data extraction, it became clear that there was potential for overlap between this secondary outcome and primary outcome 1b: changes in the 'number of reported cases of child abuse and neglect as measured subjectively by participant responses to vignettes’. To clarify, skills are individual attributes that can be assessed as an individual’s ability to perform a task to a given level (Dalziel 2017). There may be at least two skills involved in distinguishing cases that should be reported from those that should not: (i) the ability to accurately identify child abuse and neglect; and (ii) the ability to determine whether the type and extent of abuse or neglect that is presented to the reporter falls within a category required by law or policy to be reported. These skills can be developed for professionals via exposure to ‘real situations’ which may involve, for example, supervised clinic rotations, practicum or fieldwork placements, or internships. These skills can be assessed ethically, using in vivo assessments and participation in simulation games, and may also be assessable via responses to vignettes (Stanley 2017).

One of the 11 included studies measured skill in distinguishing cases that should be reported from those that should not (i.e. secondary outcome 3 as shown in Table 2) via in vivo assessment (Smeekens 2011). In this study, the intervention was an e‐learning programme for Dutch emergency department nurses comprising interactive clinical case simulations and video animations. Nurses' performances in two different simulated cases, randomly generated from a pool of eight possible simulated cases, were video recorded at pre‐ and post‐test and coded by a trained and blinded expert panel. In the in vivo assessments, nurses were guided through a simulated paediatric patient examination and were scored on the quality and quantity of the questions they asked and their completion of a standardised checklist (Smeekens 2011). Interrater reliability for the expert panellists was reported as 0.70 (p 333).

4. Attitudes towards the duty to report child abuse and neglect

According to attitude theories, attitudes are phenomenologically distinct from opinions, beliefs, and feelings. Attitudes can be defined as “a psychological tendency that is expressed by evaluating a particular entity with some degree of favour or disfavour” (Eagly 1993, p 1). An attitude thus must be directed towards a specific attitude object such as a behaviour (i.e. reporting child abuse and neglect), a condition (i.e. child sexual abuse), a person or group (e.g. perpetrators or victims of child abuse), or an event (e.g. a campaign about violence against children). Attitudes towards one particular 'thing' cannot be conflated with attitudes towards another different 'thing'.

Two of the 11 included studies measured attitudes towards the duty to report child abuse and neglect (Kim 2019Mathews 2017). Kim 2019 used the previously validated 14‐item Teacher Reporting Attitude Scale ‐ Child Sexual Abuse (Walsh 2010Walsh 2012b). Mathews 2017 used a 13‐item scale adapted from previous research. Kleemeier 1988 and Randolph 1994 measured attitudes towards child sexual abuse rather than attitudes towards the duty to report child abuse and neglect ‐ these are listed in Table 3 as ineligible outcomes. Dubowitz 1991 included "Attitudinal Items"; however, on closer inspection of the scale items and response scale, we classified this as a child abuse reporting self‐efficacy measure, which assessed levels of competence in managing cases of child abuse, comprising five items rated on a five‐point Likert‐type scale ‐ this is listed in Table 3 as an ineligible outcome.  

Table 2. Secondary outcomes (* indicates inclusion in meta‐analysis or single effect size calculation)

Secondary outcomes from included studies

Measure as named in included studies

Studies

1. Knowledge of the reporting duty, processes, and procedures

Knowledge of Child Maltreatment Reporting Laws 

Alvarez 2010

Educators and Child Abuse Questionnaire (Knowledge of Reporting Procedures subscale) (Kenny 2004)

Kim 2019

First measure (subscales 2 and 3)

McGrath 1987

iLookOut Knowledge

Mathews 2017*

2. Knowledge of core concepts in child abuse and neglect such as the nature, extent, and indicators of the different types of abuse and neglect

2a. knowledge of core concepts in child abuse and neglect (i.e. all forms of child abuse and neglect)

Test based on course content

Dubowitz 1991*

Knowledge About Child Abuse

Hazzard 1984*

Educators and Child Abuse Questionnaire (Awareness of Signs and Symptoms of Child Abuse subscale) (Kenny 2004)

Kim 2019

Second measure

McGrath 1987

2b. knowledge of core concepts in child sexual abuse (i.e. only child sexual abuse)

Teacher Knowledge Scale (Kleemeier 1988)

Jacobsen 1993

Kleemeier 1988*

Randolph 1994

First measure (subscale 1, indicators of child sexual abuse)

McGrath 1987*

Survey (Part One and Part Two)

Palusci 1995

3. Skill in distinguishing between cases that should be reported from those that should not

Performance in Simulated Cases

Smeekens 2011*

4. Attitudes towards the duty to report child abuse and neglect

Teacher Reporting Attitude Scale ‐ Child Sexual Abuse (Walsh 2012b)

Kim 2019

iLookOut Attitudes

Mathews 2017*

Ineligible outcomes

Several of the included studies also measured ineligible outcomes, as shown in Table 3 below.

Table 3. Ineligible outcomes

Ineligible outcomes from included studies

Measure as named in included studies

Studies

1. Skills in safeguarding therapeutic relationships

Clinical Expertise in Reporting Suspected Child Maltreatment Scale

Alvarez 2010

2. Child abuse reporting self‐efficacy

Attitudinal items

Dubowitz 1991

3. Child abuse detection self‐efficacy

Self‐efficacy

Smeekens 2011

4. Feelings

Feelings About Child Abuse

Hazzard 1984

5. Attitudes towards child sexual abuse

Teacher Opinion Scale

Kleemeier 1988

Randolph 1994

6. Knowledge of female genital anatomy

Survey (Part One)

Palusci 1995

7. Knowledge of ‘reportability’ of sexually transmitted infections in children

Survey (Part Two)

Palusci 1995

8. Teacher‐student relations

Delaware School Climate Survey 

Kim 2019

9. Acceptability

Abbreviated Acceptability Rating Profile

Kim 2019

Excluded studies

We excluded 1454 records after full‐text screening. Most of these records were excluded on the basis of there being no report on an eligible intervention. The Characteristics of excluded studies tables list only those 20 studies that appeared to meet the eligibility criteria but were excluded during close inspection at the full‐text screening or data extraction stages, along with the primary reasons for exclusion. These are studies that were 'near misses' and that readers may view as relevant. Of note were two studies, Rheingold 2012 and Rheingold 2015, both of which were multisite RCTs comparing (i) face‐to‐face with (ii) web‐based training using the Stewards of Children training programme for professionals. These studies addressed outcomes relevant to this review, but our study protocol did not allow for the inclusion of head‐to‐head training comparison studies. This is a study limitation (addressed below in the Potential biases in the review process section) that could be remedied in future review updates. Other notable studies that were excluded include: Hawkins 2001a and Hawkins 2001b, a frequently cited evaluation of a training intervention for mandatory reporters in Australia; Lee 2017, reporting on training interventions for nurses; and a registered Phase 2 trial, NCT03185728, of an e‐learning intervention first reported in Mathews 2017 (included in this review).

Ongoing studies and studies awaiting classification 

We identified two ongoing studies. We contacted the authors of a potentially completed registered trial in our searches of clinical trials registries (IRCT2015042713748N3), but received no response regarding its status. We identified another trial‐in‐progress both through our search and enquiries to the Child‐Maltreatment‐Research‐Listserv (NCT03185728), which is a trial testing the effectiveness of the iLookOut for Child Abuse e‐learning training programme. We identified three studies awaiting classification. One study that was written in a language other than English appeared to meet our inclusion criteria (De Faria Brino 2003), but attempts to contact the author to verify eligibility failed. We could not source the full text for another potentially eligible study, and also could not categorically exclude it based on the title and abstract (Herrera 1993). We have contacted study authors where possible, and endeavour to finalise these studies in subsequent review updates. 

Risk of bias in included studies

Risk of bias judgements for the 11 included studies are summarised in Figure 2 and Figure 3. We have reported risk of bias judgements for the 9 RCTs and quasi‐RCTs separately from risk of bias judgements for the 2 CBA studies.


Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.


Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Across the nine RCTs and quasi‐RCTs studies, three domains were most at risk of bias: performance bias (all nine studies rated at high risk of bias); detection bias (all nine studies rated at high risk of bias); and reporting bias (six of nine studies rated at high risk of bias, although the more recent studies, Alvarez 2010Mathews 2017, and Smeekens 2011, were rated as low risk of bias). Selection bias was problematic, with allocation concealment and group comparability assessed as being at unclear or high risk of bias for all nine studies. We assessed attrition bias, indicated by incomplete reporting of outcome data, as unclear or high risk of bias for all nine studies. The vast majority of studies reported insufficient information to judge risk of bias equivocally. No domains were rated predominantly at low risk of bias (i.e. with > 50% of studies rating as low risk of bias).

Of the two CBA studies, Palusci 1995 was rated as high or unclear risk of bias on 9 of 10 risk of bias domains, mirroring the ratings for Dubowitz 1991, upon which it was based, on all domains except for measurement bias. Jacobsen 1993 was rated as high or unclear risk of bias on 8 of 10 risk of bias domains, mirroring the ratings for Kleemeier 1988, upon which it was based, on several domains and improving on the original study for reporting bias (perhaps owing to the additional space afforded in a thesis format). Both CBA studies were judged as at high risk of bias, and were omitted from effect size calculation and meta‐analyses (Jacobsen 1993Palusci 1995).

We did not find published protocols for any of the included studies; however, two studies had been registered (Mathews 2017Smeekens 2011). Nonetheless, the trial register entries did not report any detail on proposed outcome measures.

Allocation

Random sequence generation

We rated three studies at high risk of bias (Dubowitz 1991Jacobsen 1993Palusci 1995), two of which were CBA studies (Jacobsen 1993Palusci 1995). One quasi‐RCT used naturally occurring clinician rotations to divide participants into intervention or control groups (Dubowitz 1991). We rated three studies at low risk of bias, as they provided adequate descriptions of appropriate methods used to generate the allocation sequence, such as a computer‐generated random number list (Kim 2019Mathews 2017Smeekens 2011). We rated the remaining five studies at unclear risk of bias because they provided inadequate descriptions of sequence generation. 

Allocation concealment

We rated no studies as low risk of bias for this domain (Dubowitz 1991Jacobsen 1993Palusci 1995). We rated three studies at high risk of bias due to inadequate concealment of allocations prior to assignment; two of these were CBA studies (Jacobsen 1993Palusci 1995), whilst in the third study, a quasi‐RCT, participants were allocated based on clinical rotations, and therefore participants and investigators could reasonably have foreseen participant allocation to intervention or control group prior to or during the allocation process (Dubowitz 1991). The remaining eight studies did not adequately report an appropriate method of concealing allocation of participants to treatment groups and were judged at unclear risk of bias. 

Blinding

Blinding of participants and personnel

In most instances with training interventions, it is not possible to blind study participants and personnel from knowledge of group membership. Participants know that they are taking part in training, and this may influence subjective outcomes such as self‐report measures. For this reason, and in the absence of adequate reporting on blinding, we rated all 11 included studies at high risk of bias. The authors of one RCT reported on blinding, acknowledging that blinding was not possible owing to the nature of the trial (Smeekens 2011), but did not explain how the risk was (or could be) mitigated.

Blinding of outcome assessment

We rated one RCT at low risk of bias because it reported blinding outcome assessors to which participants belonged to either the intervention or control group (Smeekens 2011). In Smeekens 2011, an objective individual assessment of nurse participants’ performance was undertaken via in situ responses to standardised video case simulations evaluated by experienced paediatricians blinded to group membership. We classified the remaining studies at high risk of bias owing to inadequate or no reporting on blinding of outcome assessors.

Incomplete outcome data

We rated no studies at low risk of bias for this domain. We assessed eight studies at unclear risk of attrition bias. Six of these studies, including the two CBA studies, did not provide complete data on participant attrition, exclusions, and withdrawals, or report reasons for missing data or imputation methods used (Alvarez 2010Dubowitz 1991Hazzard 1984Kleemeier 1988Palusci 1995Randolph 1994). The other two studies provided CONSORT diagrams, but did not explicitly report reasons for attrition, so it was not possible to verify the potential impact on effect estimates (Kim 2019Mathews 2017). We assessed three studies at high risk of bias. In one of these studies, the authors reported a “high level of attrition” from the experimental group, but did not provide further details (McGrath 1987, p 126). In another study, attrition was reported as around one‐third of participants across both intervention and control groups in an already‐small trial (19 participants in each group). The authors reported the use of imputation and comparison of imputed versus not imputed results, noting that there was no difference in results, yet the results for both analyses were not reported (Smeekens 2011). In the third study, all control group participants were lost at the four‐month follow‐up, thereby preventing between‐group comparisons at that time point (Mathews 2017).

Selective reporting

We rated four studies at low risk of selective reporting bias because prespecified outcomes were reported in sufficient detail to assess their completeness. Two of these studies had published trial registrations, though not protocols (Mathews 2017Smeekens 2011). In the third study, there was congruence between published and unpublished reports (Alvarez 2010); and the fourth study, a CBA, was a thesis, which arguably enabled more space for detailed reporting (Jacobsen 1993). We rated the remaining seven studies, including one CBA study (Palusci 1995), at high risk of bias because study protocols were not available, and incomplete outcome data were reported, thus increasing the possibility of selective reporting (Dubowitz 1991; Hazzard 1984; Kim 2019; Kleemeier 1988; McGrath 1987; Palusci 1995; Randolph 1994).

Other potential sources of bias

We rated none of the 11 included studies at low risk of bias in all three additional domains: reliability of outcome measures (measurement bias); group comparability (selection bias); and contamination (contamination bias). We rated six studies, including one CBA study, at unclear risk of bias across the three additional domains (Jacobsen 1993Kim 2019Mathews 2017McGrath 1987Randolph 1994Smeekens 2011). We rated five studies, including one CBA study, at high risk of bias across the three additional domains (Alvarez 2010Dubowitz 1991Hazzard 1984Kleemeier 1988Palusci 1995).

An insufficient number of included studies precluded analysis for publication bias. 

Reliability of outcome measures

We rated five studies, including one CBA study, at low risk of measurement bias due to their use of reliable measures for outcome assessment; these studies reported internal consistency coefficient alphas of > 0.60 for the scales used (Jacobsen 1993; Kim 2019; Kleemeier 1988; Palusci 1995; Randolph 1994). One study reported interrater reliability for objective outcome assessment, but did not report data on the internal consistency of the measures (Smeekens 2011); we rated this study at unclear risk of bias on this domain, along with three further studies (Hazzard 1984; Mathews 2017; McGrath 1987). One study reported internal consistency data for a scale comprised of separate items in which the use of coefficient alpha would not have been appropriate (Hazzard 1984). One study reported methods used to improve internal consistency, but did not report relevant data, so it was not possible to determine reliability (Mathews 2017). One study used a pre‐existing scale that could not be found in order to determine reliability (McGrath 1987). We rated two studies at high risk of bias because they reported low coefficient α (Alvarez 2010), or did not report any reliability data (Dubowitz 1991, which was a CBA study). 

Group comparability

We rated no studies at low risk of bias for group comparability. No studies provided sufficient detail for each outcome measure to enable a true assessment of intervention and control group comparability at baseline. We rated seven studies, including one CBA study (Jacobsen 1993), at unclear risk of bias for this domain because incomplete reporting of information prevented an assessment of whether analysed participants were comparable at baseline (Alvarez 2010; Jacobsen 1993; Kim 2019; Mathews 2017; McGrath 1987; Randolph 1994; Smeekens 2011). We rated four studies, including one CBA study (Palusci 1995), at high risk of bias for this domain because, despite reporting group equivalence, no data were provided to support the claim (Hazzard 1984Palusci 1995), or a claim was made about group equivalence, yet there appeared to be important differences between groups, such as in professional qualifications and previous experience, that were unaccounted for and would likely affect group equivalence and study outcomes (Dubowitz 1991Kleemeier 1988).

Contamination

We rated three studies at low risk of contamination bias. In one study, Hazzard 1984, participants in the intervention group were drawn from one state in the USA, and participants in the control group from another state. In two studies, whole schools were randomised to intervention or control groups (Kim 2019McGrath 1987). We rated the remaining eight studies, including the two CBA studies (Jacobsen 1993Palusci 1995), at unclear risk of contamination bias because it was unclear whether participants in the intervention and control groups worked in the same or different settings, thereby making it possible that control group participants working in the same setting as intervention group participants would be exposed to some parts of the intervention via proximity or informal communication channels, or both.

Effects of interventions

See: Summary of findings 1 Child protection training for professionals to improve reporting of child abuse and neglect compared with no training, waitlist control, or alternative training not related to child abuse and neglect (primary outcomes); Summary of findings 2 Child protection training for professionals to improve reporting of child abuse and neglect compared with no training, waitlist control, or alternative training not related to child abuse and neglect (secondary outcomes)

The results of analyses and our GRADE ratings are presented in summary of findings Table 1 (primary outcomes) and summary of findings Table 2 (secondary outcomes) for child protection training compared with no training, waitlist control, or alternative training not related to child abuse and neglect. 

In this section, we have presented the main findings on the effects of interventions for the primary and secondary outcomes, drawing only on data from the five RCTs and four quasi‐RCTs (see Data collection and analysis). We have qualitatively synthesised the findings of the two CBA studies.

Primary outcomes

No studies evaluated changes in the number of reported cases of child abuse and neglect, as measured objectively in official records of reports made to child protection authorities; changes in the quality of reported cases of child abuse and neglect, as measured via coding of actual contents of reports made to child protection authorities; or adverse events.

Number of reported cases of child abuse and neglect
Participant self‐reports of actual cases reported

Three studies measured changes in the number of cases of child abuse and neglect via participants' self‐report of actual cases reported: two RCTs (Kleemeier 1988Randolph 1994), and one quasi‐RCT (Hazzard 1984). In two studies data were missing for calculation of effect sizes, and due to the age of the studies, contact with authors to obtain missing data was not possible (Hazzard 1984Kleemeier 1988). One study, Randolph 1994, included a total of 42 participants, and the effect estimate suggested a large effect of training on self‐reported cases at three‐month follow‐up (standardised mean difference (SMD) 0.81, 95% confidence interval (CI) 0.18 to 1.43; very low‐certainty evidence). This effect size was calculated using David B Wilson's suite of effect size calculators, as RevMan Web would not calculate an effect size when the mean for a group was zero, as was the case for the control group in this study (RevMan Web 2021). 

We identified clustering in the Randolph 1994 study. After adjusting for clustering, the SMD reduced slightly, and the CIs widened slightly (SMD 0.80, 95% CI 0.16 to 1.45; very low‐certainty evidence). These analyses suggested that adjusting for clustering had only slight effects on the results.

Participant responses to vignettes

Three studies measured changes in the number of cases of child abuse and neglect via participants' responses to vignettes: two RCTs (Kleemeier 1988Randolph 1994), and one quasi‐RCT (Alvarez 2010). One study did not separate outcome data for students and professionals (Alvarez 2010), and our attempts to locate the required data breakdown after contact and co‐operation from study authors were unsuccessful. The two remaining studies, both RCTs, included 87 participants (training n = 47; comparison n = 40) (Kleemeier 1988Randolph 1994), and the overall effect estimate suggested a large effect of training on the number of reported cases of child abuse and neglect at post‐test (fixed‐effect model: SMD 1.81, 95% CI 1.30 to 2.32; P < 0.001, I² = 8%, 2 studies, 87 participants; very low‐certainty evidence; Analysis 1.1). A forest plot of the distribution of effect sizes is provided in Figure 4. All indicators suggested minimal heterogeneity across study effects (Tau² = 0.01; Chi² = 1.08, df = 1, P = 0.30; I² = 8%). 


Forest plot of the comparison: training versus no training or waitlist control: number of reported cases of child abuse and neglect (assessed via vignette responses).

Forest plot of the comparison: training versus no training or waitlist control: number of reported cases of child abuse and neglect (assessed via vignette responses).

We identified clustering in the Randolph 1994 and Kleemeier 1988 studies. After adjusting for clustering, the SMD increased slightly, the CI widened slightly (fixed‐effect model: SMD 1.82, 95% CI 1.28 to 2.35; P < 0.001, I² = 1%, 2 studies, 80 participants (adjusted sample size); very low‐certainty evidence; Analysis 1.2), and heterogeneity was reduced (Tau² = 0.00; Chi² = 1.01, df = 1, P = 0.31; I² = 1%). These analyses suggested that adjusting for clustering had only slight effects on results.

Two CBA studies also reported responses to vignettes as an outcome measure (Jacobsen 1993, n = 40; Palusci 1995, n = 30). Jacobsen 1993 did not assess this outcome at baseline. Palusci 1995 included medical students and qualified medical professionals, and although the authors summarised baseline and postintervention data by professional status subgroups, they did so only for the experimental group, and reported only the total number of correct survey answers in a figure without any other data to permit calculation of an effect size. Part three of the survey measured professionals' responses to four case vignettes, yet the authors reported results survey scores as overall totals rather than by the three distinct survey parts. We were therefore unable to discern the effect of the intervention on professionals' responses to vignettes.

Secondary outcomes

Knowledge of the reporting duty, processes, and procedures

Four studies measured professionals' knowledge of reporting duty, processes, and procedures after participation in training: two RCTs (Mathews 2017McGrath 1987), and two quasi‐RCTs (Alvarez 2010Kim 2019). In two studies data were missing for calculation of effect sizes (Kim 2019McGrath 1987), and another study did not separate data for professionals and students (Alvarez 2010). Our attempts to obtain missing data from the study authors were unsuccessful. Using the supplementary data for the remaining study (Mathews 2017), with a total of 744 participants (training n = 373; comparison n = 371), the effect estimate suggested a large effect of training on postintervention knowledge of reporting duty, processes, and procedures (SMD 1.06, 95% CI 0.90 to 1.21; low‐certainty evidence; Analysis 2.1). Due to attrition, calculation of between‐group effects was not possible at the four‐month follow‐up for this study. 

Knowledge of core concepts in child abuse and neglect 

Six studies measured professionals' knowledge of core concepts in child abuse and neglect, such as the nature, extent, and indicators of different types of child abuse and neglect (Dubowitz 1991Hazzard 1984Kim 2019Kleemeier 1988McGrath 1987Randolph 1994).  

Child abuse/maltreatment (general) 

Four studies used a generalised measure of professionals' knowledge of core concepts in child abuse or maltreatment, or both: one RCT (McGrath 1987), and three quasi‐RCTs (Dubowitz 1991Hazzard 1984Kim 2019). In two studies data were missing for calculation of effect sizes, and our attempts to obtain the data from the study authors were unsuccessful (Kim 2019McGrath 1987). The two remaining studies, Dubowitz 1991 and Hazzard 1984, included 154 participants (training n = 82; comparison n = 72), and the overall effect estimate suggested a moderate effect of training on generalised knowledge of child abuse and neglect post‐test (fixed‐effect model: SMD 0.68, 95% CI 0.35 to 1.01; P < 0.001, I² = 0%, 2 studies, 154 participants; very low‐certainty evidence; Analysis 3.1). A forest plot of the distribution of effect sizes is provided in Figure 5. All indicators suggested minimal heterogeneity across study effects (Tau² = 0.00; Chi² = 0.01, df = 1, P = 0.91; I² = 0%). Whilst follow‐up data were collected in the Dubowitz 1991 study, no data were reported to permit an assessment of the effect of training three to four months after the intervention. 


Forest plot of the comparison: training versus no training: knowledge of core concepts in child abuse and neglect (all forms of child abuse and maltreatment generally).

Forest plot of the comparison: training versus no training: knowledge of core concepts in child abuse and neglect (all forms of child abuse and maltreatment generally).

We identified clustering in the Hazzard 1984 study. After adjusting for clustering, the SMD for the meta‐analysis reduced slightly, the estimates for heterogeneity did not change, but the CIs for Hazzard 1984 widened to include zero (fixed‐effect model: SMD 0.66, 95% CI 0.17 to 1.15; P = 0.009, I² = 0%, 2 studies, 70 participants (adjusted sample size); very low‐certainty evidence; Analysis 3.2). 

Child sexual abuse (specific)

Three studies, all RCTs, used a specific measure of professionals' knowledge of core concepts in child sexual abuse (Kleemeier 1988McGrath 1987Randolph 1994), and included 238 participants (training n = 104; comparison n = 134). The overall effect for training on specific knowledge of child sexual abuse post‐test was large and positive (random‐effects model: SMD 1.44, 95% CI 0.43 to 2.45; P = 0.005, I² = 89%, 3 studies, 238 participants; very low‐certainty evidence; Analysis 3.3), but had substantial heterogeneity across effect sizes (Tau² = 0.69; Chi² = 17.44, df = 2, P < 0.001; I² = 89%; Figure 6). 


Forest plot of the comparison: training versus no training or waitlist: knowledge of core concepts in child abuse and neglect (child sexual abuse specifically).

Forest plot of the comparison: training versus no training or waitlist: knowledge of core concepts in child abuse and neglect (child sexual abuse specifically).

There were too few studies to conduct subgroup analyses, so a qualitative assessment of the three studies was used to identify the potential source of heterogeneity. All studies measured the outcome at the same time point, used a face‐to‐face delivery method, and had similar content and teaching methods (although McGrath 1987 did not report the latter). However, there were three discernible differences between the studies: (i) the comprehensiveness of the outcome, whereby McGrath 1987 used a single‐item scale, and Kleemeier 1988 and Randolph 1994 used the same 30‐item scale; (ii) McGrath 1987 utilised a train‐the‐trainer model; and (iii) the length of the training: a six‐hour workshop in Kleemeier 1988, three two‐hour sessions in Randolph 1994, and a two‐hour workshop in McGrath 1987

We identified clustering in all three studies (Kleemeier 1988McGrath 1987Randolph 1994). After adjusting for clustering, the SMD decreased slightly; the CI widened slightly (random‐effects model: SMD 1.42, 95% CI 0.44 to 2.39; P = 0.004, I² = 85%, 3 studies, 178 participants (adjusted sample size); very low‐certainty evidence; Analysis 3.4); and heterogeneity was reduced (Tau² = 0.62; Chi² = 13.35, df = 2, P = 0.001; I² = 85%). 

Two CBA studies also utilised outcomes measuring professionals' knowledge of core concepts related to child sexual abuse (Jacobsen 1993Palusci 1995). We were unable to discern the effect of the intervention on professionals' knowledge for Palusci 1995 (as explained above in this section under 'Participant responses to vignettes'). The results for Jacobsen 1993 were consistent with the results of the meta‐analysis of RCTs (SMD 1.81, 95% CI 1.07 to 2.56, 40 participants; very low‐certainty evidence; Analysis 3.5). 

Skill in distinguishing reportable and non‐reportable cases 

One RCT measured professionals' skill in distinguishing reportable and non‐reportable cases after participation in training (Smeekens 2011). Based on a total of 25 participants (training n = 13; comparison n = 12), the effect estimate suggested a large effect of training on professionals' skill in distinguishing reportable and non‐reportable cases at post‐test (SMD 0.94, 95% CI 0.11 to 1.77; very low‐certainty evidence; Analysis 4.1). 

Attitudes towards the duty to report child abuse and neglect

Two studies measured attitudes towards the duty to report child abuse and neglect: one RCT (Mathews 2017), and one quasi‐RCT (Kim 2019). In one study data were missing for calculation of effect sizes, and our attempts to obtain missing data from the study authors were unsuccessful (Kim 2019). Using the supplementary data for the remaining study (Mathews 2017), with a total of 741 participants (training n = 372; comparison n = 369), the effect estimate suggested a moderate effect of training on attitudes towards the duty to report child abuse and neglect (SMD 0.61, 95% CI 0.47 to 0.76; very low‐certainty evidence; Analysis 5.1). Due to attrition, calculation of between‐group effects was not possible at the four‐month follow‐up for this study. 

Discussion

Summary of main results

We conducted this systematic review to assess the effectiveness of child protection training to improve reporting of child abuse and neglect by professionals, and to investigate possible components of effective training interventions. We assessed the eligibility of 1481 full‐text reports, of which 11 trials (in 17 reports) met our inclusion criteria: five RCTs, four quasi‐RCTs, and two CBAs. We included data from nine of the 11 trials in the quantitative synthesis.

We found that child protection training of the type reported in this review may be more helpful than no training at all; however, overall the evidence is very uncertain. Professionals who received training scored higher on measures of knowledge, skills, and attitudes. However, the results were based on a small number of studies, some of which were dated, and had methodological problems. Our analyses in some cases included only one professional group, limiting the applicability of the findings to other professional groups.

All trials were conducted with intradisciplinary groups of qualified professionals (elementary and high school teachers, childcare professionals, medical practitioners, and nurses), except for one study involving an interdisciplinary group of mental health professionals from psychology, educational psychology, counselling, and social work (Alvarez 2010). These are key professional groups having regular contact with children and who are most often required by law or occupational policy to report child abuse and neglect to statutory child protection authorities.

Trials were mainly conducted in the USA. Interventions were developed by experts and delivered by specialist facilitators or content area experts, and three interventions were facilitated by an interdisciplinary team (Dubowitz 1991Palusci 1995Smeekens 2011). Training intensity ranged from two hours to six 90‐minute sessions over a one‐month period. Eight trials tested face‐to‐face training interventions (Alvarez 2010Dubowitz 1991Hazzard 1984Jacobsen 1993Kleemeier 1988McGrath 1987Palusci 1995Randolph 1994). Three trials tested the effectiveness of self‐paced e‐learning interventions (Kim 2019Mathews 2017Smeekens 2011). Comparison conditions were no training, waitlist control, or alternative training (unrelated to child protection). 

Effectiveness of training: primary outcomes

We were able to assess training effectiveness for only one of the three primary outcomes specified in our study protocol (Mathews 2015): number of reported cases of child abuse and neglect. 

Number of reported cases of child abuse and neglect

Compared with those with no training or who were waitlisted to receive training, trained professionals reported higher numbers of actual cases to child protection authorities up to three months after receiving training, and higher numbers of hypothetical cases presented to them as case vignettes immediately after receiving training. On both counts, this represents a large training effect. However, our findings were based on very few studies including only one professional group (teachers) (Kleemeier 1988Randolph 1994). Like many of the older studies included in this review, these studies predated standards for reporting on trials (e.g. Hoffmann 2014Schulz 2010), and were assessed as having methodological problems that could contribute to over‐ or underestimation of training effects. The certainty of evidence for this outcome was therefore very low.

Effectiveness of training: secondary outcomes

We were able to assess training effectiveness for all four secondary outcomes specified in our study protocol (Mathews 2015): (i) knowledge of the reporting duty, processes, and procedures; (ii) knowledge of core concepts in child abuse and neglect such as the nature, extent, and indicators of the different types of abuse and neglect; (iii) skill in distinguishing between cases that should be reported from those that should not; and (iv) attitudes towards the duty to report child abuse and neglect.

Knowledge of the reporting duty, processes, and procedures

Compared with those waitlisted to receive training, trained professionals demonstrated higher levels of knowledge of the reporting duty, processes, and procedures when tested immediately after receiving training. This represented a large training effect based on data from only one study including childcare professionals (Mathews 2017). In this study, childcare centre staff were trained with an e‐learning intervention, iLookOut, and assessed using self‐report measures that were completed online. Although positive, the finding was questionable due to the low certainty of the evidence. We are aware that further studies of this training programme are currently underway (NCT03185728).

Knowledge of core concepts in child abuse and neglect

The ‘core concepts’ knowledge domain was assessed using two approaches, depending on the training intervention focus. The first approach was assessment of core concepts such as the nature, extent, and indicators of all forms of child abuse and neglect; we refer to this as a generalised measure. The second approach was assessment of core concepts relating only to child sexual abuse; we refer to this as a specific measure.

Compared with those who received no training, trained professionals showed higher levels of knowledge of core concepts in child abuse and neglect (generalised measure) when tested immediately after receiving training. This represented a medium training effect. However, this finding was based on a single study conducted with one professional group (teachers), limiting the applicability of the evidence to that professional group (Hazzard 1984). The study had methodological problems that may have contributed to over‐ or underestimation of training effects, making us very uncertain about the result.

Compared with those who received no training or were waitlisted to receive training, trained professionals showed higher levels of knowledge of core concepts in child sexual abuse (specific measure) when tested immediately after receiving training. This represents a large training effect. Our finding was based on three studies (Kleemeier 1988McGrath 1987Randolph 1994), all of which were conducted with teachers, limiting the applicability of the evidence to one professional group. We rated these studies at high risk of bias for multiple issues relating to how the trials were conducted. Overall, the evidence is very uncertain.

Skill in distinguishing between cases that should be reported from those that should not

Compared with those who received no training, trained professionals demonstrated higher levels of skill in distinguishing cases of child abuse and neglect that should be reported from those that should not, when tested immediately postintervention. This represents a large training effect and was based on data from one small study of nurses (Smeekens 2011). In this study, nurses were exposed to an e‐learning intervention and evaluated using an in vivo assessment in which they were scored on their actual performance in simulated cases using standardised criteria. Our analysis showed that the measurement was somewhat imprecise, meaning we are very uncertain about the training’s true effect. Nevertheless, this study is important because it was the only trial to assess participants’ demonstrated cognitive and practical skill in attending to the nature and salience of simulated case features, and their significance when deciding to report or to not report. This study therefore provided important qualitative insights into this secondary outcome, supplementing the quantitative results about numbers of reports in the studies detailed above for our primary outcomes.

Attitudes towards the duty to report child abuse and neglect

Compared with those who were waitlisted to receive training, trained professionals demonstrated more positive attitudes towards the duty to report child abuse and neglect when tested immediately after receiving training. This represents a medium training effect. Our finding was based on a single study, limiting the applicability of the evidence to that specific professional group, that is childcare professionals (Mathews 2017). Our analysis showed that the measurement of this variable was imprecise, leading us to be very uncertain about the results.

Overall completeness and applicability of evidence

We conducted extensive searches for relevant studies, in several iterations and without date or language restrictions. We are reasonably certain that our approach yielded all relevant trials. 

The included studies were conducted in high‐income countries, mainly in the USA. Given the widespread adoption of reporting duties in law and policy for professionals whose work is focused on children in numerous countries throughout the world, including in low‐ and middle‐income countries (Mathews 2008a), the available evidence on the effectiveness of training interventions is limited. The results of our quantitative synthesis were in many instances confined to single professional groups. Whether similar effects would be seen in different countries or for a wider range of professionals therefore remains unknown. In addition, considering the wide range of different professional groups who possess reporting duties, the range of professional groups with whom child protection training interventions have been evaluated is also limited. For example, we found no trials including police, who comprise a particularly important reporter group. Police consistently make a large proportion of all reports of all types of child abuse and neglect, and are an essential front‐line response to child protection; yet, police also face unique challenges especially in appropriate reporting of exposure to domestic and family violence (Cross 2012). Similarly, few studies have involved early childhood care and education professionals, who play an important role given the high vulnerability of young children to serious harm. It is important that all reporter groups, especially those with either higher exposure to children in general or exposure to particularly vulnerable children, receive effective training and that such training is evaluated for efficacy and, where necessary, further customised to the professional group’s context. 

Our searches revealed that a significant number of evaluations of training interventions have been conducted, as evidenced by the high number of full‐text studies reviewed (n = 1481) that were screened and the list of ‘near misses’ (n = 20) (see Excluded studies). This shows substantial investment in training programmes and their evaluation, yet also underscores potential wastage of scarce research resources, because too few studies used empirical methods designed to identify whether specific training interventions with particular characteristics are effective or not. Both interventions and training come at a substantial cost.

We identified 11 trials for inclusion in this review; however, we were able to use data only from nine trials in the quantitative synthesis, mainly because information was missing from study reports, which placed the studies at risk of bias. The age of many of the included studies prevented contact with some study authors. None of the trials had published a study protocol, and only two of the more recent trials had been registered (Mathews 2017Smeekens 2011). Several factors limited the overall completeness and applicability of evidence. A paucity of studies appropriately assessed and transparently reported baseline equivalence of intervention and comparison groups on relevant demographic (e.g. age, gender, ethnicity, qualifications, years of experience) and experiential variables likely to influence the effects of training (e.g. prior training, prior experiences reporting to child protection authorities). Few studies conducted direct comparisons and reported complete data across study time points. Very few studies assessed long‐term training effects beyond the time immediately postintervention. There was generally poor reporting on participant attrition. There were many instances of missing data from analyses. None of the included studies accounted for clustering of professionals in groups in the analysis of study data. Research conduct and reporting would be improved in future by commitment by study authors to the use of established guidelines such as CONSORT, Schulz 2010, and the Template for Intervention Description and Replication (TIDieR) (Hoffmann 2014), providing guidance to study authors on the minimum information that should be reported for trials and interventions. 

For studies in which data were sufficiently reported or were available, we were unable to use some of these data owing to heterogeneity in outcomes and outcome measurement. For example, diverse measures were used to assess different types of knowledge. There was heavy reliance on the use of tailor‐made measures. A group of studies relied upon knowledge measures from Kleemeier 1988, which, in turn, was modelled after Hazzard 1984 (with full scales reported in Jacobsen 1993), thus perpetuating limitations in the original measure. Research would be improved in future with the use of standardised measures of training knowledge outcomes regarding core constructs ‐ even where the specific detail of that construct may change over time, hence requiring customisation in core detail ‐ rather than novel measures. In the Implications for research section, we discuss further problems arising from adoption of knowledge measures that are dated, or that are jurisdictionally specific.

We identified several ineligible secondary outcomes. In some studies, constructs such as attitudes were inaccurately or loosely conceptualised and labelled, meaning that their measurement lacked validity (e.g. Dubowitz 1991), although we acknowledge that many of the research measures in our review predate advances in research on attitudes (e.g. Ajzen 2005Albarracin 2005).

Self‐efficacy is an outcome we had not identified in our study protocol (Mathews 2017). This will be an important secondary outcome to consider in future review updates in the light of advances in the use of self‐efficacy theory, Bandura 1993, in observational studies of professionals’ child maltreatment reporting behaviour (e.g. Ayling 2020Colgrave 2021Lee 2012). Among the included studies, Smeekens 2011 (p 332) assessed self‐efficacy for detection (but not reporting) of child abuse, and Dubowitz 1991 (p 306) used an "attitudinal measure", which we reclassified as an ineligible self‐efficacy measure because sample items assessed confidence in managing child abuse cases measured on a Likert‐type competence scale.

No studies considered prespecified potential adverse events. In our study protocol we defined adverse events as: (i) increase in failure to report cases of child abuse and neglect that warrant a report as measured subjectively by participant self‐reports (i.e. in questionnaires); and (ii) increase in reporting of cases that do not warrant a report as measured subjectively by participant self‐reports (i.e. in questionnaires). In retrospect, our description of adverse events may have been too narrow. Given advances in trial safety, codes of ethics for research conduct, deeper awareness of the need for trauma‐informed approaches to professional development, and requirements for researchers to report unexpected adverse events to their institutions, it would be advantageous to include in future review updates a category of adverse events that captures traumatic responses by trial participants themselves. Similarly, adverse events in future reviews may also consider emotional distress for study participants, especially since a key feature of the interventions is presentation of hypothetical cases. 

No study reported on the financial costs associated with training intervention delivery, or evaluation. This would be helpful information for studies to report. Future programme design and evaluation may benefit by being informed of such costs, as well as comparisons of cost between online and face‐to‐face delivery. This is discussed further in Implications for practice. No study reported on training interventions for improving mandatory reporting specifically in culturally diverse contexts.

The completeness and applicability of evidence is limited by the concentration of studies from high‐income Western countries. The completeness and applicability of evidence is also limited by our inability to conduct subgroup analyses or sensitivity analysis because there were too few studies. For the same reason, we were unable to develop a training intervention typology. We address the implications in Implications for practice and Implications for research.

Quality of the evidence

We assessed the certainty of evidence in the review as low to very low. We included only RCTs and quasi‐RCTs in the estimation of intervention effects. We downgraded the certainty of evidence due to: limitations in the design and implementation of available studies, suggesting a high likelihood of bias (all outcomes); indirectness of outcome measurement, owing to the limited number of available studies, which restricts the generalisability of results (all outcomes); imprecision of results due to small sample sizes (all outcomes); and/or unexplained heterogeneity or inconsistency of results (one outcome). We were unable to assess publication bias because fewer than 10 trials were included in our meta‐analyses (Boutron 2022), hence we were unable to use publication bias as a criterion for rating the certainty of evidence. 

Overall, the included studies were at risk of bias. The most common problems were blinding of participants and personnel (performance bias; 11 studies) and blinding of outcome assessment (detection bias; 10 studies). Blinding is seldom possible in studies of training interventions, as group membership is obvious to participants, trainers, and likely also to colleagues in participants’ workplaces, even if these individuals are not the training targets. Allocation concealment (selection bias) was unclear for the majority of studies (seven studies). More importantly for the underpinning science, reporting bias was evident in selective reporting, lack of completeness in reporting of outcome data, including for group comparability as noted above (Figure 2Figure 3Risk of bias in included studies).

In summary, because the GRADE certainty ratings were low or very low for all outcomes, we are uncertain about the effectiveness of training interventions compared with no training, waitlist control, or alternative training (not related to child protection). This means that the true effects for these outcomes may be substantially different from the estimated effects.

Potential biases in the review process

We followed the procedures in the Cochrane Handbook for Systematic Reviews of Interventions,Higgins 2022a, by first developing a study protocol, Mathews 2015, and following this protocol in our conduct of the review. We used two key strategies to reduce the potential for bias in the review process. Firstly, our searches were comprehensive and included CENTRAL, MEDLINE, Embase, 18 other databases, a trials register, and handsearching of key journals and websites. We have confidence in our detailed search strategy after having corrected errors in the search strategy reported in the protocol, and then ensuring that all searches closely replicated the MEDLINE search across all search locations. We systematically screened and assessed all records captured by the original search and the corrected search, meaning that our search was more comprehensive than intended. We also made two separate appeals for relevant studies via email to the Child‐Maltreatment‐Research‐Listserv, a moderated electronic mailing list that distributes email messages to over 1500 subscribers (Walsh 2018 [pers comm]). Despite these efforts, it was not possible for us to capture reports on trials of training interventions that were not made public, or that were covered by commercial‐in‐confidence agreements. In addition, due to the small number of included studies, we were unable to formally assess publication bias, which is the tendency for positive/statistically significant trial findings to be published more often than negative/non‐significant findings. 

Secondly, multiple authors were involved in the selection of studies, and all were trained in using a decision guide closely based on the review inclusion and exclusion criteria. Data extraction was conducted by multiple authors, in some instances twice, during the lengthy process of conducting our review. Some issues arose during the review that we had not anticipated in our study protocol. These are detailed below in Differences between protocol and review. Furthermore, as noted below in the Declarations of interest, review authors who were also study authors, or colleagues or associates of study authors, were not involved in extracting data from or assessing risk of bias for any of the studies for which conflicts were present. This was designed to reduce the possibility of conflicts of interest. Instead, these tasks were undertaken by two independent review authors. 

One shortcoming of our review is that we did not specifically allow for head‐to‐head comparisons of training interventions in our study protocol (Mathews 2017), which meant that we excluded two well‐designed trials of a widely used training intervention known as Stewards of Children (Rheingold 2012Rheingold 2015), which otherwise met our inclusion criteria. Head‐to‐head trials assess different research questions, that is determining whether one type of training is more effective than another. Rheingold and colleagues' trial compared in‐person training, web‐based training, and no training (waitlisted to receive training). The training intervention studied in the trial has been used in 76 countries to train over 1.7 million adults (Darkness to Light 2021, p 7). In future review updates, consideration should be given to addressing this limitation.

Agreements and disagreements with other studies or reviews

To our knowledge, this is the first systematic review of child protection training for professionals to improve mandatory reporting of child abuse and neglect. Our review findings are generally consistent with results reported in individual studies included in the review, which tended to favour training over no training, waitlist for training, or alternative training (not related to child protection). However, our review highlights the uncertainty of the evidence, which was based on methodological problems with the evaluation of training interventions that have characterised the field for several decades. Notwithstanding, there are three related reviews that warrant mentioning that, in combination with this review, can assist in charting a way forward. 

Carter 2006 narratively synthesised 10 years of published evidence on the effectiveness of procedural and training interventions for improving health professionals’ identification and management of child abuse and neglect. Procedural interventions included structured forms, checklists, and flowcharts. Training interventions included those focused on raising awareness of child safeguarding. The 23 studies included in the review were inclusive of a broad range of study designs. Congruent with our findings, critical appraisal in the review found a lack of rigorous evaluation, including confounding interventions, under‐utilisation of control groups, selection bias, and lack of follow‐up assessment of training outcomes beyond the immediate postintervention period. The confounding of concurrently administered procedural and training interventions is an important finding of this previous review, and a problem we sought to avoid in our review by defining our selection criteria for types of training interventions (see Types of interventions).

Louwers 2010 synthesised the published literature to February 2008 to identify effective interventions to increase detection (rather than reporting) rates of child abuse in hospital emergency departments. Four studies were identified, all of which investigated the effects of screening tools such as structured forms, checklists, and flowcharts. The review found increases in detection rates of suspected or confirmed cases and improvements in the quality of supporting documentation. There was no assessment of methodological quality of the included studies. In our review, we identified several trials of interventions focused only on improving detection rather than reporting of child abuse and neglect. Although this addresses the issue of potential confounding of detection and reporting interventions if these are offered concurrently (Carter 2006), it is also well‐established that the detection of child abuse or neglect (or both) is a necessary but insufficient basis for reporting, because many professionals who detect, also fail to report. 

Baker 2021 conducted a content analysis of US, state‐sponsored, online, mandated reporter training. Although not a systematic review, this study applied systematic, transparent, and replicable searches to identify 44 training curriculums and coded these against 10 evidence‐based thematic domains: “legal requirements and protections; the role of the mandated reporter; reasons why reporters should make a report; identifying maltreatment; dealing with disclosures by children; barriers to reporting; the mechanics of reporting; the impact on the reporter; how to help families; and format of training" (p 5). These coding domains may be useful for developing a programme typology in future updates of our review.

Study flow diagram.

Figuras y tablas -
Figure 1

Study flow diagram.

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Figuras y tablas -
Figure 2

Risk of bias summary: review authors' judgements about each risk of bias item for each included study.

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Figuras y tablas -
Figure 3

Risk of bias graph: review authors' judgements about each risk of bias item presented as percentages across all included studies.

Forest plot of the comparison: training versus no training or waitlist control: number of reported cases of child abuse and neglect (assessed via vignette responses).

Figuras y tablas -
Figure 4

Forest plot of the comparison: training versus no training or waitlist control: number of reported cases of child abuse and neglect (assessed via vignette responses).

Forest plot of the comparison: training versus no training: knowledge of core concepts in child abuse and neglect (all forms of child abuse and maltreatment generally).

Figuras y tablas -
Figure 5

Forest plot of the comparison: training versus no training: knowledge of core concepts in child abuse and neglect (all forms of child abuse and maltreatment generally).

Forest plot of the comparison: training versus no training or waitlist: knowledge of core concepts in child abuse and neglect (child sexual abuse specifically).

Figuras y tablas -
Figure 6

Forest plot of the comparison: training versus no training or waitlist: knowledge of core concepts in child abuse and neglect (child sexual abuse specifically).

Comparison 1: Number of reported cases of child abuse and neglect, Outcome 1: Number of reported cases of child abuse and neglect (vignettes)

Figuras y tablas -
Analysis 1.1

Comparison 1: Number of reported cases of child abuse and neglect, Outcome 1: Number of reported cases of child abuse and neglect (vignettes)

Comparison 1: Number of reported cases of child abuse and neglect, Outcome 2: Number of reported cases of child abuse and neglect (vignettes), adjusted for clustering

Figuras y tablas -
Analysis 1.2

Comparison 1: Number of reported cases of child abuse and neglect, Outcome 2: Number of reported cases of child abuse and neglect (vignettes), adjusted for clustering

Comparison 2: Knowledge of the reporting duty, processes, and procedures, Outcome 1: Knowledge of reporting duty, processes, and procedures

Figuras y tablas -
Analysis 2.1

Comparison 2: Knowledge of the reporting duty, processes, and procedures, Outcome 1: Knowledge of reporting duty, processes, and procedures

Comparison 3: Knowledge of core concepts in child abuse and neglect, Outcome 1: Child abuse/maltreatment (general)

Figuras y tablas -
Analysis 3.1

Comparison 3: Knowledge of core concepts in child abuse and neglect, Outcome 1: Child abuse/maltreatment (general)

Comparison 3: Knowledge of core concepts in child abuse and neglect, Outcome 2: Child abuse/maltreatment (general), adjusted for clustering

Figuras y tablas -
Analysis 3.2

Comparison 3: Knowledge of core concepts in child abuse and neglect, Outcome 2: Child abuse/maltreatment (general), adjusted for clustering

Comparison 3: Knowledge of core concepts in child abuse and neglect, Outcome 3: Child sexual abuse (specific)

Figuras y tablas -
Analysis 3.3

Comparison 3: Knowledge of core concepts in child abuse and neglect, Outcome 3: Child sexual abuse (specific)

Comparison 3: Knowledge of core concepts in child abuse and neglect, Outcome 4: Child sexual abuse (specific), adjusted for clustering

Figuras y tablas -
Analysis 3.4

Comparison 3: Knowledge of core concepts in child abuse and neglect, Outcome 4: Child sexual abuse (specific), adjusted for clustering

Comparison 3: Knowledge of core concepts in child abuse and neglect, Outcome 5: Child sexual abuse (specific, non‐randomised CBA)

Figuras y tablas -
Analysis 3.5

Comparison 3: Knowledge of core concepts in child abuse and neglect, Outcome 5: Child sexual abuse (specific, non‐randomised CBA)

Comparison 4: Skills in distinguishing cases, Outcome 1: Skills in distinguishing cases

Figuras y tablas -
Analysis 4.1

Comparison 4: Skills in distinguishing cases, Outcome 1: Skills in distinguishing cases

Comparison 5: Attitudes towards the duty to report child abuse and neglect, Outcome 1: Attitudes towards the duty to report child abuse and neglect

Figuras y tablas -
Analysis 5.1

Comparison 5: Attitudes towards the duty to report child abuse and neglect, Outcome 1: Attitudes towards the duty to report child abuse and neglect

Summary of findings 1. Child protection training for professionals to improve reporting of child abuse and neglect compared with no training, waitlist control, or alternative training not related to child abuse and neglect (primary outcomes)

Setting: professionals' workplaces or online e‐learning, mainly in the USA

Patient of population: postqualified professionals, including elementary and high school teachers, childcare professionals, medical practitioners, nurses, and mental health professionals

Intervention: face‐to‐face or online training, with a range of teaching strategies (e.g. didactic presentations, role‐plays, video, experiential exercises), ranging from 2 hours to 6 x 90‐minute sessions over a 1‐month period

Comparator: no training, waitlist for training, alternative training (not related to child abuse and neglect)

Outcomes

Anticipated absolute effects* (95% CI)

Relative effect (95% CI)

No. of participants (studies)

Certainty of the evidence

Comments

Risk with control conditions

Risk with training interventions

Number of reported cases of child abuse and neglect (professionals' self‐report, actual cases)

 

Time of outcome assessment: short term (3 months postintervention)

The mean number of cases reported in the training group was, on average, 0.81 standard deviations higher (0.18 higher to 1.43 higher).

42

(1 RCT)

⨁◯◯◯
Very Lowa,b,c

SMD of 0.81 represents a large effect size (Cohen 1988).

 

Outcome measured by professionals' self‐report of cases they had reported to child protection authorities.

Number of reported cases of child abuse and neglect (professionals' self‐report, hypothetical vignette cases)

 

Time of outcome assessment: short term (postintervention)

The mean number of cases reported in the training group was, on average, 1.81 standard deviations higher (1.30 higher to 2.32 higher).

87

(2 RCTs)

⨁◯◯◯
Very Lowa,b,c

SMD of 1.81 represents a large effect size (Cohen 1988).

 

Outcome measured by professionals’ responses to hypothetical case vignettes.

Number of reported cases of child abuse and neglect (official records of reports made to child protection authorities)

Unknown

0

(0 studies)

No studies were identified that measured numbers of official reports made to child protection authorities.

Quality of reported cases of child abuse and neglect (official records of reports made to child protection authorities)

Unknown

0

(0 studies)

No studies were identified that measured the quality of official reports made to child protection authorities.

Adverse events 

Unknown

0

(0 studies)

No studies were identified that measured adverse effects.

*The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

CI: confidence interval; RCT: randomised controlled trial; SMD: standardised mean difference

GRADE Working Group grades of evidence

High certainty: we are very confident that the true effect lies close to that of the estimate of the effect.
Moderate certainty: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low certainty: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.
Very low certainty: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

aDowngraded by one level due to high risk of bias for multiple risk of bias domains.
bDowngraded by one level due to imprecision (CI includes small‐sized effect or small sample size, or both).
cDowngraded by one level due to indirectness (single or limited number of studies, thereby restricting the evidence in terms of intervention, population, and comparators). 

Figuras y tablas -
Summary of findings 1. Child protection training for professionals to improve reporting of child abuse and neglect compared with no training, waitlist control, or alternative training not related to child abuse and neglect (primary outcomes)
Summary of findings 2. Child protection training for professionals to improve reporting of child abuse and neglect compared with no training, waitlist control, or alternative training not related to child abuse and neglect (secondary outcomes)

Setting: professionals' workplaces or online e‐learning, mainly in the USA

Patient of population: postqualified professionals, including elementary and high school teachers, childcare professionals, medical practitioners, nurses, and mental health professionals

Intervention: face‐to‐face or online training, with a range of teaching strategies (e.g. didactic presentations, role‐plays, video, experiential exercises), ranging from 2 hours to 6 x 90‐minute sessions over a 1‐month period

Comparator: no training, waitlist for training, alternative training (not related to child abuse and neglect)

Outcomes

Anticipated absolute effects* (95% CI)

Relative effect (95% CI)

No. of participants (studies)

Certainty of the evidence

Comments

Risk with control conditions

Risk with training interventions

Knowledge of reporting duty, processes, and procedures

 

Measured by: professionals' self‐reported knowledge of jurisdictional or institutional reporting duties, or both

Time of outcome assessment: short term (postintervention)

The mean knowledge score in the training group was, on average, 1.06 standard deviations higher (0.90 higher to 1.21 higher).

744

(1 RCT)

⨁⨁◯◯
Lowa,b

SMD of 1.06 represents a large effect size (Cohen 1988).

Knowledge of core concepts in child abuse and neglect (all forms)

 

Measured by: professionals' self‐reported knowledge of all forms of child abuse and neglect (general measure)

Time of outcome assessment: short term (postintervention)

The mean knowledge score in the training group was, on average, 0.68 standard deviations higher (0.35 higher to 1.01 higher).

154

(2 RCTs)

⨁◯◯◯
Very lowa,b,c

SMD of 0.68 represents a medium effect size (Cohen 1988).

Knowledge of core concepts in child abuse and neglect (child sexual abuse only)

 

Measured by: professionals' self‐reported knowledge of child sexual abuse (specific measure)

Time of outcome assessment: short term (postintervention)

The mean knowledge score in the training group was, on average, 1.44standard deviations higher (0.43 higher to 2.45 higher).

238

(3 RCTs)

⨁◯◯◯
Very lowa,b,c,d

SMD of 1.44 represents a large effect size (Cohen 1988).

Skill in distinguishing between reportable and non‐reportable child abuse and neglect cases

 

Measured by: professionals’ performance on simulated cases scored by trained and blinded expert panel

Time of outcome assessment: short term (postintervention)

The mean skill score in the training group was, on average, 0.94standard deviations higher (0.11 higher to 1.77 higher).

25

(1 RCT)

⨁◯◯◯
Very Lowa,b,c

SMD of 0.94 represents a large effect size (Cohen 1988).

Attitudes toward the duty to report child abuse and neglect

 

Measured by: professionals’ self‐reported attitudes towards the duty to report child abuse and neglect

Time of outcome assessment: short term (postintervention)

The mean attitude score in the training group were, on average, 0.61 standard deviations higher (0.47 higher to 0.76 higher).

741

(1 RCT)

⨁◯◯◯
Very Lowa,b,c

SMD of 0.61 represents a medium effect size (Cohen 1988).

*The risk in the intervention group (and its 95% CI) is based on the assumed risk in the comparison group and the relative effect of the intervention (and its 95% CI).

CI: confidence interval; RCT: randomised controlled trial; SMD: standardised mean difference

GRADE Working Group grades of evidence

High certainty: we are very confident that the true effect lies close to that of the estimate of the effect.
Moderate certainty: we are moderately confident in the effect estimate: the true effect is likely to be close to the estimate of the effect, but there is a possibility that it is substantially different.
Low certainty: our confidence in the effect estimate is limited: the true effect may be substantially different from the estimate of the effect.
Very low certainty: we have very little confidence in the effect estimate: the true effect is likely to be substantially different from the estimate of effect.

aDowngraded by one level due to high risk of bias for multiple risk of bias domains.
bDowngraded by one level due to indirectness (one or both of the following reasons: (1) single or limited number of studies, thereby restricting the evidence in terms of intervention, population, and comparators; (2) outcome not a direct measure of reporting behaviour by professionals). 
cDowngraded by one level due to imprecision (one or both of the following reasons: (1) CI includes small‐sized effect; (2) small sample size)
dAlthough studies can only be downgraded by three levels, it is important to note that there was significant heterogeneity of the effect for this outcome (i.e. inconsistency), which also impacts the certainty of the evidence. 

Figuras y tablas -
Summary of findings 2. Child protection training for professionals to improve reporting of child abuse and neglect compared with no training, waitlist control, or alternative training not related to child abuse and neglect (secondary outcomes)
Comparison 1. Number of reported cases of child abuse and neglect

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

1.1 Number of reported cases of child abuse and neglect (vignettes) Show forest plot

2

87

Std. Mean Difference (IV, Fixed, 95% CI)

1.81 [1.30, 2.32]

1.2 Number of reported cases of child abuse and neglect (vignettes), adjusted for clustering Show forest plot

2

80

Std. Mean Difference (IV, Fixed, 95% CI)

1.82 [1.28, 2.35]

Figuras y tablas -
Comparison 1. Number of reported cases of child abuse and neglect
Comparison 2. Knowledge of the reporting duty, processes, and procedures

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

2.1 Knowledge of reporting duty, processes, and procedures Show forest plot

1

Std. Mean Difference (IV, Fixed, 95% CI)

Totals not selected

Figuras y tablas -
Comparison 2. Knowledge of the reporting duty, processes, and procedures
Comparison 3. Knowledge of core concepts in child abuse and neglect

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

3.1 Child abuse/maltreatment (general) Show forest plot

2

154

Std. Mean Difference (IV, Fixed, 95% CI)

0.68 [0.35, 1.01]

3.2 Child abuse/maltreatment (general), adjusted for clustering Show forest plot

2

70

Std. Mean Difference (IV, Fixed, 95% CI)

0.66 [0.17, 1.15]

3.3 Child sexual abuse (specific) Show forest plot

3

238

Std. Mean Difference (IV, Random, 95% CI)

1.44 [0.43, 2.45]

3.4 Child sexual abuse (specific), adjusted for clustering Show forest plot

3

178

Std. Mean Difference (IV, Random, 95% CI)

1.42 [0.44, 2.39]

3.5 Child sexual abuse (specific, non‐randomised CBA) Show forest plot

1

Std. Mean Difference (IV, Random, 95% CI)

Totals not selected

Figuras y tablas -
Comparison 3. Knowledge of core concepts in child abuse and neglect
Comparison 4. Skills in distinguishing cases

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

4.1 Skills in distinguishing cases Show forest plot

1

Std. Mean Difference (IV, Fixed, 95% CI)

Totals not selected

Figuras y tablas -
Comparison 4. Skills in distinguishing cases
Comparison 5. Attitudes towards the duty to report child abuse and neglect

Outcome or subgroup title

No. of studies

No. of participants

Statistical method

Effect size

5.1 Attitudes towards the duty to report child abuse and neglect Show forest plot

1

Std. Mean Difference (IV, Fixed, 95% CI)

Totals not selected

Figuras y tablas -
Comparison 5. Attitudes towards the duty to report child abuse and neglect