This study examined the psychometric properties of the SDQ as a measure of emotional and behavioural problems in a large community sample of 1-to-2-year-olds. Overall, the SDQ shows promise as a cost-effective and brief screening tool for early mental health in young children, particularly for children’s externalising problems.
The original five-factor SDQ model [39] (emotional symptoms, peer problems, conduct problems, hyperactivity, prosocial behaviour) was a poor fit to the data. However, the addition of a positive construal method factor significantly improved model fit yielding a moderately good fitting model. Whilst inconsistent with previous studies that have found an adequate fit of the original model with pre-schoolers [28, 30, 31], the findings are in line with studies with 2-year-olds and older children (aged 10–19 years) that found better model fit when allowing the positively worded reverse-scored items to cross-load onto a positive construal factor [29, 33, 49, 50]. The findings suggest that while the items do reflect difficulties, much of their associated variance can be attributed to how respondents scored positively worded items.
Model fit was significantly improved by correlating error terms for three pairs of items of similar context, including items 2 and 10 (restless, fidgety), 1 and 9 (considerate, caring), and 1 and 20 (considerate, volunteers). Previous studies have also found that allowing items 2 and 10 (restless, fidgety) to correlate improves model fit [50,51,52,53]. Given the emerging developmental abilities of the sample, the behaviours in these pairs may be difficult for caregivers to distinguish. Adjusting these items for younger children might yield data more sensitive to their developmental stage.
Of the winning model, most factor loadings were ≥ 0.50, indicating that most items effectively represent their corresponding latent factors. However, factor loadings of items 4 (shares) in the prosocial subscale, 11 (good friend) in the peer problems subscale, and 21 (reflective) in the hyperactivity subscale were lower (0.48, 0.43, and 0.37, respectively), meaning they may not effectively capture the attributes they purport to. This corresponds with D’Souza and colleagues’ [29] research involving 2-year-olds and could be due to participants’ young age; 1-2-year-olds are unlikely to have strong peer relationships or higher-order cognitive skills like reflective thinking. This point could also apply to other behaviours, such as lies/cheats (item 18), steals (item 22), and bullied (item 19), which may not be deemed appropriate for such young children. It might be pertinent to modify the items so they better reflect developmentally appropriate abilities/difficulties of younger children [29].
Expectedly, given the downward extension, internal consistency of SDQ subscales was generally weaker than observed in older children (ranging from 2-to-7-year-olds) [28,29,30,31]. Still, the overall results align with patterns found in some studies of preschoolers (3-to-6-year-olds) [32, 54]. The total difficulties score met the often-used α ≥ 0.70 criteria for satisfactory internal reliability. Overall, scores were higher for externalising over internalising problems in both age groups, consistent with prior accounts of the SDQ’s higher sensitivity to externalising symptoms in younger children [32, 35], possibly due to internalising symptoms being less ‘observable’ in young children than externalising difficulties. Internal consistency was moderate for hyperactivity, prosocial behaviour, and externalising subscales (ranging from 0.66 to 0.77). The emotional, peer, conduct, and internalising problems subscales had lower or unsatisfactory reliability (ranging from 0.42 to 0.61), except for conduct problems in 2-year-olds, which had stronger reliability (0.66). The peer problems subscale had the lowest internal reliability (0.42). Gustafsson and colleagues [55] suggest the prosocial subscale may be less suitable for younger children (aged 1–3 years) given their emerging social skills and peer affiliations.
Moderate to strong correlations indicated good test–retest reliability (rs 0.45 to 0.67), but the total difficulties score showed a slightly weaker association compared with prior research with preschool children [34, 55]. Moderate positive correlations were found between equivalent SDQ and CBCL problem subscales in 1-2-year-olds, including internalising (r = 0.39), externalising (r = 0.52), and total problems (r = 0.46), but were weaker than those observed in older children [56]. Overall, stronger associations were found for 2-year-olds than 1-year-olds, particularly for the internalising subscales (1-year-olds r = 0.29 versus 2-year-olds r = 0.54). Correlations between externalising subscales were also stronger for 2-year-olds (r = 0.58) than 1-year-olds (r = 0.44), but still moderately strong for both. As internalising and externalising symptoms can overlap more in younger children, those with internalising symptoms may also present with some externalising symptoms [5]. Furthermore, externalising behaviours are likely easier to identify in older children, and thus, parents may be more consistent in their understanding and scoring of similar behaviours across measures. While the relations between broader SDQ subscales (internalising, externalising, total) and corresponding CBCL subscales were encouraging, associations between individual subscales that were summed to provide the broader SDQ subscale scores (emotional symptoms, peer problems, conduct problems, hyperactivity) and the CBCL were not as consistent or strong. For example, associations between SDQ hyperactivity and CBCL externalising subscales were 0.17 (1-year-olds), 0.37 (2-year-olds), and 0.25 (all children). This is likely explained by there being fewer items in each individual subscale, meaning they are ‘noisier’ measures of behaviour. That is, each item can introduce more noise into the subscale score, meaning items that might be developmentally inappropriate (e.g. reflective within the hyperactivity subscale) will have a greater influence on the subscale score. The broader subscales, in contrast, capture behaviour across more items, therefore reducing measurement noise. Patterns of association may also be weaker in younger children as their behaviours are less well developed, and therefore less distinguishable. Their language is also less sophisticated, meaning their views are likely less clearly expressed making it more difficult for parents to rate them. Overall, higher correlations among similar symptoms and divergent patterns amongst less related symptoms are encouraging, providing confidence the SDQ is successfully measuring underlying constructs.
LimitationsAlthough children were recruited from the community through routine services, caregivers had a slightly higher graduate-level qualification 52.9%, compared with ~ 40% of people aged 25–34 years in England (based on the 2011 Census). Thus, some caution should be applied when considering sample generalisability. While concurrent validity was assessed via comparisons to the CBCL, data from direct observation or interviews would provide a more robust standard for comparison. However, these types of assessment would be costly, complex to administer, and limited in this age group, whereas the CBCL is a widely used behavioural measure validated for use with 1-2-year-olds [42]. While the positive construal factor was included in the structural equation model to account for a potential method effect of positively worded items, an alternative explanation that the items are an extension of the prosocial behaviour construct cannot be ruled out [29]. There are limitations to cut-off values used to determine goodness-of-fit in SEM. Conventional benchmarks developed for maximum likelihood (ML) estimation with continuous data are debated for their applicability with DWLS estimation (indices are typically better when using DWLS) [57]. Therefore, caution must be exercised and fit indices should be interpreted within the context of children’s age and prior research (i.e. expected result patterns). Previous SDQ psychometric validation studies using the DWLS estimator have used conventional cut-offs [29, 50]. Finally, there are other statistical approaches for psychometric evaluation that could be employed. CFA is an established and robust method widely used to examine the psychometric properties of the SDQ in young children [28,29,30,31,32,33]. Item response theory (IRT) is another, less commonly used approach [58,59,60], which offers additional insights into item performance and scale precision across different levels of a latent trait. IRT models describe the probability of a response to a categorical indicator variable in relation to the level of the latent variable [61], indicating item difficulty and respondents’ ability or level of construct endorsement. Whilst beyond the focus and scope of the current paper, future studies using IRT could inform adjustments of SDQ items so that it is more suitable and accurate for use with younger children.
ImplicationsThe current study provides some evidence for the construct validity (internal structure) of the SDQ as a brief and cost-effective measure of behavioural problems in 1-to-2-year-old children, with psychometric properties comparable to those of slightly older children. Overall, reliability, validity, and consistency measures are better for externalising over internalising symptoms, suggesting the SDQ may be particularly useful for identifying externalising behavioural problems in younger children [35]. This is consistent with Hattangadi and colleagues [34] who found that the externalising, but not internalising, subscale, had sufficient reliability and accuracy to screen 2-4-year-olds (specifically those at risk of ADHD and disruptive behaviour disorders). Focusing on externalising symptoms may be more practically instructive, as they have an earlier onset and show greater stability than internalising problems [5, 62]. Low reliability across some subscales was expected given the young age of participants. For widespread use in measuring children’s behavioural and emotional development, the SDQ might require adjustments to some items for developmental appropriateness. Nevertheless, the findings suggest that the current construction of the SDQ, especially its externalising subscale, might suffice as a useful tool for early screening purposes. However, caution should be exercised if using the tool in its current form in isolation for clinical purposes. Further research is needed to determine the measure’s predictive validity and to assess how it might contribute to screening and early identification pathways. Recommendations for such pathways also include holistic assessments of caregiver-child relationships and consideration of risk and protective factors to determine the potential benefit of monitoring, onward assessment, or early preventative intervention [16]. Ultimately these pathways should be coproduced with families and service providers.
Comments (0)