Not as easy as A, B, C, 1, 2, 3: Emphasizing transparency in measuring holistic skills
For decades, researchers, schools, and governments alike have been trying to accurately measure children’s reading and mathematical abilities through different tests and activities. Characteristics like creativity or motivation can also be crucial components of a successful education, yet there is relatively little knowledge or consensus on how to effectively measure them. Furthermore, existing measures developed in high-income countries may not always transfer to other contexts, and there are open questions about the validity and reliability of frequently used measures of these holistic, non-traditional skills.
To address this gap, the Learning for All Initiative (LAI) recently published a literature review and guide on how researchers currently conduct randomized evaluations relating to holistic skills: what kinds of measures they use, what kinds of validity and reliability checks they perform, and how they share those learnings. Along with this resource, LAI shared an appendix of the most commonly observed measures in the 237 randomized control trials (RCTs) sampled in the review.
The review has revealed that there are opportunities to push toward a higher standard of citing, explaining, and conducting original validation of the measurement tools used in randomized evaluations while recognizing there are various constraints and incentives that researchers face. We present potential actions that researchers, journals, and organizations can take to help guide future research in holistic skills.
What are holistic skills? How are they measured?
Students learn far more skills than solely reading and arithmetic in school. These characteristics or skills are known by a variety of terms, such as socioemotional skills, soft skills, non-cognitive skills, life skills, and so on. We term them “holistic skills,” which include the social, emotional, cognitive, physical, and creative skills that children develop throughout childhood and adolescence, although there is no single agreed-upon framework for categorizing these skills as of yet. In our review, researchers implemented a variety of measures in their studies including self-reported measures, measures reported by others, and observed/direct assessments, like lab-in-the-field tasks.
As is clear from the various definitions, holistic skills are multidimensional, less tangible, and more difficult to measure than literacy and numeracy skills, and yet it is equally important to measure them validly and reliably. Validity refers to the connection between the real outcome being measured and the one researchers intended to understand. For example, say a researcher wants to measure a child’s ability to recognize emotions in others by giving them written scenarios and asking them questions about the people involved in those scenarios. However, especially if the survey is translated or written above the reading level of the child, the survey could in fact be measuring a child’s reading comprehension, rather than their emotional recognition. Reliability is another metric that is useful in understanding the consistency or dependability of results. Reliable measures of assessing holistic skills are likely to give similar results when these skills are examined multiple times under similar or unchanged conditions. Establishing validity and reliability of measurement methods helps the research community to accurately determine if an intervention can enhance a specific skill. Policymakers also rely on this crucial evidence to decide which programs to scale and invest in.
The motivation for developing and using reliable and valid education measures is clear, but what are the actual practices among the existing RCT literature?
Lack of a public good
According to our review of 237 RCTs in the economics literature, 69 percent of studies with papers do not report evidence of validity or reliability. Among those studies which at least reference a pre-developed tool, only 37 percent cite a validation paper. This demonstrates that while many researchers rely on and cite tools developed by other teams, those reference papers have often not clearly established the validity of the measure. In these cases of relying on precedence, less than 0.5 percent of the studies referenced a measure which was validated in the same geographical context of the particular intervention setting.
These trends indicate that researchers may not be conducting reliability and validity checks on the measures they use, or they may not be reporting such checks in their papers despite conducting them in the background. Some explanations include the high financial costs of piloting measures and the perception of potential rejection from publication if a measure shows low validity. However, when the process of selecting and testing a measure in a new context remains hidden, future researchers encounter the same questions and are doomed to repeat rather than build on the work already done.
Possible approaches to increase validity and reliability testing
Researchers may benefit from additional support in developing new measures or testing the validity/reliability of old measures themselves. Based on the literature review and interviews across researchers of different disciplines, LAI put together a list of six guiding questions for researchers in the design stage of an RCT to spark discussion about the measurement of holistic skills and aid final tool selection for data collection. To address cost and time barriers, researchers can prioritize conducting validity/reliability checks for the measures of the main outcomes as opposed to all holistic skills measured by the study. Policymakers and funders seeking evidence-backed holistic skills programs may invest beyond the evaluations of these programs, and can also invest in building a researcher-led infrastructure for measuring holistic skills, including pilots, tool development, and public databases.
Apart from building in validity and reliability testing into research projects, there is also room for improved reporting on the findings of such tests, in the paper, appendix or analysis plans. Journals could help encourage robust holistic skills measurement reporting by committing to publishing a special issue on such topics. This would ensure authors are not disincentivized to share the strength of their measures and inform the measurement strategies of future researchers. Research organizations, on the other hand, could consider including adherence to the guidance in LAI’s literature review as a criterion for funding, or provide supportive resources such as a measurement database or a validity/reliability reporting template. For instance, at J-PAL, LAI offers grants to researchers for tool development under the pilot funding category.
More research on children’s holistic skills is as essential as ever, as school closures and the strain of the pandemic socially isolated children. Education policymakers facing both a learning and a mental health crisis among their students, can turn to synthesis products like the 2023 GEEAP “Smart Buys” report, to find that “teaching socio-emotional and life skills” is an area with promising but limited evidence on academic learning outcomes. As new research mounts on the causal impact of holistic skills instruction on academic learning outcomes, the research must be based on robust measures. That way, programs are not scaled up or down based on imprecise or invalid measures of impacts. Evidence-based policies are only as strong as the evidence they are based on—and the base has an opportunity to get stronger as it grows.