Measurement of Holistic Skills in RCTs: Review and Guidelines
J-PAL recently launched its Learning for All Initiative, which has a core focus on breadth of skills. There is much to learn about whether and how interventions can affect skills that extend beyond literacy and numeracy, including cognitive skills, social skills, emotional skills, creative skills, and physical skills of children. This document provides guidance on the measurement of holistic skills in randomized controlled trials, starting from a review of current practice in the RCT literature. There are several closely related measurement questions and concerns in the wider literature.
In order to accurately assess whether an intervention is able to improve a certain skill, that evaluation must be able to measure the skill validly and reliably. Without establishing validity, one can claim effects on an outcome, when in reality, this is not the actual outcome that is being affected. There is a long track record of studies by economists considering literacy and numeracy skills as primary outcomes, and even for such skill measures researchers are sometimes still grappling with measurement challenges. There are far fewer RCTs focusing on the wider set of skills and many of them are very recent.
Many of the measurement questions from the literacy and numeracy literature carry over to measurement of a wider set of skills. Holistic skills can be particularly prone to both random and systematic measurement error for additional reasons, for instance because skills like creativity or perseverance are less tangible and more multidimensional. The risk of invalid measurement leading to erroneous conclusions about the significance, effect size, and overall success of an intervention therefore is important to consider early during the design phase of an RCT.
The review was based on a search of the AEA RCT registry, filtering for projects that mentioned “skills” in their registry outcomes. As the language in studies focusing on preschool ages can differ, a search on “preschool” was added to complete the set of studies. All projects that included holistic skills (as defined above) as a primary or secondary outcome variable, that related to education (although interventions are not necessarily themselves in school settings), that overlapped with our age range (3-18 years old), and that measured child outcomes (as opposed to caregiver or teacher outcomes), were included. This yielded a total of 237 studies, upon which subsequent analysis will be focused.
The paper first provides a set of guiding questions for researchers to consider at the RCT design stage regarding skill measures, and their validity and reliability. The next section provides background and examples on the types of measures observed in the literature, distinguishing between self-reported measures, those reported by others, and measures based on observation/direct assessments (including games and lab-in-the-field measures). We then zoom in on the validation approaches used for those measurement tools, pointing to examples of good practices. We conclude with broader research and policy considerations.
Appendices
The guide also has three appendices. Appendix 1 (within the main document) includes more detailed findings including tables, descriptive statistics, and extra analysis. The Appendix is broken up into seven different categories:
- Broad characterization of studies covered by the review
- Skills measured
- Measurement tools
- Validation techniques employed
- Referencing validation and precedent papers
- Results and impact directions
Appendix 2 contains a list of tools identified in two or more Randomized Controlled Trials (RCTs) within the 2023 literature review by the Jameel Poverty Action Lab's Learning for All Initiative. The literature review is based on a search of the AEA RCT registry, filtering for projects that mentioned “skills” or "preschool" in their registry outcomes. This list includes projects that included holistic skills (defined in the review p2) as a primary or secondary outcome variable, that related to education (although not limited to school settings), that overlapped with the age range (3-18 years old), and that measured child outcomes. Inclusion of a tool in this list simply indicates its recurring usage in RCTs and does not, in any way, constitute an endorsement or validation of its efficacy, accuracy, or appropriateness. Readers and users of this appendix should exercise discretion and critically evaluate each tool's applicability for their specific needs. This appendix should not be viewed as a guideline or recommendation on which tools to employ.
Appendix 3 is a list of all the papers observed in the 2023 literature review.