Checklist for launching a randomized evaluation in the United States

Authors

Contributors

John Tebes

Alex Marlowe

Summary

J-PAL North America’s Evaluation Toolkit is intended for researchers, research managers, research assistants, and students trained in economic theory and research design who are preparing to launch a prospective randomized evaluation. Research organizations have published a number of useful, practical guides for researchers and their teams on this topic. The Toolkit is intended to synthesize these resources and add perspectives on implementing randomized evaluations from our network. While most of the content in this toolkit is relevant across countries and contexts, certain elements are most applicable to research conducted in the United States or research that utilizes administrative data.

Introduction

The Evaluation Toolkit includes resources ranging in focus from how to assess feasibility and interest with an implementing partner to resources for structuring file and code in ongoing evaluations. As such, the structures vary, and include a combination of process guides, technical information, checklists, and sample code. Our focus is on the operational aspects of implementing an evaluation rather than on academic considerations of hypothesis testing or analysis, and is complementary to J-PAL’s 102x online course on Designing and Running Randomized Evaluations.¹ For a primer on more academic and theoretical considerations associated with implementing a randomized evaluation, please see resources such as J-PAL’s online MicroMaster’s course,² J-PAL’s course in Evaluating Social Programs,³ and Donald Green and Alan Gerber’s Field Experiments: Design, Analysis, and Interpretation.⁴

The following is an overview of the content in the evaluation toolkit, including a checklist for researchers and their teams preparing to launch a randomized evaluation. Resources are grouped by topic but are not intended to suggest a chronological order to the “steps” outlined – research teams may find that many tasks can be taken on at the outset of a study and that progress on each of these steps can inform relative prioritization throughout the life cycle of the study.

Partnerships and Assessing Feasibility

What do we need to consider when exploring a new randomized evaluation, and how can we ensure the elements are in place for a strong research partnership? How can researchers determine whether a randomized evaluation is right for a particular program, and what questions should we ask of implementing partners to determine the likelihood of a successful randomized evaluation?

Assess the intervention and the circumstances. Once a partner has expressed interest in collaborating with researchers on a randomized evaluation, focus on establishing a strong working relationship and assessing whether a randomized evaluation is feasible for their context.⁵ This should involve a thorough exploration of the intervention, organizational context related to the evaluation, goals of conducting a randomized evaluation, and an initial assessment of the feasibility of random assignment in the program context. Further, researchers should consider potential consequences of conducting the study beyond the implementing organization, including for participants and spillovers into the community.
Formalize the research partnership. Set expectations for the outcome and process of conducting the study, including a scope and timeline at the outset of the project, along with roles and responsibilities for implementing partners and the research team. Set norms between the research team and implementing partner related to ongoing communications and expectations around publishing study results. Explore whether a legal agreement will be necessary to proceed and initiate the process within participating institutions.⁶
Conduct power calculations.⁷ Consider inputs necessary to conduct power calculations, including identifying a reasonable minimum detectable effect size (MDE), sample size, key outcomes of interest, take-up and compliance rates, approach to random assignment, variance, covariate controls, and intra-cluster correlation. Prepare to iterate these initial calculations as more information about the intervention becomes apparent and as plans for the research design develop.
Consider deploying a member of the research team to the field. After determining that a randomized evaluation is feasible in the program context, researchers might consider sending a member of the research team into the field for an extended observation period. This can help researchers understand the intervention in more detail and support the creation of a study design best suited to the context of the intervention.⁸
Make sure the intervention has been piloted.⁹ If an implementing partner is considering evaluating a new program, ask about plans for piloting. Pilots can help determine whether the program as designed will be feasible to implement. Intervention pilots are an opportunity to test and iterate key components of an intervention, such as adherence to program protocols, recruitment procedures, or approaches to staff training. A pilot can help researchers and partners determine whether the program is being delivered as intended and assess potential provider capacity to adhere to the study design.

Implementation Strategy

Once we have explored the potential intervention and determined that it makes sense to continue with the design process, how can we ensure that our study design is feasible and makes sense in the context of the intervention? How can we set processes in place to make sure the evaluation is running as planned?

Design and iterate implementation strategy. Consistent, well-organized engagement with program staff will help ensure a feasible research implementation strategy.¹⁰ Work with program staff to create an approach to identifying stakeholders in the community, getting buy-in and feedback on study design decisions, and iterating study design decisions with implementing partners.¹¹
Define the intake and consent process. Far from a simple administrative step, study intake and consent processes have implications for ethics, a study’s statistical validity and power, and acceptability with partners. Researchers must determine how to incorporate consent into an existing program, whether to seek consent before or after random assignment, and how to train staff involved in recruitment.
Obtain ethical approval from an Institutional Review Board (IRB) or Privacy Board. Research involving human subjects, or individual-level data about living humans, is likely¹² to be subject to review by an Institutional Review Board (IRB), even if no direct interaction between subjects and researchers is involved. Their review may determine that a research project is 1) not human subjects research and thus not subject to any further review, 2) exempt from ongoing review, 3) eligible for expedited review by an IRB administrator, or 4) subject to review by a full IRB. Researchers should check the current procedures for their institution’s review board to determine what must be reviewed by the IRB and what discretion researchers are allowed by that IRB.
Register the trial and consider creating a pre-analysis plan. Registration on clinicaltrials.gov is required by the FDA for many medical, clinical, or health related trials. Medical journals may require registration on clinicaltrials.gov as a condition of publication in the journal even if the Federal regulation does not apply. Registration on other sites such as the American Economics Association’s RCT Registry (which is supported by J-PAL) or the Open Science Framework (OSF) may be required by project funders, including J-PAL. Additionally, economics journals may require pre-registration for submissions.
Consider the particular challenges associated with evaluating technology-based interventions. Technology-related evaluations span a range of sectors and offer a means to standardize an intervention across treatment sites. However, there are particular challenges and considerations associated with implementing tech interventions, such as integration with evaluation design, ensuring fidelity to treatment assignment, and ensuring adequate take-up rates.
Operationalize random assignment. Select a method for random assignment and develop protocols for random assignment. Methods include randomizing from an existing list of participants with or without identifiers, randomizing in blocks, and randomizing each individual "on-the-spot." Choice of strategy will depend on the evaluation design, recruitment method, data security considerations, and acceptability to the partner.
Establish a process for monitoring. Monitoring intervention and study implementation activities is crucial for researchers to understand how the program and evaluation design is working in practice and make adjustments as necessary. Establishing a process for monitoring involves selecting key indicators that will support interpretation of results and allow researchers and partners to track progress such as recruitment and take-up rates, adherence to research protocols, and other key implementation details.

Data Collection and Access

Determining what kinds of data will be necessary for a randomized evaluation will be a key consideration for project feasibility and study design. What data do we need to answer our research questions, how will we go about collecting and accessing data, and how will we use data when we have them?

Create processes for code and file management. Randomized evaluations often span several “generations” of research assistants. This turnover demands documentation and data management. Early on, researchers should create clear organizational structures to allow consistent file management of data, documentation, and other files. Throughout the study, researchers should document code versions, randomization, task management, and other key steps.
Determine data collection method(s). Determine which types of data are necessary (e.g., survey data, administrative data, secondary data sources such as satellite data or others’ survey data), and when, and how to collect these data. Using administrative data may lower data collection costs and reduce the risk of certain biases relative to primary data; however, receiving approvals can be time consuming. Researchers may be able to measure a broader type of information through primary data, and have more oversight over the collection process; however, primary data collection can be expensive and subject to low response rates.
Pursue data use agreements.¹³ A Data Use Agreement (DUA) documents the terms under which a data provider shares data with a researcher’s home institution for use by the researcher. This agreement, which typically must be approved by legal counsel at the researcher’s home institution, contains a number of provisions that can significantly impact research protocols. For example, data providers may have specific requirements for informed consent or individual authorization (a HIPAA requirement similar to informed consent).
Create plans for data security. A plan for data security may help to minimize data security threats by outlining plans for data storage and access, data transmission and sharing, and erasing data when appropriate. Be prepared to describe data security procedures in IRB protocols.
Establish data publication permissions and protocol. J-PAL and a number of grant-making institutions, including the National Science Foundation and the National Institutes of Health, have adopted data-sharing policies for research they fund. Many top academic journals require data and replication code as a condition of publication. Developing and storing data sets and code with publication in mind will decrease the burden of preparing data later in the research process.

Accessing and analyzing data may take significant time before results are ready to share or publish. How can we plan to use data once we have them and think about proactively managing expectations with stakeholders? After analysis is complete, what actions do we need to take to publish data and prepare to submit a paper to an academic journal?

Plan for data analysis. Consider using baseline data or synthetic data to write cleaning and analysis code while waiting for final data. Planning for data analysis will allow the research team to produce results quickly when outcomes are ready and potentially allow for interim results and monitoring if data is coming in batches. Additionally, planning allows researchers to think through exactly what needs to be done with program data, which can potentially influence thinking about requested data and the evaluation design. This plan may be included in a registered pre-analysis plan.
Develop a communications plan for stakeholders. Running randomized evaluations requires collaboration and communication between multiple stakeholders – academic researchers, research staff, partners who implement the program or policy, holders of administrative data, policymakers, and community members. These stakeholders have diverse viewpoints and specialized vocabularies, which can make communication and collaboration difficult. Strong communication leading to partner engagement can increase the chances of an evaluation running smoothly and having potential for policy impact.
Prepare for journal submission. Before publishing a working paper or submitting a manuscript to a journal, it may be helpful to map out requirements and a plan for engaging the study’s stakeholders and contracts. Maintaining compliance with grant agreements and data use agreements may require sending a draft or data for review prior to publication. Even in the absence of a legal requirement, sharing drafts or notifying partners in advance of publication can help maintain strong relationships.

Last updated June 2021.

These resources are a collaborative effort. If you notice a bug or have a suggestion for additional content, please fill out this form.

Acknowledgments

Thanks to Mary-Alice Doyle, Noreen Giga, Kim Gannon, Sarah Kopper, and Clare Sachsse for your thoughtful discussion and contributions. Alex Marlowe copy-edited this document. Thanks to John Tebes for creating the initial draft of this resource. This work was made possible by support from the Alfred P. Sloan Foundation and Arnold Ventures.

Part of J-PAL’s MicroMaster’s credential, Designing and Running Randomized Evaluations provides step-by-step training on how to build a well-designed, policy relevant study, including why and when to conduct a randomized evaluation.

J-PAL’s MicroMaster's credential in Data, Economics, and Development Policy equips learners with the practical skills and theoretical knowledge to tackle some of the most pressing challenges facing developing countries and the world’s poor.

J-PAL’s five-day, in-person training course on evaluating social programs is taught primarily by J-PAL affiliated professors. The course gives participants a thorough understanding of why and when researchers and policymakers might choose to conduct randomized evaluations and how randomized evaluations are designed and conducted in real-world settings.

Alan S. Gerber and Donald P. Green’s Field Experiments: Design, Analysis, and Interpretation covers many of the theoretical considerations associated with implementing a randomized evaluation.

For more information about whether a randomized evaluation is right for a given program or partner, see Section 1.2 of Rachel Glennerster’s The Practicalities of Running Randomized Evaluations: Partnerships, Measurement, Ethics, and Transparency.

J-PAL’s Using administrative data for randomized evaluations discusses topics related to processes for accessing administrative data, the ethical and legal framework surrounding the use of data for randomized evaluations, and common challenges in using administrative data. It also expands on topics related to securing Data Use Agreements.

J-PAL offers template code and training exercises for parametric and non-parametric power calculations at the following zip file download.

A blog post from the Stanford Graduate School of Business shares insights from researchers about what they have learned from prior field experiments, including how deploying a member of the research team could have helped identify barriers to implementation.

For more guidance on the importance of and different potential approaches to piloting, see the following resources. Page 20 of J-PAL’s guide Implementing Randomized Evaluations in Government discusses the importance of ensuring that interventions have been piloted. Page 12 of Glennerster and Takavarsha’s Running Randomized Evaluations discusses the purpose and outcomes of piloting an intervention. Page 63 of IPA’s Guide to Evaluating Financial Products and Services in the United States covers potential uses for piloting, including testing a new intervention, randomization protocol, and other logistics.

10.

Rachel Glennerster’s blog provides insight on developing good relationships with implementing partners. This post discusses how to be a better research partner.

11.

For more examples of questions to consider when making study design decisions with your partner, see page 99 of IPA’s Guide on Evaluating Financial Products and Services, which includes a partnership development questionnaire and other helpful guidance for implementing randomized evaluations.

12.

There are very limited exceptions to this rule, and the exceptions may vary by institution. Most universities apply the Federal Policy for the Protection of Human Subjects (i.e., the “Common Rule”) to all “human subjects” research (which includes most research involving human subjects or individual-level data about living humans) regardless of any federal requirements. Check with your IRB for details.

13.

J-PAL’s Administrative steps for launching a randomized evaluation in the United States, steps three and four, includes further information on this topic. For additional guidance, see Using administrative data for randomized evaluations and Formalizing research partnerships.

Additional Resources

The Laura and John Arnold Foundation. “Key Items to Get Right When Conducting Randomized Controlled Trials of Social Programs.” 2016. https://craftmediabucket.s3.amazonaws.com/uploads/PDFs/Key-Items-to-Get-Right-When-Conducting-Randomized-Controlled-Trials-of-Social-Programs.pdf.

The checklist shares key considerations that are critical to the success of conducting a randomized evaluation. While it is not a comprehensive guide, it contains a number of important steps that may affect the usefulness of findings if not addressed.
Brown, Julia, Lucia Goin, Nora Gregory, Katherine Hoffman, and Kim Smith. 2015. “Evaluating Financial Products and Services in the US: A Toolkit for Running Randomized Controlled Trials.” IPA. https://www.poverty-action.org/publication/evaluating-financial-products-and-services-us-toolkit-running-randomized-controlled.

This toolkit compiles Innovations for Poverty Action's (IPA's) best practices for running randomized evaluations. The guide focuses on using randomized evaluations to develop and test new financial products and product features for consumers in the United States.
Duflo, Esther, Rachel Glennerster, and Michael Kremer. 2006. “Using Randomization in Development Economics Research: A Toolkit.” w333. Cambridge, MA: National Bureau of Economic Research. http://www.nber.org/papers/t0333.

This is a practical guide for researchers and their teams to use when planning a randomized evaluation. Broad topics include the strengths of using randomization as an experimental approach, how to introduce randomization into program procedures, experimental design considerations, data analysis, and drawing conclusions from experimental results.
Glennerster, Rachel and Kudzai Takavarsha. 2013. Running Randomized Evaluations: A Practical Guide. Princeton: Princeton University Press. http://runningres.com.

This book draws from the experience of researchers in the J-PAL network to share practical considerations for implementing a randomized evaluation of a social program. The companion website, runningres.com, includes additional resources and sample exercises.
Green, Donald P., and Alan S. Gerber. 2012. Field Experiments: Design, Analysis, and Interpretation. W.W. Norton.

This book covers many of the theoretical considerations associated with implementing a randomized evaluation.
Gueron, Judith M. 2008. “The Politics of Random Assignment: Implementing Studies and Impacting Policy.” Journal of Children’s Services Vol 3 (1): 14-26.

Drawing from years of implementing randomized experiments at MDRC, this paper shares common challenges, lessons, strategies to overcome them, and discusses how randomized evaluations can influence policy.
Innovation Growth Lab. “Experimentation Toolkit.” http://toolkit.innovationgrowthlab.org/.

The Innovation Growth Lab’s Experimentation Toolkit contains resources on what randomized evaluations are, as well as why, when, and how they are used. The interactive toolkit includes high-level overviews of important considerations in randomized evaluation development. The section on “Impact trials” highlights tasks associated with planning and design; implementation; and analysis and reporting on the results of a randomized evaluation.
Karlan, Dean and Jacob Appel. 2016. Failing in the Field: What We Can Learn When Field Research Goes Wrong. Princeton: Princeton University Press.

This book draws on examples from field experiments in developing countries to share common stumbling blocks, case studies, and best practices for implementing randomized evaluations.

References

Brown, Julia, Lucia Goin, Nora Gregory, Katherine Hoffman, and Kim Smith. 2015. “Evaluating Financial Products and Services in the US.” Innovations for Poverty Action. https://www.poverty-action.org/publication/evaluating-financial-products-and-services-us-toolkit-running-randomized-controlled.

Dadisman, Kimberly and Jessica Troe. 2020. “Building Capacity, Overcoming Obstacles, and Creating Evidence: An Evaluation Guide for State and Local Policymakers.” J-PAL North America. https://www.povertyactionlab.org/publication/building-capacity-overcoming-obstacles-and-creating-evidence-evaluation-guide-state-and.

Feeney, Laura, Jason Bauman, Julia Chabrier, Geeti Mehra, and Michelle Woodford. 2015. “Using administrative data for randomized evaluations.” J-PAL North America. https://www.povertyactionlab.org/resource/using-administrative-data-randomized-evaluations.

Glennerster, Rachel. 2017. “Chapter 5 - The Practicalities of Running Randomized Evaluations: Partnerships, Measurement, Ethics, and Transparency.” In Handbook of Economic Field Experiments, edited by Abhijit Vinayak Banerjee and Esther Duflo, 175–243. North-Holland. https://doi.org/10.1016/bs.hefe.2016.10.002.

Glennerster, Rachel, and Kudzai Takavarasha. 2013. Running Randomized Evaluations: A Practical Guide. Princeton: Princeton University Press.

J-PAL North America. 2017. “Administrative Steps for Launching a Randomized Evaluation in the United States.”

Research Resources

Checklist for launching a randomized evaluation in the United States

Summary

Introduction

Partnerships and Assessing Feasibility

Implementation Strategy

Data Collection and Access

In this resource

Research Resources

Checklist for launching a randomized evaluation in the United States

Summary

Introduction

Partnerships and Assessing Feasibility

Implementation Strategy

Data Collection and Access

Data Analysis and Sharing Results

In this resource