AI Evidence Playbook
The AI Evidence Playbook is a practical reference for policymakers, practitioners, and donors investing in or developing AI-enabled programs.
The playbook is meant to support decision-making by explaining how to identify where AI can best help, how to design AI-enabled programs so they work in the settings they are intended for, and how to evaluate AI-enabled programs to build evidence that helps prevent wasted resources and harm.
Rigorous evidence from randomized evaluations is especially important in an AI world. AI systems often generate rich engagement, monitoring, and technical performance data, but these signals alone cannot show whether an intervention improves outcomes in the real world, such as learning, health, income, or well-being. Randomized evaluations can quantify causal impact, identify who benefits and who is left out, and surface unintended consequences, particularly when AI interacts with human behavior, incentives, and constraints that are hard to predict in advance.
The playbook maps six concrete pathways through which AI has strong potential to accelerate development outcomes:
Improving Program Targeting and Needs Prediction
Increasing Access to Personalized, Timely Support
Maximizing the Effectiveness of Frontline Service Providers
Improving Organizational and Programmatic Efficiency
Reducing Bias and Ensuring Fairness
Boosting Government Resource Mobilization
These pathways are supported by a growing body of evidence from randomized evaluations where available, and highlight what must be true for AI tools to translate into impact. Throughout, we emphasize that successful AI deployment depends not only on model performance, but also on delivery design, institutional readiness, trust, user experience, and safeguards.
Despite the potential unlocked by AI, promising tools can often stall in practice due to barriers related to access and adoption. This playbook draws on lessons from earlier waves of “next big” technologies, from laptops in classrooms to improved cookstoves, to show how innovations can fail when context, delivery, and incentives are overlooked.
Practical constraints that are especially salient for AI include device and connectivity gaps, low trust, limited digital skills, and weak state capacity. If unaddressed, these constraints will limit who benefits from AIenabled programs. Policymakers have multiple opportunities across the life cycle of an AI-enabled program to avoid repeating past mistakes.
Three decisions meaningfully shape risks and results:
What is the theory of change for this program?
Can the program realistically be implemented?
How to scale successfully? Evidence from randomized evaluations can support each of these decisions; for example, by comparing AI to non-AI alternatives, testing designs across groups, informing feasible oversight and transparency mechanisms, and guiding scale-up strategies.
AI will not transform the social sector on its own. How governments and their partners design, test, and update AI programs is what determines whether people actually see better outcomes.
The evidence in this playbook shows that the most promising results come when AI is designed with careful attention to a theory of change, the local context, users, and real-world constraints.
While emerging evidence indicates promising opportunities for AI to improve the world, completed studies are still scarce, and additional research is needed to confidently identify the potential impacts of any individual AI use case.
If AI and evaluation are built together into core government systems, decision-makers can more confidently choose what to scale, what to adapt, and what to stop, ultimately enabling them to direct scarce public resources toward AI applications that demonstrably improve people’s lives.
This is a living document which we will periodically update so that it reflects the current state of evidence and practice, including new evaluations, emerging risks, and evolving implementation lessons.