Rigorous evaluations for humanitarian interventions: In conversation with the World Food Programme

Posted on:
Kristen McCollum
Women working in a warehouse
Photo: WFP | Hussam Al Salen

This piece is also posted on the World Food Programme’s Evaluation blog space.

How can we make good use of the limited resources available for humanitarian interventions, when needs are on the rise globally? As part of J-PAL’s developing workstream on the effectiveness of humanitarian interventions, J-PAL Europe organised a virtual training on randomised evaluations for nearly 50 representatives of  humanitarian organisations in December 2021 with the support of the UK Foreign, Commonwealth & Development Office (FCDO). 

The programme featured a presentation from Kristen McCollum, an impact evaluation analyst with the World Food Programme Office of Evaluation (OEV), who shared insights from the organisation’s experience implementing randomised evaluations. J-PAL invited Kristen’s reflections on the potential of randomised evaluations for improving the generation and use of evidence in the sector and the overall effectiveness of humanitarian programmes.

WFP has recently increased their investment in impact evaluations. What has motivated this push?

While impact evaluation is not new to WFP, our past experience with impact evaluations generated several lessons that prompted the launch of a new impact evaluation strategy in 2019. Previously, impact evaluations were commissioned by OEV in short-run series, and by individual country offices, producing scattered findings that were context-specific.

While this is valuable for in-country decision-making, it presents a challenge: how can we coordinate impact evaluations to produce more generalisable findings that help us make decisions at the organisational level and contribute to global evidence? WFP’s strategy addresses this by grouping our impact evaluations into windows: thematic portfolios of randomised evaluations that ask the same questions across several contexts. Our first three “windows”, developed in partnership with DIME World Bank, cover the topics of cash-based transfers and gender, climate and resilience, and school-based programming.

Randomised impact evaluations are increasingly seen as a feasible evaluation option in humanitarian settings. In what areas does WFP see the potential for experimental research in informing the design of its programmes? 

Optimising humanitarian and emergency responses requires specific evidence. WFP and other actors in the sector work in challenging contexts with unique questions, data constraints, and often shorter programme timelines.

Making impact evaluations operationally relevant requires some adaptation, and, with support from USAID’s Bureau of Humanitarian Affairs, our strategy aims to start filling this gap. With USAID’s support, we identified 4 initial focus areas where WFP could generate important evidence for the sector: in targeting; forecast-based financing; cash-based transfers; and peace-building and social cohesion.

Improving household targeting is a good example, as we don’t have much causal evidence on what works best although it is an important part of every programme. We are preparing to test different targeting approaches to understand differences in outcomes such as accuracy, speed, cost-effectiveness, and community satisfaction. By evaluating targeting approaches across several contexts, we are more likely to have solid evidence to inform decisions when an emergency hits.

Conducting randomised evaluations often brings up organisational questions around how teams work together to implement programmes and generate learnings. As an organisation, how have you had to work differently across teams to support this, and what advice do you have for implementers looking to engage in this type of work?

Our impact evaluation team is lucky that WFP already has a good reputation for evaluation more broadly. However, our monitoring and evaluation colleagues are generally more familiar with summative evaluations, where the focus is largely on data collection and analysis when the programme is nearing completion. Randomised evaluations require additional teamwork, due mostly to the fact that they are built into the implementation from the very beginning.

In addition to our monitoring teams, we work closely with programme teams (both inside WFP and in any partner implementing organisations), IT teams, communication experts, gender and protection officers, as well as external partners (for example with the World Bank) and procurement colleagues.

For this reason, it is crucial to have colleagues dedicated full-time to the coordination work of impact evaluation.

Our (small but growing) headquarters team works to support several evaluations, shifting priorities depending on the stage of each evaluation. Particularly with humanitarian programming, where implementation necessarily starts quickly and with little warning, we are finding it necessary to have this flexible capacity. As another top priority, we have learned that it is important to build and fund in-country capacity. Dedicated impact evaluation field coordinators in the country reduce the burden on programme teams who are already doing important emergency work, and support evaluation data collection and quality.

How do you approach the requirement for a control group when designing randomised evaluations for humanitarian settings, where the imperative of impartiality requires actors to allocate assistance based on need alone and withholding assistance may cause harm?

Our experience has highlighted two scenarios where randomised evaluations in this sector are particularly feasible.

First, unfortunately it is rare that WFP is able to reach all of those in need at the same time. With resource and logistics constraints in mind, a phase-in design (or staggered design, in which a control group receives the intervention slightly later than an intervention group), can happen organically as part of how WFP usually operates. With limited resources, randomisation can be one ethical way of determining who receives assistance first.

Second, the questions we often have in humanitarian settings are not “does this work?” but “which way works best?”. We don’t need to test whether providing food for people helps curb extreme hunger, for example. What we do not always know is how we can do that best.

To answer this question with a rigorous causal analysis, we would still use a randomised evaluation design, but we would compare several different approaches – for example, comparing various cash-based modalities, without the need for a control group. This is sometimes called A/B testing in the private sector. We call it ‘lean IE’ and we think it is a promising way forward to start answering high-priority questions in the humanitarian sector.

Implementers are increasingly asked to speak to the sustainability of their programmes. How can randomised evaluations contribute to answering this question?

We conceptualise the different randomised evaluation designs in our humanitarian impact evaluation workstream as sitting along a timeline of an emergency response.

At the beginning of the timeline sits the targeting questions I mentioned earlier. At the other end sits these questions around transitions from humanitarian to development assistance. Questions here include ones like “how can we best ensure that WFP beneficiaries become less reliant on emergency assistance?” or “What interventions are most effective at supporting peaceful and socially cohesive communities?”

For WFP programmes that aim to build resilience, such as our food assistance for assets programmes, randomised evaluations can help us answer a host of questions about the timing of assistance and the layering of other activities in order to better support households in the face of shocks and stressors. For example, our Climate and Resilience Window, supported by the German Federal Ministry for Economic Cooperation and Development (BMZ), accomplishes this using high-frequency measurements, to better understand the trajectory of households’ food security and coping strategies over time. Collecting both short- and long-term data in these evaluations could help us to understand how sustainable our programmes might be. Resources permitting, WFP is also seeking to eventually capture the longer-term effects of shorter-term interventions in this way. 

Learn more about WFP’s approach to impact evaluations.

Explore J-PAL’s randomised evaluation database.

Authored By