Why Randomize?

What is impact? In our chlorine example, impact is how much healthier people are because of the program than they would have been without the program. Or more specifically, it is how much lower the incidence of diarrhea is than it would have been otherwise.

Getting this number correct is more difficult than it sounds. It is possible to measure the incidence of diarrhea in a population that received the program. But “how they would have been otherwise” (termed, the counterfactual) is impossible to measure directly, it can only be inferred.

Constructing a Comparison Group

Impact evaluations estimate program effectiveness usually by comparing outcomes of those (individuals, communities, schools, etc) who participated in the program against those who did not participate. The key challenge in impact evaluation is finding a group of people who did not participate, but closely resemble the participants, and in particular, the participants if they had not received the program. Measuring outcomes in this comparison group is as close as we can get to measuring “how participants would have been otherwise.” Therefore, our estimate of impact is only as good as our comparison group is equivalent.

There are many methods of creating a comparison group.  Some methods do a better job than others. All else equal, randomized evaluations do the best job. They generate a statistically identical comparison group, and therefore produce the most accurate (unbiased) results. Or stated more strongly: other methods often produce misleading results—results that would lead policymakers to make exactly the opposite decision relative to where the truth would have directed them.

These other methods don’t always give us the wrong answer, but they rely on more assumptions. When the assumptions hold, the answer is unbiased. But it is usually impossible, and always difficult, to ensure that the assumptions are true. In fact, it is likely that most debates about the validity of an evaluation are fueled by disagreements over whether these assumptions are reasonable.

Beyond escaping debates over assumptions, randomized evaluations produce results that are very easy to explain. A table comparing common methods of evaluation can be found here.