Programs and policies are designed to achieve a certain goal (or set of goals). For example, a chlorine distribution program may be implemented specifically to combat high-incidence of waterborne illness in a region. We may want to know whether this program is succeeding in its goal. This isn’t the same thing as asking, “Does chlorine kill bacteria?” or “Is the consumption of chlorine harmful?” Those questions can be answered in a real laboratory. For our program to achieve its goal of stopping illness, money must be allocated, tablets must be purchased, distribution mechanisms must be put in place, households must receive the tablets, households must use the tablets, and households must not consume untreated water. A program evaluation helps us determine whether all of these requirements are being met and if our goal is actually being achieved as intended.
As a normal part of operations, e.g. basic bookkeeping, certain information is produced, such as how many boxes of chlorine tablets have been shipped. This type of information can be used for process evaluation. But it cannot tell us whether we’ve successfully reduced the incidence of diarrhea. To measure impact, we must use more direct indicators such as the number of people who report suffering from diarrhea in the last two months.
Impact evaluations gauge the success of a program—where success can be broadly or narrowly defined. They help us weed out less effective interventions from successful ones and also help us improve existing programs.
The primary purpose of impact evaluation is to determine whether a program has an impact (on a few key outcomes), and more specifically, to quantify how large that impact is. What is impact? In our chlorine example, impact is how much healthier people are because of the program than they would have been without the program. Or more specifically, how much lower the incidence of diarrhea is than it would have been otherwise.
Getting this number correct is more difficult than it sounds. It is possible to measure the incidence of diarrhea in a population that received the program. But “how they would have been otherwise” is impossible to measure directly—just as it is impossible to measure the United States economy today had the Nazis won World War II, or to determine today’s most deadly disease if penicillin was not discovered in Alexander Fleming’s dirty laboratory in 1928 in London. It is possible that Germany would have become the dominant economy in the world, or alternatively, the Nazis may have fallen just a few years later. It is possible that minor wounds would still be one of the largest killers, or alternatively, some close relative of penicillin could have been discovered in another laboratory in a different part of the world. In our chlorine example, it is possible that without chlorine, people would have remained just as sick as they were before. Or it is possible that they would have started boiling their water instead, and the only thing chlorine did was substitute one technology for another—suggesting that people are not really any healthier because of the chlorine.
Impact evaluations estimate program effectiveness usually by comparing outcomes of those (individuals, communities, schools, etc) who participated in the program against those who did not participate. The key challenge in impact evaluation is finding a group of people who did not participate, but closely resemble the participants had those participants not received the program. Measuring outcomes in this comparison group is as close as we can get to measuring “how participants would have been otherwise.” There are many methods of doing this and each method comes with its own assumptions.
A table comparing the different methodologies can be found in the Why Randomize section.