Repository of measurement and survey design resources

Authors

Jack Cavanagh

Sabhya Gupta

Contributors

Michael Gibson

Katharina Kaeppel

Sarah Kopper

Daniela Muhaj

Summary

This is a list of resources on measurement and survey design relating to various topics. Topics are organized alphabetically for ease of navigation using the sidebar. We rely on crowd-sourced material to maintain this page and update it regularly. Feedback on the page, or suggestions of resources to add or remove, can be submitted using this form.

General resources

Welcome to the J-PAL Repository of Measurement and Survey Design Resources. The purpose of this repository is to provide an introduction to the measurement and survey design resources available in a particular subject or for a specific type of question. It is a companion piece to our Introduction to measurement and indicators and Survey design pages, compiling resources that discuss and provide guidance on specific issues related to the concepts introduced in those resources. In each section, you will find a list of resources, moving from more general to more specific, that are meant to introduce readers to the measurement tools, difficulties, and solutions in that particular topic.

These lists do not aspire to comprehensiveness but rather attempt to tackle the main points in measurement within a topic. They cover a variety of media, from blogs and academic papers to books and journal issues. While heavy synthesis is outside the scope of this resource, short descriptions of the papers, as well as introductions to each section written by J-PAL staff with expertise in that area, are provided for the reader’s ease of use.

The General section includes resources that provide an overview of measurement concepts and guidance on survey design. It also includes a few repositories of sample questionnaires and resources on remote surveying in light of the shifts during the Covid-19 pandemic. Some of the following sections are divided by research area, for example, financial inclusion, corruption, and environment & energy. Others, like the forthcoming section on measuring consumption, assets, income, prices, and poverty, cut across several research topics.

The current repository includes only a portion of the total number of sections, and we will continue to release more sections in phases over the coming months, so please check back for more.

As with our other research resources, this is meant to be a living resource. If you have feedback on any of the resources in this repository, or if you have recommendations for resources and/or further subjects to add, please fill out this form.

General measurement resources

J-PAL research resource: Introduction to measurement and indicators, by Sarah Kopper and Katie Parry -- An overview of measurement concepts, measurement error, and common sources and types of data.
J-PAL training presentation: Measurement: Outcomes, impacts, and indicators -- A lecture on measurement delivered at J-PAL’s 2019 Executive Education Training.
Book: The analysis of household surveys, by Angus Deaton; The World Bank Group (1997/2018) -- A comprehensive guide to the qualitative and econometric methods for collecting and analysing survey data to understand the effect of household decisions.
Blog post: Towards a survey methodology methodology, by Andrew Dillon (2016) -- An overview article on measurement issues.
Broad overview: What can we learn from experimenting with survey methods?, by Joachim De Weerdt, John Gibson, and Kathleen Beegle (2020) -- An examination of the effect of different survey designs on the measurement of different indicators associated with consumption, household size, etc. It also highlights the learnings from recent literature on survey design. [Gated published version]
Journal issue: Symposium on measurement and survey design, edited by David McKenzie and Mark Rosenzweig; Journal of Development Economics, Volume 98, Issue 1 (2012) -- A collection of curated articles on different measurement techniques and questions. [Gated]
- See also the Preface for symposium on measurement and survey design, by McKenzie and Rosenzweig (2012) [Gated]
Survey guide: Measurement error in survey data, by John Bound, Charles Brown, and Nancy Mathiowetz; The Handbook of Econometrics (2001) -- Discusses the impact of measurement error and provides methods to minimise error. It also reviews a few validation studies and discusses their design. [Gated published version]
Book: Sample surveys: Design, methods and applications, by C. R. Rao; The Handbook of Statistics, Volume 29, Part A (2009) -- A guide to different types of sampling methods and processing survey data with examples from experiences in the field. [Gated]

Practical guides to designing and implementing surveys

J-PAL research resource: Survey design by Sarah Kopper and Katie Parry -- An introduction to the principles of survey design.
Book: Designing household survey questionnaires for developing countries, edited by Margaret Grosh and Paul Glewwe; The World Bank Group, Vol. 1 (2000-05) -- Modules with detailed instructions on designing surveys for different sectors.
- See also Volumes 2 and 3.
Book: Designing household survey samples: Practical guidelines, by the United Nations’ Department of Economic and Social Affairs Statistics Division (2008) -- A practical guide on designing household surveys, implementing them, and analyzing the data while accounting for the sampling frame and non-sampling error.
Broad overview: Articles on DIME’s Wiki, including:
Survey guide: The World Bank’s Living Standards Measurement Study (LSMS) site has many resources and papers on measurement and survey design, many of which are listed below.
Broad overview: Guidelines for best practice in cross-cultural surveys, Survey Research Center, Institute for Social Research, University of Michigan (2016), including:
- Questionnaire design
- Instrument technical design
Broad overview: EGAP's 10 things to know about survey design, by Gabriella Sacramone-Lutz -- Contains broad guidelines on some aspects of survey design.
Blog post: Development Impact blog posts on measurement and survey design.
Broad overview: Observational field research, by Laura Brown -- A brief overview of what observational research is and the types of data collection associated with it.

Sample questionnaires

This section contains sources for finding sample questionnaires across sectors; topic-specific sample questionnaires can also be found under the relevant topic header. Note that project data is typically stored with the questionnaire and can be found by following the links below.

Datahub for Field Experiments in Economics and Public Policy, by J-PAL and IPA; Harvard Dataverse -- Includes survey questionnaires, research results, replication code, and documentation from studies conducted by J-PAL and IPA affiliated researchers.
The World Bank’s Microdata Catalog -- Central cache of questionnaires and data used and produced by the World Bank.
Model surveys from the Demographic and Health Surveys Program -- DHS questionnaire modules on household composition, biomarkers, and behavior related to the health of women, men and children.
The International Food Policy Research Institute's Database of Household and Community Surveys -- A library of datasets resulting from household and community surveys conducted by IFPRI, many of which are open access and contain the survey questionnaires.
International Household Survey Network -- A catalogue of surveys, research data and documentation from the IHSN.

Phone surveys

General guides

J-PAL research resource: Resources for conducting remote surveys, by Michael Gibson and Sarah Kopper (2020).
- Using mobile phones to collect panel data in developing countries, by Brian Dillon (2011) discusses the experience of collecting high frequency household data using mobile phones in rural Western Tanzania in 2009-2010. [Gated]
- One example of a study using SMS polls to study health behavior is Tracking health seeking behavior during an Ebola outbreak via mobile phones and SMS, by Shuo Feng, Karen A. Grépin, and Rumi Chunara (2018).
J-PAL research resource: Best practices for conducting phone surveys, by Sarah Kopper and Anja Sautmann (2020)
Webinar: Adaptations for phone surveys: A webinar with Tavneet Suri -- Webinar on guidance on pivoting to phone surveys in light of the Covid-19 pandemic.
Survey guide: Mobile phone panel surveys in developing countries: A practical guide for microdata collection, by Andrew Dabalen, Alvin Etang, Johannes Hoogeveen, Elvis Mushi, Youdi Schipper, and Johannes von Engelhardt (2016) -- A practical guide to designing and conducting phone surveys.
Survey guide: Remote surveys on the DIME Wiki -- Brief overview of different kinds of remote surveys.

Research on response rates and mode effects

Briefs: Phone survey methods on Innovation for Poverty Action's (IPA) RECOVR hub, including evidence briefs on:
Blog post: Reducing attrition in phone surveys, by Berk Özler and P. Facundo Cuevas (2019) -- Discusses small changes that can dramatically improve attrition rates for phone surveys.
Paper: Call me maybe: Experimental evidence on frequency and medium effects in microenterprise surveys, by Robert Garlick, Kate Orkin, and Simon Quinn (2020) -- Testing the reliability of phone surveys in microenterprise surveys. [Gated published version]
Paper: Please call again: Correcting nonresponse bias in treatment effect models, by Luc Behaghel, Bruno Crépon, Marc Gurgand, and Thomas Le Barbanchon (2015) -- Propose and test a method to reduce non-response bias in surveys that involve repeated sequential attempts to obtain a response.
J-PAL webinar: Webinar on how to use SurveyCTO plug-ins for phone surveys, by SurveyCTO, IPA, and J-PAL South Asia.

Practical survey guides

J-PAL South Asia’s Transitioning to CATI checklist, by Saurabh Bhajibhakare, Ambika Chopra, Putul Gupta, and Mustufa Patel (2020) -- Guide to implementing and monitoring CATI surveys.
J-PAL South Asia’s Quality assurance for CATI, by Saurabh Bhajibhakare (2020) -- Best practices on conducting and supervising phone surveys with guidance on back checks and high frequency checks.
Guide to work-from-home data collection, by DIME Analytics -- Guide to conducting remote surveys during the Covid-19 pandemic.
60_Decibel’s Remote survey toolkit (2020) -- Guidance on conducting phone surveys, different available methods with some example question sets.
SurveyCTO’s Computer-assisted telephone interviewing (CATI) starter kit, by Marta Costa -- Guide to CATI using SurveyCTO.
- J-PAL South Asia’s Exotel-SurveyCTO plugin -- Useful for making anonymous calls from within a SurveyCTO form.

Covid-specific resources

Survey guide: J-PAL South Asia’s Budgeting for phone surveys during the Covid-19 pandemic, by Putul Gupta (2020) -- Guide to budgeting for trainings, personnel and equipment for phone surveys during the Covid-19 pandemic.
Survey guide: Remote surveying in a pandemic: Handbook, by Steve Glazerman, Michael Rosenbaum, Rosemarie Sandino, and Lindsey Shaughnessy (2020) -- Comprehensive guide to all aspects of remote surveying including IRB and data protection, respondent tracking and logistics.
Blog post: Impact evaluations in the time of Covid-19, part 1, by Markus Goldstein and Florence Kondylis (2020) -- Early pandemic guidance on how the pandemic is likely to affect research.
Blog post: Practical tips for implementing remote surveys in the time of the Great Lockdown, by Maria Jones, Roshni Khincha, Florence Kondylis, and Lysca Uwamariya (2020) -- Tips on logistics, training, data collection and security while conducting remote surveys.
Blog post: Mobile phone surveys for understanding Covid-19 impacts: Part I sampling and mode, by Kristen Himelein, Stephanie Eckman, Charles Lau, and David McKenzie (2020) -- Discussion on planning and budgeting for sampling frames and different methods for phone surveys with examples.
Blog post: Mobile phone surveys for understanding Covid-19 Impacts: Part II response, quality, and questions, by Kristen Himelein, Stephanie Eckman, Charles Lau, and David McKenzie (2020) -- Discussion of reasonable response rates, examples of methods to improve response rates and data monitoring methods.
Webinar: Adapting to Covid-19: Overview of data collection and phone surveying with SurveyCTO, video by SurveyCTO (2020).

Agriculture

Overview

The vast majority of people living in low and middle income countries depend on agriculture for their livelihoods, and a high reliance on agriculture has been attributed to the substantial income gaps between high and low income countries. As a result, governments, aid organizations, and others spend hundreds of millions annually¹ on programs to increase agricultural productivity.

Yet conclusions about production, input productivity, and profits are hindered by accurate measures of inputs and output. Given the informality of the sector, smallholder agricultural labor is rarely recorded as it is done; researchers must hence rely on self-reports that can vary widely depending on the respondent answering the question (e.g., Kilic et al. 2021) or the recall period (Arthi et al. 2018; Beegle et al. 2012). Though purchases of fertilizer and seeds are easier to measure accurately, they can be of unobserved and mixed quality (see e.g., Bold 2017 and Michelson et al. 2021 for contradictory findings), further complicating accurate productivity estimates. Self-reported land size suffers from measurement bias that varies with plot size and is thought to be at least partially responsible for the puzzle of an inverse relationship between land size and productivity that has plagued agricultural economists for decades. Fortunately, the proliferation of GPS devices in recent years has enabled better land size measurement.

Self-reported output is subject to similar concerns of recall bias and measurement error that vary with the plot size (e.g., Abay et al. 2019), particularly when crop production is consumed by the household (rather than only sold), as is the case for smallholder farmers. Crop cuts provide a more accurate measure but are costly and must be taken correctly to avoid miscalculations based on the “edge effect”² (Bevis & Barrett 2020). Satellite imagery offers a promising alternative that is both low cost and objective though may require ground-truthing (Lobell 2013; Lobell et al. 2019). Yet even when inputs and outputs are measured accurately, the endogenous nature of input use and the many other factors that affect the production function, from highly localized and costly-to-measure soil and growing conditions to farmer skill, require careful econometric analysis for credible inference.

These measurement challenges and others have broad policy implications. Mismeasurement of farm-level production scales up to mismeasurement of aggregate production (Gollin & Udry 2021). This in turn affects calculations of potential gains to large-scale policies to increase production such as through higher chemical input use, commercialize smallholders such as through better connections to input and output markets, or to move people out of the agricultural sector altogether.

Carletto et al. (2021) surveys the latest research on measurement error in agriculture, offering actionable suggestions for data collection and analysis, and is a good starting point for readers aiming to design a study or analyze data.

- Sarah Kopper, Associate Director of Research

General resources

Book section: Designing household survey questionnaires for developing countries, edited by Margaret Grosh and Paul Glewwe; The World Bank (2000)
- Volume 2, Chapter 19 by Thomas Reardon and Paul Glewwe (pages 139 to 182)
- Volume 3, Chapter 19 by Thomas Reardon and Paul Glewwe (pages 407 to 452)
Survey Guide: Agricultural survey design: Lessons from the LSMS-ISA and beyond, by Andrew S. Dillon, Calogero Carletto, Sydney Gourlay, Philip Randolph Wollburg, and Alberto Zezza (2021) -- A guidebook reviewing major innovations and best practices in agriculture survey design developed since the publication of Grosh and Glewwe (2000; above), as well as lessons learned from the LSMS surveys.
- Associated blog post: Agricultural survey design: Lessons from the LSMS-ISA and beyond
Paper: Agricultural data collection to minimize measurement error and maximize coverage, by Calogero Carletto, Andrew Dillon, and Alberto Zezza (2021) -- Surveys recent developments in agricultural data collection and measurement, and argues that increased focus on data collection structures can reduce measurement error and maximize data coverage. [Gated published version]
Paper: From tragedy to renaissance: Improving agricultural data for better policies, by Gero Carletto, Dean Jolliffe, and Raka Banerjee (2015) -- An overview of challenges in available agricultural statistics. [Gated published version]
Questionnaires from 50x2030
- See also 50x2030, 2021 for an overview of the overall initiative

Input use, productivity, and production

Labor

Paper: Not your average job: Measuring farm labor in Tanzania, by Vellore Arthi, Kathleen Beegle, Joachim De Weerdt, and Amparo Palacios-López (2018) -- A randomized test of recall bias in reporting agricultural labor. [Gated published version]
Paper: Measuring farm labor: Survey experimental evidence from Ghana, by Isis Gaddis, Gbemisola Oseni, Amparo Palacios-López, and Janneka Pieters (2020) -- A randomized test of recall bias in agricultural labor.
Paper: Employment data in household surveys: Taking stock, looking ahead, by Sam Desiere and Valentina Costa -- Reviews four different survey programs from 14 developing countries by identifying the labour market indicators and the key concepts behind them, and discusses related data quality concerns. Has a particular emphasis on agricultural labor.
Paper: Productivity and health: Physical activity as a measure of effort, by Oladele Akogun, Andrew Dillon, Jed Friedman, Ashesh Prasann, and Pieter Serneels (2020) -- Uses measures of physical activity captured by wearable accelerometers to measure labor productivity. [Gated published version]
Paper: Improving the measurement of rural women's employment: Global momentum and survey research priorities, by Gayatri Koolwal (2021) -- Reviews the challenges in measuring rural women’s employment with a particular focus on survey programs in sub-Saharan African countries, and enumerates best practices to consider when examining rural women’s employment. [Gated published version]
Questionnaire: Women’s Empowerment in Agriculture Index (WEAI), by the International Food Policy Research Institute -- An instrument for measuring women’s empowerment, agency, and inclusion in agricultural settings.
Paper: Measurement properties of the project-level Women's Empowerment in Agriculture Index, by Kathryn M.Yount, Yuk Fai Cheong, Lauren Maxwell, Jessica Heckert, Elena Martinez, and Gregory Seymour (2019) -- A test of the Women’s Empowerment in Agriculture Index (WEAI) in two settings using Item Response Theory (IRT). Contains suggestions for refining certain parts of the module to improve its measurement equivalency.

Land: Size, ownership, and fertility

Paper: Land measurement bias and its empirical implications: Evidence from a validation exercise, by Andrew Dillon, Sydney Gourlay, Kevin McGee, and Gbemisola Oseni (2019) -- A comparison of methods for measuring land size. [Gated published version]
Survey guide: Land area measurement in household surveys, by Gero Carletto, Sydney Gourlay, Siobhan Murray, and Alberto Zezza (2016) -- An LSMS guidebook on measuring land area in household surveys, including guidance on and a comparison of the three most prevalent tools, compass and rope, self-reports, and GPS.
Paper: Land measurement bias: Comparisons from Global Positioning System, self-reports, and satellite data, by Andrew Dillion and Lakshman Nagraj Rao (2021) -- Compares measures of land measurement from GPS, self-reports, and satellite imagery.
Paper: Close to the edge: Do behavioral explanations account for the inverse productivity relationship?, by Leah Bevis and Christopher Barrett (2020) -- An experiment measuring edge effects and their relationship to the inverse-productivity puzzle in rural Uganda. [Gated published version]
Paper: Correlated non-classical measurement errors, ‘Second best’ policy inference, and the inverse size-productivity relationship in agriculture, by Kibrom A. Abay, Gashaw T. Abate, Christopher B. Barrett, and Tanguy Bernard (2019) -- Discusses non-classical measurement errors (those correlated with the true value of their variable) with an application to the inverse size-productivity puzzle. The authors compare precise but costly measurements to the bias caused by only reducing error in one variable as well as the bias caused by reducing error in no variables. [Gated published version]
Paper: Empirical assessment of subjective and objective soil fertility metrics in East Africa: Implications for researchers and policy makers, by Julia Berazneva, Linden McBride, Megan Sheagan, and David Guerena (2018) -- A comparison of subjective (reported) and objective (measured) soil fertility estimates, and their correlation. [Gated published version]
Survey guide: Spectral soil analysis & household surveys: A guidebook for integration, by Sydney Gourlay, Ermias Aynekulu, Calogero Carletto, and Keith Shepherd (2017) -- An LSMS guide to integrating soil analysis into household and farm surveys. It also includes an overview of the general feasibility of integration, comparisons of subjective and objective measures of soil fertility, and some example modules.
Measurement guide: DNA fingerprinting for crop varietal identification: Fit-for-purpose protocols, their costs and analytical implications, by Ana Poets, Kevin Silverstein, Philip Pardey, Sarah Hearne, and James Stevenson (2020) -- A guide to using DNA fingerprinting in order to measure crop variety.
Paper: Getting the (gender-disaggregated) lay of the land: Impact of survey respondent selection on measuring land ownership and rights, by Talip Kilic, Heather Moylan, and Gayatri Koolwal (2021) -- An empirical test of the effect of within-household survey respondent on measures of land ownership.

Skills and knowledge

Paper: Measuring agricultural knowledge and adoption, by Florence Kondylis, Valerie Mueller, Siyao Jessica Zhu (2014) -- A gender-disaggregated test of measurement error in self-reported knowledge and adoption of three agricultural practices: intercropping, mulching, and strip tillage. [Gated published version]

Other inputs

Paper: Lemon technologies and adoption: Measurement, theory and evidence from agricultural markets in Uganda, by Tessa Bold, Kayuki C. Kaizzi, Jakob Svensson, and David Yanagizawa-Drott (2017) -- Tests for fertilizer and seed quality and their application to input markets in Uganda. [Gated]
- See also Michelson et al. 2021 for similar methods but contradictory findings. [Gated published version]
Paper: Do beliefs about herbicide quality correspond with actual quality in local markets? Evidence from Uganda, by Maha Ashour, Daniel Orth Gilligan, Jessica Blumer Hoel, and Naureen Iqbal Karachiwalla (2018) -- Compares tested quality of herbicide in Ugandan input markets against farmers’ beliefs of quality and prices.

Production

Book: Handbook on crop statistics: improving methods for measuring crop area, production and yield, by the Global Strategy to improve Agricultural and Rural Statistics (GSARS); Food and Agriculture Organization of the United Nations (2018) -- An extensive guide to measuring crop area, production, and yields.
Code: Agricultural development indicator curation, by Leigh Anderson, Travis Reynolds, Pierre Biscaye, Didier Alia, David Coomes, Terry Fletcher, Jack Knauer, Josh Merfeld, Isabella Sun, Chelsea Sweeney, Emma Weaver, and Ayala Wineman -- Sample STATA code to construct agricultural development indicators from the Living Standards Measurement Study - Integrated Surveys on Agriculture (LSMS-ISA).
- See also their description of the data construction process in General construction decisions.
Brief: Methodology for estimation of crop area and crop yield under mixed and continuous cropping, by U.C. Sud, Tauqueer Ahmad, V.K. Gupta, Hukum Chandra, Prachi Misra Sahoo, Kaustav Aditya, Man Singh, and Ankur Biswa (2017) -- Discussion of different methods for measuring area (GPS, rope, and compass) and yields (self reports, crop cuts).
Paper: Methods of crop yield measurement on multi-cropped plots: Examples from Tanzania, by Ayala Wineman, Leigh Anderson, Travis Reynolds, and Pierre Biscaye (2019) -- A comparison of four different methods of measuring crop-area on multi-cropped plots, and then an exploration of the implications of those methods for crop-yield measurements and correlates.
Paper: Heterogeneity, measurement error, and misallocation: Evidence from African agriculture, by Douglas Gollin and Christopher Udry (2021) -- Expounds and applies a model for distinguishing between measurement error, unobserved heterogeneity, and potential misallocation in observed agricultural productivity differences. [Gated published version]
Paper: Reliability of recall in agricultural data, by Kathleen Beegle, Calogero Carletto, and Kristen Himelein (2012) -- A test for recall bias in input use and yield measurements. [Gated published version]
Paper: Can diaries help in improving agricultural production statistics? Evidence from Uganda, by Klaus Deininger, Calogero Carletto, Sara Savastano, and James Muwonge (2012) -- A comparison of data quality when data is received from production diaries vs. self-reports in agricultural production data. [Gated published version]
Survey guide: Technical note on post-harvest losses, by Marco Tiberti (2021) -- Provides a guide to collecting on-farm harvest and post-harvest losses in farm and household surveys.
Paper: How accurate are yield estimates from crop cuts? Evidence from smallholder maize farms in Ethiopia, by Frederic Kosmowski, Jordan Chamberlin, Hailemariam Ayalew, Tesfaye Sida, Kibrom Abay, and Peter Craufurd (2021) -- An experimental comparison of yields from full crop harvest measurement with estimations from various field-sampling methods.

Satellite and geographic measurement

Paper: The use of satellite data for crop yield gap analysis, by David Lobell (2013) -- A review of the use of remote-sensing techniques in measuring crop yield-gaps.
Blog post: Measuring yields from space, by Florence Kondylis and Marshall Burke (2015) -- A review of the use of remote-sensing techniques in agricultural measurement.
Paper: Eyes in the sky, boots on the ground: Assessing satellite- and ground-based approaches to crop yield measurement and analysis, by David Lobell, George Azzari, Marshall Burke, Sydney Gourlay, Zhenong Jin, Talip Kilic, and Siobhan Murray (2019) -- A comparison of satellite measurement of crop yields against multiple ground-based measurement techniques, including the “gold-standard” of full-plot crop cuts. [Gated published version]
Paper: Sampling nomads: A new technique for remote, hard-to-reach, and mobile populations, by Kristen Himelein, Stephanie Eckman, and Siobhan Murray (2014) -- Introduces a new technique for sampling nomads and other hard-to-reach populations into surveys based on location within a radius of randomly selected geographic points.
Paper: Remote sensing of grassland production and management—A review, by Sophie Reinermann, Sarah Asam, and Claudia Kuenzer (2020) -- Provides a review of papers using satellite data to measure livestock production and grazing areas.

Corruption in governance and service provision

Overview

Corruption, or when bureaucrats and elected officials misuse their positions or break rules for private gain, is difficult to measure due to its illicit and often secretive nature. Directly observing corrupt activities like bribery by government officials, neglecting official duties, or tax avoidance can be challenging as officials may change or conceal their behavior in response to being monitored. Further, traditional survey approaches are unlikely to elicit truthful responses as officials may be unwilling to confess to corruption due to social desirability bias. Alternative methods of measuring corruption that rely on asking about citizens’ experience with corrupt officials (e.g. “have you paid a bribe for a service before?”) or their perceptions of corruption may be biased, outdated, or incomparable across contexts (Olken 2009). Perception-based indices and rankings may also provide limited insight into the type, causes, or consequences of corruption in a given context (Banerjee, Mullainathan, and Hanna 2012).

Given these challenges, open questions around measuring corruption include: How can researchers measure corruption without distorting public officials’ behavior or eliciting a biased response? What is the best way of measuring social norms around corruption? What types of corruption, if any, can citizen reports shed valuable insights on? Can e-governance reforms that improve the collection of administrative data also improve our ability to measure corruption? More reliable measures of corruption help us better answer policy-relevant questions like the effects of corruption on the efficiency of public service delivery, and the effectiveness of anticorruption policies and programs.

While measuring corruption is difficult, researchers have made remarkable progress in doing so in the past few years, including through the use of a variety of innovative approaches that directly measure corruption and begin tackling some of these questions. This includes:

Surveyors accompanying truck drivers on delivery routes, dressed as their assistants, to record bribes paid to police at checkpoints (Olken and Barron 2009)
Combining GPS-tracked company vehicle data with administrative data to measure corrupt behavior among bureaucrats of a large public service provider (Schonholzer et al., ongoing)
Comparing villagers’ perceptions of corruption to an objective measure (e.g. the difference between government-reported expenditure for a road building project and the estimated cost of actually building the road according to independent engineers; Olken 2009).

The papers that follow include many more examples of methods that can be used to measure corruption in governance and service provision, including through audits, public expenditure tracking surveys, market inference, and more. For a discussion on the different measurement approaches and their applicability, see the MITx Micromasters Course on Political Economy and Economic Development.

- Aimee Barnes, Policy Associate, and Eliza Keller, Senior Policy & Communications Manager, for the J-PAL Political Economy and Governance sector

Datasets

The World Bank's Worldwide Governance Indicators (1996-2019) -- See the comprehensive documentation for a discussion of their methodology.
The World Bank’s Enterprise Surveys -- Surveys firms to collect information on informal payments for utilities, licenses, contracts etc.
- See Bai et.al (2019) for an example using data from the WB Enterprise surveys to compare corruption across countries.
International Crime Victims Surveys (ICVS) -- Includes a question on bribe payments by individuals in 49 countries.
- For an example, see Mocan (2008), who uses data from the ICVS to find correlations between being asked for a bribe and country characteristics. [Gated published version]
V-Dem: Global Standards, Local Knowledge -- Includes variables to measure corruption.

General resources

Paper: Eight questions about corruption, by Jakob Svensson (2005) -- Provides a definition of corruption and then discusses the level of corruption in different countries, the different ways to reduce corruption, and the impact of corruption on growth.
Paper: Corruption in developing countries, by Benjamin Olken and Rohini Pande (2012) -- A review of the different measurement techniques and the existing evidence.
Paper: Section 4 (measurement), of Corruption, by Abhijit Banerjee, Sendhil Mullainathan, and Rema Hanna (2012) -- A review of different measurement methods and their application in the literature.
Book: New advances in experimental research on corruption, edited by Danila Serra, Leonard Wantchekon, R. Mark Isaac, and Douglas A. Norton; Emerald Group Publishing Limited, Vol. 15 (2012) -- Reviews the research on corruption measurement and reduction generated from laboratory and field experiments. [Gated]
Paper: Survey techniques to measure and explain corruption, by Ritva Reinikka and Jakob Svensson (2003) -- Reviews the use of Public Expenditure Tracking Surveys (PETS), provider surveys, and enterprise surveys for measuring corruption in education, health, and private businesses.
Book: Are you being served? New tools for measuring services delivery, edited by Samia Amin, Jishnu Das, and Markus Goldstein; the World Bank, Vol. 1 (2008) -- Examples of using different methods and tools for measuring public service delivery.
Book: Advances in experimental political science, edited by James N. Druckman and Donald P. Green; Cambridge University Press, Vol. 1 (2021) - A comprehensive guide to the next experimental methods, data collection, analysis, and challenges. [Gated]
Book: Corruption: What everyone needs to know, by Ray Fisman and Miriam A. Golden; Oxford University Press (2017) - An overview of corruption and its causes and consequences with examples from around the world. [Gated]

International indicators of corruption and governance

Paper: A measurement assessment approach: Assessing the varieties of democracy corruption measures, by Kelly M. McMann, Daniel Pemstein, Brigitte Seim, Jan Teorell, and Staffan I. Lindberg (2017) -- Develops an approach to assess the reliability and validity of corruption measures, and tests the assessment method on the V-Dem measure.
Paper: Are international databases on corruption reliable? A comparison of expert opinion surveys and household surveys in sub-Saharan Africa, by Mireille Razafindrakoto and François Roubaud (2010) -- A comparison of data from household surveys on perception of corruption and expert opinions to comment on the reliability of expert opinions and related biases. [Gated published version]

Specific approaches to measuring corruption

Through perception

Paper: Corruption perception vs corruption reality, by Benjamin Olken (2009) -- Examines the reliability of villagers’ perception of corruption in a project by comparing it with the “missing expenditure.” Missing expenditure is the difference between the actual cost of the project and the individual’s perception. [Gated published version]
Paper: Parochial politics: Ethnic preferences and politician corruption, by Abhijit Banerjee and Rohini Pande (2009) -- Uses expert surveys to measure perceptions about how corrupt a candidate is. They report a high correlation between journalist’s perception about the candidate with actual data.

Through survey estimates of bribes

Paper: Who must pay bribes and how much? Evidence from a cross-section of firms, by Jakob Svensson (2003) -- Examines information collected on bribe payments by surveying Ugandan firms. [Gated published version]
Paper: Firm growth and corruption: Empirical evidence from Vietnam, by Jie Bai, Seema Jayachandran, Edmund J. Malesky and Benjamin Olken (2019) -- Examines data collected from 10,000 firms on their perception of corruption and bribe payments in Vietnam. [Gated published version]
See also Reid and Weigel (2019), who use self-reported estimates to measure the impact of financial incentives on bribes.

Through direct observation

Paper: Comparing corruption in a laboratory and in the field in Burkina Faso and in Canada, by Olivier Armantier and Amadou Boly (2013) -- Compares the results of direct observation of corruption in laboratories in Canada and Burkina Faso to the corruption in the field in Burkina Faso. [Gated]
Paper: The simple economics of extortion: Evidence from trucking in Aceh, by Benjamin Olken and Patrick Barron (2009) -- Observed bribes paid by truck drivers at police checkpoints or weigh stations in Indonesia. They also asked truck drivers to self-report bribe payments to check the accuracy of reported values. [Gated published version]
- For papers on observing bribes see An empirical study of corruption in ports, by Sandra Sequeira and Simeon Djankov (2010), who use direct observation of bribes at ports in Mozambique and South Africa. In How to subvert democracy: Montesinos in Peru, John McMillan and Pablo Zoido (2004) use detailed records of the former Chief of Police, Montesinos; the paper compares the amount of bribes paid to different institutions.

By comparing estimated and actual expenditure

Public Expenditure Tracking Surveys (PETS):
- Survey guide: Using public expenditure tracking surveys to monitor projects and small-scale programs /A guidebook, by Margaret Koziol and Courtney Tolmie (2010) -- A practical guide for implementing PETS.
- Broad overview: Following the money: Do public expenditure tracking surveys matter?, by Geir Sundet (2008) -- Discusses the limitations of wide adoption of PETS and suggests solutions and alternative methods.
- Broad overview: Implementing public expenditure tracking surveys for results: Lessons from a decade of global experience, by Asli Gurkan, Kai Kaiser, and Doris Voorbraak (2009) -- A review of previous implementations of PETS and a summary of their lessons and challenges.
- Broad overview: Public expenditure tracking and service delivery surveys: A review of design and implementation issues, by Ritva Reinikka (2002) -- A presentation on the need, features, benefits and implementation of PETS.
- An example of using PETS is given in Local capture: Evidence from a central government transfer program in Uganda, by Ritva Reinikka and Jakob Svensson (2004), where the authors compared allocated grants to schools with the actual grants received to measure “leakage” or amount of funds siphoned off by politicians. This is one of the first uses of PETS to track corruption. [Gated published version]
Paper: Tax rates and tax evasion: Evidence from “missing imports” in China, by Raymond Fisman and Shang-Jin Wei (2004) -- Compares China’s reported imports and Hong Kong’s reported exports of the same products to measure the impact of the tariff schedule on reported trade. [Gated published version]
Paper: Monitoring corruption: Evidence from a field experiment in Indonesia, by Benjamin Olken (2007) -- Compares the actual cost of the infrastructure project with engineers’ estimated costs. [Gated published version]
- For another paper on infrastructure projects, see Proposal for a new measure of corruption, illustrated with Italian data, by Miriam Golden and Lucio Picci (2005), where the authors compare the value of existing infrastructure with the amount allocated for the projects.
- For another paper using administrative data, see Corruption and the costs of redistribution: Micro evidence from Indonesia, by Benjamin Olken (2006). [Gated published version]
Paper: Just rewards? Local politics and public resource allocation in south India, by Timothy Besley, Rohini Pande, and Vijayendra Rao (2011) -- Measures the likelihood of being a beneficiary of a transfer program based on status and power. [Gated published version]
Another example is given by Banerjee et al. (2020), who use digital and mobile monitoring to reduce leakages in public spending.

From market inference

Paper: Estimating the value of political connections, by Raymond Fisman (2001) -- Obtained measure of firm’s political connectedness from a local consulting firm and compared how the stock prices changed when the president fell ill to estimate the value of the connections. [Gated published version]
- A similar approach is taken in Fisman et. al (2012), who use market reactions in the US to measure the value of political connections with Vice President Dick Cheney. [Gated published version]
Paper: Public sector pay and corruption: Measuring bribery from micro data, by Yuriy Gorodnichenko and Klara Sabirianova Peter -- Compares the pay and consumption gap between private and public sector employees to measure the amount of bribes or informal payments to public sector employees in Ukraine. [Gated published version]

Using audits

Paper: Electoral accountability and corruption: Evidence from the audits of local governments, by Claudio Ferraz and Frederico Finan (2011) -- Uses audit reports in Brazil municipalities to create a measure of corruption. [Gated published version]
Paper: Governance and the effectiveness of public health subsidies: Evidence from Ghana, Kenya and Uganda, by Rebecca Dizon-Ross, Pascaline Dupas, and Jonathan Robinson (2017) -- Measures corruption in health program administration with audits, back-checks and decoy visits.
Paper: Missing in action: Teacher and health worker absence in developing countries, by Nazmul Chaudhury, Jeffrey Hammer, Michael Kremer, Karthik Muralidharan and F. Halsey Rogers (2006) -- A review of surveys recording teachers’ and health worker absence during unannounced visits in six developing countries.
- Another example is Duflo, Hanna and Ryan (2012), who use cameras to reduce instances of teacher absenteeism.

Through other methods

Paper: Using field experiments in international relations: A randomized study of anonymous Iicorporation, by Michael Findley, Daniel Nielson, and J.C. Sharman (2013) -- Measures the willingness of firms to break the laws by proposing the formation of a shell company. [Gated Published Version]
Paper: Misunderestimating corruption, by Aart Kraay and Peter Murrell (2016) -- A tool-specific example of using random response methods to measure the prevalence of reticent methods in an enterprise survey. [Gated published version]

Discrimination

Overview

Improving diversity, equity, and inclusion in a particular context requires first understanding the nature and extent of bias and discrimination. For instance, there may be disproportionately fewer individuals of a certain marginalized group employed in a certain industry. This could be due to discrimination within the industry’s hiring practices, to other factors that reduce the ability of individuals from within the marginalized group to gain the skills necessary to enter the industry, or both. Disentangling these drivers is important for determining what types of interventions will effectively address this issue.

This section provides resources and tools to help researchers and practitioners better measure bias and discrimination. This includes a handbook chapter and numerous papers providing an overview of the types of research methods often used to identify discrimination in different contexts, such as audit and correspondence studies, list randomization, and more. It also includes more practical considerations for implementing these methods, including insight into when each can be especially useful and their limitations. In some cases, this overview is provided in the context of specific types of discrimination, such as race or gender, as well as in certain thematic areas, such as housing. It also provides a number of resources for those seeking to measure discrimination within the labor market. These include a synthesis of RCT evidence on hiring discrimination, as well as a book chapter and several papers that provide practical suggestions and methodologies for using many of the tools mentioned above specifically in labor market settings.

- Anupama Dathan, Policy Manager for the J-PAL Health sector

General overview

Book section: Field experiments on discrimination, by Marianne Bertrand and Esther Duflo, in Handbook of Field Experiments, edited by Abhijit Banerjee and Esther Duflo; North Holland, Volume 1, 309-395 (2017) -- A review of the literature on the prevalence and costs of discrimination against women and minority groups, with a particular focus on the labor market. [Gated published version]
Paper: Evidence on discrimination in consumer markets, by John Yinger (1998) -- Discusses measurement of race and gender-based discrimination in housing, car and fast-food sales.
Paper: Field experiments of discrimination in the market place, by P. A. Riach and J. Rich (2002) -- Review of evidence on discrimination against women and minority groups in the labor, housing and product markets. [Gated published version]
Paper: The size of the LGBT population and the magnitude of anti-gay sentiment are substantially underestimated, Katherine B. Coffman, Lucas C. Coffman & Keith M. Marzilli Ericson (2016) -- The authors use a “veiled methodology” to evaluate implicit bias against members of the LGBTQ population. [Gated published version]
Paper: Implicit discrimination, by Marianne Bertrand, Dolly Chugh and Sendhil Mullainathan (2005) -- Presents implicit discrimination as a driver of discrimination and discusses methods to test it. [Gated published version]

Labor market discrimination

J-PAL Policy insight: Changing resumes to reduce hiring discrimination, by Lisa Corsetto (2019) -- Summarizes the evidence on hiring discrimination based on randomized resume studies
Book sections: An overview of auditing for discrimination by Michael E. Fix, George C. Galster, Raymond J. Struyk (Chapter 1) and Audit studies: Their methods and findings by James J. Heckman and Peter Siegelman (Chapter 5) in Clear and convincing evidence: Measurement of discrimination in America, edited by Michael E. Fix, Raymond J. Struyk; Urban Institute Press (1993) -- An overview of using auditing for measuring discrimination. [Gated]
Paper: Detecting discrimination in audit and correspondence studies, by David Neumark (2012) -- Provides a methodology for measuring discrimination using data from audit and correspondence studies. [Gated published version]
Paper: Experimental research on labor market discrimination, by David Neumark (2018) -- Discusses various approaches to testing for discrimination.
Paper: Detecting discrimination, by James Heckman (1998) -- A review of the audit method for measuring dicrimination in the labor market.
- See Bailey (2013) for an example of the audit method for measuring bias associated with sexual orientation and gender identity. [Gated published version]
Paper: Race discrimination: An economic perspective, by Kevin Lang and Ariella Kahn-Lang Spitzer (2020) -- Discusses the techniques to measure taste based and statistical discrimination in many places, including the labor market and the workplace. Also presents evidence of discrimination.
Paper: Bias in the workplace: Consistent evidence of sexual orientation and gender identity discrimination 1998–2008, by M.V. Lee Badgett, Brad Sears, Holning Lau and Deborah Ho (2008) -- Summarizes the evidence produced by different methods to measure discrimination faced by members of the LGBTQ population.
Paper: Race and gender in the labor market, by Joseph G.Altonji and Rebecca M.Blank (1999), in the Handbook of labor economics; Elsevier B.V., Volume 3, 3143-3259 -- Discusses the evidence from different methods. [Gated published version]

Education

Overview

Please note that this section focuses on measurement challenges associated with K-12 education. While some of these may also be applicable to higher education, measurement in higher education presents unique challenges that are not included in this section.

Beyond the intrinsic value-add of education, schooling's association with higher income, greater health, more civic participation, and other benefits (Duflo 2001) has made increasing access to and quality of education a significant policy concern globally and a primary focus of RCTs in the social sciences. Education field experiments have predominantly centered on finding cost-effective ways to improve participation and learning outcomes, and a growing body of evidence has begun to reveal general themes about "what works." However, despite substantial advancements over the last decade, difficulties in measurement often limit our understanding of the impact of social programs.

At the broadest level, much work has been done in order to identify measurement issues caused by the high potential for spillover effects in education studies and to identify both potential solutions in study design and when they are feasible (see Muralidharan 2017 for a discussion). However, studies of specific outcomes reveal other potential measurement errors. For example, attendance is a standard indicator for measuring student participation and teacher absenteeism. However, administrative data in general is subject to several biases (Feeney et al., 2015), and administrative data in education has been found in certain cases to have systematic errors (Singh, 2021). When feasible, unannounced visits can help combat this issue (Muralidharan 2017).

Similarly, test scores are the most popular indicators of learning outcomes. However, test construction often lacks transparency and systematic design. This can impact the replicability, scale and generalizability of a study, as well as cause other measurement issues, such as impact underestimation because the test was too difficult for those in treatment and control conditions (see Singh (2015) for a discussion, Muralidharan, Singh, and Ganimian (2016) for a demonstration, and Muralidharan (2017) for essential principles of test design).

Test scores, however, may not adequately capture other critical holistic skills that go beyond literacy and numeracy. These include cognitive skills, social skills, emotional skills, creative skills, and physical skills of children, and are also commonly known as socioemotional skills, soft skills, life skills, 21st-century skills, cognitive and noncognitive skills, nonacademic skills, and many more. Holistic skills can be particularly prone to both random and systematic measurement error for additional reasons, for instance because skills like creativity or perseverance are less tangible and more multidimensional. Kautz et al. (2014) provides a review of the literature on measuring holistic skills and an overview of some of the tools for doing so. The subsection on “Skills and Effort” in the Labor subsection also provides guidance on measuring skills as they relate to economic output. More recently, Macours et al. (2023) reviews over 230 RCTs and provides a snapshot of which skills are measured, which types of measures are used, and how their validity and reliability are being assessed. The review also provides guiding questions for researchers to consider in their evaluation designs to improve the practice of conducting, documenting, and sharing their validation work. Laajaj & Macours (2021) provide a helpful appendix which includes an introduction to psychometric concepts and methods, and as an example, reports the index construction rationale for their own evaluation. Finally, the psychology, medicine, and health literatures include extensive guides and examples of instrument validation work and holistic skills measurement (see Cavallera et al. (2023) and Fernald et al. (2018)).

Besides student skills and performance, it can also be difficult to measure and compare the quality or effort of teachers. One commonly used method of comparing pre and post scores of students may suffer from biases as enumerated in Rothstein (2008). Guarino et al. (2012) provide a comparison of value added methods to other measures of teacher quality.

There are crucial policy implications to these measurement errors and the many others not mentioned. This section provides real-world datasets from the World Bank and national and regional assessments as well as resources that discuss different educational outcomes, options for measuring these outcomes, challenges that arise when measuring different effects, and strategies to overcome potential issues. Please see Muralidharan (2017) and the other resources below for a more in-depth discussion of the significant measurement issues discussed here.

- Demitria Wack and Jessica Williams Senior Policy Associate and Policy Associate on J-PAL's Education sector, and Thanh Nguyen, J-PAL Education Lead

Datasets

General:

World Bank EdStats -- Data on over 4000 indicators from 214 economies using data collected from 1970 to the present.

Datasets of test scores:

Global data set on education quality (1965–2015), by Altinok, Nadir, Noam Angrist, and Harry Anthony Patrinos (2018) -- A World Bank paper introducing a dataset of globally comparable and longitudinal education data.
National assessment of educational progress, by the National Center for Education Statistics -- national, state and district level assessment of American children on different subjects.
Program for international student assessment, by the Organisation for Economic Development and Empowerment -- “PISA measures 15-year-olds’ ability to use their reading, mathematics and science knowledge and skills to meet real-life challenges” (OECD)
Trends in international mathematics and science study (TIMMS) and Progress in international reading literacy study (PIRLS), from the International Association for the Evaluation of Educational Achievement (IEA) -- Comparative assessments in mathematics, science and reading for more than 60 countries.

General resources

Book section: Field experiments in education in developing countries, by Karthik Muralidharan, in The Handbook of economic field experiments; Elsevier, Volume 2: 323-385 (2017) -- Discusses the relevant questions in education, summarizes the evidence, discusses limitations and presents a toolkit on design of surveys, collection of data and measurement of outcomes. [Gated published version]
Online course: Lecture titled “Measurement III (Behavior, Education, Gender and Empowerment)” in Week 6 of J-PAL Micromasters Course: 102x Designing and running randomised evaluations -- The module on learning explains the use and statistics of Item response theory (IRT) for tests measuring learning levels.
Survey guide: Education modules by Paul Glewwe (2000) in Designing household survey questionnaires for developing countries; World Bank, Volume 1: 143-176 (2000) -- Provides an introductory framework for understanding the different indicators needed for different policy questions.
Book section: Improving school education outcomes in developing countries: Evidence, knowledge gaps, and policy implications, by Paul Glewwe and Karthik Muralidharan, in The Handbook of the economics of education; Elsevier, Volume 5: 653-743 (2016) -- A review of the evidence on education interventions. [Gated published version]

Measuring literacy and numeracy: Developing valid and reliable tests

Blog post: How standard is a standard deviation? A cautionary note on using SDs to compare across impact evaluations in education, by Abhijeet Singh (2015) -- explains the issues with comparing changes in standard deviations across contexts and proposes some ways of remedying the issue.
- Das and Zajonc (2008) use item response theory to standardise test scores for compatibility across countries. For another example of item response theory in educational assessments, see Muralidharan, Singh, and Ganimian (2019); their appendix contains a discussion of methodology. [Gated published version]
Paper: Validating the ASER testing tools: Comparisons with reading fluency measures and the Read India measures, by Shaher Banu Vagh (2009) -- tests the reliability and validity of ASER data from two studies.
Paper: International surveys of educational achievement: how robust are the findings?, by Giorgina Brown, John Micklewright, Sylke V. Schnepf, and Robert Waldmann (2007) -- Compares test scores from four international surveys to ascertain robustness of the measures [Gated published version]
Paper: Internationally comparable mathematics scores for fourteen African countries, by Justin Sandefur (2018) -- Provides a methodology to create internationally comparable scores using regional test scores. [Gated published version]
Survey guide: Understanding what works in oral reading assessments: Recommendations from donors, implementers and practitioners, by the UNESCO Institute for Statistics (2016) -- A comprehensive guide to conducting oral assessments and analysing the data.
Paper: The measurement of student ability in modern assessment systems, by Brian Jacob and Jesse Rothstein (2016) -- Discusses the design of tests and statistical methods to convert student test responses to a quantitative measure on a scale.

Measuring holistic skills

Paper: Measurement of holistic skills in RCTs: Review and guidelines, by Karen Macours, Jessica Williams, and Samuel Wolf (2023) -- Review of measurement strategies used in over 200 RCTs aimed at measuring and improving skills beyond typical literacy and numeracy skills in children, as well as guiding questions for researchers to ask themselves when designing new RCTs with these outcomes.
- See Appendix - Tools and papers for tools and papers reviewed for this resource [direct download].
Paper: Fostering and mentoring skills: Improving cognitive and non-cognitive success to measure lifetime success, by Tim Kautz, James J. Heckman, Ron Diris, Bas ter Weel, and Lex Borghans (2014) -- Review of the methods to measure cognitive and non-cognitive skills. The paper also explores the correlation of skills with economic success and the evidence of interventions to improve cognitive and non-cognitive skills among children and adolescents.
- See Alan, Boneva, and Ertac (2016) and Gneezy et al. (2017) for examples of studies measuring non-cognitive skills.
Paper: Measuring skills in developing countries, by Rachid Laajaj and Karen Macours (2021) -- An experiment designed to test the reliability and validity of commonly used skills measures; it was first implemented in Western Kenya and then replicated in Colombia.
Paper: Protocol for validation of the Global Scales for Early Development (GSED) for children under 3 years of age in seven countries, by Vanessa Cavallera et. al. (2021) -- The seven-country validation of the WHO-developed Global Scales for Early Development (GSED) to measure children’s early development up to 3 years of age.
Book: Measuring child development: A toolkit for doing It right, by Lia Fernald, Elizabeth Prado; Patricia Kariger, and Abbie Raikes (2017) -- A practical guide developed by the World Bank for selection and adaptation of child development measurements for use in research in low- and middle-income countries.

Student and teacher participation and effort

Paper: Missing in action: Teacher and health worker absence in developing countries, by Nazmul Chaudhury, Jeffrey Hammer, Michael Kremer, Karthik Muralidharan and F. Halsey Rogers (2006) -- A review of surveys recording teachers’ and health worker absence during unannounced visits in six developing countries.
Paper: Examining the reliability of self-reported data on school participation, by Sarah Baird and Berk Özler (2012). -- A comparison of self-reported and administrative data on school attendance in Malawi. [Gated]

Teacher quality and value add

Paper: Measuring the impacts of teachers I: Evaluating bias in teacher value-added estimates, by Raj Chetty, John N. Friedman, and Jonah E. Rockoff (2014) -- Measure bias in value-added estimates by comparing the estimates to other measures created using quasi-experimental methods from school and tax records.
- See Kane and Staiger (2008) for an example of using value-added methods to measure teacher performance.
Paper: Teacher quality in educational production: Tracking, decay, and student achievement, by Jesse Rothstein (2010) -- Tests the assumptions implicit in value-added models used to measure teacher performance. [Gated published version]
Paper: Can value-added measures of teacher performance be trusted?, by Cassandra M. Guarino, Mark D. Reckase and Jeffrey M. Wooldridge (2012) -- Compare value-added methods to other methods to measure teacher effectiveness. [Gated published version]

Household spending on education

Survey guide: The LSMS Guidebook to Measuring household expenditure on education, by Gbemisola Oseni et al. (2018) -- A guide to measuring the time and money spent by households on education.

Energy and environment

Overview

Research in environment, energy, and climate change encompasses a range of topics and with them challenges and opportunities in measurement unique to the sector. Some of the topics included in the sector include greenhouse gas emission reductions; measures to help people cope and live with the effects of climate change; pollution reduction and sustainable natural resource management; and access to affordable, reliable, and clean energy sources.

Generating evidence in environment, energy, and climate change is becoming increasingly urgent as emissions increase, global warming progresses, and communities start to feel the effects of a changing climate. Climate change is highly inequitable, with low-income communities being hardest hit by climate and weather shocks, while at the same time having the fewest resources to adapt. To better understand the impacts of programs, technologies, and policies, researchers are exploring new ways of combining different sources of data. Combining remote sensing data, satellite data, or administrative data collected by governments and utility companies with ground-truth and survey data has the potential to unlock insights about the efficacy of climate solutions. Technological innovations in, for example, sensor technology can produce more granular data on air quality and pollution (Khanna, 2000; O’Neill et al., 2003), allowing for new and innovative combinations with data on welfare losses and health. Policies and research on environment and climate change often face challenges in using available data to predict and measure the impacts of environmental shocks - a challenge that is being met with developments in predictive modeling to inform policy and humanitarian interventions.

Lastly, understanding human behavior and household-level measures to face environmental and energy challenges as well as mitigate and adapt to climate change opens up more questions. Researchers tackle these questions by studying incentive structures (Jayachandran et al., 2017; Hanna et al., 2016), effective regulation enforcement and monitoring (Duflo et al., 2013; Ghanem and Zhang, 2014), and energy consumption and conservation behavior (Burgess et al., 2020; Lee et al., 2020).

- Maike Pfeiffer, Policy Associate, and Andrea Cristina Ruiz, Policy Manager, for the J-PAL Environment, Energy, and Climate Change sector

Environment: General resources

The reading lists for MIT’s Environmental policy and economics (Allcott, Spring 2011) and Environmental economics and government responses to market failure (Greenstone, Spring 2005) are good primers for measurement issues in environmental economics. Though the measurement issues are more focused on cost benefit analysis, the Energy economics (Joskow, Spring 2007) course also has a reading list that may be helpful.

Book section: Environment modules, by Dale Whittington, in Designing household survey questionnaires for developing countries; World Bank, Volume 2, 5-30 (2000) -- An overview of measurement issues surrounding indicators relevant for environmental policy, including sections on contingent valuation, measuring resource use, and capturing environmental priorities. Has a particular focus on LSMS methods and includes example questionnaire modules.

Environment: Measuring benefits

General resources

Paper: Nonmarket valuation of environmental resources: An interpretive appraisal, by V. Kerry Smith (1993) -- A literature review of the pros and cons of various methods of nonmarket valuation of environmental resources; covers both indirect (e.g., revealed preference) and direct (e.g., WTP surveys) methods. [Gated]
Book: The measurement of environmental and resource values, by A. Myrick Freeman III, Joseph A. Herriges, and Catherine L. Kling; RFF Press, Vol. 3 (2014) [direct download link] -- A graduate-school level textbook that covers the valuation of environmental resources. Includes chapters on topics including, but not limited to, hedonic pricing, environmental quality as a factor input, and valuing longevity and health.

Hedonic method

Paper: Hedonic prices, property values and measuring environmental benefits: A survey of the issues, by A. Myrick Freeman III (1979) -- A review of the theoretical and empirical literatures on using hedonic pricing models to measure environmental benefits and demand functions.
Paper: Can markets value air quality? A meta-analysis of hedonic property value models, by V. Kerry Smith and Ju-Chin Huang (1995) -- A meta-analysis of empirical estimates of air-quality demand from hedonic property value models.
Paper: Wages, rents, and the quality of life, by Jennifer Roback (1982) -- Introduces a city-level measure of quality of life determined by wage rent values.
For a relatively recent example of using housing-price data to measure air quality demand, see Chay and Greenstone (2005). [Gated published version]

Environment: Measuring bads and costs

Measuring pollution using air quality sensors

Broad overview, Data: The EPA’s Air Emission Measurement Center (EMC) and Daily Air Quality Tracker -- The Air Emission Measurement Center compiles resources, including a comprehensive list of available tests, for measuring pollutant emissions from smokestacks and other industrial sources. The Daily Air Quality Tracker provides current and historical data on air quality at different levels of geographic aggregation.
Paper: Measuring environmental quality: an index of pollution, by Neha Khanna (2000) -- Introduces a new method for creating an air quality/pollution index, which uses the epidemiological dose-response functions of pollutants to aggregate by welfare loss. Compares the index to the EPA’s Pollutant Standards Index (PSI). [Gated]
Paper: Using unobtrusive sensors to measure and minimize Hawthorne effects: Evidence from cookstoves, by Andrew M. Simons, Theresa Beltramo, Garrick Blalock, and David I. Levine (2017) -- The Data in brief has a more detailed description of methods, including using kitchen performance tests, stove use monitors, and UCB Particulate and Temperature Sensors. [Gated published version]
Paper: Ozone exposure among Mexico City outdoor workers, by Marie S. O’Neill et al. (2003) -- Compares air pollution data gathered from both fixed and personal monitors; ozone measurement from fixed monitors is generally higher than that measured from personal monitors.
Paper: Effortless perfection: Do Chinese cities manipulate air pollution data?, by Dalia Ghanem and Junjie Zhang (2014) -- Introduces a set of tests for whether reported air pollution data has been manipulated, and tests the method on data from Chinese cities, finding suggestive evidence of manipulation.
For examples of research using air quality data, see Lavy, Ebenstein, and Roth (2014), Chang et al. (2016), and Ebenstein et al. (2017). For more technical papers on the variability of sensor measurement, see Blanchard and Tanenbaum (2003) [gated], who demonstrate the within-week variability of pollution measures, and Braniš et al. (2005) [gated], who examine the impact of local human activity on pollution measurement. For an example of research using continuous emissions monitoring systems, see Greenstone et al. (forthcoming).

Measuring pollution using audits

Paper: Truth-telling by third-party auditors and the response of polluting firms: Experimental evidence from India, by Esther Duflo, Michael Greenstone, Rohini Pande, and Nicholas Ryan (2013) -- Compares air and water pollution data gathered from audits and audit back-checks, pollution sampling, and administrative reports on regulatory action.
Paper: EIA practice in India and its evaluation using SWOT analysis, by Ritu Paliwal (2006) -- A discussion of the strengths and weaknesses of data taken from Environmental Impact Assessments (EIA) in India, as well as recommendations for improving that system. [Gated]
Paper: Does severe air pollution affect audit judgment?: Evidence from China, by Feng Chen, Xiaofeng Peng, and Jianguang Zeng (2017) -- Provides evidence that local air pollution can negatively affect the quality of audits.

Measuring deforestation using satellite imagery

Paper: Humid tropical forest disturbance alerts using Landsat data, by Matthew C. Hansen et al. (2015) -- Introduces and tests a methodology for measuring tropical forest disturbances using Landsat data. The tests were performed in Peru, the Republic of the Congo, and Indonesia, and the methodology is currently used by MAAP (the Monitoring of the Andean Amazon Project), among others.
Paper: Cash for carbon: A randomized trial of payments for ecosystem services to reduce deforestation, by Seema Jayachandran et al. (2017) -- Contains a description of using satellite imagery to measure deforestation in small areas.

Anthropometric measurement of costs

Paper: Up in smoke: The influence of household behavior on the long-run impact of improved cooking stoves, by Rema Hanna, Esther Duflo, and Michael Greenstone -- Measuring health effects of smoke exposure through carbon monoxide exhalation measurements, health recall data, biometric tests, and spirometry readings. [Gated published version]

Energy: General resources

Book section: Environment modules by Dale Whittington, in Designing household survey questionnaires for developing countries; World Bank, Volume 2, 5-30 (2000) -- An overview of measurement issues surrounding indicators relevant for environmental policy, including sections on contingent valuation, measuring resource use, and capturing environmental priorities. Has a particular focus on LSMS methods and includes example questionnaire modules.
Paper: Does household electrification supercharge economic development?, by Kenneth Lee, Edward Miguel, and Catherine Wolfram (2020) -- Challenges to measuring electrification (and combining measures of electrification).

Energy: Access and use

Data: The World Bank’s Sustainable Energy for All Databank -- Contains country-level indicators of energy consumption and production, with a particular focus on renewable energy.
Paper: Energy poverty: What you measure matters, by Lauren C. Culver (2017) -- A review of the current methods of measuring energy poverty, as well as a discussion of their strengths and weaknesses and of indices combining different pieces of them.
Paper: The U.S. Department of Energy’s Review of selected home energy auditing tools (2010) -- A review of home energy audit tools, including a literature review of past tests of audit tools as well as a comparison of ten selected tools.
- For an example of research using home energy audits, see Fowlie, Greenstone, and Wolfram (2018) [Gated published version].
Survey guide: The International Atomic Energy Agency’s Energy indicators for sustainable development: Guidelines and methodologies (2005) -- Guidelines for constructing macroeconomic energy indicators.
Paper: Measuring energy access: Supporting a global target, by Morgan Bazilian et al. (2015) -- A review of available measures of energy access, a discussion of how to combine them into indices, and a proposed framework for improving the measurement of energy access and poverty.
Paper: Measuring energy poverty: Focusing on what matters, by Patrick Nussbaumer, Morgan Bazilian, Vijay Modi, and Kandeh K. Yumkella (2012) -- Includes a review of past literature on measuring energy poverty and introduces a new energy poverty index, the Multidimensional Energy Poverty Index. [Gated published version]
Paper: Towards a new measurement of energy poverty: A cross-community analysis of rural Pakistan, by Bilal Mirza and Adam Szirmai (2010) -- Provides an overview of a specially-designed energy poverty instrument (the Energy Poverty Survey) and descriptive statistics from its use in rural Pakistan, and introduces a new energy poverty index that takes energy inconveniences, household size, and energy shortfalls into account.
For an example of research using administrative data to measure energy use, see Jack and Smith (2016).

Energy demand

Paper: Demand for electricity on the global electrification frontier, by Robin Burgess, Michael Greenstone, Nicholas Ryan, and Anant Sudarshan (2020) -- Revealed preference approach to measuring demand and estimation of substitution between electricity sources; structural model to study policy counterfactuals.
- See also Welfare benefits of decentralized solar energy for the rural poor in India by the same authors.

Financial inclusion

Overview

Financial inclusion, or access by individuals and businesses to quality and affordable financial products and services that meet their needs, is an increasingly common goal among policymakers. Measuring financial inclusion can help policymakers assess the current state of financial inclusion, set goals, and monitor progress towards achieving them. Moreover, research on these topics helps build our understanding of how interventions can support financial inclusion efforts. Data on financial inclusion has grown and improved in quality over the past decade.

To measure financial inclusion, researchers usually collect data from users (demand-side) or providers (supply-side). On the demand-side, surveys can capture an individual’s, household’s, or businesses' access to and use of financial services. One challenge with surveys is that it relies on self-reported data: respondents may not remember some financial decisions and may not want to provide accurate responses given the sensitivities of talking about one’s finances. On the supply-side, researchers can use administrative data from financial service providers or regulators. However, one limitation is that data from a single financial institution does not provide a full picture of an individual’s financial behavior, particularly if they use multiple institutions or do not use one at all, a case which is particularly common for low-income individuals. Finally, there are efforts to develop indices that capture the many dimensions of financial inclusion and aggregate measures of financial inclusion at a macro-level.

This section discusses various approaches to measuring financial inclusion, including by using survey data, using administrative or non-survey data, constructing indices of financial inclusion, and by using aggregate/macroeconomic measures of financial inclusion.

- Mikaela Rabb and Sam Carter, former Senior Policy Associates for the J-PAL Finance sector

General resources

Data: The IPA Dataverse contains survey data and instruments from their series of Consumer Protection in Digital Finance Users Surveys, as well as survey instruments and data from many finance-related RCTs.
Brief: IPA’s Building resilience through financial inclusion: A review of existing evidence and knowledge gaps, by Danielle Moore, Zahra Niazi, Rebecca Rouse, and Berber Kramer (2019) -- A research brief that provides a review of research on financial inclusion, as well as areas for further research.
Paper: Financial inclusion – measuring progress and progress in measuring, by Thorsten Beck (2016) -- An overview of issues in measuring financial inclusion as well as a review of recent advances in methods and data availability. Discusses indicators at the individual, household, firm, and aggregate levels.
Blog post: 10 useful data sources for measuring financial inclusion, by Karina Broens Nielsen -- A short list of data sources for measuring financial inclusion.

Using survey/administrative data

Using survey data

Data: Measuring financial inclusion: The Global Findex Database, by Asli Demirgüç-Kunt and Leora Klapper (2012) -- An introduction to and first analysis of the World Bank’s Findex Database, which contains survey data on financial inclusion.
- See also the Global Findex Database
Paper: Measuring household usage of financial services: Does it matter how or whom you ask?, by Robert Cull and Kinnon Scott (2010) -- A randomized test of respondent effects on measures of financial inclusion; measures tabulated from a survey of the head of household are compared to those taken from all members of a household and those taken from a randomly-selected member of the household. [Gated published version]
Survey guide & questionnaire: IPA’s Financial health survey manual -- A guide to implementing IPA’s Financial Health Survey, which measures individuals’ access-to-funds, access-to-finance, and financial behavior. Includes the instrument itself.
Paper: Measuring financial health around the globe, by Lasse Brune, Dean Karlan, and Rebecca Rouse (2020) -- A companion paper to the above survey guide, it provides a brief literature review of measuring financial health, describes the creation of IPA’s Financial Health Survey, and provides results from a validation of the tool and advice for use.
For an example of research using financial inclusion measured through survey data, see Beck, Demirgüç-Kunt and Peria (2008), who measure barriers to financial inclusion [Gated].

Using administrative/non-survey data

Data & survey guide: Measuring financial access: 10 years of the IMF Financial Access Survey, by Marco Espinosa-Vega et al. (2020) -- An overview of the IMF’s Financial Access Survey (FAS), including a short literature review of measuring financial inclusion, a discussion of and guide to the instrument, and an introduction to future areas of research in financial inclusion measurement.
- See also the Financial Access Survey
Brief: IPA’s Social media usage by digital finance consumers: Analysis of consumer complaints in Kenya, Nigeria, and Uganda from July 2019 - July 2020, (2021) -- Demonstrates the feasibility of using AI to extract and analyze social media information on digital finance consumers.
For examples of research using financial inclusion measured through administrative data, see Mian (2006) [gated], who uses administrative data on loans, and Beck, Ioannidou, and Schäfer (2017), who use administrative data on lending policies and practices. [gated published version]

Constructing indices or using aggregate measures

Constructing indices

Paper: Index of financial inclusion – A measure of financial sector inclusiveness, by Mandira Sarma (2012) -- Introduces a new index for measuring financial inclusion by aggregating measures taken from banking sector indicators.
Paper: Assessing countries’ financial inclusion standing— A new composite index, by Goran Amidžić, Alexander Massara, and André Mialou (2014) -- Constructing index weights using factor analysis.
Paper: Measuring financial inclusion: A multidimensional index, by Noelia Cámara and David Tuesta (2014) -- Constructing index weights using principal components analysis.

Aggregate/macroeconomic measures

Paper: Household financial assets in the process of development, by Patrick Honohan (2006) -- Discusses various sources of data on financial inclusion and introduces an econometric method for estimating a country’s level of financial inclusion.
Data: Section IV of the World Bank’s Financial inclusion strategies reference framework, by Douglas Pearce and Claudia Ruiz Ortega (2012) -- Discussion of measuring financial inclusion followed by a list of available surveys and indicators.
Paper: Discussing measures of financial inclusion for the main euro area countries, by Giorgio Nuzzo & Stefano Piermattei (2019) -- A discussion of various indices of financial inclusion and their applicability for euro area countries; further introduces the diffusion of electronic cards into the indices and tests the results using survey data. [Gated]

Gender

Overview

Outcomes that indicate progress in the sphere of gender equality are often difficult to define and measure. For example, abstract concepts such as women’s agency and empowerment are difficult to define, making it challenging to identify indicators that capture meaningful changes in them. Indicators may additionally need to be tailored to the local context to accurately measure women’s agency or empowerment in any given region (Donald et al., 2017), while self-reported indicators of gender equality or empowerment, such as individuals’ gender attitudes, may be subject to reporting bias. For instance, participants may report having more progressive attitudes than they actually do if they believe that is aligned with what the surveyor wants to hear or with generally accepted social norms (J-PAL Women’s empowerment measurement guide). Due to intra-household power dynamics, women may answer questions differently than men, or may answer differently depending on who is present for the interview (Goldstein, 2011). In addition, researchers and practitioners must always consider the potential unintended effects and ethical implications of collecting sensitive information related to gender, such as information on women’s experience of gender-based violence (IPA’s GBV survey guide).

Some important questions that research on measuring gender-related outcomes attempts to answer are: i) how are some abstract concepts related to gender equality, such as women’s agency and empowerment, defined and measured? ii) what are some standardizable indicators of gender-related outcomes that can also be tailored to local contexts? iii) how can researchers and practitioners design survey instruments to reduce reporting bias and gather accurate information related to gender? iv) what are some ethical issues that researchers and practitioners must consider while collecting sensitive information related to gender?

This section includes an array of resources that provide answers to the above questions. The section begins with a list of repositories of measures of women’s agency and empowerment (e.g. EMERGE website, UCSD) and examples of questionnaires and survey instruments that minimize bias and help gather accurate information (e.g. appendices of the J-PAL Women’s empowerment measurement guide; DHS women’s status module). It also lists several resources on defining and measuring important outcomes related to women’s agency (e.g. Kabeer,1999; Donald et al. 2017), household decision-making power (e.g. Glennerster and Walsh, 2017; Doss and Quisumbing, 2018), women’s economic empowerment (e.g. Glennerster and Diaz-Martin, 2017; Anand et al., 2019), and women’s empowerment in agriculture (Koolwal, 2019; Lambrecht et al., 2017). This section also includes resources that delve into various sources of bias in surveys (e.g. Goldstein, 2013; Jayachandran, 2017) and others on ethical considerations when collecting sensitive information (e.g. IPA’s GBV Survey Guide; WHO and PATH GBV Guide).

Gender issues cut across many different sections. While we try to provide a comprehensive overview of measurement issues related to gender, it may be advisable to refer to particular sections to learn more about how to measure gendered experience in a particular field. Also note that guides on measurement of gender-based discrimination are included in the larger section on measuring discrimination.

- Yvette Ramirez, Policy Manager for the J-PAL Gender sector

Data Sources

EMERGE website, UCSD -- EMERGE records and tests different measures of women’s empowerment. This includes measures of autonomy, outcomes, agency, etc.
Women’s Empowerment in Agriculture Index (WEAI), International Food Policy Research Institute -- Survey based index designed to measure the empowerment, agency and inclusion of women in the agricultural sector.

Questionnaires

Survey guide: Appendix 1: Examples of survey questions related to women’s empowerment and Appendix 2: Examples of non-survey instruments and tips on using them to A practical guide to measuring women's and girls' empowerment in impact evaluations, by Rachel Glennerster, Claire Walsh, and Lucia Diaz-Martin (2018) -- Guides on survey and non-survey methods for measuring indicators in the realms of economic, social, intimate partner and family, political and civic, psychological, education, and health.
Questionnaire: DHS women’s status module -- Module on measuring the role of women in the household.

Empowerment, autonomy, agency, and household decision-making

Agency, autonomy, economic empowerment and decision-making are interwoven with each other and form part of the much larger construct of female empowerment. As such, the allocation of papers into buckets is necessarily somewhat arbitrary.

Empowerment

Survey guide: A practical guide to measuring women's and girls' empowerment in impact evaluations, by Rachel Glennerster, Claire Walsh, and Lucia Diaz-Martin (2018) -- Presents the challenges of measuring empowerment and provides tools that help design research and aid data collection.
Blog post: We talk a lot about empowerment, but how do we measure it?, by Jed Friedman (2011) -- An overview of a few papers that measure women’s autonomy.
Book: Measuring women’s economic empowerment: Lessons from South America, edited by Susana Martinez-Restrepo and Laura Ramos-Jaimes; International Development Research Centre (2017) -- A guide on evaluating women’s agency in making economic decisions and measuring economic outcomes based on research in South America. Refer to “Chapter 9: Methodological Considerations in Measuring Women’s Economic Empowerment in South America” (pp 131) for a detailed discussion of measurement issues.
Paper: Resources, agency, achievements: Reflections on the measurement of women’s empowerment, by Naila Kabeer (1999) -- Addresses methodological issues with respect to measuring material freedom, freedom of decision making and achievements of women. [Gated Published version]
Paper: Power vs money: Alternative approaches to reducing child marriage in Bangladesh, a randomized control trial, by Nina Buchmann, Erica Field, Rachel Glennerster, Shahana Nazneen, Svetlana Pimkina, and Iman Sen (2018) -- Measures the impact of an empowerment program on observed outcomes for girls and women. It also provides a good discussion of the researcher’s methodology.

Agency

Paper: Measuring women’s agency, by Aletheia Donald, Gayatri Koolwal, Jeannie Annan, Kathryn Falb, and Markus Goldstein (2017) -- Evaluates different methodologies for measuring agency against a multi-dimensional framework of agency as goal-setting, ability to achieve goals, and acting on goals. Also assesses how each method adapts to the sub-Saharan context.
Paper: Using machine learning and qualitative interviews to design a five-question women’s agency index, by Seema Jayachandran, Monica Biradavolu and Jan Cooper (2021) -- Creates a five-question index of women’s agency. The questions are chosen based on their correlation to coded qualitative interviews of women in Haryana, India. The paper also provides a short literature review of the different methods used to measure agency.
Paper: The SWPER index for women's empowerment in Africa: development and validation of an index based on survey data, by Ewerling et al. (2017) -- Based on the Demographic and Health Survey data on partnered women from 34 African countries, the SWPER index includes 15 questions on three dimensions of empowerment attitudes to violence, social independence, and decision making.

Household decision-making

Blog post: Is it time to rethink how we measure women’s household decision-making power in impact evaluations?, by Rachel Glennerster and Claire Walsh (2017) -- Critiques the use and applicability of common survey questions on decision making.
Book section: Gender, household behavior, and rural development, by Cheryl Doss and Agnes Quisumbing (2018) -- Reviews literature on intra-household resource sharing among men and women.
Paper: Understanding the measurement of women’s autonomy: Illustrations from Bangladesh and Ghana, by Gregory Seymour and Amber Peterman (2017) -- Evaluates the validity of indicators to measure household decision making by women.
Paper: Measuring women’s decision making: Indicator choice and survey design experiments from cash and food transfer evaluations in Ecuador, Uganda, and Yemen, by Amber Peterman et al. (2015) -- Reviews the effectiveness and accuracy of different decision making indicators, created to measure empowerment in the context of cash and food transfer programs in Ecuador, Yemen, and Uganda.
Paper: Intrahousehold bargaining and resource allocation in developing countries, by Cheryl Doss (2013) -- Reviews the framework and the evidence on intrahousehold bargaining power of women.

Economic empowerment and labor force participation

Blog post: What does women’s labor force participation really tell us about women’s empowerment?, by Rachel Glennerster and Lucia Diaz-Martin (2017) -- Evaluates whether labor force participation is linked to empowerment and proposes other measures that may be a better indicator of empowerment.
Paper: Practical tools and frameworks for measuring agency in women’s economic empowerment, by Mansi Anand, Anna Mecagni, and Maryam Piracha (2019) -- Reviews different tools that have been used to measure women’s agency in programs that promote economic empowerment.
Paper: Sensitivity of work participation rates to survey design, by Neerad Deshmukh, Sonalde Desai, Santanu Pramanik, and Dinesh Kumar Tiwari (2020) -- Examines different measurement tools for capturing women’s participation in the labor market.

Measuring women’s employment and empowerment in agriculture

Paper: Improving the measurement of rural women's employment: Global momentum and survey research priorities, by Gayatri Koolwal (2019) -- Provides a framework for issues to consider when measuring rural women’s labor force participation using data from Ethiopia, Malawi, Nigeria, and Uganda Living Standards Measurement Study-Integrated Surveys on Agriculture and pilot data from the International Labour Organization.
Paper: Changing gender roles in agriculture, by Isabel Lambrecht, Monica Schuster, Sarh Asare, and Laura Pelleriaux (2017) -- Documents changes in gender patterns, land, property rights, market participation, agricultural inputs and employment.

Effect of intra-household differences on measurement

Paper: Moving from the household to the individual: Multidimensional poverty analysis, by Ramya M. Vijaya, Rahul Lahoti, and Hema Swaminathan (2014) -- Differences between using household-level and individual-level measures of poverty. A new measure to examine resource allocation within households.
Paper: Husband-wife survey responses in Malawi, by Kate Miller, Eliya Msiyaphazi Zulu, and Susan Cotts Watkins [Gated]. A blog post about the article: Getting to better data: Do men say yes more often than women?, by Markus Goldstein (2013) -- The difference between the responses of husbands and wives.
Blog post: Love and secrets, by Markus Goldstein (2011) -- Differences in expenditure reporting by spouses.

Gender-based violence

Survey guide: Innovations for Poverty Action (IPA)’s The safe and ethical conduct of violence research -- Policies and best practices for conducting surveys for violence research (gender based or otherwise).
Paper: The World Health Organization and Program for Appropriate Technology in Health (PATH)’s Researching violence against women: A practical guide for researchers and activists -- Practical guide for conducting research on gender-based violence in low-resource settings including guidance on study design, collecting data, and presenting results.
Paper: Method Matters: The Underreporting of Intimate Partner Violence by Claire Cullen (2022) -- Compares estimated intimate partner violence (IPV) reporting rates using a list experiment and standard survey methodology and discusses the magnitude and predictors of misreporting.

Gender preferences

Blog post: Odds are you’re measuring son preference incorrectly, by Seema Jayachandran (2017) -- Discussion on survey questions for measuring preferences for sons and daughters.
- Paper: Fertility decline and missing women, by Seema Jayachandran (2017) -- Models the preference for males and a smaller family to evaluate the adverse effect on a declining sex-ratio due to sex-selective abortions in Haryana.

Health

Overview

Field experiments in health economics help to answer a variety of questions related to the take-up and delivery of health products and services. From helping to better determine the factors that motivate individuals to adopt healthy behaviors to identifying innovations that improve the delivery of health services, this type of research is an important input to strengthening health systems and improving health outcomes around the world. Accurately measuring baseline, intermediate, and final health outcomes is a critical component of determining whether a given policy or program was effective. Some metrics, such as HIV prevalence, can be measured through relatively straightforward tests. But other outcomes are trickier to measure. For instance, child malnutrition is a key predictor of mortality. What is the best measure of malnutrition rates? Height-for-age, weight-for-height, mid-arm circumference, iron deficiency anemia, and more can all be appropriate in certain situations. Which one should a researcher choose given the context and their research questions? Use of modern contraceptives is an important measure of fertility, but respondents may be tempted to report regular use, even if this is untrue, if they feel they should be using them. How can researchers avoid this type of desirability bias?

This section, categorized according to outcomes and health conditions, compiles resources to guide researchers through these and other health measurement challenges. Produced by experts including the World Health Organization, UNICEF, and pioneering researchers in the field, these resources range from survey design guides to best practices for measuring tricky outcomes. In instances where multiple metrics may be appropriate, they also provide suggestions on how to help determine the best indicator(s).

- Anupama Dathan, Policy Manager for the J-PAL Health sector

General resources

Questionnaire: Model surveys from the Demographic and Health Surveys program -- Provides a high-level overview of the DHS’s four main questionnaires (Man, Woman, Household, and Biomarker), and provides links to current and past modules.
Book section: Health modules by Paul Gertler, Elaina Rose, and Paul Glewwe, in Designing household survey questionnaires for developing countries; World Bank, Volume 1: 177-216 (2000) -- An overview of indicators relevant for health policy, a discussion of survey methods used to capture those indicators, and annotated example questionnaire modules. Has a particular focus on LSMS methods.
Online course: J-PAL’s Measuring health outcomes in field surveys course -- Contains lectures and interactive material on all aspects of measuring health outcomes in field surveys: measuring individual and population health, selecting health indicators and measurement tools, questionnaire development, and practical and ethical issues for data collection.
Paper: The impact of recall periods on reported morbidity and health seeking behavior, by Jishnu Das, Jeffrey Hammer, and Carolina Sánchez-Paramo (2012). -- An experimental comparison of different recall periods on different reported health outcomes, including morbidity, doctor visits, time spent sick, and use of self-medication. Includes an exploration of the effects among different subgroups of the sample. [Gated published version]
Blog post: Quantifying the Hawthorne effect, by Jed Friedman and Brinda Gokul (2014) -- A compilation and short literature review of papers attempting to quantify the Hawthorne effect in health studies.

Health indicators

Conventional indicators

Broad overview: The World Health Organization’s Global reference list of 100 core health Indicators (plus health-related SDGs), 2018 -- A list of 100 core health indicators compiled by the WHO, with a broad classification, a definition, possible disaggregations, preferred measurement technique, and preferred data source listed for each indicator.
Survey guide: Disability measurement in household surveys: A guidebook for designing household survey questionnaires, by Marco Tiberti and Valentina Costa (2020) -- An LSMS guidebook on measuring disabilities. Includes an exploration of the definition of a disability, a framework for measurement, and a comparison of three sets of modules: those by the Washington Group, by DHS, and by the WHO. Includes sample modules.

Composite measures

Paper: Measuring the health of populations: explaining composite indicators, by Adnan A. Hyder, Prasanthi Puvanachandra, and Richard H. Morrow (2012) -- A review of the main composite health measures, including DALYs, HeaLYs, and QALYs, their construction, and their advantages and limitations.

Anthropometric data

Paper: The National Academy of Sciences’ Biological and clinical data collection in population surveys in less developed countries (2000) -- A discussion of the logistics, ethics, and usefulness of adding anthropometry measures to large-scale household surveys and censuses; contains disease-specific discussion and guidance.
Book section: Anthropometry modules by Harold Alderman, in Designing household survey questionnaires for developing countries; World Bank, Volume 1: 251-272 (2000) -- Sample anthropometry modules, along with a discussion of when to include one in a larger health module, and for which members of the household.
Survey guide: The FANTA project’s Guide to anthropometry: A practical tool for program planners, managers, and implementers, by Kristen Cashin and Lesley Oot (2018) -- A comprehensive practical guide to anthropometry for four groups: children 0-5 years of age, children and adolescents 5 to 19, pregnant and postpartum women and girls, and Adults 18 years and older. Contains example protocols and recommended equipment.
Paper: Methodologic issues in measuring physical activity and physical fitness when evaluating the role of dietary supplements for physically active people, by William L. Haskell and Michaela Kiernan (2000) -- The “Measurement Of Physical Activity And Physical Fitness” section provides an introduction to a number of indicators used to measure physical activity and fitness.

Early childhood development (general)

Questionnaire: The Caregiver Reported Early Childhood Development Instruments (CREDI) -- a set of instruments for measuring early childhood development validated specifically for children ages 0 to 3 across developing countries that rely heavily on caregiver reporting. The site contains the instrument, an overview, and a user guide.
Survey guide: UNICEF’s Multiple indicator cluster surveys: Delivering robust data on children and women across the globe, by Shane Khan and Attila Hancioglu (2019) -- An overview of UNICEF’s MICS, including a list of available modules and discussions of sample selection and data quality.
Survey guide: A first initiative to create regionally comparative data on child development in four Latin American countries: Technical annex, by Aimee Verdisco et al. (2015) -- A retrospective on the PRIDI initiative of the IADB: includes a discussion of the development of the initiative and its implementation, as well as the final instrument used in gathering the data.

Early childhood development (cognitive)

Journal issue: ScienceDirect’s collection of articles on the Bayley-III Scale -- A collection of journal articles on the Bayley-III Scale, an instrument designed to assess the developmental functioning of infants, toddlers, and young children aged between 1 and 42 months; contains articles on the individual scales that make up the Bayley-III, as well as an international review of research that employs it and reviews of similar developmental tools.
Questionnaire: Tools from the U.S. Bureau of Labor Statistics:
- The Peabody Picture Vocabulary Test -- An overview of the Peabody Picture Vocabulary Test, which measures verbal ability and scholastic aptitude for individuals 2.5-40 years of age. Includes links to the instrument itself, its technical report, and similar cognitive development tools.
- The Home Observation for Measurement of the Environment (HOME) -- An overview of the Home Observation for Measurement of the Environment module, which measures the quality of a child’s home environment. Includes links to the instrument itself, its technical report, and similar tools from the BLS.

Early childhood development (physical)

Journal issue: WHO child growth standards, by Mercedes de Onis, Cutberto Garza, Adelheid W. Onyango, and Reynaldo Martorell (2006) -- A collection of articles that describe the WHO Child Growth Reference Study and the standards that came out of it; includes the standards themselves as well as multiple articles discussing the development and validity of the study. [Direct download]
Paper: Using height-for-age differences (HAD) instead of height-for-age z-scores (HAZ) for the meaningful measurement of population-level catch-up in linear growth in children less than 5 years of age, by Jef L. Leroy et al. (2015) -- Argues for the use of height-for-age differences (HAD) instead of height-for-age z-scores (HAZ) when examining child growth in the same population over time because of the cross-sectional nature of HAZ; tests both HAD and HAZ using DHS and COHORTS data.

Nutrition

Survey guide: The World Health Organization’s Indicators for assessing infant and young child feeding practices: definitions and measurement methods (2021) -- A set of 17 recommended indicators for measuring IYCF and their definitions and measurement tips; contains example questionnaire modules, example surveyor instructions, and tips for adapting the instruments to specific contexts.
Survey guide: The Food Insecurity Experience Scale: Development of a global standard for monitoring hunger worldwide, by Terri J. Ballard, Anne W. Kepple, and Carlo Cafiero (2013) -- A discussion of the FAO’s Food Insecurity Experience Scale (FIES) and its development; includes a review of other food insecurity and hunger indicators, the questions that form the scale, and advice for translation and implementation.
Paper: What do we really know? Metrics for food insecurity and undernutrition, by Hartwig de Haen, Stephan Klasen, and Matin Qaim. (2011) -- A comparison of three of the main ways to measure chronic food insecurity: the FAO indicator of undernourishment, childhood anthropometrics, and household food consumption surveys. Tests the correlation between the three measures and discusses each of their strengths and pitfalls. [Gated published version]
Paper: Development, validation and utilisation of food-frequency questionnaires – a review, by Janet Cade, Rachel Thompson, Victoria Burley, and Daniel Warm (2002) -- A literature review of food-frequency questionnaires and their development and validation.

Sexual and Reproductive Health

Paper: Sex, lies, and measurement: Consistency tests for indirect response survey methods, by Erica Chuang, Pascaline Dupas, Elise Huillery, and Juliette Seban (2021) -- Introduces internal consistency tests for list randomization and randomized response technique, and applies them in data collection on sexual and reproductive health in Côte d’Ivoire and Cameroon. [Gated published version]
Book section: Fertility modules by Indu Bhushan and Raylynn Oliver in Designing household survey questionnaires for developing countries; World Bank, Volume 2: 31-48 (2000) -- Example and annotated short and long questionnaires for gathering fertility data in household (particularly LSMS) surveys, as well as recommendations for implementation.
- For an example of research measuring sexual and reproductive health, see Fertility decline and missing women, by Seema Jayachandran (2017).

Healthcare quality

Healthcare quality/patient satisfaction

Blog post: Pitfalls of patient satisfaction surveys and how to avoid them, by David Evans (2018) -- A summary of two papers studying the validity and reliability of patient satisfaction surveys; discusses the results from the papers and provides advice on how to improve patient satisfaction surveys, including links to two recommended instruments.
Paper: Bias in patient satisfaction surveys: a threat to measuring healthcare quality, by Felipe Dunsch, David K. Evans, Mario Macis, and Qiao Wang (2018) -- A test of whether positive framing in agree/disagree questions in patient satisfaction surveys biases estimates upward.
Paper: Which doctor? Combining vignettes and item response to measure doctor quality, by Jishnu Das and Jeffrey Hammer (2005) -- Introduces a new method for measuring doctor quality--combining survey vignettes and Item Response Theory (IRT)--and tests the method using data from urban India. [Gated published version]

Using audits and mystery shoppers

Blog post: Mystery clients in development research, by David Evans (2015) -- A compilation and review of papers using mystery clients in development research, with the majority focusing on health.
- For examples of health research using mystery shoppers, see Hetzel et al. (2008), who use them in medical referrals, and Fitzpatrick (2015), who uses them in drug markets.
Paper: Missing in action: Teacher and health worker absence in developing countries, by Nazmul Chaudhury, Jeffrey Hammer, Michael Kremer, Karthik Muralidharan, and F. Halsey Rogers (2006) -- A review of surveys recording teachers’ and health worker absence during unannounced visits in six developing countries.

Housing stability and homelessness

Overview

Housing instability is both a function of and a catalyst for poverty. Maintaining stable housing is a necessary prerequisite in many cases for health, employment, education, and a host of other fundamental needs. The scope and complexity of housing instability and homelessness highlight the need for rigorous evidence on the effectiveness of strategies to prevent and reduce homelessness. A first step in generating this evidence is defining and measuring homelessness and housing instability adequately.

Unfortunately, the measurement of housing instability is complicated by the existence of a variety of definitions and no widely established measurement system of it. For instance, in the United States, children who share housing with others (living “doubled up”) qualify for assistance under some programs, but not others. Moreover, the scope of people experiencing homelessness can vary by orders of magnitude depending on which definition one uses; including children who are living in doubled up conditions increases estimates of the number of children experiencing homelessness by a factor of 10 from the standard “point in time” (PIT) count. An emerging literature looks at how to measure housing stability using techniques for reaching mobile populations and consumer reference data (e.g. Phillips (2020); Kalton (2001))

Further challenges to measurement come with measuring people who are unsheltered (those sleeping outside or in places not meant for human habitation), typically part of the PIT count in the United States; some studies have found that PIT counts can understate the rate of unsheltered homelessness by as much as 50 percent (e.g. Evans, Phillips, and Ruffini (2019))

The resources included below cover survey and administrative data methods for counting people experiencing housing instability and homelessness, covering topics from oversampling to ensure adequate representation of minority groups through methods for including hard-to-reach subpopulations. A reflection of J-PAL’s internal expertise, the resources below center around housing instability and homelessness in the United States; we welcome suggestions for additional resources to include, particularly those based in or relevant to other countries.

- Rohit Naimpally, Senior Research and Policy Manager, for J-PAL's Reducing and Preventing Homelessness Initiative

General resources

Broad overview: Measuring homelessness and the extent of the problem (2019), in Reducing and preventing homelessness: A review of the evidence and charting a research agenda by William N. Evans, David C. Phillips and Krista Ruffini (2019) -- Section 2 provides an overview of the different methods for measuring homelessness and associated challenges. It also defines homelessness and discusses methods to reduce it.
Survey guide: U.S. Department of Housing and Urban Development’s A guide to counting unsheltered homeless people (2008) -- Defines homelessness, reviews some common methods for measuring it and presents some common issues in the US context.
Paper: Defining and measuring homelessness, by Volker Busch-Geertsema (2014) -- A review of the definitions, measurement and prevalence of homelessness in the EU.

Using administrative data

Paper: The potential of linked administrative data for advancing homelessness research and policy, by Dennis P. Culhane (2016) -- Discusses how linked administrative data can help identify welfare programs used by homeless populations and uncover gaps that need to be addressed to reduce homelessness.
Paper: Measuring housing stability with consumer reference data, by David C. Phillips (2020) -- Uses home addresses in consumer data to track changes in housing for low-income groups. [Gated published version]
Broad overview: 2016 AHAR: Part 2 - Estimates of homelessness in the U.S., by Larry Buron, Tom McCall, and Claudia D. Solari (2017) -- Measuring homelessness using administrative data (in this case the U.S.’s Homeless Management Information System (HMIS)).

Survey methods

Broad overview: Sampling and weighting a survey of homeless persons, by Pascal Ardilly and David Le Blanc (2001) -- Uses weight sharing to correct for duplication.
Broad overview: Methods for oversampling rare subpopulations in social surveys, by Graham Kalton (2009) -- Statistical framework for oversampling rare subgroups in broad surveys.
Broad overview: Practical methods for sampling rare and mobile populations, by Graham Kalton (2001) -- The statistical framework for sampling methods that can be used for measuring homelessness.
Book section: Towards a strategy for counting the homeless, by Tracy Peressini, Lynn Mcdonald, and J. David Hulchanski in Finding home: Policy options for addressing homelessness in Canada; Canadian Observatory on Homelessness, Toronto (2010). -- Arguments in favor of service-based methods, compares it with other methods, and presents an example.

Capture-recapture methods (plant capture, mark-recapture etc.)

Broad overview: Can we measure homelessness? A critical evaluation of ‘Capture-Recapture’, by Malcolm Williams (2010) -- Discusses the possibilities and limitations of the “capture-recapture” method for measuring homelessness.
- Examples of capture-recapture method include: Berry (2007); D'Onise, Wang, and McDermott (2007) [gated]; and Fisher, Turner, Pugh, and Taylor (1994).
Paper: Estimating numbers of unsheltered homeless people through plant-capture and postcount survey methods, by Kim Hopper, Marybeth Shinn, Eugene Laska, Morris Meisner, and Joseph Wanderling (2008) -- Testing PIT estimates against two other counting methods.
- For a similar paper on the 1990 Census, see Issues in the use of a plant-capture method for estimating the size of the street dwelling population by Elizabeth Martin, Eugene Laska, Kim Hopper, Morris Meisner, and Joe Wanderling (1997).
Paper: Bayesian estimation of the size of a street-dwelling homeless population, by Lawrence C. McCandless, Michelle L. Patterson, Lauren B. Currie, Akm Moniruzzaman, and Julian M. Somers (2016) -- Creating confidence intervals for homeless populations using plant-recapture techniques in Edmonton.
Paper: Estimating the prevalence of hard‐to‐reach populations: The illustration of mark‐recapture methods in the study of homelessness, by Ian Shaw, Michael Bloor, Richard Cormack, and Howard Williamson (1996) -- Using mark-recapture methods to estimate the size of the homeless population. [Gated]

Labor

Overview

While governments invest a lot of resources into active labor market policies, evidence on the effectiveness of many of these programs is inconclusive. Reliable micro- and macro-economic data on labor market outcomes is essential for policymakers to understand the needs in their labor markets and to assess the impact of their policies. Key areas for research include training people in skills demanded by the labor market, helping them search for work, and reducing discrimination.

An important challenge for research in the labor space is inconsistency in the ways researchers measure key labor market indicators such as work, employment, unemployment, inactivity, skills, ability, and productivity. Small differences in survey features--such as the design of a questionnaire, the length of the labor module, the way the survey is implemented or even the wording of a question--can have outsize effects on labor market statistics (Dillon et al. 2012). A second challenge is that reliable administrative data is limited in low-income countries given the prevalence of self-employment or seasonal work and the complexity of capturing migration. People living in low-income countries usually have a portfolio of formal and informal activities. Tracking their earnings can be a sensitive task.

Questionnaire modules using either single keyword questions, activity lists, or time diaries are often used. A common concern about the accuracy of labor data collected relates to women and youth statistics. Indeed, these two subcategories of population tend to engage more often in an "atypical" type of work or in domestic work, which can lead to systematic under- or over-reporting depending on the wording of a question. Collecting accurate data on skills and job satisfaction is also challenging (Friedman, 2012). Cognitive skills measurements such as Raven and Stroop tests are usually reliable and consistent. Soft skills are usually more subjective due to being self-reported, though psychometric tests and exercises are increasingly being used to enhance accuracy of those measures (Laajaj and Macours, 2017).

The International Labour Organisation has shared a definition of all key concepts in labor that serves as reference for surveys. However, not all statistical offices across the world use those definitions (Desiere and Costa, 2019), and it is difficult to harmonize key labor outcome variables.

This section compiles resources that explore and discuss these issues, moving from general resources and measurement challenges to resources and tools for measuring skills and effort, productivity, and job satisfaction.

- Victoire Fribourg, Policy Associate, and Lisa Corsetto, Policy Manager, for the J-PAL Jobs and Opportunity Initiative

General resources

Book section: Employment modules by Julie Anderson Schaffner, in Designing household survey questionnaires for developing countries; World Bank, Volume 1: 177-216 (2000) -- Details the key policy concern and the required data, and provides a few prototypes of employment modules.
Paper: Employment data in household surveys, by Sam Desiere and Valentina Costa (2019) -- Discusses the methodological challenges related to measuring employment indicators and reviews the different kinds of surveys used in different studies.

Bias and sensitivity in labor statistics

Paper: Conceptualizing the labor force: The underestimation of women's economic activities, by Lourdes Beneria (1981) -- Challenges the underlying beliefs about “work” and provides an ideological framework for understanding work based on welfare. [Gated]
Paper: Measuring labor market dynamics: Gross flows of workers and jobs, by Joseph A. Ritter (1993) -- Introduces the concept of labor flows and different ways to measure them.
Paper: Do labor statistics depend on how and to whom the questions are asked? Results from a survey experiment In Tanzania, by Pieter Serneels, Elena Bardasi, Kathleen Beegle and Andrew Dillon (2011) -- Conducts an experiment to measure the effect of survey design on key labor force statistics. [Gated published version]
Paper: Explaining variation in child labor statistics, by Andrew Dillon, Elena Bardasi, Kathleen Beegle, and Pieter Serneels (2012) -- Examines the effect of survey design on the measurement of child labor by comparing the results from two modules to a self-reported measure. The variation in results leads them to argue that the concept of work can be made more explicit to respondents through small changes in the questionnaire. [Gated published version]
Paper: Sensitivity of work participation rates to survey design, by Neerad Deshmukh, Sonalde Desai, Santanu Pramanik, and Dinesh Kumar Tiwari (2020) -- Examines different measurement tools for capturing female participation in the labor market.

Skills and effort

Paper and questionnaire: Measuring skills in developing countries, by Rachid Laajaj and Karen Macours (2019) -- Using psychometric tests in developing countries, with an application to agriculture. [Gated published version].
- The online appendix contains the questionnaires and discusses the construction of the paper’s various indices.
Questionnaire: The annex of the World Bank’s STEPSkills: Overview and initial results presentation (2010), which contains survey modules.
Survey guide: Measuring non-cognitive skills: Psychometric validation of scales, by Marta Grabowska -- Provides an overview of the different tools used to validate scales used to measure non-cognitive skills. It also provides sample STATA code.
Paper: Reducing information asymmetries in the youth labor market of Jordan with psychometrics and skill based tests, by Matthew Groh, David McKenzie, and Tara Vishwanath (2015) -- Psychometric tests to measure soft skills. Online appendix (pp 14-16 provides further details about the tests). [Gated published version]
- Associated blog post: Hard measurement of soft skills, by David McKenzie (2014).
Paper: The selection of talent: Experimental and structural evidence from Ethiopia, by Girum Abebe, Stefano Caria, and Esteban Ortiz-Ospina (2021) -- Uses measures of worker talent to estimate the effect of labor market frictions on recruitment. [Gated published version]
- The appendix contains detailed information on their measurement tools.

Productivity

Journal Issue: The World Bank productivity project (2021) -- Includes papers on measuring agriculture, labor and national productivity.
Paper: Understanding productivity: Lessons from longitudinal microdata, by Eric J. Bartelsman and Mark Doms (2000) -- Reviews papers on the distribution of productivity across firms, factors driving productivity, and changes over time using firm-level data. [Gated published version]
- See Akogun et al. (2017) for an investigation of the correlation between physical activity and productivity using firm level data.

Job satisfaction

Blog post: Job satisfaction matters…and the measurement of job satisfaction matters, by Jed Friedman (2012) -- Summarizes recent work on job satisfaction and challenges associated with its measurement.
Paper: Reliability and validity of instruments measuring job satisfaction -- A systematic review, by N. van Saane, J. K. Sluiter, J. H. A. M. Verbeek, & M. H. W. Frings‐Dresen (2003) -- Reviews the reliability and validity of several instruments for measuring job satisfaction in addition to discussing their ability to measure responsiveness.

Microenterprises and firms

Overview

Firms do not simply provide goods and services for the economy; they also generate jobs and secure income for workers. Firm-related policies can potentially have large impacts on poverty alleviation through quality employment. However, firms are an understudied area in the experimental economics literature and in policy analysis. Part of the challenge stems from the onerous cost of conducting rigorous firm-level research at scale, while additional constraints persist on the data, measurement, and methodological fronts. These challenges are more pronounced for low- and middle-income countries and the informal sector given the scarcity of existing reliable data, and more nuanced measurement challenges.Therefore, it is important for researchers to leverage and build on existing firm-level datasets before they embark on their own data collection efforts.

There are three central questions on how to measure the performance of microenterprises and firms, and their contribution to economic development:

What are some of the most commonly used firm-level datasets that are currently available, and how have researchers been using them to study the role of firms in development
What are the most accurate and cost effective techniques to obtain representative and comparable firm-level indicators across countries and time?
How can researchers leverage innovative survey instruments and data sources to measure firm characteristics and outcomes that are not reflected in existing official statistics or directly observable (e.g., entrepreneurial activity, business and management practices, balance sheet data, productivity etc.)?

This chapter compiles resources that explore and discuss these issues, starting with an overview of existing datasets, followed by a general description of measurement resources, and concluding with more specific sections regarding the measurement of profits, inventories, business practices, and entrepreneurship.

- Siena Harlin, Senior Policy & Communications Associate, and Daniela Muhaj, Senior Policy & Research Associate, for the J-PAL Firms sector

Datasets

As noted in the introduction to this section, collecting new, high-quality data on firms can be complex and expensive; further, the universe of existing data on firms across the world is rich and varied. Therefore, before moving on to sections more in-line with the structure of the rest of the resource, we provide a non-comprehensive introduction to some of the already-available datasets and data sources that researchers have previously used to conduct research on firms.

Accounts data

A few organizations, most prominently Dun & Bradstreet and Bureau van Dijk (Orbis), maintain proprietary datasets on public and private companies and entities across the globe. While the exact variables that the organizations track may differ, the datasets work by assigning each firm in their database a unique identifier and then tracking harmonized statistics.

Dataset: Dunn and Bradstreet’s Data Cloud -- A collection of data on firms’ characteristics, including credit scores and ratings, trade experiences, corporate family trees and beneficial ownership, banking data, business registrations, relevant contacts and principals, and buyer intent.
- See the accompanying booklet for more information [direct download]
Dataset: Bureau van Dijk’s Orbis database -- A database with firms’ characteristics, including financial strength metrics, patents and intellectual property, corporate family trees and beneficial ownership, royalty agreements, data on individuals associated with the firms, and marine vessels.
- The accompanying brochure contains more information [direct download]
- How to construct nationally representative firm level data from the Orbis global database: New facts and aggregate implications, by Sebnem Kalemli-Ozcan et al. (2019) -- Provides guidance on using Orbis data to construct nationally representative data on firms.
- See Why do firms in developing countries have low productivity? by Nicholas Bloom, Aprajit Mahajan, David McKenzie, and John Roberts (2010) for an example of research using the Orbis database.

Data from statistical agencies

Data on firms from governmental statistical agencies generally comes in two forms: survey data and administrative records. Examples of the former include the Small Business Pulse Survey and the Annual Survey of Entrepreneurs from the U.S. Census; these surveys aim for representivity as opposed to completeness, and generally provide rich data on a specific aspect of firms in their country. Administrative records, like Brazil’s RAIS, generally have much larger and comprehensive samples, but may contain less or less-specific data. Specific examples include:

Datasets: Business surveys from the U.S. Census -- A list of the business surveys conducted yearly by the U.S. Census; ranges from the Small Business Pulse Survey to the Annual Survey of Entrepreneurs to Export Statistics.
Dataset: RAIS (Brazil) -- Socioeconomic data on employees at firms in the formal sector in Brazil.
- Ricardo Dahis provides Stata code for cleaning the dataset from 1985 to 2018.
- See Corruption and firms, by Emanuele Colonnelli, and Mounu Prem (2021) for an example of research using RAIS.

VAT data

Tax registry data, especially data on Value Added Taxes (VATs), have become prominent in the study of firm behavior, dynamics, and outcomes. VAT data is especially useful because the taxes generally involve a “paper-trail” from the firms selling raw materials to the final retailer of a goods or services, which can be particularly useful for studying firm linkages, buyer-supplier networks, and tax compliance. Specific examples of research using VAT data include:

Broad overview: Real-time VAT data: Applications for monetary policy and financial stability, by Elias Albagli and Antonio Martner (n.d.) -- A set of examples from the Central Bank of Chile on the types of research made possible through the use of Chile’s VAT data.
- Paper: See No taxation without information: Deterrence and self-enforcement in the Value Added Tax, by Dina Pomeranz (2015) for an example of RCT research using Chilean VAT data
J-PAL evaluation summary: Social incentives and tax compliance in Bangladesh, by Mushfiq Mobarak, Raj Chetty, Monica Singhal, Nasiruddin Ahmed, and Aminur Rahman
Paper: Capitalists in the twenty-first century, by Matthew Smith, Danny Yagan, Owen Zidar, and Eric Zwick (2019) -- The authors link deidentified firm and individual level tax data from the U.S. [Gated published version]
Evaluation summary: Prices, firm-to-firm trade, and import substitution in Rwanda by John Spray, Yuhei Miyauchi, and Jie Bai

Unstructured/alternate data

Beyond the standard forms of data described above, recent innovations in data collection, processing, and analysis have allowed researchers interested in firm dynamics to broaden the sources and types of data they work with. In particular, recent research has used text, satellite, and network data to broaden the type of questions, outcomes, and models available to firms researchers.

Paper: The diffusion of disruptive technologies, by Nicholas Bloom, Tarek Alexander Hassan, Aakash Kalyani, Josh Lerner, and Ahmed Tahoun (2021) -- Uses text data from patents, earning calls, and job postings in order to study the diffusion of technology.
Paper: The view from above: Applications of satellite data in economics, by Dave Donaldson and Adam Storeygard (2016) -- Describes various past, present, and potential future uses of satellite data in economic research, including using satellites to estimate property value and other firm assets.
Paper: Networks and trade, by Andrew B. Bernard, and Andreas Moxnes (2017) -- Uses firm-to-firm transaction level data to assess the pattern of trade and production networks; an example of applying network models to firms data.
Paper: Global labor flow network reveals the hierarchical organization and dynamics of geo-industrial clusters, by Jaehyuk Park et al. (2018) -- An example of using social network data in order to study firm dynamics.

Non-data resources

General

Website: David McKenzie’s website includes sample survey questions, research papers on measurement and survey design, and links to other resources.
Book section: Household enterprise modules, by Wim Vijverberg and Donald Mead in Designing Household Survey Questionnaires for Developing Countries (2000) -- Provides annotated example LSMS modules on measuring household enterprises and discusses basic data prerequisites when analyzing the modules.
Paper: Measuring income from family enterprises with household surveys, by Wim Vijverberg (1992) -- Discusses various methods for measuring income from family enterprises with household surveys, and tests these measurements using data from Living Standards Measurement Study (LSMS) surveys in Côte d'Ivoire and Ghana [Gated published version]
Paper: Measuring the return on household enterprise: what matters most for whom?, by Krislert Samphantharak and Robert Townsend (2012) -- Reviews survey methods for measuring returns on assets in household enterprises, and tests those methods against a more robust instrument, analysing which sub-topics matter the most in determining return on assets (ROA), and for which sub-groups the results are most sensitive [Gated published version]
Paper: A comparison of EPI sampling, probability sampling, and compact segment sampling methods for micro and small enterprises, by Li-Wei Chao et al. (2012) -- Tests three different methods of sampling small enterprises from unreliable or incomplete registries [Gated published version]
Paper: Call me maybe: Experimental evidence on frequency and medium effects in microenterprise surveys, by Robert Garlick, Kate Orkin, and Simon Quinn (2020) -- Tests the reliability of phone and in-person surveys when collecting microenterprise survey data [Gated published version]

Profits

Paper: Measuring microenterprise profits: Must we ask how the sausage is made? by Suresh de Mel, David McKenzie, and Christopher Woodruff (2009) -- A literature review of experiments conducted to test the presence and severity of data collection as well as validity issues for microenterprise profit data.
Paper: Measuring the unmeasured: aggregating, anchoring, and adjusting to estimate small business performance, by Stephen Anderson, Christy Lazicky, and Bilal Zia (2021) -- Introduces a new survey methodology for measuring business sales and profits by combining data triangulation and dynamic adjustment to increase the precision of estimates.
- See also AAA Survey Tool from the World Bank Development Research Group
Paper: Using PDA consistency checks to increase the precision of profits and sales measurement in panels, by Marcel Fafchamps, David McKenzie, Simon Quinn, and Christopher Woodruff (2012) -- A test of the feasibility of using Personalized Digital Assistants (PDAs) to collect data on profits and sales in microenterprises, checking against cross-sectional survey data [Gated published version]

Inventories

Paper: Radio Frequency (un)Identification: Results from a proof-of-concept trial of the use of RFID technology to measure microenterprise turnover in Sri Lanka, by Suresh de Mel, Dammika Herath, David McKenzie, and Yuvraj Pathak (2014) -- A test of the feasibility of using RFID technology to track microenterprise inventories; also tests recall survey questions.
- Associated blog post: A technology letdown: Efforts at using RFID technology to track microenterprise inventories, by David McKenzie (2014) -- A shorter summary of the above.

Business practices

Blog post: Measuring business practices in small firms, by David McKenzie (2015) -- A review of the measurement techniques used in the paper immediately below, as well as a short discussion of a few issues in measuring business practices and links to the survey instruments used.
- For research measuring business practices in small firms, see also McKenzie and Woodruff (2015).
Paper: What are we learning from business training and entrepreneurship evaluations around the developing world?, by David McKenzie and Christopher Woodruff (2014) -- includes a section on measurement issues dealing in business practice training program evaluations.

Entrepreneurship

Blog post: Measuring entrepreneurship, part (I) and part (II) by Markus Goldstein and Francisco Campos (2012) -- These two blog posts together form a short literature review of the measurement issues associated with measuring entrepreneurship in developing countries, as well as many of the standard methods and recent innovations in entrepreneurship measurement.
Book section: Who are the microenterprise owners? Evidence from Sri Lanka on Tokman versus De Soto, by Suresh de Mel, David McKenzie, and Christopher Woodruff (2010), in International Differences in Entrepreneurship -- Contains a description of a microenterprise survey conducted in Sri Lanka that sought to gather data on the characteristics of microenterprise owners.
For research using characteristics of entrepreneurs, see also de Mel, McKenzie, and Woodruff (2009), who measure innovation, and Cole, Sampson, and Zia (2010), who measure risk preferences.

Poverty, consumption, and income

Overview

Accurate and precise measures of income, consumption, assets, prices, and poverty are pivotal in ensuring that social programs are equitably and efficiently targeted and in the estimation of some of the most common primary outcomes of policy interventions. However, complicating this measurement are two larger facts. First, the measurement issues surrounding these five topics are highly, and to a certain extent mechanically, interlinked; measurement error in one can filter through to estimates of the others. Second, the indicators associated with each concept vary dramatically by context. Whereas indicators like the HIV-positivity rate or the concentration of CO2 in the atmosphere can generally be transferred across contexts, the goods and services that make up consumption bundles, income schemes, and asset classes vary over time and geography.

This section compiles resources to guide researchers through the above issues and other cross-cutting topics such as respondent and recall effects, but also delves into more specific topics. It is split into five subsections, each focused on one of the five topics above, and each starting with resources that form a general introduction to the measurement of that topic and proceeding with papers and guides that address specific challenges within it. The first, consumption, covers both general and food consumption and provides resources on issues like the effect of the level of reporting and length of the reference period (e.g., Beegle et al., 2012),differences in reporting from individuals vs. households (e.g., Sununtnasuk and Fielder, 2017), and difficulties and tools associated with measuring hunger (e.g., Friedman et al., 2014). The subsection on assets guides readers through issues in, and innovative tools for, valuation (e.g., Kochar, 2000 and Marx, Stoker, and Suri, 2016), using assets in the estimation of other indicators such as poverty and inequality (e.g., Filmer and Pritchett, 2001 and McKenzie, 2005), and measuring asset indicators when there are incentives to misreport (e.g., individuals’ rights to land as in FAO et al., 2019).

The income subsection provides resources on income measurement in general but focuses on aspects of income for which reliable and precise measures are more difficult to obtain, including non-labor and informal-wage income (McKay, 2000) and income expectations (Hensel, n.d.). The subsection on prices contains discussion of common issues facing researchers collecting and using price data, including its variation by season (e.g., Gilbert et al., 2016) and by unobserved quality differences (e.g., McKelvey, 2011). Finally, the poverty subsection focuses on three larger issues: how best to estimate, predict, and target poverty (e.g., Elbers, Lanjouw, and Lanjouw, 2003; Brown, Ravallion, and van de Walle, 2018; and Banerjee et al., 2016); the extent and prevalence of questionnaire (e.g., Kilic and Sohnesen, 2017) and respondent (e.g., Silverio-Murillo, 2018) effects on poverty measures; and considerations for using adult equivalence scales for measuring child poverty (e.g., Ravallion, 2015).

- Jack Cavanagh, Senior Research, Education, and Training Associate, and Ximena Mercado Garcia, MSRP Intern

Consumption

General consumption resources

Book section: Consumption, by Angus Deaton and Margaret Grosh, in Designing Household Survey Questionnaires for Developing Countries; World Bank, Volume 1; 91-133 (2000) -- An overview of consumption measurement, with particular emphasis on LSMS methods.
Book section: Consumption modules, by Angus Deaton and Margaret Groshin, in Designing Household Survey Questionnaires for Developing Countries; World Bank, Volume 3 (2000) -- Example questionnaire modules for measuring consumption.
Paper: Guidelines for constructing consumption aggregates for welfare analysis, by Angus Deaton and Salman Zaidi (2002) -- Guide for creating household level aggregates of consumption, price indices for comparing those households, and adjustments based on household composition; contains sample Stata and SPSS code.
Paper: Household consumption: Research questions, measurement issues, and data collection strategies, by Luigi Pistaferri (2015) -- A context-specific overview of consumption surveys and their issues in the United States. [Gated published version]
Paper: Methods of household consumption measurement through surveys: Experimental results from Tanzania, by Kathleen Beegle, Joachim De Weerdt, Jed Friedman, and John Gibson (2012) -- An experimental comparison of eight different consumption measures in a low-income context. [Gated published version]
Paper: Improving consumption measurement and other survey data through CAPI: Evidence from a randomized experiment, by Bet Caeyers, Neil Chalmers, & Joachim De Weerdt, (2012) -- An experimental comparison of consumption data gathered by PAPI and CAPI. [Gated published version]

Food consumption and hunger

Blog post: Challenges in counting the world’s hungry, by Jed Friedman, Joachim De Weerdt, Kathleen Beegle, and John Gibson (2014) -- An overview of measurement issues related to hunger, and a summary of a paper comparing the different survey methods.
Survey guide: World Bank LSMS guide to Food data collection in household consumption and expenditure surveys (2019) -- A general guide from the LSMS on general food data collection at the household level; contains example questionnaire modules.
Survey guide: The use of non-standard units for the collection of food quantity: A guidebook for improving the measurement of food consumption and agricultural production in living standards surveys, by Gbemisola Oseni, Josefine Durazo, and Kevin McGee (2017) -- A tool-specific guide from the LSMS on how to use non-standard units of food (e.g., a “bunch” of bananas) for food consumption measurement; contains example questionnaire modules and images.
Paper: Decomposing response errors in food consumption measurement, by Jed Friedman, Katheen Beegle, Joachim De Weerdt, and John Gibson (2016) -- A comparison experiment studying different measures of food consumption and decomposing their errors by consumption unit.
Journal Issue: Food Policy’s Special Issue on Consumption Measurement (2017), with an overview provided in the introductory article Food counts. Measuring food consumption and expenditures in household consumption and expenditure surveys (HCES), by Alberto Zezza, Calogero Carletto, John L. Fiedlerc, Pietro Gennarib, and Dean Jolliffe (2017).
- Blog post: What do we measure when we measure food consumption?, by John Gibson and Alberto Zezza (2018) -- Provides a summary of the above journal issue, which contains papers covering diary vs. recall, food away from home (FAFH), individual vs. household consumption, and food acquisition vs. consumption.

Assets

General resources

Paper: Economic well-being: Concepts and measurement with asset data, by Matthew Wai-Poi, Seymour Spilerman, and Florencia Torche (2008) -- A comparison of different asset indices and weighting methods, including how they compare with other measures of economic well-being (consumption, income, wealth)
Paper: Methodological experiment on measuring asset ownership from a gender perspective, by Talip Kilic and Heather Moylan (2016) -- An experimental comparison of different methods of measuring individual-level asset ownership (as opposed to household ownership), with a particular emphasis on differences in ownership by gender.

Savings/wealth

Book section: Savings, by Anjini Kochar, in Designing Household Survey Questionnaires for Developing Countries; World Bank, Volume 1, 183-211 (2000). -- An overview of savings measurement, with a particular focus on LSMS methods. Includes example questionnaire modules.
Paper: Estimating wealth effects without expenditure data—or tears: An application to educational enrollments in states of India, by Deon Filmer and Lant H. Pritchett (2001) -- An explanation, test, and example implementation of using Principal Components Analysis (PCA) to measure household wealth.
Paper: Measuring wealth in rural communities: Lessons from the Sanitation, Hygiene, Infant Nutrition Efficacy (SHINE) trial, by Bernard Chasewka et al. (2018) -- A context-specific discussion of using Principal Components Analysis in rural areas in LMICS.
Paper: Is the wealth index a proxy for consumption expenditure? A systematic review, by L.D. Howe, J.R. Hargreaves, S. Gabrysch, and S.R.A. Huttly (2009) -- A meta-analysis/literature review of using wealth as a proxy for consumption. [Gated]
Paper: Approaches and alternatives to the wealth index to measure socioeconomic status using survey data: A critical interpretive synthesis, by Mathieu J. P. Poirier, Karen A. Grépin, and Michel Grignon (2019) -- A meta-analysis/literature review of using wealth indices as a measure of socio-economic status.

Physical assets

Paper: Measuring the role of livestock in the household economy, by Alberto Zezza, Ugo Pica-Ciamarra, Harriet K. Mugera, Titus Mwisomba, and Patrick Okello (2016) -- A guide to measuring household livestock assets. Includes an example questionnaire module.
Survey guide: The FAO, UN-Habitat, and World Bank’s Measuring individuals’ rights to land (2019) -- A tool-specific guide to their proposed standardized module used to measure an individual’s right to land; includes both the example questionnaire module itself, a guide to its use, and a discussion of the larger measurement issue.
Book section: Housing modules, by Stephen Malpezzi, in Designing Household Survey Questionnaires for Developing Countries; World Bank, Volume 1, 293-314 (2000) -- An overview of housing measurement (particularly in low- and middle-income countries), with a particular focus on LSMS methods. Includes example questionnaire modules.
Paper: There is no free house: Ethic patronage in a Kenyan slum, by Benjamin Marx, Thomas M Stoker, and Tavneet Suri (2016) -- Details a new method to use satellite imagery to estimate housing value by measuring the luminosity of light reflected from metal roofs.
Paper: Using satellite imagery to revolutionize creation of tax maps and local revenue collection, by Daniel Ayalew Ali, Klaus Deininger, and Michael Wild (2018) -- Details a new method to use satellite imagery to estimate property value by examining the building’s footprint.

Measuring inequality with assets

Paper: Measuring inequality with asset indicators, by David McKenzie (2005) -- An explanation, test, and an example implementation of using asset indicators as a substitute for income data in measuring inequality of living standards.
Paper: Is inequality underestimated in Egypt? Evidence from house prices, by Roy van der Weide, Christoph Lakner, and Elena Ianchovichina (2016) -- An explanation, test, and example implementation of using housing prices as a substitute for income-tax data in measuring income inequality.

Income

Book section: Transfers and other nonlabor income, by Andrew McKay, in Designing Household Survey Questionnaires for Developing Countries; World Bank, Volume 1, 273-292 (2000) -- An overview of measuring “miscellaneous” income (largely transfers, but also other types including rental income and lottery winnings), with a particular focus on LSMS methods. Includes example questionnaire modules.
Book section: Should the survey measure total household income? by Andrew McKay (2000), in Designing Household Survey Questionnaires for Developing Countries; World Bank, Volume 1, 83-104 (2000) -- An overview of the advantages and disadvantages of attempting to measure total household income in surveys, as well circumstances in which that should be the objective. Has a particular focus on LSMS methods and surveys.
Paper: Measuring aggregate welfare in developing countries: How well do national accounts and surveys agree? by Martin Ravallion (2004) -- A comparison of income survey results to national accounts of consumption.
Paper: How reliable are income data collected with a single question? by John Micklewright and Sylke V. Schnepf (2007) -- A comparison of household income data collected by a single question against data collected through a more extensive module using data from the UK.
Blog post: Measuring income expectations using phone surveys, by Lukas Hensel -- A guided example of measuring income expectations through phone surveying.
Blog post: Measuring inequality isn’t easy or straightforward - Here’s why, by Christoph Lakner, Mario Negre, José Cuesta, and Ani Silwal (2016) -- An overview of the difficulties of comparing consumption- and income-based inequality.

Prices

Book section: Community and price data, by Elizabeth Frankenber, in Designing Household Survey Questionnaires for Developing Countries; World Bank, Volume 1, 315-338 (2000) -- An overview of measuring community price data, including the selection of consumption items and the cases in which measurement is appropriate. Has a particular focus on LSMS methods and surveys. Includes example questionnaire modules.
Paper: Prices for poverty analysis in Africa, by Isis Gaddis (2016) -- A review of the different methodologies used to create consumer price indices, with a particular focus on Africa.
Paper: Quality, quantity, and spatial variation of price, by Angus Deaton (1988) -- A discussion and guide to using community price data to estimate price elasticities of demand, with an example from Côte d'Ivoire.
Paper: Price, unit value, and quality demanded, by Christopher McKelvey (2011) -- A test of the method developed by Deaton in the above paper with price data from Indonesia, and a measure of the extent to which quality differences may bias price elasticity estimates.[Gated published version]
Paper: Trying to understand the PPPs in ICP 2011: Why are the results so different?, by Angus Deaton (2017) -- A discussion of the 2011 PPP data from the ICP, including an extensive overview of how the ICP PPP data is created.
Paper: The measurement of poverty with geographic and intertemporal price dispersion: Evidence from Rwanda, by Christophe Muller (2008) -- An examination of the extent to which prices vary across seasons and regions (particularly in rural areas) and a test of how that variation can impact aggregate poverty estimates. [Gated published version]
Paper: Price seasonality in Africa: Measurement and extent , by Christopher L. Gilbert, Luc Christiaensen, and Jonathan Kaminski (2016) -- Measures the extent of price seasonality in multiple African countries, and tests the accuracy of different methods for estimating that seasonality.

Poverty

General resources

Book section: Chapters 1-6, by Jonathan Haughton and Shahidur R. Khandker, in Handbook on Poverty and Inequality; World Bank (2009) -- Introduction to measures of poverty and inequality, with a particular emphasis on poverty indices and their robustness checks.
Book section: Chapters 3-6, by Martin Ravallion, in The Economics of Poverty; Oxford University Press (2015) -- Introduction to measures of poverty and inequality, with a particular emphasis on using survey data to measure poverty.
Paper: Counting the world's poor: Problems and possible solutions, by Angus Deaton (2001) -- A discussion of the major difficulties involved in creating aggregate measures of inequality in poverty.
Paper: Measuring poverty in a growing world (or measuring growth in a poor world), by Angus Deaton (2005) -- A comparison of aggregate poverty and inequality measures calculated from national household surveys and national account data.
Paper: Measuring and understanding behavior, welfare, and poverty, by Angus Deaton (2016) -- A review/summary of Deaton’s work on using household surveys to measure welfare and poverty; part I, in particular, is relevant.

Poverty mapping/imputation

Paper: Micro-level estimation of poverty and inequality, by Chris Elbers, Jean O. Lanjouw, & Peter Lanjouw (2003) -- Introduces, discusses, and tests a census-to-survey poverty imputation technique. [Gated published version]
Paper: Updating poverty estimates at frequent intervals in the absence of consumption data, by Hai-Anh H. Dang, Peter F. Lanjouw, and Umar Serajuddin (2014)-- Introduces, discusses, and tests a survey-to-survey poverty imputation technique.
- Associated blog post: How to overcome the (almost insurmountable) task of tracking poverty trends without good consumption data?, by Hai-Anh Dang (2014)
Paper: Small area estimation-based prediction methods to track poverty, by Luc Christiaensen, Peter Lanjouw, Jill Luoto, and David Stifel (2011) -- An empirical comparison of three different Small Area Estimation methods used to impute poverty.

Questionnaire and respondent effects

Paper: Moving from the household to the individual: Multidimensional poverty analysis, by Ramya M. Vijaya, Rahul Lahoti, and Hema Swaminathan (2014) -- Details an individual-level poverty index and tests it against household-level indices. [Gated published version]
Paper: Same question but different answer: Experimental evidence on questionnaire design’s impact on poverty measured by proxies, by Talip Kilic and Thomas Sohnesen (2017) -- An experimental comparison of answers to long and short questionnaires, along with a test of how the differences impact proxy-based poverty predictions. [Gated published version]
- Associated blog post: 9 pages or 66 pages? Questionnaire design’s impact on proxy-based poverty measurement, by Talip Kilic (2015).
Paper: Wife vs. husband: Does it matter who answers the survey? by Adan Silverio-Murillo (2018) -- A comparison of survey answers provided by husband-wife pairs.
- Associated blog post: Does it matter who answers the survey to identify families in poverty? by Adan Silverio-Murillo (2018)

Targeting and proxy indicators

Paper: A poor means test? Econometric targeting in Africa, by Caitlin Brown, Martin Ravallion, Dominique van de Walle (2018) -- A comparison of standard proxy means tests using data from nine African countries; also tests some methodological tweaks, including a poverty-quantile method that performed well. [Gated published version]
Paper: Better to be indirect? Testing the accuracy and cost-savings of community leader surveys for poverty targeting, by Jennifer Alix-Garcia and Katharine R.E. Sims (2019) -- A comparison of the accuracy and cost-effectiveness of poverty indices created from surveys of individuals and “key informants”.
Paper: The (lack of) distortionary effects of proxy-means tests: Results from a nationwide experiment in Indonesia, by Abhijit Banerjee, Rema Hanna, Benjamin A. Olken, & Sudarno Sumarto (2020) -- A randomized evaluation of whether census questions on household assets influence consumption and consumption reporting behavior.
J-PAL Evaluation summary: Improving targeting of a conditional cash transfer program in Indonesia, by Abhijit Banerjee, Benjamin A. Olken, Rema Hanna, Vivi Alatas, Ririn Purnamasari, & Matthew Wai-Poi (2016) -- A randomized comparison of self-targeting and proxy-means testing for poverty.

Scale sensitivity of poverty measures

Paper: On testing the scale sensitivity of poverty measures, by Martin Ravallion (2015) -- Scale sensitivity tests for poverty measures are sensitive to the choice of reference demographics. [Gated
Paper: Measuring child poverty in South Africa: Sensitivity to the choice of equivalence scale and an updated profile, by Judith Christine Streak, Derek Yu, and Servaas Van der Berg (2009)-- A test of the sensitivity of child poverty estimates to the adult equivalence scale used in South Africa. [Gated]

Recall periods and interview effects

Overview

This section includes resources on measurement error from recall periods and interview effects.

As covered in our Introduction to measurement and indicators resource, recall bias can arise in questions posed to survey respondents about events, processes, or decisions that occurred in the past. Questions like “how much fertilizer did you use last year?” and “how many times were you sick in the last month?” can be very helpful in gathering survey data on infrequent or lumpy events, and extending the time horizon can help better capture variation in the underlying concept. But there are tradeoffs. The longer the time horizon, and the less noteworthy the event, the harder it may be for respondents to remember accurately, and the more likely they are to use potentially biasing mental heuristics like anchoring to guide their answers (Godlonton, Hernandez, and Murphy 2018; see also Table 3 in Introduction to measurement and indicators for a list of common biasing heuristics). Further, the extent of measurement error may differ between subgroups if there are interactions between the variable separating the subgroups and the concept being measured – for example, differing rates of visits to doctors and perceptions of the normality of illness between poorer and richer households can cause comparisons of health measures between the two groups to differ both quantitatively and qualitatively depending on whether a short or long recall period is used (Das, Hammer, and Sanchez-Paramo 2012).

The subsection on recall bias compiles resources that seek to measure the extent to which these biases operate in different research areas, including agriculture (Beegle et al. 2012), health (Das et al. 2012), and microenterprises (de Mel et al. 2014). It also includes resources that introduce novel ways to reduce the chance of recall period bias. These largely involve finding innovative ways to increase the frequency of data collection so that short-term memory dominates: among others, Wiseman et al. (2005) provide a guide to using diaries to collect data in resource-poor settings, and de Mel et al. (2014) use a novel technology (RFID chips) to collect high-frequency data in microenterprises.

Resources in the second subsection, on interview and question effects explore the extent to which the survey or interview questions themselves can influence respondent behavior. There are a couple of channels through which this could bias measurement: question-behavior effects cover situations in which asking respondents about intentions or behavior impacts the behavior itself, e.g., through shifting perceptions of social desirability (Dholakia 2010; Fitzsimons and Moore 2008), a response “freezing” effect in panel surveys (Bridge et al. 1977) or a “self-prophesying” effect in situations where the respondent is asked about future behaviors/intentions (Smith et al. 2003). Interview effects, on the other hand, can occur even without any elicitation of intentions or other behavioral prompts (see e.g., Zwane et al. (2011) for an example). The papers in this subsection provide theory and evidence for when these effects are most likely to influence behavior (Feldman and Lynch, Jr. 1988) and recommendations for question modifications to ameliorate some of the effects (Fitzsimons and Moore 2008).

- Jack Cavanagh, Senior Research, Education, and Training Associate, and Sarah Kopper, Associate Director of Research

Recall periods

Paper: The impact of recall periods on reported morbidity and health seeking behavior, by Jishnu Das, Jeffrey Hammer, and Carolina Sanchez-Paramo (2012) -- An experimental test of the effect of recall length on reported health outcomes and behaviors; includes tests for heterogeneous effects, particularly in regards to income level. [Gated published version]
Blog post: Do you remember? Measuring anchoring bias in recall data, by Susan Godlonton, Manuel Hernandez, and Mike Murphy (2018) -- Tests whether and to what extent survey participants use their answers from the most recent period as an anchor when answering recall questions; tests the results for both objective (income, wages, and working hours) and subjective (reports of happiness, health, stress, and wellbeing) measures.
Paper: Using diaries to collect data in resource-poor settings: questions on design and implementation, by V Wiseman, L Conteh, F Matovu (2005) -- A guide to using diaries, and particularly pictorial diaries, for data collection in resource-poor settings with populations with lower rates of literacy and higher rates of mobility.
- For an example of research using diaries, see Kohler and Thornton (2011).
Paper: Radio frequency (un)identification: Results from a proof-of-concept trial of the use of RFID technology to measure microenterprise turnover in Sri Lanka, by Suresh de Mel, Dammika Herath, David McKenzie, and Yuvraj Pathak (2014) -- Also tests recall survey questions.
Paper: Reliability of recall in agricultural data, by Kathleen Beegle, Calogero Carletto, & Kristen Himelein (2012) -- Recall bias in input use and yield measurements. [Gated published version]
Blog post: Quantifying the Hawthorne effect, by Jed Friedman and Brinda Gokul (2014) -- A summary and compilation of recent studies that attempted to quantify Hawthorne effects in health research; in particular notes how “stealth recall” data can be used to help estimate Hawthorne effects.

Interview/question effects

Paper: Interviewing changes attitudes—sometimes, by R. Gary Bridge, et al. (1977) -- A survey experimental test of interview effects, with a particular focus on the potential “freezing” effect in panel data.
Paper: Self-generated validity and other effects of measurement on belief, attitude, intention, and behavior, by Jack M. Feldman and John G. Lynch, Jr (1988) -- A theory of how and when the questions measuring belief, attitude, intention, and behavior themselves affect the answers.
Paper: Do intentions really predict behavior? Self-generated validity effects in survey research, by Pierre Chandon, Vicki G. Morwitz, & Werner J. Reinartz (2005) -- Introduces and tests a method for examining the extent of “self-generated validity” – when measurement inflates the association between intentions and behavior.
Paper: Being surveyed can change later behavior and related parameter estimates, by Alix Peterson Zwane et al. (2011) -- Experimentally tests whether asking questions related to health and household finances affects use of a related product using non-self-reported data.
Book section: A critical review of question–behavior effect research, by U.M. Dholakia, in Review of Marketing Research, Vol. 7, edited by N.K. Malhotra; Emerald Publishing Limited (2010) – A literature review of research on question-behavior effects, which separates research into two streams: any where effects are related to social desirability and one where they are related to measurement alone. [Gated]
Paper: Should we ask our children about sex, drugs and rock & roll? Potentially harmful effects of asking questions about risky behaviors, by Gavan J. Fitzsimons and Sarah G. Moore (2008) -- Provides a literature review of research on question-behavior effects, applies the research to the hypothetical real-world scenario of asking children sensitive questions, and provides recommendations for moderating question-behavior effects in similar situations. [Gated published version]
Paper: Self‐prophecy effects and voter turnout: An experimental replication, by Jennifer K. Smith, Alan S. Gerber, and Anton Orlich (2003) -- A replication of an earlier experiment on self-prophesying effects in voting in the United States, applied to a larger and more representative sample.

Sensitive questions

Overview

Sensitive information can be among the hardest data to gather accurately and ethically, but at the same time this data can often be the most informative to answer certain research questions, particularly in studies relating to health, gender, and crime, violence, and conflict. What makes a question sensitive depends on culture and context, but information relating to identity, illegal activities, and socially unacceptable behavior are almost always sensitive. Complicating accuracy, respondents may not answer truthfully due to social desirability bias or embarrassment, or because they feel that a different answer is strategic. Beyond this, they may choose not to answer at all due to privacy concerns or discomfort. On the ethical side, extra considerations (such as enumerator demographics and training, interview environment, and availability of referral resources) are necessary when asking sensitive questions; further, the questions themselves might actually have the ability to harm research participants through various channels, including the potential for retraumatization. For more on the ethics of sensitive questions or human subjects research in general, see our resources on Survey design and the Ethical conduct of randomized evaluations.

Therefore the main questions that research on measuring sensitive subjects attempt to answer are i) What are the best techniques to get accurate information on sensitive topics, and which situations is each technique best suited for? and ii) What additional ethical considerations do research teams need to take when measuring sensitive questions, and how do those considerations differ across survey media?

This section compiles resources that attempt to answer those questions; it begins with resources that provide an overview of what sensitive questions are (e.g., Blair (2015)) and why they are difficult to measure (e.g., Özler (2013) and Fitzsimons and Moore (2008)), moves on to a set of papers that compare and validate current methods for measuring sensitive topics (e.g., Chuang et al. (2020)), and then finally contains sections on three of the most popular measurement techniques: list randomization, randomized response technique, and implicit association testing. In these sections you will find articles discussing each method’s use and studies of their validity (see Droitcour et al. (2004), Blair, Imai, and Zhou (2015), and Kondylis et al. (2019), respectively, for a higher-level introduction of each technique).

- Jack Cavanagh, Senior Research, Education, and Training Associate and Sarah Gault, Training Manager

General resources

Blog post: Economists have experiments figured out. What’s next? (Hint: It’s measurement), by Berk Özler (2013) -- Reviews why sensitive questions are tricky and makes a case for there being a wide breadth of applicability for techniques used to measure sensitive questions.
Brief: Survey methods for sensitive topics, by Graeme Blair (2015) -- A review of survey techniques used to address incentives to conceal truthful responses; the techniques include survey administration practices and three experimental methods: randomized control technique, list experiments, and endorsement experiments; examples are provided.
Book section: Chapter 5: Outcomes, in Running randomized evaluations, by Rachel Glennerster and Kudzai Takavarasha; Princeton University Press (2014) -- Includes an overview of non-survey instruments for difficult to measure outcomes, including sensitive topics. [Gated published version]
Paper: Should we ask our children about sex, drugs and rock & roll? Potentially harmful effects of asking questions about risky behaviors, by Gavan J. Fitzsimons and Sarah G. Moore (2008) -- Discusses potential explanations for the question-behavior effect; highlights the potential impact of this effect on screening adolescents for risky behavior, and suggests ways in which screening can be carried out to minimize its impact. [Gated published version]
Blog post: Surveying on sensitive topics: Using audio computer assisted self-interviewing, by Claire Cullen and Mahreen Mahmud -- Provides an overview of the use of ACASI in measuring sensitive topics, describes the method, and provides evidence from two randomized evaluations on the validity of the technique.

Validating indirect response survey methods

Paper: Measuring the measurement error: A method to qualitatively validate survey data, by Christopher Blattman, Julian Jamison, Tricia Koroknay-Palicz, Katherine Rodrigues, and Margaret Sheridan (2016) -- Develops and describes a method for assessing survey data through intensive qualitative work and applies the method to a survey containing questions on sensitive topics in Liberia.
Paper: An empirical validation study of popular survey methodologies for sensitive questions, by Bryn Rosenfeld, Kosuke Imai, and Jacob N. Shapiro (2015) -- Tests direct survey questions, list randomization, endorsement experiments, and randomized response technique against administrative election data on a sensitive topic to test the validity of the techniques.
Paper: Sex, lies, and measurement: Do indirect response survey methods work?, by Erica Chuang, Pascaline Dupas, Elise Huillery, and Juliette Seban (2020) -- Develops internal consistency checks for both list experiments and randomized response techniques and tests both in Côte d’Ivoire and Cameroon.
- Associated blog post: Sex, lies, and measurement: Do indirect response survey methods work? (No…), by Berk Özler (2019).
Paper: Private but misunderstood? Evidence on measuring intimate partner violence via self-interviewing in rural Liberia and Malawi, by David Sungho Park, Shilpa Aggarwal, Dahyeon Jeong, Naresh Kumar, Jonathan Robinson, and Alan Spearot (2021) -- A measurement experiment that compares measures of intimate partner violence taken either from self-interviewing (SI) or face-to-face interviewing (FTFI).
Paper: Method matters: Underreporting of intimate partner violence in Nigeria and Rwanda, by Claire Cullen (2020) -- A measurement experiment that compares measures of intimate partner violence taken from list randomization, self-interviewing (SI), and face-to-face interviewing (FTFI).

List randomization

Paper: The item count technique as a method of indirect questioning: A review of its development and a case study application, by Judith Droitcour et al., in Measurement Errors in Surveys (2004) -- Provides an introduction to list randomization and goes through a case study of its implementation.
Paper: List randomization for sensitive behavior: An application for measuring use of loan proceeds, by Dean S. Karlan & Jonathan Zinman, (2012) [Gated published version] -- Introduces list randomization in an economic-research setting (microfinance) and tests it against direct questioning for reports of loan spending.
Blog post: List experiments for sensitive questions – a methods bleg, by Berk Özler (2017) -- Introduces list experiments, goes through the pros and cons of using them as a measurement tool, and describes situations in which their use would be reasonable.
Book section: List Randomization, in Field experiments in discrimination, by Marianne Bertrand and Esther Duflo (2016) -- Contains an introduction to list randomization and then provides a literature review of studies that have used it to measure an array of senstive topics.
Paper: Social desirability bias in voter turnout reports: Tests using the item count technique, by Allyson L. Holbrook and Jon A. Krosnick (2010) -- Tests the validity of list randomization against self-reports and administrative data on voting behavior for both telephone and self-administered internet surveys.
For examples of research papers using list randomization, see Jamison, Karlan, and Raffler (2013) and Coffman, Coffman, and Ericson (2017).

Randomized Response Technique (RRT)

Paper: Design and analysis of the randomized response technique, by Graeme Blair, Kosuke Imai, and Yang-Yang Zhou (2015) -- Reviews the development of RRT, introduces different variants of the technique, including forced response design and disguised response design, and provides illustrations using data from militant groups in Nigeria.
Paper: Meta-analysis of randomized response research: Thirty-five years of validation, by Gerty J. L. M. Lensvelt-Mulders et al. (2005) -- Conducts a meta-analysis of validation papers for RRT; the meta-analysis includes 6 papers conducting individual validation studies and 32 papers conducting comparative studies. [Gated published version]
Paper: More is not always better: An experimental individual-level validation of the randomized response technique and the crosswise model, by Marc Höglinger and Ben Jann (2018) -- Investigates the validity of a number of variants of RRT using data on actual cheating behavior taken from online surveys.
Paper: Surveying multiple sensitive attributes using an extension of the randomized-response technique, by Morten Moshagen and Jochen Musch (2012) -- Introduces and tests an extension of RRT that allows for the gathering of data of multiple sensitive subjects with only one randomization and controls for non-adherence. [Gated published version]
Paper: Detecting nonadherence without loss in efficiency: A simple extension of the crosswise model, by Daniel W. Heck, Adrian Hoffmann & Morten Moshagen (2017) -- An extension of the RR technique allowing the preservation of statistical power.
For examples of research papers using randomized response technique, see Gavin et al. (2010), who use it to study the illegal use of natural resources in the United States, and Krumpal (2012) [Gated], who uses it to study anti-semitism in Germany.

Implicit association testing (IAT)

Blog post: How to implicit association test, by Florence Kondylis, Lucia Andrade, Arianna Legovini, Kate Vyborny, and Astrid Zwager (2019). -- Introduces IATs and goes through the practical steps for designing a custom test.
- Implicit association examples can be found at projectimplicit.org. Software used to calculate the IAT algorithm highlighted in Greenwald et al. below can be found in Meade (2009).
Book section: Implicit association tests, in Field experiments in discrimination, by Marianne Bertrand and Esther Duflo (2016) -- Introduces IATs, goes through their development, and provides a literature review of their use and validation.
Paper: Understanding and using the implicit association test: I. An improved scoring algorithm, by Anthony G. Greenwald, Mahzarin R. Banaji, and Brian A. Nosek (2003) -- Evaluates different scoring techniques for IATs.
Paper: Predicting ethnic and racial discrimination: A meta-analysis of IAT criterion studies, by Oswald et al. (2013) -- Conducts a meta-analysis of IAT validation/criterion studies which includes papers covering an array of test-subjects, versions of the IAT, and strategies for measuring bias from the IAT data.
- Examples of research papers using implicit association tests include Reuben, Sapienza, and Zingales (2014), Carlana (2018), Arcuri et al. (2008), and Green et al. (2007). They apply IAT in studies focused on labor, education, voting, and health, respectively.

Subjective questions

Overview

Subjective questions assess subjective psychological states and are not verifiable by external observation or records. They aim to measure a number of different things:

Beliefs about the object (“Is your child’s school providing them with an adequate education?”)
Expectations, i.e., plans for future actions (“Do you plan to enroll your child in the same school next year?” )
Attitudes about the object (attitudes are distinct from beliefs in that they measure judgements on normative instead of positive issues: “Your neighbor is trying to decide if they should send their child to secondary school or have them work instead. If the child attends secondary school they potentially could work in a higher paying job in the future, but they wouldn’t be earning money for the family in the present. Ultimately, your neighbor decides to send their child to school. Do you agree with their decision?”)

While social scientists have made strides in the past decades in improving measurement of subjective questions, important challenges still persist on the definition and quality of indicators. These include the difficulty of creating and comparing measures of subjective welfare, like life satisfaction, happiness, and subjective poverty measures (Friedman 2011); the increased time it often takes to get point estimates for subjective questions (Delavande, Giné, and McKenzie 2011); and the reliability and precision of subjective expectations (McKenzie 2016).

This section provides resources discussing advances and challenges on the measurement of two specific topics: 1) subjective wellbeing, and 2) subjective expectations. The resources on subjective wellbeing focus on its definition, guidelines on measurement, and best practices for constructing comparable indicators. They explore measurement issues related to multiple facets of wellbeing, including but not limited to meaning in life and autonomy (Samman 2007), hope and aspirations (Wydick 2013), and social connectedness (Zavaleta, Samuel, and Mills 2014). The subjective expectations section provides resources that provide a general overview of the subject (Manski 2004 and Attanasio 2009), as well as papers discussing recent advances in methods (Delavande, Giné, and McKenzie 2011). Cross-cutting both of the subsections are discussions of the extent to which subjective measures may vary across time and space, potentially confounding attempts to create comparable indicators (Kahneman and Krueger 2006; Beegle et al. 2012). For a discussion of subjective questions in the context of survey design, see the J-PAL research resource on Survey design.

- Daniela Muhaj, Senior Policy and Research Associate, and Sarah Gault, Training Manager

Measuring wellbeing

Journal issue: Wellbeing in developing countries: From theory to research, by Ian Gough and J. Allister McGregor (2007) -- Examines poverty by evaluating the following: 1) ideas of human functioning; 2) capabilities and needs; 3) the analysis of livelihoods and resource use; and 4) research on subjective wellbeing and happiness. Draws on insights from psychology, economics, anthropology, sociology, political science and development to understand wellbeing from different perspectives. [Gated]
Broad overview: OECD’s Guidelines on measuring subjective well-being (2013) -- Provides guidelines on collecting and measuring subjective wellbeing across eleven domains, ranging from jobs, health and housing to civic engagement and the environment.
Paper: Psychological and subjective well-being: A proposal for internationally comparable indicators, by Emma Samman (2007) -- Constructs comparable survey based indicators on four aspects of wellbeing: 1) meaning in life; 2) the three ‘basic psychological needs’ of autonomy, competence and relatedness, following self-determination theory; 3) domain-specific and overall life satisfaction; and 4) happiness. [Gated published version]
Blog post: What is the “good life”? Can we measure it?, by Jed Friedman (2011) -- Provides guidance for measuring satisfaction and welfare beyond traditional consumption or income approaches.
Blog post: Measuring hope: Guest post by Bruce Wydick (2013) -- Discusses the role of hope and aspirations in life outcomes from a measurement, modeling, and policy perspective.
Paper: Social isolation: A conceptual and measurement proposal, by Diego Zavaleta, Kim Samuel and China Mills (2014) -- Provides a working definition of social connectedness, highlights the relevance of isolation for poverty analysis, and proposes several indicators to measure social connectedness.
Paper: The ability to go about without shame: A proposal for internationally comparable indicators, by Diego Zavaleta (2007) -- Develops eight indicators to measure specific aspects of shame and humiliation. [Gated published version]
Paper: Aspirations: An approach to measurement with validation using Ethiopian data, Tanguy Bernard, Alemayehu Seyoum Taffesse (2014) -- Proposes a set of measurement tools for measuring aspirations.
Paper: Measuring poverty using qualitative perceptions of consumption adequacy, by Menno Pradhan & Martin Ravallion (2000) -- Creates a measure of welfare based on perceived consumption adequacy collected using surveys in Nepal and Jamaica.
Paper: Poor, or just feeling poor? On using subjective data in measuring poverty, by Martin Ravallion (2012) -- Studies the case for and against using subjective measures of poverty, and introduces challenges in such measures, including choice of covariates, frame-of-reference effects, and underlying heterogeneity in trade-offs.
Paper: Developments in the measurement of subjective well-being, by Daniel Kahneman and Alan B. Krueger (2006) -- Reviews research done on measuring subjective wellbeing and introduces a new measure, the u-index, which aims to capture the amount of time people spend in an unpleasant state.
Paper: Frame-of-reference bias in subjective welfare, by Kathleen Beegle, Kristen Himelein, and Martin Ravallion (2012) -- Uses vignettes to test for frame-of-reference bias, or the bias that comes from personal heterogeneity in subjective scales, in Tajikistan. [Gated published version]

Measuring subjective expectations

Paper: Measuring subjective expectations in developing countries: A critical review and new evidence, by Adeline Delavande, Xavier Giné, and David McKenzie (2011) -- Provides a summary and review of new papers on using subjective probabilities and expectations across country settings. [Gated published version]
- Associated blog post: What’s new in measuring subjective expectations?, by David McKenzie (2016)
Paper: Measuring expectations, by Charles F. Manski (2004) -- Discusses the literature on measuring expectations through subjective probabilities, advances in this topic, and areas for future research. [Gated published version]
Paper: Expectations and perceptions in developing countries: Their measurement and their use, by Orazio P. Attanasio (2009) -- Reviews recent progress on the measurement of subjective expectations in low-and middle-income countries, and discusses possible avenues for additional research. [Gated published version]

Using games to measure trust, preferences, and risk aversion

Overview

There has been a growing interest in understanding how social norms and preferences affect behavior and decision making and thereby economic and political phenomena such as economic growth, poverty, or corruption. Understanding how they evolve and whether they are malleable are therefore of similar interest.

However, measuring norms and preferences such as trust, fairness, or risk aversion is not straightforward. Answers to survey questions asking directly about an individual’s preference may not be accurate due to either general biases such as interviewer effects (Binswanger 1980) and social desirability bias, or more specific biases that arise because of differences in conceptualizing abstract concepts like “trust” (Glaeser et al. 2000; Meki n.d.).

To mitigate these challenges, researchers often use behavioral experiments or games to measure preferences. During these experiments, participants are asked to choose between different options - preferably in an incentive-compatible way - and preferences are derived from individual choices within the experiment instead of relying on survey questions, often yielding more accurate measures of the concept of interest. Experiments to elicit preferences or norms can either be carried out in university labs (so-called lab experiments) or in more naturalistic settings (often referred to as lab-in-the-field experiments).

While both types of experiments help solve many of the issues that arise in survey measurements of norms and preferences, there are tradeoffs between the two. Lab experiments ensure tight control over the setting and thereby allow the researcher to eliminate potential confounders. However, they might not be a good proxy of how individuals make decisions in real life and often involve non-representative samples (e.g. university students). On the other hand, while lab-in-the-field experiments may be better able to mimic real-life decisions using a more theoretically relevant population, they provide less control over the setting and hence results can be more noisy and/or situation specific and therefore less generalizable and replicable (Gneezy and Imas 2017).
Finally, there are further considerations to keep in mind when using games to measure preferences and norms. For example, measures from experimental games have been shown to be sensitive to the timing of the experiment (Zelenski et al. 2003), its name (Libermann et al. 2004) and set of choice options (List 2007). Other papers address the difficulty of disentangling related concepts (e.g., Ashraf et al. 2006) or the validity of the proposed measures more generally (e.g., Dean and Sautmann 2019).

This section compiles resources that provide an overview of, and introduce innovative solutions to these and similar difficulties in using games to measure norms and preferences. It begins with resources that provide an introduction to using experiments in economics, listing common games and highlighting challenges. It then contains two sections that discuss the measurement of trust, cooperation & fairness and risk and time preferences, respectively.

- Katharina Kaeppel, Senior Research and Training Associate and Michala Riis-Vestergaard, Postdoctoral Training Associate

General resources

Paper: Field experiments in economics: An introduction, by Jeffrey P. Carpenter, Glenn W. Harrison and John A. List (2005) – Provides a broad introduction to field experimental games in economics, including a short survey and justification of their use.
Paper: Behavioural development economics: Lessons from field labs in the developing world, by Juan Camilo Cardenas and Jeffrey Carpenter (2006) -- An review of the use of field experiments to elicit preferences in lower and middle income countries.
Paper: Lab in the field: Measuring preferences in the wild, by Uri Gneezy and Alex Imas (2017) -- An overview of the use of lab in the field experiments, particularly their connection with and use within larger RCTs; also provides a methodology for testing the external validity of lab in the field experiments. [Gated published version]
Paper: Measuring social norms and preferences using experimental games: A guide for social scientists, by C. F. Camerer. & E. Fehr (2002) -- Goes through a selection of games that have been used by social scientists to measure social norms and preferences.
Paper: Research with experimental games: Questioning practice and interpretation, by Cecile Jackson (2011) -- Provides an introduction to key games used in both development economics and anthropology, and argues for increased incorporation of social theory in experimental economics.
Paper: Consistency in the time of experiment participation and personality correlates: a methodological note, by John M. Zelenski, Cheryl L. Rusting, and Randy J. Larsen (2003) -- Tests whether the time and date of sign-ups for lab experiments are correlated with any underlying traits in participants. [Gated]
Paper: The name of the game: Predictive power of reputations versus situational labels in determining prisoner’s dilemma game moves, by Varda Liberman, Steven M. Samuels, Lee Ross (2004) -- Tests whether the name of the game can affect the outcome. [Gated published version]
Paper: Traditional trust measurement and the risk confound: An experiment in rural Paraguay, by Laura Schecter (2007) -- Explores the connection between behavior in trust and risk games; specifically, tests whether they measure two separate indicators or whether they both pick up measures of risk.

Using distributional games to measure trust, cooperation, and fairness

Paper: Using experimental economics to measure social capital and predict financial decisions, by Dean Karlan (2005) -- Tests whether survey and lab in the field measures of social capital are predictive of real world behavior, in this case repayment of group loans. [Gated published version]
Paper: What do people bring into the game? Experiments in the field about cooperation in the commons, by Juan-Camilo Cárdenas and Elinor Ostrom (2004) -- Tests the effect of both a host of sociodemographic and behavioral variables and repeated plays on the outcomes of cooperation games. [Gated published version]
- For an example of research using cooperation games, see Velez et al. (2009).
Paper: Measuring trust, by Edward Glaeser, David Laibson, Jose Scheinkman, and Christine Soutter (2000) -- Compares the results of experiments eliciting trusting behavior and trustworthiness with standard survey measures of the two concepts.
- For examples of research using trust games, see Buchan et al. (2006), Danielson and Holm (2007) and Ligon and Schechter (2012).
Paper: Decomposing trust and trustworthiness, by Nava Ashraf, Iris Bohnet, and Nikita Piankov (2006) -- Attempts to isolate the determinants of trust and trustworthiness by running dictator and investment games in multiple contexts.
Paper: Let me sleep on it: Delay reduces rejection rates in ultimatum games, by Veronika Grimm and Friederike Mengel (2011) -- Examines the effect of delayed decisions on the acceptance of low offers in ultimatum games.
- For an example of research on fairness using ultimatum games, see Henrich et al. (2006).
Paper: On the interpretation of giving in dictator games, by John List (2007) --Tests whether changing the action set in dictator games by allowing dictators to take money as well as giving it changes the average outcome, and therefore the interpretation, of dictator games.
- For examples of research using dictator games, see Batista et al. (2013) and Binzel et al. (2013).

Risk and time preferences

Blog post: Measuring time preferences using behavioural tasks with monetary rewards, by Giulio Schinaia – Outlines two survey tasks used to measure time preferences, multiple price lists and convex time budgets.
Paper: Measuring time preferences: A comparison of experimental methods, by James Andreoni, Michael A. Kuhn, and Charles Sprenger (2015) – Test the predictive validity of two common lab-based measures of time preferences: convex time budgets and double multiple price lists.
Paper: Credit constraints and the measurement of time preferences, by Mark Dean and Anja Sautmann (2019) – Uses data from an experiment in Mali to argue that incentivized experiments may not elicit time preference parameters, but can instead provide useful information on financial shocks and constraints.
Blog post: Measuring risk attitudes in the field, by Muhammad Meki – A discussion of and guide to common field measures of risk.
Paper: On the measurement of risk aversion from experimental data, by Mette Wik,Tewodros Aragie Kebede, Olvar Bergland, & Stein T. Holden (2007). – Uses repeated games to determine the effect of wealth and gender on experimental measures of risk aversion.
Paper: Attitudes toward risk: Experimental measurement in rural India, by Hans Binswanger (1980) – Tests experimental measures of risk against measures collected from surveys.
Paper: Eliciting risk and time preferences, by Steffen Andersen, Glenn W. Harrison, Morten I. Lau, and E. Elisabet Rutström (2008) – Design and test a method for jointly estimating time and risk preferences.
Paper: Risk aversion and incentive effects: Comment, by Glenn W. Harrison, Eric Johnson, Melayne M. Mcinnes, and E. Elisabet Rutström (2005) -- Prior experience can affect behavior in games.
Paper: Revising commitments: Field evidence on the adjustment of prior choices, by Xavier Gine, Jessica Goldberg, Dan Silverman and Dean Yang (2016) – Outline and implement a test for the revision of prior time-dependent commitments.

Last updated February 2024. These resources are a collaborative effort. If you notice a bug or have a suggestion for additional content, please fill out this form.

Acknowledgments

We thank Aimee Barnes, Sarah Baum, Sam Carter, Anupama Dathan, Therese David, Maya Duru, Sarah Gault, Nilmini Herath, Eliza Keller, Tithee Mukhopadhyay, Kyle Murphy, Rohit Naimpally, William Pariente, Maike Pfeiffer, Mikaela Rabb, Andrea Cristina Ruiz, Emily Sylvia, and Caroline Tangoren for helpful review and comments, and Manvi Govil and Ximena Mercado Garcia for their help copy-editing the resource. Any errors are our own.

For example, Jayne et al. (2018) calculate 853 million USD spent by 10 African countries on input subsidy programs in 2014.

That is, that productivity is highest around a plot’s borders, an observation widely recognized in the agronomic literature.

Research Resources

Repository of measurement and survey design resources

Summary

General resources

General measurement resources

Practical guides to designing and implementing surveys

Sample questionnaires

Phone surveys

General guides

Research on response rates and mode effects

Practical survey guides

Covid-specific resources

Agriculture

Overview

General resources

Input use, productivity, and production

Labor

Land: Size, ownership, and fertility

Skills and knowledge

Other inputs

Production

Satellite and geographic measurement

Corruption in governance and service provision

Overview

Datasets

General resources

International indicators of corruption and governance

Specific approaches to measuring corruption

Through perception

Through survey estimates of bribes

Through direct observation

By comparing estimated and actual expenditure

From market inference

Using audits

Through other methods

Discrimination

Overview

General overview

Labor market discrimination

Education

Overview

Datasets

General:

Datasets of test scores:

General resources

Measuring literacy and numeracy: Developing valid and reliable tests

Measuring holistic skills

Student and teacher participation and effort

Teacher quality and value add

Household spending on education

Energy and environment

Overview

Environment: General resources

Environment: Measuring benefits

General resources

Hedonic method

Environment: Measuring bads and costs

Measuring pollution using air quality sensors

Measuring pollution using audits

Measuring deforestation using satellite imagery

Anthropometric measurement of costs

Energy: General resources

Energy: Access and use

Energy demand

Financial inclusion

Overview

General resources

Using survey/administrative data

Using survey data

Using administrative/non-survey data

Constructing indices or using aggregate measures

Constructing indices

Aggregate/macroeconomic measures

Gender

Overview

Data Sources

Questionnaires

Empowerment, autonomy, agency, and household decision-making

Empowerment

Agency