Research Resources

Data quality checks

Authors
Contributors
Summary

High-frequency checks, back-checks, and spot-checks can be used to detect programming errors, surveyor errors, data fabrication, poorly understood questions, and other issues. The results of these checks can also be useful in improving your survey, identifying enumerator effects, and assessing the reliability of your outcome measures. This resource describes use cases and how to implement each type of check, as well as special considerations relating to administrative data.

Introduction

This section covers how to check the quality of data through three types of checks:

  • High-Frequency Checks (HFCs) are daily or weekly checks for data irregularities.
  • Back-checks (BCs) are short, audit-style surveys of respondents who have already been surveyed.
  • Spot-checks (SCs) are unanticipated visits by senior field staff to verify enumerators are surveying when and where they should be. 

For each type of check, we cover the underlying logic, the implementation process, and how to use their results. Where available, we reference template forms and do-files to facilitate implementation of these methods.

High-frequency checks (HFCs)

As the name suggests, HFCs are checks of incoming data conducted on a regular basis (ideally daily). High-frequency checks can be run on survey data or administrative data. Regardless of the source, they should be run on as much of the data as possible.

For survey data, HFCs are used for identifying and correcting data errors, monitoring survey progress, measuring enumerator performance, and detecting data fraud. HFCs play a similar role for administrative data but can also be used to check its coherence (the degree to which the administrative data are comparable to other data sources) and its accuracy (e.g., information on any known sources of errors in the administrative data) (Iwig et al. 2013).

Types of HFCs

HFCs fall into five broad categories:

  1. To detect errors: Identify if there are issues with the survey coding or problems with specific questions.
    1. Survey coding: Suppose question 1a asks “Do you have children under the age of 18?” followed by question 1b: “(if yes) Are they in school?” If respondents who answer “no” to the first question are shown the second question, then the skip pattern is not working correctly and should be fixed.
    2. Missing data: Are some questions skipped more than others? Are there questions that no respondents answered? This may indicate a programming error.
    3. Categorical variables: Are respondents selecting the given categories or are many respondents selecting “None of the above”, or “other”? If conducting a survey, you may want to add categories or modify your existing ones.
    4. Too many similar responses: Is there a question where all respondents answer in the same way?
    5. Outliers: Are some respondents reporting values drastically higher or lower than the average response? Do these variables need to be top or bottom coded? Many outlier checks can be directly programmed into the survey, either to flag responses or bar responses that are outside the acceptable range.
    6. Respondent IDs: Are there duplicates of your unique identifiers? If so, does the reason why make sense? (e.g., one circumstance in which there may be duplicates of unique IDs is when surveyors have to end and restart an interview.) Are there blank or invalid IDs? This might be a sign your surveyors are not interviewing the correct respondent. 
  2. To monitor survey progress and track respondents: Checking these variables allows research teams to forecast how long it will take to complete a round of surveys while also identifying surveyors who are performing poorly. 
    1. How long do surveyors take to do one survey?
    2. How many surveys do surveyors complete in a day?
    3. Are the surveys being completed in one sitting or do respondents take breaks or stop the survey early?
    4. Are the correct respondents tracked and surveyed? Can you match respondents between rounds of data collection and across sources of data?
    5. Variables that measure survey progress might not be present in the data per se, but they can be constructed. You can collapse the dataset by enumerator in order to get this information. SurveyCTO automatically generates some variables which can be used here, such as SubmissionDate, startdate and enddate.
  3. To monitor surveyor performance: Identify if there are differences in responses that correspond to surveyors.
    1. Distribution checks: Is it the case that one of your surveyors is reporting households with drastically higher incomes than others? You should look at the distribution of missing values, “I don’t know/Refuse to answer,” and “No” responses to skip-order questions to detect if surveyors are fraudulently shortening the survey to make their job easier.
    2. Number of outliers: Similar to the check for outliers when looking for data errors, but now you should check the number of outliers each enumerator has. Enumerators with a high number of outliers might need to be re-trained or might indicate the enumerator is fabricating the data.
    3. Number of inconsistent responses: Check if some surveyors have high numbers of impossible responses (e.g., they report the head of a household is 30 but has a 28-year-old child, or they report the respondent has a college degree but is illiterate). This is also a sign the enumerator might need more training or is fabricating data.
    4. Productivity: Examine the count of surveys completed, communities covered, refusal (respondent refuses to be interviewed), and tracking rates (percent of targeted respondents reached) by enumerator.
  4. To detect data fraud:
    1. Duration of survey: Extremely short surveys might be an indication that the surveyor fabricated the data. 
    2. Location audits using GPS: Depending on your devices, you might be able to record the GPS location of the interviews, which will allow you to see if the surveyor is where they are supposed to be--or if they are staying in one place and completing multiple surveys, which might be a sign of fraud. Note that collecting GPS requires IRB approval.
    3. Audio audits: Some survey platforms, like SurveyCTO, allow research teams to collect audio recordings. These recordings can either be listened to closely to see if the enumerator was asking the questions correctly, or can be analyzed to determine if there were multiple speakers or if there was any speech at all. Note that recording audio requires IRB approval. These checks might detect surveyors who are cutting costs by taking the survey themselves and making up data.
    4. Suspiciously high number of “no” responses for skip orders: Questions that only trigger additional questions if a respondent answers “yes” might be fraudulently reported as “no” so that the surveyor has to do less work. This can be detected by comparing the rates of “no” responses across surveyors.
    5. Suspiciously short sections: With some surveying platforms, you can code “speed limits” on questions, which will either forbid an enumerator from moving past a question until a certain time has passed or will flag questions where the enumerator advanced too quickly. This requires some up-front piloting of questions in order to know what the average amount of time spent on each question is. 
  5. Special considerations for administrative data:
    1. Research teams should work with data providers to determine which variables can be checked for coherence (e.g., the average household income in this data should be no more than 2% off of the average household income reported in some other data source) as well for accuracy (e.g., there should be no more than 5% of households who don’t report an income each month).
    2. Detecting errors in administrative data is similar to detecting errors in survey data. In addition to the basic checks mentioned above you should also check variables for coherence and accuracy. Many administrative datasets are panel data, so you can perform additional logic checks (e.g., do respondents’ ages increase over time?).
    3. Tracking respondents is a primary goal with administrative data, both in the sense that you want to follow respondents over time and across datasets. Check if unique respondent IDs ever change (for instance, someone moves out of their parents’ house and creates a new household).  
    4. As you are not collecting the data, you might not know who was interviewed by which enumerator. Ideally you will work with the data provider to get this information. If the data provider is unwilling to share it, you should share any observations with issues with the data providers so they can work with their enumerators to ensure data quality.
    5. Your ability to detect data fraud depends largely on the coherence rules you determine with the data provider. Finding a high-quality dataset with similar respondents or in a similar context will help you determine if the data you are provided looks real or fraudulent.

Implementing HFCs

There are three main ways to implement HFCs:

  1. Custom do-files: This entails developing a do-file or R script checking for the above data quality issues. (J-PAL staff and affiliates: See example custom HFC Stata and R code and an HFC template.) Customized do-files have the advantage of being flexible and are especially useful when standardized tools will not suit your needs but require time upfront to develop. Not every potential data quality issue is foreseeable, so custom do-files might need periodic updating. 
  2. IPA user-written commands: Innovations for Poverty Action (IPA) developed commands to conduct HFCs. These also require an upfront investment in order to understand what each command does and how to use them. 
  3. SurveyCTO built-in features can be used to automate many data quality checks.

Regardless of implementation method, it is best to prepare HFC procedures before enumerators go to the field.

On a daily basis, the Research Assistant should download the new data, run the HFC code on it, flag any issues, and send flagged responses to the PI/Research Manager. This is usually done by creating a spreadsheet with some basic information on the respondent (i.e, their unique ID, location, phone number, and the problematic response) so that field staff can contact them to verify their response. Once field teams have verified the data, a do-file can be used to fix or reconcile any errors (important: never directly edit or override the raw data! Always make edits in a do-file). This do-file can be updated regularly to incorporate new edits as you conduct HFCs on incoming batches of data. 

On an ongoing (i.e., weekly or monthly) basis, the RA should maintain the HFC code (e.g., makes necessary adjustments). Changes to the HFC code should be made if you modify the survey (e.g., adding a response that was commonly given as an “Other- please specify” to the set of options). As more data is collected, you may be able to perform additional tests, such as comparing surveyors in one district to surveyors in another, or comparing responses to the same surveyor in different districts. You may want to modify the code to include these as time goes on. Discuss with your PIs how often modifications should be made to the HFC code.

Back-checks (BCs)

A back-check is when previously-interviewed respondents are re-interviewed by a new enumerator using a shortened version of the original survey. The responses to the back-checked survey are then compared to the respondent’s original responses to detect discrepancies. Back-checks are used for two main purposes: i) to hold surveyors accountable by verifying surveys are actually occurring, ii) to assess how well surveyors are administering the survey, and iii) to gauge the reliability of a survey measure by seeing how respondents’ answers change between the main and back-check surveys.

An important limitation to back-checks, however, is that it is sometimes difficult to distinguish between these three explanations (or other potential explanations) for a given discrepancy.

Selecting variables for the back-check survey

Variables to be included fall into three distinct categories, defined below. For each question (or variable), included in the survey, you will need to determine the range of acceptable deviation. You might think consumption could vary by as much as 10% from one survey to the next, while some variables (e.g., age, gender) should not vary in the timeframe of your survey.

  1. Type 1 variables check whether the surveyors are a) performing the interview and b) with the right respondent These are questions that should never change, regardless of the interviewer, location or time of day. Examples of these questions include things like gender, house structure, age (within a certain range) and past events (e.g., marriage, school attendance in the last year).
  2. Type 2 variables assess how well the surveyors are administering the survey. The responses to these questions are unlikely to change, but they are questions where the team will be tempted to cut corners. These may have been difficult for surveyors to understand or to administer due to complexity or sensitivity, including categorization questions (i.e., the surveyor categorizes the respondent’s answer), questions with a lot of examples, and skip questions (i.e., questions which if answered a certain way would shorten the survey). 
  3. Type 3 variables check the stability of your measures on key outcomes. They should include key outcomes, stratification variables, and other variables that are integral to understanding the intervention. These may or may not change over time. Examples of variables to include are income, consumption, quantities of inputs or goods, labor supply, or plot size, plot yield, etc.

Implementing back-checks

Once you have your list of back-check questions, follow standard survey procedures and have your back-check team administer it. This team should not be the same team conducting the original survey; you may have to hire and train additional staff. As such, back-checking surveys can carry a high cost. One money-saving alternative can be to record telephone numbers of respondents so that surveyors can call respondents instead of traveling to their locations. At the very least, the enumerator conducting the back-check should not be the same enumerator who conducted the original interview.

After the back-check surveys are complete, compare the responses in the original survey to the responses in the back-check survey. This can be done through a custom do-file (J-PAL staff and affiliates: see J-PAL’s template) or tools like IPA’s user-written commands. Responses that vary significantly between the two surveys (as defined above) should be flagged as an error. SurveyCTO has tools for conducting back-checks within the Monitor tab.

J-PAL’s Research Protocols encourage research teams to back-check at least 10% of respondents, as a best practice. Each enumerator should have at least one of their respondents back-checked, and any differences should be well-documented and reconciled.

Using the results of back-checks

  • Analyzing type 1 variables, you should look at the overall error rate. If it’s higher than 10% this is a red flag that there may be systemic problems in the questionnaire or administration, or that surveyors are fabricating data. Furthermore, you should examine error rates by surveyor and by question. If you have a large survey, you might consider looking at error rates by team and location. If errors are found, you may want to modify the problematic questions, retrain surveyors, and even let some survey staff go if they continue to cause high error rates after retraining.
  • The analysis of type 2 variables looks similar to the analysis of type 1. Consider the error rate both overall and broken down by problematic surveyor and question. Error rates above 10% in these questions should start conversations with your leadership team. If errors are found, it is advisable to re-train enumerators, meet with survey teams to review survey protocols and edit the questionnaire (with explicit permission from PIs).
  • To analyze type 3 variables, examine the overall error rates by question and perform stability checks (e.g., a t-test) on these variables to see if there are statistical differences between original and back-check data. If you find high rates of errors in type 3 variables, you should discuss these with your PI.

Spot-checks (SCs)

Spot-checks are when research staff observe surveyors conducting interviews. These are usually conducted by higher-level members of the research staff, such as the Research Manager, Research Assistant, Field Coordinator, or senior surveyors. According to J-PAL’s Research Protocols, it is a suggested best practice that 15% of surveys are spot-checked. One method for doing spot-checks is to check a higher percentage of surveys at the beginning of a survey to catch errors early, then to decrease the percentage checked over time (Robert, 2019).

What should you look for in spot-checks?

The goals of spot-checks are:

  • To confirm surveys are happening
  • To observe the quality of the surveys and surveyors. Areas to focus on include
    • Do participants seem to understand the survey?
    • Do surveyors seem to understand the survey?
    • Does the survey take too long?

Implementing spot-checks

Plan your spot-checks so that they are at least unpredictable (if not random) to the enumerators. You want to observe enumerators doing surveys as they would in the absence of observation. Therefore surveyors should not know ahead of time which surveys will be observed. Upon arriving to the survey, enumerators should be asked if they are comfortable being observed. If enumerators are uncomfortable, you should consider why this is the case (e.g., are they concerned that they will be fired for poor surveying?).

Next, all observers must be introduced to the respondent:

  • It is best to introduce them in a general, non-threatening way so that you don't make the respondent nervous about the extra scrutiny.
  • International observers could be particularly disruptive so minimize the number of spot-checks that include international observers. 
  • Generally, you should be concerned about Hawthorne effects during spot-checks. To minimize the risk of this, enumerators should be familiar with the research staff conducting the spot-check. 
  • Observers should fill out a spot-check form during the interview (J-PAL staff and affiliates: see an example spot-check form). Spot-check forms should include a rating of the enumerator, flagged areas for follow up (e.g., rewording of a question, etc), and any notes about the interview.

Finally, the data from the spot-check forms should not be accessible by the enumerators.

Using the results of spot-checks

Spot-check data can be used to test for enumerator effects: as it includes a question rating the enumerator’s quality, you can see if responses differ based on how the enumerator is ranked. You may also need to retrain enumerators who consistently earn low rankings of quality. Spot-checks also allow research teams to directly observe how respondents answer to questions. Questions that cause respondents to become upset, uncomfortable, or confused should be reworked to avoid this.

Last updated July 2020.

These resources are a collaborative effort. If you notice a bug or have a suggestion for additional content, please fill out this form.

Acknowledgments

We thank Maya Duru and Jack Cavanagh for helpful comments. Any errors are our own.

Additional Resources
  1. IPA User-written command: bcstats

  2. IPA User-written command: ipacheck

  3. J-PAL HFC exercises (J-PAL internal resource)

  4. J-PAL Research Protocol Checklist 

  5. J-PAL Template Back-check do-file (J-PAL internal resource)

  6. J-PAL Template HFC do-file and R script (J-PAL internal resource)

  7. J-PAL Template monitoring form (J-PAL internal resource)

  8. SurveyCTO: Survey design for data quality

Iwig, William, Michael Berning, Paul Marck, and Mark Prell. 2018. “Data Quality Assessment Tool for Administrative Data.” Federal Committee on Statistical Methodology Working Paper 46.

Robert, Christoper. 2019. “Collecting High Quality Data - Accurate Data” Lecture, Delivered as part of the 2T 2019 semester of J-PAL 102x Designing and Running Randomized Evaluations, Massachusetts, Cambridge. 

In this resource