Take practical steps to de-identifying and publishing research data with J-PAL’s new guides

woman reviewing a research guide

We are pleased to announce the publication of two new methods guides to de-identifying and publishing research data. These guides draw on J-PAL’s experience of publishing research data on randomized evaluations in the social sciences for more than a decade. They provide practical advice for students, researchers, and anyone else publishing their own or others’ data.

Researchers who plan to publish their data should take every effort to minimize the risk of re-identification of their study participants, as is commonly required by ethical standards, IRB protocols, and legal requirements. This is done through a process known as de-identification, in which variables that could be used to identify individuals are masked through techniques such as aggregation or encoding, or removed from the dataset altogether.

About the guides

The Guide to Publishing Research Data includes:

  • A list of considerations to make before publishing data, such as what information was provided to study participants and the IRB, the sensitivity of the data collected, and legal requirements
  • Sample consent form language that will allow future publication of de-identified data
  • A checklist for preparing data for publication
  • And more

The accompanying Guide to De-Identifying Data approaches de-identification as a process that reduces the risk of identifying individuals. It includes:

  • An overview of personally identifiable information (PII) and the responsibility of data users not to use data to try to identify human subjects
  • Recommendations for handling direct identifiers (such as full name, social security number, or phone number), as well as indirect identifiers (such as month/year of birth, nationality, or gender)
  • Guidance on de-identification steps to take throughout the research process, such as encrypting all data containing identifying information as soon as possible
  • A list of common identifiers, including those labeled by the United States’ Health Insurance Portability and Accountability Act (HIPAA) guidelines as direct identifiers
  • And more

Why publish de-identified research data

Increasing the availability of research data benefits researchers, policy partners who supported the studies, students who learn from using the data, and, importantly, the people from whom the data was collected. Data sharing can provide many benefits and opportunities to the research community, including:

  • Allowing for re-use of the data by researchers, policymakers, students, and teachers around the world
  • Providing opportunities for new research, such as meta-analyses and questions on external validity and generalizability of results
  • Enabling the replication and confirmation of published results as well as sensitivity or complementary analyses

J-PAL has been committed to making research more transparent for over a decade and supports the publication of de-identified research data in a digital repository such as J-PAL and IPA’s Datahub for Field Experiments in Economics and Public Policy, the Harvard Institute for Quantitative Social Sciences Dataverse, the Inter-university Consortium for Political and Social Research at the University of Michigan, or the Yale Institution for Social and Policy Studies Data Archive.

For more on J-PAL’s work in increasing transparency in research, please see here.

Some text in this post is excerpted from the J-PAL Guide to De-Identifying Data and the J-PAL Guide to Publishing Research Data.

Posted by Sarah Kopper, Research Manager, J-PAL Global, Anja Sautmann, Director of Research, Education, and Training, J-PAL, and James Turitto, Senior Research Manager, J-PAL Global