Medicaid and Children’s Health Insurance Program (CHIP) files

Centers for Medicare and Medicaid Services (CMS), Research Data Assistance Center (ResDAC), Chronic Conditions Warehouse (CCW)

These datasets contain enrollment and claims data for persons enrolled in Medicaid and the Children’s Health Insurance Program (CHIP) in all 50 states and DC. 

Unit of Observation:
Individual beneficiary; Medicaid claim
Personally Identifiable Information Available for Linking:
Yes
Geography:
United States of America
Years Available:
Varies by state; MAX files are available 1999-2013. Files from 2013-2015 are available as a combination of MAX and TAF. TAF data are available up to 2019. See File Availability Chart and RIF Availability Search Engine
Cost:
Paid
Frequency of Updates:
TAF: Monthly. MAX: Quarterly.
Universe:

Medicaid fee-for-service beneficiaries and eligible children enrolled in CHIP in all United States and DC. Limited coverage of beneficiaries in managed care organizations.

Access

Medicaid and CHIP research files are research-friendly administrative files that are curated by CMS with information each states’ Medicaid and CHIP agencies submit via the Transformed Medicaid Statistical Information System (T-MSIS), and previously via the Medicaid Statistical Information System (MSIS), which is now decommissioned. Originally, these data are difficult to use for research because they contain a mixture of time periods. Therefore, from the T-MSIS and MSIS data, CMS derives the Medicaid Analytic eXtract (MAX) and T-MSIS Analytic Files (TAF)—organized by calendar year and comprised of both fee-for-service and managed care claims data.

Whether MAX or TAF data are available depends on the year each state transitioned to using T-MSIS to report the data. Most states transitioned between 2013 and 2015, and, by 2016, all the states reported using T-MSIS. 

To access CMS files, researchers must submit data requests to the Research Data Assistance Center (ResDAC)—the intermediary for processing and filing requests for these data—at the University of Minnesota. ResDAC then sends the data request materials to CMS for review and approval. Data are available to researchers as Research Identifiable Files (RIF). A formal Data Use Agreement (DUA), a RIF Specifications Worksheet that includes the details of the data requested, a RIF Study Protocol that describes the objectives, background, methods and importance of the research proposed, a RIF request letter, an executive summary and approval from the CMS Privacy Board (IRB Common Rule and HIPAA Waiver Approval) are required to gain access to MAX and TAF files.  ResDAC has a Request Material Tool for more details about what documentation a specific study requires.

Researchers have historically accessed CMS data through a Physical Research Data Request process, in which data are mailed to users via external media. Alternatively, researchers can access Medicaid data through the CMS Virtual Research Data Center (VRDC), hosted by the Chronic Conditions Warehouse (CCW) which is the CMS data distributor. 

Researchers working in the VRDC have direct access to approved data files and conduct their analysis within the CMS secure environment. Users must go through Remote Identity Proofing (RIDP) prior to obtaining a CCW User ID to access the VRDC.

According to the CCW, the process for accessing data through the Physical Research Data Request process and through the VRDC are very similar. The Physical Research Data Request process, including all necessary forms and tips for completing them, and the process for accessing records through the VRDC is detailed on the website.

Timeline for Access

ResDAC recommends planning a minimum of 4 to 5 months between a draft application for data and receipt of the data. Privacy Board approval may take from 6 to 8 weeks, and data processing may last 4-6 weeks. For more details, visit the ResDAC CMS Timeline.

Physical Research Data Request: Data must be destroyed upon reaching the expiration date of the DUA. Researchers must request an extension to continue working with the data beyond that time.

Virtual Research Data Center Request: Project data are stored within the VRDC for a three-year period. Note that this only includes analysis files and does not include the raw data files. The analysis is maintained so the researcher can access the data in case they need respond questions for journal publication. However, the researcher will not have access to the source files after their license expires.
 

Lag Time

MSIS data was reported on a quarterly basis and 7 fiscal year quarters were needed in order to create 1 calendar year of MAX data. T-MSIS data which replaced MSIS and is used to produce TAF is reported on a monthly basis. As of June 2022, there are data available up to 2015 for MAX and 2019 for TAF. 

Cost

For Physical Research Data Requests, CMS generally charges for data by the number of beneficiaries in the requested cohort, number of files requested, frequency, whether a finder file is needed, and file update status. Use the cost estimator tool to generate an estimate specific to cohort size and data requested. The ResDAC Assistance Desk can provide a formal cost estimate with the completion of a specifications worksheet submitted to [email protected]. For MAX data including up to 1 million beneficiaries for one year, the fee per data set (i.e., per file per year of data) is estimated to be about $2000. See CMS Fee Information for complete descriptions of fee types and more information. For TAF, the fee per data set including up to 1 million beneficiaries for one year ranges between $3,500 and $4,000. Besides these prices, the total fee also includes a finder file fee. A finder file is a file that identifies the people the researcher wants to include in their study. The researcher can also include their own finder file submitting encrypted personal identifiers. For more detailed information, see the Finder File Encryption Policy.      

For VRDC requests, there is a seat access fee, project fee, and space/usage cost. Seats must be renewed every year, and the project fee is also an annual charge. To gain an accurate understanding of VRDC fees, a formal cost estimate is recommended. Standard seat access fee per user is $25,000 per seat. Additionally, researchers are subject to a one-time project fee that is data specific starting at $15,000. For requested files that exceed 500 GB of space, an additional $2,000 fee is imposed. Access can be renewed or terminated on a quarterly or annual basis ($25,000 annual renewal fee, $6,250 quarterly renewal fee).
 

Linking

According to ResDAC, MAX and TAF files are available to researchers by custom cohort (e.g., list of pre-defined study participants or all beneficiaries in x state with y claim type). For linking purposes, the researcher needs to submit a finder file which should include identifiers. In the Finder File Encryption Policy, researchers can find information on finder file submission shipping address and requirements, identifier options for finder files and crosswalks, and required formats for submitted finder files.  According to CCW’s Finder File Encryption Policy, finder files must consist of the following types:

Identifiers Available for Linking

  • Beneficiary IDs received from a previous data shipment from CMS
  • Health Insurance Claim numbers
  • Social Security numbers
  • Medicare Beneficiary Identifier numbers
  • RES_ID / State Code - Identifies resident in the national repository
  • Unique Physician Identification Number
  • National Provider Identifiers
  • Employer Identification Number/Tax Identification Number
  • Secondary identifiers: last name* date of birth, zip code, and partial SSN or HIC (*Note that last name is an unreliable search criterion; users should prepare for a lower match rate when using last name to link records)

Linking to Outside Data Sources

According to the RIF Data Use Agreement, researchers must request permission prior to matching CMS data with any external data sources, or with CMS files not listed in the initial DUA. 

Data Contents

Partial List of Variables

Inpatient file (claim file): diagnoses, procedures, discharge status, length of stay, and payment amount, discharge and procedure date.

Long-Term Care file (claim file): facility type, dates of service, and discharge status; contains diagnosis codes but does not contain procedure codes.

Other Therapy/Other Services file (claim file): diagnosis codes, procedure codes, and date of service.

Personal Summary file (enrollment file): demographic data (e.g., date of birth, gender, race), basis of eligibility, maintenance assistance status, monthly enrollment status, and utilization summary.

Prescription Drug file (claim file):drug code, days supply covered, charges and payments, provider identifier; does not contain diagnosis or procedure codes. 
 

J-PAL Randomized Evaluations Using this Data Set

Jacob, Brian A., Max Kapustin, and Jens Ludwig. 2014. "The Impact of Housing Assistance on Child Outcomes: Evidence from a Randomized Housing Lottery.” The Quarterly Journal of Economics 130(1): 465-506. doi: 10.1093/qje/qju030

Other Research Using this Data Set

Pollack, Craig E., Shawn Du, Amanda L. Blackford, and Bradley Herring. 2019. “Experiment to Decrease Neighborhood Poverty had Limited Effects on Emergency Department Use.” Health Affairs 38(1): 1442-1450. doi: 10.1377/hlthaff.2019.00452.

Last reviewed