SynPUFs are data that allow users who are interested in using Medicare claims data to gain familiarity with those data without making the large monetary investment into purchasing Medicare data. They provide data analysts and software developers the opportunity to develop programs and products utilizing the identical formats and variable names as those which appear in the actual CMS data files. After working with these synthetic files, users should be much better informed about which CMS data products they would need to acquire to fulfill their analytic needs.
We see the files being used to:
- Allow data entrepreneurs to develop and create software and applications that may eventually be applied to actual CMS claims data;
- Train researchers on the use and complexity of conducting analyses with CMS claims data prior to initiating the process to obtain access to actual CMS data; and,
- Support safe data mining innovations that may reveal unanticipated knowledge gains while preserving beneficiary privacy.
Although these files have very limited inferential research value to draw conclusions about Medicare beneficiaries due to the synthetic processes used to create the files, they increase access to realistic Medicare claims data files in a timely and less expensive manner to spur the innovation necessary to achieve the goals of better care for beneficiaries and improve the health of the population.
The first one available is the Data Entrepreneur's Synthetic PUF (DE-SynPUF). The DE-SynPUF is built from a 5 percent random sample of Medicare beneficiaries in 2008, and their claims from 2008 through 2010. The DE-SynPUF contains five types of data – Beneficiary Summary, Inpatient Claims, Outpatient Claims, Carrier Claims, and Prescription Drug Events. Each file contains the same variables across years.