Inactive
Notice ID:2032H8-24-R-00003
The Internal Revenue Service (IRS) Statistics of Income Division (SOI) has a continuing need to create and test a new fully synthetic Public Use File (PUF), train SOI data science staff on applying th...
The Internal Revenue Service (IRS) Statistics of Income Division (SOI) has a continuing need to create and test a new fully synthetic Public Use File (PUF), train SOI data science staff on applying the necessary techniques to create a PUF, develop methodology for improving synthetic datasets in general, and to work with IRS IT and data security experts to evaluate security plans as part of a proposed validation process. The goal of this project is to continue the expansion of high-quality synthetic tax data and a validation process—two innovative new statistical methodologies designed to allow and expand research access to administrative tax data while protecting privacy. These are both cutting-edge approaches. Over the last seven years the new methods have been adapted to the large scale and complexity of tax datasets and finding ways to improve quality while protecting privacy. Using these methods, a synthetic supplemental PUF was created containing information about people who did not file income tax returns or have an obligation to file. Those data had never been publicly available before, but a synthetic version was produced that was safe to release. In combination with the PUF, these two files provide a more complete picture of the range of households in the US, including those with low incomes. A synthetic version of the 2012 PUF was created and a 2013 synthetic PUF is currently being tested. Evaluation of the 2012 synthetic PUF and the methodological contributions embodied in it were the subject of articles in the National Tax Journal and statistical journals.