ISR Awards

Synthetic Data Generation for Small Area Estimation

Government agencies are facing an increasing demand for publicly available datasets for small geographic areas. The Census Bureau regularly collects information from small geographic areas and is therefore in a unique position to meet this demand. However, the Bureau is prevented from releasing small area identifiers because the data do not satisfy disclosure restrictions (Census Dissertation Fellowship RFP Topic I). Current disclosure avoidance practices involve suppressing geographical details or making the data available through Research Data Centers (RDC), but neither method fully satisfies the demands for publicly available datasets for small geographic areas. Synthetic data generation is an alternative method that may permit the release of data products that contain enough geographical details to estimate small area statistics. A further benefit of the synthetic data approach is that it can incorporate complex sample design features and increase analytic sample sizes to the extent that data users can apply unweighted direct estimation methods to facilitate small area estimation (Census Dissertation Fellowship RFP Topic F). I propose to test a new method for generating synthetic microdata that may be publicly released with small area identifiers to permit small area estimation. Throughout this proposal, I define ?small areas? to be counties, though the proposed framework can be extended to handle lower levels of geography (e.g., Census tracts, block groups) or other domains of interest. This research has implications for enhancing data confidentiality and increasing data utility and may result in public-use data products, such as micro-datasets and/or tables that contain more detailed categories than are currently being released for small geographic areas and/or other domains of interest.