Blog

September 19, 2016

Michigan Program in Survey Methodology Graduate Hanzhi Zhou and Her Colleagues Publish New Techniques for Imputing Missing Values in Complex Sample Survey Data Sets

By Brady West

Together with her colleagues from SRC (Michael Elliott and Trivellore Raghunathan), 2014 Michigan Program in Survey Methodology (MPSM) PhD graduate Hanzhi Zhou recently published three studies from her dissertation that provide survey statisticians with important new techniques for imputing missing data in complex sample survey data sets: an original article in Biometrics entitled “A Two-Step Semiparametric Method to Accommodate Sampling Weights in Multiple Imputation,” an original article in the Journal of Official Statistics entitled “Synthetic Multiple-Imputation Procedure for Multistage Complex Samples” and an original article in the Journal of Survey Statistics and Methodology entitled “Multiple Imputation in Two-Stage Cluster Samples using the Weighted Finite Population Bayesian Bootstrap.”

All three articles fall under the general theme of Hanzhi’s dissertation, which focused on innovative techniques for imputing missing values in complex sample survey data. Prior to the publication of these articles, common guidance for this practice was to include the complex sample design features, including stratum codes, cluster codes, and sampling weights, as predictor variables in all imputation models. This raised questions about appropriate model specification, however, and several researchers pointed to the need for more principled techniques that better reflected the final population sampling that gave rise to the survey data. These articles present variations of the basic idea of “uncomplexing” complex sample survey data for imputation purposes, or simulating finite populations of interest from the collected survey data (including missing values), and then imputing missing values in the finite population data sets under simple random sampling assumptions.

In the Biometrics article, the authors focus on the incorporation of survey weights into multiple imputation inference, arguably the most difficult design element to incorporate. In the JSSAM article, these authors extend their methodology to account for clustering and use simulation studies and analyses of real data to demonstrate that the proposed methodology greatly improves survey estimates relative to imputations that ignore the complex sampling features entirely. The authors also note that the methods are straightforward to implement using existing software. In the JOS article, the authors extend to general settings incorporating weights, clusters, and stratification, attacking the prior approaches used for imputation in these settings and noting the inefficiencies introduced in complex sample designs with many first-stage strata and the uncertainty related to modeling the relationships of sampling weights with key survey variables. The authors once again use simulation studies and analyses of real survey data to note the advantages of their proposed approach relative to approaches that incorporate sampling features in the imputation models. They also provide R code for implementing their proposed methodology.

This work has important practical implications for survey statisticians charged with imputing missing values in complex samples. The authors also note several directions for future research, including adjustment for unit nonresponse. In sum, the authors have developed a novel framework for dealing with item-missing data in surveys that should produce many interesting multiple imputation approaches for complex sample survey data in the near future.