Publications

Importance of Modeling the Sampling Design in Multiple Imputation for Missing Data

The theory of multiple imputation for missing data requires that imputations be made conditional on the sampling design. However, most standard software packages for performing model-based multiple imputation assume simple random samples, leading many practitioners not to account for complex sample design features, such as stratication and clustering, in their imputations. Theory predicts that analyses of such multiply-imputed data sets can yield biased estimates from the design-based perspective. In this article, we illustrate through simulation that (i) the bias can be severe when the design features are related to the survey variables of interest, and (ii) the bias can be reduced by controlling for the design features in the imputation models. The simulations also illustrate that conditioning on irrelevant design features in the imputation models can yield conservative inferences, provided that the models include other relevant predictors. These results suggest a prescription for imputers: the safest course of action is to include design variables in the specication of imputation models. Using real data, we demonstrate a simple approach for incorporating complex design features that can be used with some of the standard software packages for creating multiple imputations.

The Importance of Modeling the Sampling Design in Multiple Imputation for Missing Data. Available from: https://www.researchgate.net/publication/237835785_The_Importance_of_Modeling_the_Sampling_Design_in_Multiple_Imputation_for_Missing_Data [accessed Oct 2, 2017].