Publications

A comparative study of doubly robust estimators of the mean with missing data

Doubly robust (DR) estimators of the mean with missing data are compared. An estimator is DR if either the regression of the missing variable on the observed variables or the missing data mechanism is correctly specified. One method is to include the inverse of the propensity score as a linear term in the imputation model [D. Firth and K.E. Bennett, Robust models in probability sampling, J. R. Statist. Soc. Ser. B. 60 (1998), pp. 3–21; D.O. Scharfstein, A. Rotnitzky, and J.M. Robins, Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion), J. Am. Statist. Assoc. 94 (1999), pp. 1096–1146; H. Bang and J.M. Robins, Doubly robust estimation in missing data and causal inference models, Biometrics 61 (2005), pp. 962–972]. Another method is to calibrate the predictions from a parametric model by adding a mean of the weighted residuals [J.M Robins, A. Rotnitzky, and L.P. Zhao, Estimation of regression coefficients when some regressors are not always observed, J. Am. Statist. Assoc. 89 (1994), pp. 846–866; D.O. Scharfstein, A. Rotnitzky, and J.M. Robins, Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion), J. Am. Statist. Assoc. 94 (1999), pp. 1096–1146]. The penalized spline propensity prediction (PSPP) model includes the propensity score into the model non-parametrically [R.J.A. Little and H. An, Robust likelihood-based analysis of multivariate data with missing values, Statist. Sin. 14 (2004), pp. 949–968; G. Zhang and R.J. Little, Extensions of the penalized spline propensity prediction method of imputation, Biometrics, 65(3) (2008), pp. 911–918]. All these methods have consistency properties under misspecification of regression models, but their comparative efficiency and confidence coverage in finite samples have received little attention. In this paper, we compare the root mean square error (RMSE), width of confidence interval and non-coverage rate of these methods under various mean and response propensity functions. We study the effects of sample size and robustness to model misspecification. The PSPP method yields estimates with smaller RMSE and width of confidence interval compared with other methods under most situations. It also yields estimates with confidence coverage close to the 95% nominal level, provided the sample size is not too small.