Real-life survey data are often unrepresentative due to selection bias and nonresponse. Classical design-based corrections rely on weighting to match the sample to the target population. Unit weights are constructed as a product of inverse probability of inclusion (selection and response) and calibration ratio factors. Low inclusion probabilities or small sample sizes can cause extreme values of weights. Volatile weights can result in unstable estimators, especially when the weights are weakly correlated with the survey outcomes. When estimating quantities more complicated than means and quantiles, such as regression coefficients, it is unclear whether one should or how to apply the weights under multiple adjustments.
Model-based approaches fit regression models conditional on all the variables that affect the probability of inclusion, but computational and modeling challenges arise because of the potential need to adjust for deep interactions of weighting variables, to select models or to include enough variables in the model so that the assumption of ignorable nonresponse is reasonable. Practical concerns also arise with complex survey designs such as cluster sampling when incomplete information is available on the clusters in the population.
We propose to develop and implement a unified framework for survey weighting through novel modifications of the multilevel regression and poststratification (MRP), an approach to survey adjustment, popular in social science research, that incorporates design-based information in a modeling framework. In MRP, data are partially pooled during the modeling process and then local estimates are combined via poststratification to obtain population inference. This smoothed estimation borrows information from neighboring poststratification cells and implements flexible multilevel modeling strategies, thus yielding robust performances against model misspecification.
National Science Foundation
09/01/2017 to 09/30/2019