Conditional Properties of Some Estimators in Stratified Sampling

The prediction properties of the stratified expansion estimator, the separate and combined ratio estimators, and the separate and combined regression estimators are studied under a model appropriate to a population stratified on a size variable. Several estimators of variance for each total estimator are considered, including standard ones from probability sampling theory, alternative choices derived from a superpopulation model, and the jackknife. The theory is tested in an empirical study using a real population. Earlier studies of the ratio and regression estimators under simple random sampling plans have illustrated that conditional properties of those estimators and of the linearization variance estimators that are often used with them can be much different and less desirable than unconditional properties. Whether similar results hold for stratified samples and estimators has been the subject of some debate. This article illustrates both theoretically and empirically that the use of stratification and reasonably large samples is not necessarily sufficient to produce reliable inferences. Over all stratified samples, estimators may be unbiased and confidence intervals may have nominal coverage probabilities, but conditional on certain sample characteristics this may not be true. To illustrate the prediction theory for the stratified expansion estimator, the combined ratio estimator, and the combined regression estimator, a simulation study was conducted using a population of iron and steel foundries. The variable $y$, whose population total was estimated, was employment in a particular month, and the auxiliary variable $x$ was the employment one year earlier for each establishment. The population was divided into five size strata based on the auxiliary, and two sets of 2,000 stratified simple random samples were selected–one set with a total sample of 50 units and one with a total of 200. In samples that were extreme, as measured by the size of the stratified sample mean of $x$, the estimators of totals and the conventional variance estimators generally performed poorly. For the ratio and regression estimators variance estimators derived from model-based considerations and the jackknife variance estimator had better conditional properties than the conventional choices but also had difficulties in extreme samples. The empirical results illustrate that subtle departures from a straightline regression of $y$ on $x$ can lead to poor conditional inferences using either the combined ratio or regression estimators. The results give support to survey practices such as stratified systematic sampling, stratified balanced sampling, and fine substratification that restrict randomization beyond the degree of stratification studied here.