Researchers often wish to characterize small areas based on data from a sample of the local population, but while average values based on random samples are known to provide unbiased estimates, the variance of estimates can be quite large when samples are small. This problem arises in the use of published census data (such as for census tracts) as well as for many large-scale health-related surveys. This project evaluates the performance of Bayesian models for small area estimation with a particular focus on their performance with population data of the type provided by the census.
The project exploits the availability of a new source: 100% population data from the Censuses of 1880 and 1940 where people?s locations within cities can be geocoded. This allows multiple samples to be drawn at will from the population, and for small area estimates from these samples to be compared with the actual population values. The same methods can be applied with non-geocoded data for EDs in low density suburbs and for non-central city counties. The approach is innovative in three other major ways. First, most assessments of SAE model performance focus on the bias and variance in estimates of area means and proportions. The project deals with another concern that is substantively important to spatial demographers: the variance across areas (which is used to assess spatial inequality and residential segregation). Second, recognizing that most contemporary large scale surveys including the census use sample weights to represent complex survey designs and correct for patterns of unit non-response, we specifically assess methods to incorporate sample weights in the hierarchical Bayesian framework. Little is known about use of sample weights in unit-level spatial Bayesian models. This project will examine and further develop a new extension of spatial models to deal with sample weights. Third, the 1940 estimates for Chicago will draw on both dependence in both time and space, using data at the Enumeration District level in 1930.
In comparing results from different models we will examine a range of scenarios, varying sample proportions and the population size and geographic scale of small areas, estimating both means and proportions, comparing the effects of different sample designs, and assessing the contribution of drawing on multiple indicators. The result of this work will include guidelines for investigators for the most appropriate statistical method to use under the different design scenarios.