Combining Exposure Information from Various Sources in an Analysis of a Case-Control Study

This paper describes a method for estimating disease-exposure odds ratios in a case-control study where information on the exposure variable is available from several, possibly imperfect, sources. A hybrid approach is developed where a Bayesian perspective is used in combining information from multiple sources, although the ultimate analysis of the disease-exposure association is likelihood based and incorporates the design considerations from a frequentist perspective, namely matching cases and controls on the basis of certain characteristics. The basic analytical strategy involves using Gibbs sampling to draw several sets of actual exposure variables at random from their posterior distribution, conditional on the exposure ascertainment from several sources and other pertinent variables. Each set of drawn values of the actual exposure variable and the confounding variables are used as independent variables in a conditional logistic regression model with case-control status as the dependent variable. The resulting point estimates and their covariance matrices are then combined. This method is applied to a population-based case-control study of the risk of primary cardiac arrest and the intake of n-3 polyunsaturated fatty acids derived mainly from fish and seafood, which motivated this research. This hybrid strategy was developed for pragmatic reasons as these data will be used for several analyses from differing perspectives by different analysts. Hence, this paper also reports an evaluation from a frequentist perspective that investigates the sampling properties of estimates so derived through a simulation study that is similar in many respects to the actual data set analysed. These results show that the estimate of the log-odds ratio obtained by using the method described in this paper is better in terms of bias, the mean-square error and the confidence coverage when compared with the estimate obtained by using only one of the several sources as the exposure variable.