Maximum Likelihood Estimation for Mixed Continuous and Categorical Data with Missing Values

Maximum likelihood procedures for analysing mixed continuous and categorical data with missing values are presented. The general location model of Olkin ${ t&}$ Tate (1961) and extensions introduced by Krzanowski (1980, 1982) form the basis for our methods. Maximum likelihood estimation with incomplete data is achieved by an application of the EM algorithm (Dempster, Laird ${ t&}$ Rubin, 1977). Special cases of the algorithm include Orchard ${ t&}$ Woodbury's (1972) algorithm for incomplete normal samples, Fuchs's (1982) algorithms for log linear modelling of partially classified contingency tables, and Day's (1969) algorithm for multivariate normal mixtures. Applications include: (a) imputation of missing values, (b) logistic regression and discriminant analysis with missing predictors and unclassified observations, (c) linear regression with missing continuous and categorical predictors, and (d) parametric cluster analysis with incomplete data. Methods are illustrated using data from the St Louis Risk Research Project.