Statistical analysis of masked data

A model-based likelihood theory is presented for the analysis of data masked for confidentiality purposes. The theory builds on frameworks for missing data and treatment assignment, and a theory for coarsened data. It distinguishes a model for the masking selection mechanism, which determines which data values are masked, and the masking treatment mechanism, which specifies how the masking is carried out. The framework is applied to a variety of masking methods, including randomized response, subsampling of cases or variables, deletion, coarsening by grouping or rounding, imputation, aggregation, noise injection and simulation of artificial records.