A case study of recorded linkage

Record linkage is a process of pairing records from two files and trying to select the pairs that belong to the same entity. The basic framework uses a match weight to measure the likelihood of a correct match and a decision rule to assign record pairsas “true” or “false” match pairs. Weight thresholds for selecting a record pair as matched or unmatched depend on the desired control over linkage errors. Current methods to determine the selection thresholds and estimate linkage errors can
provide divergent results, depending on the type of linkage error and the approach to linkage. This paper presents a case
study that uses existing linkage methods to link record pairs but a new simulation approach (SimRate) to help determine
selection thresholds and estimate linkage errors. SimRate uses the observed distribution of data in matched and unmatched
pairs to generate a large simulated set of record pairs, assigns a match weight to each pair based on specified match rules,
and uses the weight curves of the simulated pairs for error estimation.