Record linkage is the process of identifying approximately duplicate entities in datasets and determining whether or not the two entities in fact refer to the same real world entity. Entities in record linkage commonly have name, location, and date attributes. The process of comparing these entities can frequently be accomplished by resolving each attribute pair into a similarity score using a comparison metric. Advanced metrics have been created by aggregating the outputs of a wide variety of such metrics into an artificial neural network which outputs a regression score. For example, combining a date comparison metric based upon calculating the difference in days between two dates with a date comparison that compares the similarity of dates' strings successfully combines a statistical significance metric with a metric that scores data entry errors. In this research, we compare simple and aggregate similarity metrics for dates, names, and locations on a large, post-blocking genealogical database.

