Maths encyclopedia and lessons  
Search

Mathematics Encyclopedia and Lessons

 
     
 

Lessons

Popular
Subjects

algebra
arithmetic
calculus
equations
geometry
differential equations
trigonometry
number theory
probability theory
more
 

References

applied mathematics
mathematical games
mathematicians
more
 
 

Record linkage

Basics

Record linkage refers to the task of finding identical entries in two or more files. The initial idea goes back to Simon Newcomb, who tried to link two files containing personal data.

In 1969, Felegi and Sunter formalized these ideas. Their pioneering work "A Theory For Record Lankage" is, still today, the mathematical tool for any record linkage application.

Mathematical Model

In an application with two files, A and B, denote the rows (records) by α(a) in file A and β(b) in file B. Assign K characteristics to each record. The set of records that represent identical entities is defined by

M = \left\{ (a,b); a=b; a \in A; b \in B \right\}

and the complement of set M, namely set U representing different entities is defined as

U = \{ (a,b); a \neq b; a \in A, b \in B \}.

A vector, γ is defined, that contains the coded agreements and disagreements on each characteristic:

\gamma \left[ \alpha ( a ), \beta ( b ) \right] = \{ \gamma^{1} \left[ \alpha ( a ) , \beta ( b ) \right] ,...,	\gamma^{K} \left[ \alpha ( a ), \beta ( b ) \right] \}

where K is a subscript for the characteristics (sex, age, martial status, etc.) in the files. The conditional probabilities of observing a specific vector γ given (a, b) \in M, (a, b) \in U are defined as

m(\gamma) = P \left\{ \gamma \left[ \alpha (a), \beta (b) \right] | (a,b) \in M \right\} =  \sum_{(a, b) \in M} P \left\{\gamma\left[ \alpha(a), \beta(b) \right] \right\} \cdot                  P \left[ (a, b) | M\right]

and

u(\gamma) = P \left\{ \gamma \left[ \alpha (a), \beta (b) \right] | (a,b) \in U \right\} =  \sum_{(a, b) \in U} P \left\{\gamma\left[ \alpha(a), \beta(b) \right] \right\} \cdot                  P \left[ (a, b) | U\right], respectively.

01-04-2007 01:18:14
The contents of this article are licensed from Wikipedia.org
under the GNU Free Documentation License. How to see transparent copy