Skip to main content

Table 2 A comparison of the most common fields in the created synthetic data and the original data it was based on

From: The effect of data cleaning on record linkage quality

Surname (top 5) Synthetic
Per cent
Original
Per cent
Male forename (top 5) Synthetic
Per cent
Original
Per cent
Missing value 1.98   Missing value 1.99  
Smith 0.92 0.94 John 3.44 3.47
Jones 0.55 0.55 David 3.09 3.09
Brown 0.46 0.46 Michael 2.95 2.95
Williams 0.46 0.46 Peter 2.87 2.88
Taylor 0.44 0.44 Robert 2.47 2.47
Female forename (top 5) Synthetic
Per cent
Original
Per cent
Postcode (top 5) Synthetic
Per cent
Original
Per cent
Missing value 1.99   Missing value 1.01  
Margaret 1.57 1.56 6210 2.84 2.84
Susan 1.35 1.34 6163 2.33 2.34
Patricia 1.22 1.22 6027 2.06 2.05
Jennifer 1.19 1.20 6155 2.02 2.02
Elizabeth 1.05 1.05 6065 2.00 1.98