BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Handling identifier error rate variation in data linkage of large 
 administrative data sources - Katie Harron (London School of Hygiene and T
 ropical Medicine)
DTSTART:20160913T123000Z
DTEND:20160913T130000Z
UID:TALK67330@talks.cam.ac.uk
CONTACT:INI IT
DESCRIPTION:<span>Co-authors: Gareth Hagger-Johnson (Administrative Data  
 Research Centre for England\, University College London)\, Ruth Gilbert  (
 Institute of Child Health\, University College London)\, Harvey  Goldstein
  (University of Bristol and University College London) <br></span> <span><
 br>Background: Linkage of administrative data with no unique identifier of
 ten  relies on probabilistic linkage. Variation in data quality on individ
 ual or  organisational levels can adversely affect match weight estimation
 \, and  potentially introduce selection bias to the linked data if subgrou
 ps of  individuals are more likely to link than others. We quantified indi
 vidual and  organisational variation in identifier error in a large admini
 strative dataset  (Hospital Episode Statistics\; HES) and incorporated thi
 s information within a  match probability estimation model. Methods: A str
 atified sample of 10\,000  admission records were extracted from 2011/2012
  HES for three cohorts of ages  0-1\, 5-6 and 18-19 years. A reference sta
 ndard was created by linking via NHS  number with the Personal Demographic
  Service for up-to-date identifiers. Based  on aggregate tables\, we calcu
 lated identifier error rates for sex\, date of birth  and postcode and inv
 estigated whether these errors were dependent on individual  characteristi
 cs and evaluated variation between organisations. We used a  log-linear mo
 del to estimate match probabilities\, and used a simulation study to  comp
 are readmission rates based on traditional match weights. Results: Match  
 probabilities differed significantly according to age (p<0.0001)\, ethnici
 ty  (p=0.0005) and sex (p<0.0001). Match probabilities were lower for male
 s  compared with females (odds ratio 0.84\; 95% CI 0.81-0.86)\; lower for 
 older  cohorts compared with infants (OR 0.39\; 95% CI 0.37-0.40 and 0.37\
 ; 95% CI  0.36-0.39 respectively) and lowest for Asian ethnicity (odds rat
 io 0.89\; 95% CI  0.84-0.94 compared with White ethnicity). Results from t
 he simulation study will  be presented. Discussion: We provide empirical e
 vidence on identifier error  variation in a widely-used administrative dat
 aset. We propose that modelling  identifier error rates and covariates\, a
 nd incorporating these characteristics  into match probability estimation\
 , can improve the quality of linkage.</span>
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR
