BEGIN:VCALENDAR VERSION:2.0 PRODID:-//Talks.cam//talks.cam.ac.uk// X-WR-CALNAME:Talks.cam BEGIN:VEVENT SUMMARY:Handling identifier error rate variation in data linkage of large administrative data sources - Katie Harron (London School of Hygiene and T ropical Medicine) DTSTART:20160913T123000Z DTEND:20160913T130000Z UID:TALK67330@talks.cam.ac.uk CONTACT:INI IT DESCRIPTION:Co-authors: Gareth Hagger-Johnson (Administrative Data Research Centre for England\, University College London)\, Ruth Gilbert ( Institute of Child Health\, University College London)\, Harvey Goldstein (University of Bristol and University College London)
< br>Background: Linkage of administrative data with no unique identifier of ten relies on probabilistic linkage. Variation in data quality on individ ual or organisational levels can adversely affect match weight estimation \, and potentially introduce selection bias to the linked data if subgrou ps of individuals are more likely to link than others. We quantified indi vidual and organisational variation in identifier error in a large admini strative dataset (Hospital Episode Statistics\; HES) and incorporated thi s information within a match probability estimation model. Methods: A str atified sample of 10\,000 admission records were extracted from 2011/2012 HES for three cohorts of ages 0-1\, 5-6 and 18-19 years. A reference sta ndard was created by linking via NHS number with the Personal Demographic Service for up-to-date identifiers. Based on aggregate tables\, we calcu lated identifier error rates for sex\, date of birth and postcode and inv estigated whether these errors were dependent on individual characteristi cs and evaluated variation between organisations. We used a log-linear mo del to estimate match probabilities\, and used a simulation study to comp are readmission rates based on traditional match weights. Results: Match probabilities differed significantly according to age (p<0.0001)\, ethnici ty (p=0.0005) and sex (p<0.0001). Match probabilities were lower for male s compared with females (odds ratio 0.84\; 95% CI 0.81-0.86)\; lower for older cohorts compared with infants (OR 0.39\; 95% CI 0.37-0.40 and 0.37\ ; 95% CI 0.36-0.39 respectively) and lowest for Asian ethnicity (odds rat io 0.89\; 95% CI 0.84-0.94 compared with White ethnicity). Results from t he simulation study will be presented. Discussion: We provide empirical e vidence on identifier error variation in a widely-used administrative dat aset. We propose that modelling identifier error rates and covariates\, a nd incorporating these characteristics into match probability estimation\ , can improve the quality of linkage. LOCATION:Seminar Room 1\, Newton Institute END:VEVENT END:VCALENDAR