BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Deduplicating databases of deaths in war: advances in adaptive blo
 cking\, pairwise classification\, and clustering - Patrick Ball (Human Rig
 hts Data Analysis Group)
DTSTART:20160912T110000Z
DTEND:20160912T113000Z
UID:TALK67310@talks.cam.ac.uk
CONTACT:INI IT
DESCRIPTION:Violent inter-state and civil wars are documented with lists o
 f the casualties\, each of which constitutes a partial\, non-probability s
 ample of the universe of deaths. There are often several lists\, with dupl
 icate entries within each list and among the lists\, requiring record link
 age to dedeuplicate the lists to create a unique enumeration of the known 
 dead.<br><br><span>This talk will explore how we do record linkage\, inclu
 ding: new advances in generating and learning from training data\; an adap
 tive blocking approach\; pairwise classification with string\, date\, and 
 integer features and several classifiers\; and a hybrid clustering method.
  Assessment metrics will be proposed for each stage\, with real-world resu
 lts from deduplicating more than 420\,000 records of Syrian people killed 
 since 2011.<br><br></span>
LOCATION:Seminar Room 1\, Newton Institute
END:VEVENT
END:VCALENDAR
