University of Cambridge > Talks.cam > Isaac Newton Institute Seminar Series > Deduplicating databases of deaths in war: advances in adaptive blocking, pairwise classification, and clustering

Deduplicating databases of deaths in war: advances in adaptive blocking, pairwise classification, and clustering

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact INI IT.

DLAW02 - Data linkage: techniques, challenges and applications

Violent inter-state and civil wars are documented with lists of the casualties, each of which constitutes a partial, non-probability sample of the universe of deaths. There are often several lists, with duplicate entries within each list and among the lists, requiring record linkage to dedeuplicate the lists to create a unique enumeration of the known dead.

This talk will explore how we do record linkage, including: new advances in generating and learning from training data; an adaptive blocking approach; pairwise classification with string, date, and integer features and several classifiers; and a hybrid clustering method. Assessment metrics will be proposed for each stage, with real-world results from deduplicating more than 420,000 records of Syrian people killed since 2011.

This talk is part of the Isaac Newton Institute Seminar Series series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity