BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//Talks.cam//talks.cam.ac.uk//
X-WR-CALNAME:Talks.cam
BEGIN:VEVENT
SUMMARY:Generating partially synthetic data to protect confidentiality in 
 survey microdata - Robin Mitra\, University of Southampton
DTSTART:20121127T143000Z
DTEND:20121127T153000Z
UID:TALK39607@talks.cam.ac.uk
CONTACT:Dr Jack Bowden
DESCRIPTION:There is often a tension between the needs of researchers to a
 ccess data for analysis and the needs of data-holding organizations to pro
 tect confidentiality.  There are various approaches data-holding organizat
 ions can apply\, such as data swapping or top coding\, that alter values i
 n the data so as to protect confidential information. However\, these meth
 ods typically alter the statistical properties of the data\, and thus redu
 ce the utility of the data.\n\nAnother approach data-holding organizations
  could employ is to replace values in the data with multiple imputations t
 o create partially synthetic data sets. As the synthetic data comprise a m
 ix of actual and simulated values\, confidentiality risks are mitigated to
  an extent. The imputations are typically drawn from a statistical model t
 hat seeks to capture relationships between all the variables in the data\,
  so the synthetic data should replicate the statistical properties present
  in the original data. Thus\, users of the synthetic data should\, in theo
 ry\, draw similar conclusions to those that would have been obtained from 
 an analysis of the original data.\n\nIn this talk\, I will review the synt
 hetic data approach to protecting confidentiality\, and consider risks ass
 ociated with this approach. I will also describe an application of this ap
 proach to protecting confidentiality in the UK 1991 Sample of Anonymised R
 ecords (SARs). The SARs is a 2% sample of the UK census data containing ov
 er 1 million records with mainly categorical variables. This makes it chal
 lenging to form statistical models for synthesis as well as to decide whic
 h values in the data should be synthesized.\n\nReferences:\n\nReiter\, J.P
 . (2003). Inference for partially synthetic\, public use microdata sets. S
 urvey Methodology 29\, 181-188\n\nReiter\, J. P. (2005). Using CART to Gen
 erate Partially Synthetic\, Public Use Microdata. Journal of Official Stat
 istics 21\, 441–462.\n
LOCATION:Large  Seminar Room\, 1st Floor\, Institute of Public Health\, Un
 iversity Forvie Site\, Robinson Way\, Cambridge
END:VEVENT
END:VCALENDAR