University of Cambridge > Talks.cam > MRC Biostatistics Unit Seminars > Generating partially synthetic data to protect confidentiality in survey microdata

Log in

University Account

External (via Google)

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Generating partially synthetic data to protect confidentiality in survey microdata

Download to your calendar using vCal

Robin Mitra, University of Southampton
Tuesday 27 November 2012, 14:30-15:30
Large Seminar Room, 1st Floor, Institute of Public Health, University Forvie Site, Robinson Way, Cambridge.

If you have a question about this talk, please contact Dr Jack Bowden .

There is often a tension between the needs of researchers to access data for analysis and the needs of data-holding organizations to protect confidentiality. There are various approaches data-holding organizations can apply, such as data swapping or top coding, that alter values in the data so as to protect confidential information. However, these methods typically alter the statistical properties of the data, and thus reduce the utility of the data.

Another approach data-holding organizations could employ is to replace values in the data with multiple imputations to create partially synthetic data sets. As the synthetic data comprise a mix of actual and simulated values, confidentiality risks are mitigated to an extent. The imputations are typically drawn from a statistical model that seeks to capture relationships between all the variables in the data, so the synthetic data should replicate the statistical properties present in the original data. Thus, users of the synthetic data should, in theory, draw similar conclusions to those that would have been obtained from an analysis of the original data.

In this talk, I will review the synthetic data approach to protecting confidentiality, and consider risks associated with this approach. I will also describe an application of this approach to protecting confidentiality in the UK 1991 Sample of Anonymised Records (SARs). The SARs is a 2% sample of the UK census data containing over 1 million records with mainly categorical variables. This makes it challenging to form statistical models for synthesis as well as to decide which values in the data should be synthesized.

References:

Reiter, J.P. (2003). Inference for partially synthetic, public use microdata sets. Survey Methodology 29, 181-188

Reiter, J. P. (2005). Using CART to Generate Partially Synthetic, Public Use Microdata. Journal of Official Statistics 21, 441–462.

This talk is part of the MRC Biostatistics Unit Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Generating partially synthetic data to protect confidentiality in survey microdata

📅 Download to calendar (vCal)

👤 Speaker: Robin Mitra, University of Southampton
📅 Date & Time: Tuesday 27 November 2012, 14:30 - 15:30
📍 Venue: Large Seminar Room, 1st Floor, Institute of Public Health, University Forvie Site, Robinson Way, Cambridge

Questions? Contact Dr Jack Bowden

Abstract

References:

Reiter, J.P. (2003). Inference for partially synthetic, public use microdata sets. Survey Methodology 29, 181-188

Reiter, J. P. (2005). Using CART to Generate Partially Synthetic, Public Use Microdata. Journal of Official Statistics 21, 441–462.

Series This talk is part of the MRC Biostatistics Unit Seminars series.

Included in Lists

Note: Ex-directory lists are not shown.

Log in

🔐 Log In

Information on

ℹ️ Information

Generating partially synthetic data to protect confidentiality in survey microdata

This talk is included in these lists:

Generating partially synthetic data to protect confidentiality in survey microdata

Abstract

Included in Lists

Log in

🔐 Log In

Information on

ℹ️ Information

Generating partially synthetic data to protect confidentiality in survey microdata

This talk is included in these lists:

Other lists

Other talks

Generating partially synthetic data to protect confidentiality in survey microdata

Abstract

Included in Lists