BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Isaac Newton Institute Seminar Series
SUMMARY:Data compression with statistical guarantees - Syl
via Richardson (University of Cambridge)
DTSTART;TZID=Europe/London:20170703T133000
DTEND;TZID=Europe/London:20170703T141500
UID:TALK73129AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/73129
DESCRIPTION:Joint talk with Daniel Ahfock (MRC Biostatistics U
nit @ University of Cambridge)
The
talk is concerned with translating recent ideas fr
om computer science on probabilistic data-c
ompression techniques into a statistical framework
that can be &lsquo\;safely&rsquo\; applied for sp
eeding linear regression analyses for very larges
sample sizes in bio-medicine.
 \
;Our motivation is to facilitate the use of multiv
ariate regression and model exploration in tall da
ta sets\, so that\, for example\, genetic associat
ion analyses carried out on hundreds of thousands
of subjects can investigate multivariate effects f
or a set of explanatory features\, rather than be
restricted to one feature at a time associations f
or computational feasibility.
Among
the many approaches to dealing with tall data\, pr
obabilistic data compression techniques using rand
om linear mapping\, developed in the comput
er science community\, so called sketching\
, are particularly suitable for linear regression
problems. In the first part of the talk\, we will
present a hierarchical representation of sketching
\, which allows deriving statistical properties (d
istributional) of different sketching algorithms.
In particular\, we will discuss how the signal to
noise ratio in the original data set is important
for the choice of sketching algorithm. In the seco
nd part of the talk\, we will further refine some
of the approximation guarantees and consider itera
tive sketches. The talk will be illustrated on a g
enetic analysis of the link between a blood cell t
rait and the HLA region involving a sample of 130\
,000 people.
http://arxiv.org/abs/1706.03665
LOCATION:Seminar Room 1\, Newton Institute
CONTACT:INI IT
END:VEVENT
END:VCALENDAR