BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:AI+Pizza
SUMMARY:AI + Pizza October 2018 - Microsft Research Cambri
dge/University of Cambridge
DTSTART;TZID=Europe/London:20181026T173000
DTEND;TZID=Europe/London:20181026T190000
UID:TALK112918AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/112918
DESCRIPTION:*Speaker 1* - Marton Havasi \n\n*Title* - Minimal
Random Code Learning: Getting Bits Back from Compr
essed Model Parameters\n\n*Abstract* - While deep
neural networks are a highly successful model clas
s\, their large memory footprint puts considerable
strain on energy consumption\, communication band
width\, and storage requirements.\nConsequently\,
model size reduction has become an utmost goal in
deep learning. A typical approach is to train a se
t of deterministic weights\, while applying certai
n techniques such as pruning and quantization\, in
\norder that the empirical weight distribution be
comes amenable to Shannon-style coding schemes. Ho
wever\, as shown in this paper\, relaxing weight d
eterminism and using a full variational distributi
on over weights allows \nfor more efficient coding
schemes and consequently higher compression rates
. In particular\, following the classical bits-bac
k argument\, we encode the network weights using a
random sample\, requiring only a number of bits \
ncorresponding to the Kullback-Leibler divergence
between the sampled variational distribution and t
he encoding distribution. By imposing a constraint
on the Kullback-Leibler divergence\, we are able
to explicitly control the compression rate\, \nwhi
le optimizing the expected loss on the training se
t. The employed encoding scheme can be shown to be
close to the optimal information-theoretical lowe
r bound\, with respect to the employed variational
family. Our method sets new \nstate-of-the-art in
neural network compression\, as it strictly domin
ates previous approaches in a Pareto sense: On the
benchmarks LeNet-5/MNIST and VGG-16/CIFAR-10\, ou
r approach yields the best test performance for a
fixed memory budget\, and \nvice versa\, it achiev
es the highest compression rates for a fixed test
performance.\nJoint work with Robert Peharz and Jo
s\\'e Miguel Hern\\'andez-Lobato\n\n\n*Speaker 2*
- Patrick Fernandes\n\n*Title* - Structured Neural
Summarization\n\n*Abstract* - Summarization of lo
ng sequences into a concise statement is a core pr
oblem in natural language processing\, requiring n
on-trivial understanding of the input. Based on th
e promising results of graph neural networks on hi
ghly structured data\, we develop a framework to e
xtend existing sequence encoders with a graph comp
onent that can reason about long-distance relation
ships in weakly structured data such as text. In a
n extensive evaluation\, we show that the resultin
g hybrid sequence-graph models outperform both pur
e sequence models as well as pure graph models on
a range of summarization tasks.
LOCATION:Auditorium\, Microsoft Research Ltd\, 21 Station R
oad\, Cambridge\, CB1 2FB
CONTACT:Microsoft Research Cambridge Talks Admins
END:VEVENT
END:VCALENDAR