BEGIN:VCALENDAR
VERSION:2.0
PRODID:-//talks.cam.ac.uk//v3//EN
BEGIN:VTIMEZONE
TZID:Europe/London
BEGIN:DAYLIGHT
TZOFFSETFROM:+0000
TZOFFSETTO:+0100
TZNAME:BST
DTSTART:19700329T010000
RRULE:FREQ=YEARLY;BYMONTH=3;BYDAY=-1SU
END:DAYLIGHT
BEGIN:STANDARD
TZOFFSETFROM:+0100
TZOFFSETTO:+0000
TZNAME:GMT
DTSTART:19701025T020000
RRULE:FREQ=YEARLY;BYMONTH=10;BYDAY=-1SU
END:STANDARD
END:VTIMEZONE
BEGIN:VEVENT
CATEGORIES:Microsoft Research Machine Learning and Perception
Seminars
SUMMARY:Multi-Label Learning with Millions of Categories &
amp\; Generalized Multiple Kernel Learning with a
Million Kernels - Manik Varma\, Microsoft Research
\, India
DTSTART;TZID=Europe/London:20120719T140000
DTEND;TZID=Europe/London:20120719T150000
UID:TALK38622AThttp://talks.cam.ac.uk
URL:http://talks.cam.ac.uk/talk/index/38622
DESCRIPTION:Multi-Label Learning with Millions of Categories\n
\nOur objective is to build an algorithm for class
ifying a data point into a set of labels when the
output space contains millions of categories. This
is a relatively novel setting in supervised learn
ing and brings forth interesting challenges such a
s efficient training and prediction\, learning fro
m only positively labeled data with missing and in
correct labels and handling label correlations. We
propose a random forest based solution for jointl
y tackling these issues. We develop a novel extens
ion of random forests for multi-label classificati
on which can learn from positive data alone and ca
n scale to large data sets. We generate real value
d beliefs indicating the state of labels and adapt
our classifier to train on these belief vectors s
o as to compensate for missing and noisy labels. I
n addition\, we modify the random forest cost func
tion to avoid overfitting in high dimensional feat
ure spaces and learn short\, balanced trees. Final
ly\, we write highly efficient training routines
which let us train on problems with more than a hu
ndred million data points\, over a million dimensi
onal sparse feature vector and over ten million ca
tegories. Extensive experiments reveal that our pr
oposed solution is not only significantly better t
han other multi-label classification algorithms bu
t also more than 10\\% better than the state-of-th
e-art NLP based techniques for suggesting bid phra
ses for online search advertisers.\n\nGeneralized
Multiple Kernel Learning with a Million Kernels\n\
nMultiple Kernel Learning (MKL) aims to learn the
kernel in an SVM from training data. Many MKL form
ulations have been proposed and some have proved e
ffective in certain applications. Nevertheless\, a
s MKL is a nascent field\, many more formulations
need to be developed to generalize across domains
and meet the challenges of real world applications
. However\, each MKL formulation typically necessi
tates the development of a specialized optimizatio
n algorithm. The lack of an efficient\, general pu
rpose optimizer capable of handling a wide range o
f formulations presents a significant challenge to
those looking to take MKL out of the lab and into
the real world. \n\nThis problem was somewhat all
eviated by the development of the Generalized Mult
iple Kernel Learning (GMKL) formulation which admi
ts fairly general kernel parameterizations and reg
ularizers subject to mild constraints. However\, t
he projected gradient descent GMKL optimizer is in
efficient as the computation of the step size and
a reasonably accurate objective function value or
gradient direction are all expensive. We overcome
these limitations by developing a Spectral Project
ed Gradient (SPG) descent optimizer which: a) take
s into account second order information in selecti
ng step sizes\; b) employs a non-monotone step siz
e selection criterion requiring fewer function eva
luations\; c) is robust to gradient noise\, and d)
can take quick steps when far away from the optim
um.\n\nWe show that our proposed SPG-GMKL optimize
r can be an order of magnitude faster than project
ed gradient descent on even small and medium sized
datasets. In some cases\, SPG-GMKL can even outpe
rform state-of-the-art specialized optimization al
gorithms developed for a single MKL formulation. F
urthermore\, we demonstrate that SPG-GMKL can scal
e well beyond gradient descent to large problems i
nvolving a million kernels or half a million data
points. Our code and implementation are available
publically.
LOCATION:Large public lecture room\, Microsoft Research Ltd
\, 7 J J Thomson Avenue (Off Madingley Road)\, Cam
bridge
CONTACT:Microsoft Research Cambridge Talks Admins
END:VEVENT
END:VCALENDAR