Multi-Label Learning with Millions of Categories & Generalized Multiple Kernel Learning with a Million Kernels
- đ¤ Speaker: Manik Varma, Microsoft Research, India
- đ Date & Time: Thursday 19 July 2012, 14:00 - 15:00
- đ Venue: Large public lecture room, Microsoft Research Ltd, 7 J J Thomson Avenue (Off Madingley Road), Cambridge
Abstract
Multi-Label Learning with Millions of Categories
Our objective is to build an algorithm for classifying a data point into a set of labels when the output space contains millions of categories. This is a relatively novel setting in supervised learning and brings forth interesting challenges such as efficient training and prediction, learning from only positively labeled data with missing and incorrect labels and handling label correlations. We propose a random forest based solution for jointly tackling these issues. We develop a novel extension of random forests for multi-label classification which can learn from positive data alone and can scale to large data sets. We generate real valued beliefs indicating the state of labels and adapt our classifier to train on these belief vectors so as to compensate for missing and noisy labels. In addition, we modify the random forest cost function to avoid overfitting in high dimensional feature spaces and learn short, balanced trees. Finally, we write highly efficient training routines which let us train on problems with more than a hundred million data points, over a million dimensional sparse feature vector and over ten million categories. Extensive experiments reveal that our proposed solution is not only significantly better than other multi-label classification algorithms but also more than 10\% better than the state-of-the-art NLP based techniques for suggesting bid phrases for online search advertisers.
Generalized Multiple Kernel Learning with a Million Kernels
Multiple Kernel Learning (MKL) aims to learn the kernel in an SVM from training data. Many MKL formulations have been proposed and some have proved effective in certain applications. Nevertheless, as MKL is a nascent field, many more formulations need to be developed to generalize across domains and meet the challenges of real world applications. However, each MKL formulation typically necessitates the development of a specialized optimization algorithm. The lack of an efficient, general purpose optimizer capable of handling a wide range of formulations presents a significant challenge to those looking to take MKL out of the lab and into the real world.
This problem was somewhat alleviated by the development of the Generalized Multiple Kernel Learning (GMKL) formulation which admits fairly general kernel parameterizations and regularizers subject to mild constraints. However, the projected gradient descent GMKL optimizer is inefficient as the computation of the step size and a reasonably accurate objective function value or gradient direction are all expensive. We overcome these limitations by developing a Spectral Projected Gradient (SPG) descent optimizer which: a) takes into account second order information in selecting step sizes; b) employs a non-monotone step size selection criterion requiring fewer function evaluations; c) is robust to gradient noise, and d) can take quick steps when far away from the optimum.
We show that our proposed SPG -GMKL optimizer can be an order of magnitude faster than projected gradient descent on even small and medium sized datasets. In some cases, SPG -GMKL can even outperform state-of-the-art specialized optimization algorithms developed for a single MKL formulation. Furthermore, we demonstrate that SPG -GMKL can scale well beyond gradient descent to large problems involving a million kernels or half a million data points. Our code and implementation are available publically.
Series This talk is part of the Microsoft Research Machine Learning and Perception Seminars series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Chris Davis' list
- Guy Emerson's list
- Interested Talks
- Large public lecture room, Microsoft Research Ltd, 7 J J Thomson Avenue (Off Madingley Road), Cambridge
- Machine Learning Summary
- Microsoft Research Cambridge, public talks
- Microsoft Research Machine Learning and Perception Seminars
- ML
- ndk22's list
- ob366-ai4er
- Optics for the Cloud
- personal list
- PMRFPS's
- rp587
- School of Technology
- Trust & Technology Initiative - interesting events
- yk373's list
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Manik Varma, Microsoft Research, India
Thursday 19 July 2012, 14:00-15:00