University of Cambridge > Talks.cam > Microsoft Research Machine Learning and Perception Seminars > Multi-Label Learning with Millions of Categories & Generalized Multiple Kernel Learning with a Million Kernels

Multi-Label Learning with Millions of Categories & Generalized Multiple Kernel Learning with a Million Kernels

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Microsoft Research Cambridge Talks Admins.

This event may be recorded and made available internally or externally via http://research.microsoft.com. Microsoft will own the copyright of any recordings made. If you do not wish to have your image/voice recorded please consider this before attending

Multi-Label Learning with Millions of Categories

Our objective is to build an algorithm for classifying a data point into a set of labels when the output space contains millions of categories. This is a relatively novel setting in supervised learning and brings forth interesting challenges such as efficient training and prediction, learning from only positively labeled data with missing and incorrect labels and handling label correlations. We propose a random forest based solution for jointly tackling these issues. We develop a novel extension of random forests for multi-label classification which can learn from positive data alone and can scale to large data sets. We generate real valued beliefs indicating the state of labels and adapt our classifier to train on these belief vectors so as to compensate for missing and noisy labels. In addition, we modify the random forest cost function to avoid overfitting in high dimensional feature spaces and learn short, balanced trees. Finally, we write highly efficient training routines which let us train on problems with more than a hundred million data points, over a million dimensional sparse feature vector and over ten million categories. Extensive experiments reveal that our proposed solution is not only significantly better than other multi-label classification algorithms but also more than 10\% better than the state-of-the-art NLP based techniques for suggesting bid phrases for online search advertisers.

Generalized Multiple Kernel Learning with a Million Kernels

Multiple Kernel Learning (MKL) aims to learn the kernel in an SVM from training data. Many MKL formulations have been proposed and some have proved effective in certain applications. Nevertheless, as MKL is a nascent field, many more formulations need to be developed to generalize across domains and meet the challenges of real world applications. However, each MKL formulation typically necessitates the development of a specialized optimization algorithm. The lack of an efficient, general purpose optimizer capable of handling a wide range of formulations presents a significant challenge to those looking to take MKL out of the lab and into the real world.

This problem was somewhat alleviated by the development of the Generalized Multiple Kernel Learning (GMKL) formulation which admits fairly general kernel parameterizations and regularizers subject to mild constraints. However, the projected gradient descent GMKL optimizer is inefficient as the computation of the step size and a reasonably accurate objective function value or gradient direction are all expensive. We overcome these limitations by developing a Spectral Projected Gradient (SPG) descent optimizer which: a) takes into account second order information in selecting step sizes; b) employs a non-monotone step size selection criterion requiring fewer function evaluations; c) is robust to gradient noise, and d) can take quick steps when far away from the optimum.

We show that our proposed SPG -GMKL optimizer can be an order of magnitude faster than projected gradient descent on even small and medium sized datasets. In some cases, SPG -GMKL can even outperform state-of-the-art specialized optimization algorithms developed for a single MKL formulation. Furthermore, we demonstrate that SPG -GMKL can scale well beyond gradient descent to large problems involving a million kernels or half a million data points. Our code and implementation are available publically.

This talk is part of the Microsoft Research Machine Learning and Perception Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2019 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity