University of Cambridge > Talks.cam > Microsoft Research Cambridge, public talks > Randomized tree ensembles: output kernels and variable importances

Randomized tree ensembles: output kernels and variable importances

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Microsoft Research Cambridge Talks Admins.

This event may be recorded and made available internally or externally via http://research.microsoft.com. Microsoft will own the copyright of any recordings made. If you do not wish to have your image/voice recorded please consider this before attending

Methods based on ensembles of randomized trees, such as random forests and extremely randomized trees, have been at the origin of many successful applications in various domains, among which computer vision and bioinformatics. The main advantages of these methods include statistical and computational efficiencies, ease of use, flexibility, and interpretability. This talk focuses on two methodological developments around these methods. First, we will present a principled generalization of classification and regression trees to make predictions in a kernel-induced output space. From a sample of both input feature vectors and a Gram matrix of output kernel values, the resulting method, called output kernel trees, learns a model of an output kernel as a function of the input features. This generalization naturally opens tree-based methods to structured output prediction and supervised kernel learning. The practical interest of the method will be illustrated on the problem of supervised graph inference. The second part of the talk will be devoted to variable importances derived from ensembles of randomized trees. Despite growing interest and practical use in various scientific areas, these variable importances are not well understood from a theoretical point of view. In an attempt to fill this gap, we will present a theoretical analysis of the mean decrease impurity variable importances as measured by an ensemble of totally randomized trees in asymptotic conditions. In particular, we demonstrate that the importance of a variable is equal to zero if and only if the variable is irrelevant and that the importance of a relevant variable is invariant with respect to the removal or the addition of irrelevant variables. These properties will be illustrated and we will discuss how they may change in the case of non-totally randomized trees such as random forests and extremely randomized trees.

This talk is part of the Microsoft Research Cambridge, public talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity