Randomized tree ensembles: output kernels and variable importances
- π€ Speaker: Pierre Geurts, University of LiΓ¨ge
- π Date & Time: Tuesday 26 November 2013, 10:00 - 11:00
- π Venue: Auditorium, Microsoft Research Ltd, 21 Station Road, Cambridge, CB1 2FB
Abstract
Methods based on ensembles of randomized trees, such as random forests and extremely randomized trees, have been at the origin of many successful applications in various domains, among which computer vision and bioinformatics. The main advantages of these methods include statistical and computational efficiencies, ease of use, flexibility, and interpretability. This talk focuses on two methodological developments around these methods. First, we will present a principled generalization of classification and regression trees to make predictions in a kernel-induced output space. From a sample of both input feature vectors and a Gram matrix of output kernel values, the resulting method, called output kernel trees, learns a model of an output kernel as a function of the input features. This generalization naturally opens tree-based methods to structured output prediction and supervised kernel learning. The practical interest of the method will be illustrated on the problem of supervised graph inference. The second part of the talk will be devoted to variable importances derived from ensembles of randomized trees. Despite growing interest and practical use in various scientific areas, these variable importances are not well understood from a theoretical point of view. In an attempt to fill this gap, we will present a theoretical analysis of the mean decrease impurity variable importances as measured by an ensemble of totally randomized trees in asymptotic conditions. In particular, we demonstrate that the importance of a variable is equal to zero if and only if the variable is irrelevant and that the importance of a relevant variable is invariant with respect to the removal or the addition of irrelevant variables. These properties will be illustrated and we will discuss how they may change in the case of non-totally randomized trees such as random forests and extremely randomized trees.
Series This talk is part of the Microsoft Research Cambridge, public talks series.
Included in Lists
- All Talks (aka the CURE list)
- Auditorium, Microsoft Research Ltd, 21 Station Road, Cambridge, CB1 2FB
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Chris Davis' list
- Guy Emerson's list
- Interested Talks
- Microsoft Research Cambridge, public talks
- ndk22's list
- ob366-ai4er
- Optics for the Cloud
- personal list
- PMRFPS's
- rp587
- School of Technology
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Pierre Geurts, University of Liège
Tuesday 26 November 2013, 10:00-11:00