Learning generative models of images by factoring appearance and shape
- 👤 Speaker: Nicolas Heess
- 📅 Date & Time: Tuesday 29 March 2011, 09:30 - 11:30
- 📍 Venue: Small lecture theatre, Microsoft Research Ltd, 7 J J Thomson Avenue (Off Madingley Road), Cambridge
Abstract
Rich prior knowledge of the visual world is crucial for many vision tasks, and much effort has been devoted to formalizing such knowledge for use in computer vision systems and elsewhere. Generative, probabilistic models are an appealing framework for this purpose: they allow us to reason about uncertainty, and importantly, they are particularly amenable to unsupervised learning. Despite considerable efforts however, a comprehensive generative model that can represent image structure at different levels of abstraction and scale and that still allows for efficient inference and learning remains largely elusive.
In my talk I will discuss some steps towards this long-term goal. One hallmark of natural images is the variability of visual characteristics across different image regions and the presence of sharp boundaries between regions which arise from objects occluding each other. Many generative models of generic natural images have difficulties representing this type of structure and I will present a model that addresses this problem. It builds on concepts from the computer vision literature such as the layered representation of images and combines them with ideas from ‘deep’, unsupervised learning. I will first describe the basic building block of the model, the Masked Restricted Boltzmann Machine, which allows occlusion boundaries to be modeled by factoring out the appearance of an image region from its shape. This model also has a natural extension to images of realistic size: the Field of Masked RBMs models an image in terms of a large number of independent small and partially overlapping ‘objects’, each of which has an associated shape and appearance. Finally, I will discuss where this leaves us in the quest for a comprehensive representation of visual structure. I will give an outlook of how the Field of Masked RBMs naturally gives rise to a compositional, hierarchical framework for modeling images at different scales and levels of abstraction, and I will talk about some of the challenges ahead.
Joint work with Nicolas Le Roux, John Winn, and Jamie Shotton
Series This talk is part of the Microsoft Research Cambridge, public talks series.
Included in Lists
- All Talks (aka the CURE list)
- bld31
- Cambridge Centre for Data-Driven Discovery (C2D3)
- Cambridge talks
- Chris Davis' list
- Guy Emerson's list
- Interested Talks
- Microsoft Research Cambridge, public talks
- ndk22's list
- ob366-ai4er
- Optics for the Cloud
- personal list
- PMRFPS's
- rp587
- School of Technology
- Small lecture theatre, Microsoft Research Ltd, 7 J J Thomson Avenue (Off Madingley Road), Cambridge
- Trust & Technology Initiative - interesting events
- yk449
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)

Nicolas Heess
Tuesday 29 March 2011, 09:30-11:30