University of Cambridge > Talks.cam > Microsoft Research Cambridge, public talks > Learning generative models of images by factoring appearance and shape

Learning generative models of images by factoring appearance and shape

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Microsoft Research Cambridge Talks Admins.

Rich prior knowledge of the visual world is crucial for many vision tasks, and much effort has been devoted to formalizing such knowledge for use in computer vision systems and elsewhere. Generative, probabilistic models are an appealing framework for this purpose: they allow us to reason about uncertainty, and importantly, they are particularly amenable to unsupervised learning. Despite considerable efforts however, a comprehensive generative model that can represent image structure at different levels of abstraction and scale and that still allows for efficient inference and learning remains largely elusive.

In my talk I will discuss some steps towards this long-term goal. One hallmark of natural images is the variability of visual characteristics across different image regions and the presence of sharp boundaries between regions which arise from objects occluding each other. Many generative models of generic natural images have difficulties representing this type of structure and I will present a model that addresses this problem. It builds on concepts from the computer vision literature such as the layered representation of images and combines them with ideas from ‘deep’, unsupervised learning. I will first describe the basic building block of the model, the Masked Restricted Boltzmann Machine, which allows occlusion boundaries to be modeled by factoring out the appearance of an image region from its shape. This model also has a natural extension to images of realistic size: the Field of Masked RBMs models an image in terms of a large number of independent small and partially overlapping ‘objects’, each of which has an associated shape and appearance. Finally, I will discuss where this leaves us in the quest for a comprehensive representation of visual structure. I will give an outlook of how the Field of Masked RBMs naturally gives rise to a compositional, hierarchical framework for modeling images at different scales and levels of abstraction, and I will talk about some of the challenges ahead.

Joint work with Nicolas Le Roux, John Winn, and Jamie Shotton

This talk is part of the Microsoft Research Cambridge, public talks series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2019 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity