University of Cambridge > Talks.cam > CUED Computer Vision Research Seminars > Learning visual representations

Learning visual representations

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Rachel Fogg.

Learnable representations, and deep convolutional neural networks (CNNs) in particular, have become the preferred way of extracting visual features for image understanding tasks, from object recognition and detection to semantic segmentation.

In this talk I will discuss several recent advances in deep representations for computer vision. First, I will review modern CNN architectures and their training. Then, I will illustrate state-of-the-art networks using an example in text spotting. In this example I will show that, by using only synthetic data and a sufficiently large deep model, it is possible to learn to directly map image regions to entire words, effectively training 90k image classifiers that achieve state-of-the-art text spotting performance. I will also briefly touch on other applications of deep learning in object recognition and discuss feature universality and transfer learning.

In the last part of the talk I will move to the problem of understanding deep networks, which remain, by and large, black boxes, presenting two recent results. The first one are visualisation techniques to investigate which visual information is retained or learned by visual representations. The second one is a method that allows investigating how geometric transformations are represented in a CNN , as well as establishing whether two CNNs, learned on different tasks, are in fact equivalent.

This talk is part of the CUED Computer Vision Research Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity