University of Cambridge > > Machine Intelligence Laboratory Speech Seminars > Multi-view Learning of Speech Feature Spaces

Multi-view Learning of Speech Feature Spaces

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Dr Marcus Tomalin.

Many learning tasks (classification, regression, clustering) can be improved when multiple views of the data are available. The meaning of “views” may be a natural one like audio vs. images vs. text, or more abstract like arbitrary subsets of the observation vector. Multi-view learning algorithms, such as co-training, take advantage of the relationships between the views. In this work, we explore two-view learning of feature spaces: Given two views of the training data, we learn a transformation of each view that, in some sense, best predicts the other view. Importantly, we can then apply the learned transformations even when only one view (e.g. audio) is available at test time. For this talk, I will focus on work using canonical correlation analysis (CCA), in which a linear projection of each view is learned, such that the two views’ projections are maximally correlated. I will describe recent experiments showing improvements on clustering tasks (speaker clustering of audio and/or video and topic clustering of Wikipedia pages) and on a speaker identification task. Time permitting, I will describe additional ongoing work in speech and language at TTI -C.

Joint work with Kamalika Chaudhuri (UCSD), Sham Kakade (TTI-C), Karthik Sridharan (TTI-C), and Mark Stoehr (U. Chicago)

This talk is part of the Machine Intelligence Laboratory Speech Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2023, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity