University of Cambridge > Talks.cam > CUED Speech Group Seminars > General teacher-student learning for automatic speech recognition

General teacher-student learning for automatic speech recognition

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Anton Ragni.

Teacher-student learning is a general framework that can be used to transfer knowledge from one or more models to another. This has found various applications in the field of automatic speech recognition, to perform tasks such as compressing a large model or ensemble of models, and domain adaptation. In its standard form, teacher-student learning propagates information from one or more teacher models to a student model, by minimising the KL-divergence between their per-frame state-cluster posterior distributions, at the Neural Network (NN) outputs. This form of teacher-student learning is limited in two aspects. First, only frame-level posterior information is propagated from the teachers to the student. This form of information may not effectively capture the sequential nature of speech data, or the interactions between the acoustic, alignment, and language models. Second, all models are required to use the same set of state clusters. This in turn requires that all models must also use the same set of sub-word units, Hidden Markov Model (HMM) alignment model topology, context-dependency, and language model. Furthermore, all models are required to use the NN-HMM topology. This restricts the situations for which teacher-student learning may be applied. In particular, the allowed forms of diversity are limited within an ensemble that can be compressed using teacher-student learning. This talk presents several proposals to generalise the teacher-student learning framework to overcome these limitations. Different sets of state cluster can be allowed between the teacher and student models, by minimising the KL-divergence between per-frame logical context-dependent state posteriors. The sequential nature of speech data can be taken into account by using sequence-level criteria. These sequence-level criteria can potentially also remove all restrictions on the required topological similarities between the teacher and student models.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2020 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity