University of Cambridge > Talks.cam > CUED Speech Group Seminars > Automatic Speech Recognition in a State-of-Flux

Automatic Speech Recognition in a State-of-Flux

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Dr Kate Knill.

Abstract: Initiated by the successful utilization of deep neural network modeling for large vocabulary automatic speech recognition (ASR), the last decade brought a considerable diversification of ASR architectures. Following the classical state-of-the-art hidden Markov model (HMM) based architecture, connectionist temporal classification (CTC), attention-based encoder-decoder, recurrent neural network transducer (RNN-T) and monotonic variants, as well as segmental approaches including inverted HMM architectures were introduced. All these architectures show competitive performance and the question arises, which of these will finally prevail and define the new state-of-the-art in large vocabulary ASR ? In this presentation, a comparative review of current architectures in the context of Bayes decision rule is provided. Relations and equivalences between architectures are derived, utilization of data is considered and especially the role of language modeling within integrated end-to-end architectures will be discussed.

Bio: Ralf Schlüter serves as Academic Director and Lecturer (Privatdozent) in the Department of Computer Science of the Faculty of Computer Science, Mathematics and Natural Sciences at RWTH Aachen University. He leads the Automatic Speech Recognition Group at the Lehrstuhl Informatik 6: Human Language Technology and Pattern Recognition. He studied physics at RWTH Aachen University and Edinburgh University and received his Diploma in Physics (1995), in Computer Science (2000) and Habilitation for Computer Science (2019), each at RWTH Aachen University. Dr. Schlüter works on all aspects of automatic speech recognition and has been leading the scientific work of the Lehrstuhl Informatik 6 in the area of automatic speech recognition in many large national and international research projects, e.g. EU-Bridge and TC-STAR (EU), Babel (US-IARPA) and Quaero (French OSEO ).

This talk is provided through the ISCA International Virtual Seminar Programme.

This talk is part of the CUED Speech Group Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.

 

© 2006-2024 Talks.cam, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity