Multi-Head State Space Model for Sequence Modeling
- đ¤ Speaker: Yassir Fathullah, Speech Group, Cambridge University Engineering Department
- đ Date & Time: Tuesday 11 October 2022, 15:00 - 16:00
- đ Venue: Hybrid: LT6, First floor Baker building, Engineering Dept or Zoom: https://eng-cam.zoom.us/j/81927138251?pwd=TVd3MXliV003dUdYVlFwU2NDWGpmdz09
Abstract
Recently, state space models (SSMs) have shown promising results on sequence modeling tasks. However, a potential challenge of existing works is that SSMs are usually introduced or initialized in a homogeneous way, encouraging the model to only capture similar temporal dynamics on different features. In this talk, we propose a multi-head state space model (MSSM), in which parallel heads are introduced to learn different temporal dynamics on sequence data. Furthermore, we propose a novel variant of the Transformer, referred to as the Stateformer, which combines MSS Ms with attention. Experiments on large-scale automatic speech recognition (ASR) and language modeling tasks show the MSSM outperforming a range of attention-based baselines. The Stateformer further improves performance, achieving the state-of-the-art performance on the LibriSpeech ASR task.
Research performed in Research Internship at Meta (AI Speech), California.
Series This talk is part of the CUED Speech Group Seminars series.
Included in Lists
- Cambridge Forum of Science and Humanities
- Cambridge Language Sciences
- Cambridge talks
- Chris Davis' list
- CUED Speech Group Seminars
- Guy Emerson's list
- Hybrid: LT6, First floor Baker building, Engineering Dept or Zoom: https://eng-cam.zoom.us/j/81927138251?pwd=TVd3MXliV003dUdYVlFwU2NDWGpmdz09
- Information Engineering Division seminar list
- PhD related
Note: Ex-directory lists are not shown.
![[Talks.cam]](/static/images/talkslogosmall.gif)


Tuesday 11 October 2022, 15:00-16:00