Log in

Cambridge users (raven) details

Other users details

No account? details

Information on

Subscribing to talks details

Finding a talk details

Adding a talk details

Disseminating talks details

Help and Documentation details

Multi-Head State Space Model for Sequence Modeling

Add to your list(s) Download to your calendar using vCal

Yassir Fathullah, Speech Group, Cambridge University Engineering Department
Tuesday 11 October 2022, 15:00-16:00
Hybrid: LT6, First floor Baker building, Engineering Dept or Zoom: https://eng-cam.zoom.us/j/81927138251?pwd=TVd3MXliV003dUdYVlFwU2NDWGpmdz09.

If you have a question about this talk, please contact Dr Kate Knill.

Recently, state space models (SSMs) have shown promising results on sequence modeling tasks. However, a potential challenge of existing works is that SSMs are usually introduced or initialized in a homogeneous way, encouraging the model to only capture similar temporal dynamics on different features. In this talk, we propose a multi-head state space model (MSSM), in which parallel heads are introduced to learn different temporal dynamics on sequence data. Furthermore, we propose a novel variant of the Transformer, referred to as the Stateformer, which combines MSS Ms with attention. Experiments on large-scale automatic speech recognition (ASR) and language modeling tasks show the MSSM outperforming a range of attention-based baselines. The Stateformer further improves performance, achieving the state-of-the-art performance on the LibriSpeech ASR task.

Research performed in Research Internship at Meta (AI Speech), California.

This talk is part of the CUED Speech Group Seminars series.

This talk is included in these lists:

Note that ex-directory lists are not shown.

Log in

Information on

Multi-Head State Space Model for Sequence Modeling

This talk is included in these lists:

Other lists

Other talks