University of Cambridge > > Signal Processing and Communications Lab Seminars > Itakura-Saito nonnegative factorizations of the power spectrogram for music signal decomposition

Itakura-Saito nonnegative factorizations of the power spectrogram for music signal decomposition

Add to your list(s) Download to your calendar using vCal

If you have a question about this talk, please contact Rachel Fogg.

Nonnegative matrix factorization (NMF) is a popular linear regression technique in the fields of machine learning and signal/image processing. Much research about this topic has been driven by applications in audio. NMF has been for example applied with success to automatic music transcription and audio source separation, where the data is usually taken as the magnitude spectrogram of the sound signal, and the Euclidean distance or Kullback-Leibler divergence are used as measures of fit between the original spectrogram and its approximate factorization.

After a brief overview of NMF , in this presentation we will show evidence of the relevance of considering factorization of the power spectrogram, with the Itakura-Saito (IS) divergence. Indeed, IS-NMF is shown to be connected to maximum likelihood inference of variance parameters in a well-defined statistical model of superimposed Gaussian components and this model is in turn shown to be well suited to audio. Furthermore, the statistical setting opens doors to Bayesian approaches and to a variety of computational inference techniques. We discuss in particular model order selection strategies and Markov regularization of the activation matrix, to account for time-persistence in audio.

This presentation will also adress extensions of NMF to the multichannel case, in both instantaneous or convolutive recordings, possibly underdetermined, leading to nonnegative tensor factorizations under novel structures. We will present in particular audio source separation results of real-world stereo musical excerpts.

References :

C. Févotte, N. Bertin and J.-L. Durrieu. “Nonnegative matrix factorization with the Itakura-Saito divergence. With application to music analysis,” Neural Computation, vol. 21, no 3, Mar. 2009

A. Ozerov and C. Févotte. “Multichannel nonnegative matrix factorization in convolutive mixtures for audio source separation,” IEEE Trans. Audio, Speech and Language Processing, 2010 (to appear)

This talk is part of the Signal Processing and Communications Lab Seminars series.

Tell a friend about this talk:

This talk is included in these lists:

Note that ex-directory lists are not shown.


© 2006-2017, University of Cambridge. Contact Us | Help and Documentation | Privacy and Publicity