Thèse De Doctorat

Abstract

Given an audio signal that is a mixture of several sources, such as a music piece with several instruments, or a radio interview with several speakers, singlechannel audio source separation aims at recovering each of the source signals when the mixture signal is recorded with only one microphone. Since there are less sensors (one microphone) than sources (several sources), there is a priori an infinite number of solutions to this problem that are not related to the original source signals. They key ingredient in single-channel audio source separation is to decide what kind of additional information must be provided to disambiguate the problem. In the last decade, nonnegative matrix factorization (NMF) has become a major building block in source separation. In lay man’s term, NMF consists in learning to describe a collection of audio signals as linear combinations of typical atoms forming a dictionary. Source separation algorithms are then built on the idea that each atom can be assigned unambiguously to a source. However, since the dictionary is learnt on a mixture of several sources, there is no guarantee that each atom corresponds to one source rather than of a mixture of them : put in other words, the dictionary atoms are not interpretable a priori, they must be made so, using additional information in the learning process. In this thesis we provide three main contributions to blind source separation methods based on NMF. Our first contribution is a group-sparsity inducing penalty specifically tailored for Itakura-Saito NMF : in many music tracks, there are whole intervals where at least one source is inactive. The group-sparsity penalty we propose allows identifying these intervals blindly and learn source specific dictionaries. As a consequence, those learned dictionaries can be used to do source separation in other parts of the track were several sources are active. These two tasks of identification and separation are performed simultaneously in one run of group-sparsity Itakura-Saito NMF. Our second contribution is an online algorithm for Itakura-Saito NMF that allows learning dictionaries on very large audio tracks. Indeed, the memory complexity of a batch implementation NMF grows linearly with the length of the recordings and becomes prohibitive for signals longer than an hour. In contrast, our online algorithm is able to learn NMF on arbitrarily long signals with limited memory usage. Our third contribution deals with user informed NMF. In short mixed signals, blind learning becomes very hard and sparsity do not retrieve interpretable dictionaries. Our contribution is very similar in spirit to inpainting. It relies on the empirical fact that, when observing the spectrogram of a mixture signal, an overwhelming proportion of it consists in regions where only one source is active. We describe an extension of NMF to take into account time-frequency localized v te l-0 07 97 09 3, v er si on 1 5 M ar 2 01 3

54 Figures and Tables

Cite this paper

@inproceedings{Lefvre2013ThseDD, title={Th{\`e}se De Doctorat}, author={Augustin Lef{\`e}vre}, year={2013} }