Intelligent Single-Channel Methods for Multi-Source Audio Analysis


This thesis investigates the potential of recent machine learning methods for the challenging task of single-channel, multi-source audio audio analysis, i.e., information extraction from single-channel audio where the sources of interest (e.g., speech) are mixed with multiple interfering sources. First, it is shown that source separation by recently proposed techniques for non-negative matrix factorization can significantly improve the recognition performance, compared to the state-of-the-art approach of training the recognition task with multi-source data. Second, it is shown that by formulating the source separation problem itself as a recognition task, state-ofthe-art methods for supervised training of recognition systems such as deep neural network models can be used to achieve previously unseen performance in singlechannel source separation. In this context, supervised training of non-negative models is introduced as well. The task of multi-source recognition as defined above is exemplified by challenging real-world speech separation and recognition problems where speech is mixed with non-stationary background noise such as music, and world-leading results in international evaluation campaigns are demonstrated for this task. Furthermore, state-of-the-art results are presented in selected music information retrieval applications involving polyphonic audio, such as characterizing the singer, or transcribing the music into a score.

60 Figures and Tables

Showing 1-10 of 244 references

Exemplarbased recognition of speech in highly variable noise

  • A. Hurmalainen, K. Mahkonen, J. F. Gemmeke, T. Virtanen
  • 2011
Highly Influential
20 Excerpts

Constant-Q transform toolbox for music processing

  • C. Schörkhuber, A. Klapuri
  • 2010
Highly Influential
6 Excerpts