Automatic Music Transcription and Audio Source Separation


2 In this article, we give an overview of a range of approaches to the analysis and separation of musical audio. In particular, we consider the problems of automatic music transcription and audio source separation, which are of particular interest to our group. Monophonic music transcription, where a single note is present at one time, can be tackled using an autocorrelation-based method. For polyphonic music transcription, with several notes at any time, other approaches can be used, such as a blackboard model or a multiple-cause/sparse coding method. The latter is based on ideas and methods related to independent component analysis (ICA), a method for sound source separation. scene analysis 3 Over the last decade or so, and particularly since the publication of Bregman's seminal book on Auditory Scene Analysis (Bregmann 1990), there has been an increasing interest in the problem of Computational Auditory Scene Analysis (CASA): how to design computer-based models that can analyze an auditory scene. Imagine you are standing in a busy street among a crowd of people. You can hear traffic noise, footsteps of people nearby, the bleeping of a pedestrian crossing, your mobile phone ringing, and colleagues behind you having a conversation. Despite all these different sound sources, you have a pretty good idea of what is going on around you. It is more than just a mess of overlapping noise, and if you try hard you can concentrate on one of these sources if it is important to you (such as the conversation behind you). This has proved to be a very difficult problem. It requires both separation of many sound sources, and analysis of the content of these sources. However, a few authors have begun to tackle this problem in recent years, with some success (see e.g. Ellis 1996). One particular aspect of auditory scene analysis of interest to our group is automatic music transcription. Here, the sound sources are one or more instruments playing a piece of music, and we wish to analyze this to identify the instruments that are playing, and when and for how long each note is played. From this analysis we should then be able to produce a written musical score that shows notes and the duration of each on a written conventional music notation (for conventional western music, at least). In principle, this musical score could then be used to recreate the musical piece that was played. We …

DOI: 10.1080/01969720290040777
Showing 1-10 of 31 references

Information theoretic approaches to source separation. Master's thesis

  • P Smaragdis
  • 1997
Highly Influential
3 Excerpts

Sparse coding of music signals

  • S A Abdallah, M D Plumbley
  • 2001
1 Excerpt

Blind separation of more sources than mixtures using sparsity of their short-term Fourier transform

  • P Bofill, M Zibulevski
  • 2000
1 Excerpt

Towards musical instrument separation using multiple-cause neural networks

  • J Klingseisen, M D Plumbley
  • 2000
1 Excerpt
Showing 1-10 of 51 extracted citations


Citations per Year

88 Citations

Semantic Scholar estimates that this publication has received between 56 and 142 citations based on the available data.

See our FAQ for additional information.