One Microphone Source Separation

Abstract

Source separation, or computational auditory scene analysis, attempts to extract individual acoustic objects from input which contains a mixture of sounds from different sources, altered by the acoustic environment. Unmixing algorithms such as ICA and its extensions recover sources by reweighting multiple observation sequences, and thus cannot operate when only a single observation signal is available. I present a technique called refiltering which recovers sources by a nonstationary reweighting (" masking ") of frequency sub-bands from a single recording, and argue for the application of statistical algorithms to learning this masking function. I present results of a simple factorial HMM system which learns on recordings of single speakers and can then separate mixtures using only one observation signal by computing the masking function and then refiltering. Imagine listening to many pianos being played simultaneously. If each pianist were striking keys randomly it would be very difficult to tell which note came from which piano. But if each were playing a coherent song, separation would be much easier because of the structure of music. Now imagine teaching a computer to do the separation by showing it many musical scores as " training data ". Typical auditory perceptual input contains a mixture of sounds from different sources, altered by the acoustic environment. Any biological or artificial hearing system must extract individual acoustic objects or streams in order to do successful localization, denoising and recognition. Bregman [1] called this process auditory scene analysis in analogy to vision. Source separation, or computational auditory scene analysis (CASA) is the practical realization of this problem via computer analysis of microphone recordings and is very similar to the musical task described above. It has been investigated by research groups with different emphases. The CASA community have focused on both multiple and single microphone source separation problems under highly realistic acoustic conditions, but have used almost exclusively hand designed systems which include substantial knowledge of the human auditory system and its psychophysical characteristics (e.g. [2,3]). Unfortunately, it is difficult to incorporate large amounts of detailed statistical knowledge about the problem into such an approach. On the other hand, machine learning researchers, especially those working on independent components analysis (ICA) and related algorithms, have focused on the case of multiple microphones in simplified mixing environments and have used powerful " blind " statistical techniques. These " unmixing " algorithms (even those which attempt to recover more sources than signals) …

Extracted Key Phrases

Showing 1-10 of 264 extracted citations

Statistics

0204060'01'03'05'07'09'11'13'15'17
Citations per Year

437 Citations

Semantic Scholar estimates that this publication has received between 357 and 538 citations based on the available data.

See our FAQ for additional information.