Sparse Representations in Audio and Music: From Coding to Source Separation

@article{Plumbley2010SparseRI,
  title={Sparse Representations in Audio and Music: From Coding to Source Separation},
  author={Mark D. Plumbley and Thomas Blumensath and Laurent Daudet and R{\'e}mi Gribonval and Mike E. Davies},
  journal={Proceedings of the IEEE},
  year={2010},
  volume={98},
  pages={995-1005}
}
Sparse representations have proved a powerful tool in the analysis and processing of audio signals and already lie at the heart of popular coding standards such as MP3 and Dolby AAC. In this paper we give an overview of a number of current and emerging applications of sparse representations in areas from audio coding, audio enhancement and music transcription to blind source separation solutions that can solve the ¿cocktail party problem.¿ In each case we will show how the prior assumption that… 

Figures from this paper

Informed Audio Source Separation from Compressed Linear Stereo Mixtures
TLDR
This paper uses a MPEG-AAC codec and shows that the ISS process is quite robust to compression, opening the way for ''real-world'' karaoke/soloing/remixing applications for downloadable music.
Methods of Single-Channel Music Source Separation
Music source separation refers to the process of recovering original music sources from a mixture of two or more musical sound sources. Although music source separation is important even when the
Sparse denoising of audio by greedy time-frequency shrinkage
  • Gautam Bhattacharya, P. Depalle
  • Computer Science, Engineering
    2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2014
TLDR
This work presents an analysis of MP in the context of audio denoising, by interpreting the algorithm as a simple shrinkage approach, and proposes several approaches to improve its performance and robustness.
Audio Denoising by Generalized Time-Frequency Thresholding
TLDR
In audio processing, different collections of windowed Fourier or cosine bases have proven to serve as well adapted dictionaries for most audio signals of relevance for humans, in particular speech and music.
Sparse and structured decomposition of audio signals on hybrid dictionaries using musical priors.
TLDR
Evaluation on monophonic and complex polyphonic excerpts of real music signals shows that the proposed approach provides results whose quality measured by the signal-to-noise ratio is competitive with state-of-the-art approaches, and more coherent with the semantic content of the signal.
A Multichannel Audio Denoising Formulation Based on Spectral Sparsity
  • I. Bayram
  • Computer Science
    IEEE/ACM Transactions on Audio, Speech, and Language Processing
  • 2015
TLDR
This work considers the estimation of an audio source from multiple noisy observations, where the correlation between noise in the different observations is low and proposes a two-stage method that assumes that the signal of interest has a sparse time-frequency representation.
"Sparsification" of Audio Signals Using the MDCT/IntMDCT and a Psychoacoustic Model - Application to Informed Audio Source Separation
TLDR
This paper revisits the irrelevance filtering analysis-synthesis approach and applies the sparsification process to the informed source separation (ISS) problem and shows that it enables to significantly decrease the computational cost at the ISS decoder.
Lossy audio signal compression via structured sparse decomposition and compressed sensing
TLDR
A least absolute shrinkage and selection operator (LASSO) is employed to sparse and structured decompose the audio signals into tonal and transient layers, and then, both resulting layers are compressed by a CS method.
Investigating the Potential of Pseudo Quadrature Mirror Filter-Banks in Music Source Separation Tasks
TLDR
This work investigates the potential of an optimized pseudo quadrature mirror filter-bank (PQMF), as a T-F representation for music source separation tasks and suggests that the PQMF maintains the aforementioned desirable properties and can be regarded as an alternative for representing mixtures of musical signals.
An overview of informed audio source separation
TLDR
In recent years, much research has focused on informed separation, which consists in using additional available information about the sources to improve the separation quality.
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 66 REFERENCES
Hybrid representations for audiophonic signal encoding
Audio Signal Representations for Indexing in the Transform Domain
TLDR
This new audio codec allows efficient transform-domain audio indexing for three different applications, namely beat tracking, chord recognition, and musical genre classification and is compared with the two standard MP3 and AAC codecs in terms of performance and computation time.
Sparse Overcomplete Decomposition for Single Channel Speaker Separation
TLDR
An algorithm for separating multiple speakers from a mixed single channel recording based on a model proposed by Raj and Smaragdis (2005) and a probabilistic framework to achieve sparsity is proposed.
Sparse representations of polyphonic music
Low Bit-Rate Object Coding of Musical Audio Using Bayesian Harmonic Models
TLDR
This work proposes a family of probabilistic signal models combining learned object priors and various perceptually motivated distortion measures for very low bit-rate coding purposes and designs efficient algorithms to infer object parameters and builds a coder based on the interpolation of frequency and amplitude parameters.
Audio source separation with a single sensor
TLDR
This paper addresses the problem of audio source separation with one single sensor, using a statistical model of the sources, based on a learning step from samples of each source separately, during which Gaussian scaled mixture models (GSMM) are trained.
Union of MDCT Bases for Audio Coding
TLDR
This paper investigates the use of sparse overcomplete decompositions for audio coding by using a bitplane encoding approach, which provides a fine-grain scalable coder that can seamlessly operate from very low bitrates up to transparency.
Underdetermined blind source separation using sparse representations
OBJECT CODING OF HARMONIC SOUNDS USING SPARSE AND STRUCTURE D REPRESENTATIONS
TLDR
A novel object-based coding is presented, which allows the computation of objects in a reasonable computational time, and appears to perform better than transform and parametric coders onsolo or duo of harmonic instruments at 8kbit/s and 2 k bit/s.
Blind separation of dependent sources using the "time-frequency ratio of mixtures" approach
  • F. Abrard, Y. Deville
  • Computer Science
    Seventh International Symposium on Signal Processing and Its Applications, 2003. Proceedings.
  • 2003
TLDR
The principles of the TIFROM approach are recalled and it is shown that, unlike independent component analysis methods, this approach can separate dependent signals, provided there exist some areas in the time-frequency plane where only one source occurs.
...
1
2
3
4
5
...