Learn More
In this paper we present a new technique for monaural source separation in musical mixtures, which uses the knowledge of the musical score. This information is used to initialize an algorithm which computes a parametric decomposition of the spectrogram based on non-negative matrix factorization (NMF). This algorithm provides time-frequency masks which are(More)
Real world sounds often exhibit non-stationary spectral characteristics such as those produced by a harpsichord or a guitar. The classical Non-negative Matrix Factorization (NMF) needs a number of atoms to accurately decompose the spectrogram of such sounds. An extension of NMF is proposed hereafter which includes time-frequency activations based on ARMA(More)
In a number of vibration applications, systems under study are slightly non-linear. It is thus of great importance to have a way to model and to measure these non-linearities in the frequency range of use. Cascade of Hammerstein models conveniently allows one to describe a large class of non-linearities. A simple method based on a phase property of(More)
In this paper, we propose a new method for singing voice detection based on a Bidirectional Long Short-Term Memory (BLSTM) Recurrent Neural Network (RNN). This classifier is able to take a past and future temporal context into account to decide on the presence/absence of singing voice, thus using the inherent sequential aspect of a short-term feature(More)
Real-world sounds often exhibit time-varying spectral shapes, as observed in the spectrogram of a harpsichord tone or that of a transition between two pronounced vowels. Whereas the standard non-negative matrix factorization (NMF) assumes fixed spectral atoms, an extension is proposed where the temporal activations (coefficients of the decomposition on the(More)
In this paper, we present a new method for decomposing musical spectrograms. This method is similar to shift-invariant Probabilistic Latent Component Analysis, but, when the latter works with constant Q spectrograms (i.e. with a logarithmic frequency resolution), our technique is designed to decompose standard short time Fourier transform spectrograms (i.e.(More)
Audio rendering systems are always slightly nonlinear. Their non-linearities must be modeled and measured for quality evaluation and control purposes. Cascade of Hammerstein models describes a large class of non-linearities. To identify the elements of such a model, a method based on a phase property of exponential sine sweeps is proposed. A complete model(More)
We propose in this paper a simple fusion framework for un-derdetermined audio source separation. This framework can be applied to a wide variety of source separation algorithms providing that they estimate time-frequency masks. Fusion principles have been successfully implemented for classification tasks. Although it is similar to classification, audio(More)
In this paper, we present a complete proof that the β-divergence is a particular case of Bregman divergence. This little-known result makes it possible to straightforwardly apply theorems about Bregman divergences to β-divergences. This is of interest for numerous applications since these divergences are widely used, for instance in(More)
In this paper, we present a new method to perform underdetermined audio source separation using a spoken or sung reference signal to inform the separation process. This method explicitly models possible differences between the spoken reference and the target signal, such as pitch differences and time lag. We show that the proposed algorithm outperforms(More)