Learn More
In this paper we present a methodology for analyzing polyphonic musical passages comprised by notes that exhibit a harmonically fixed spectral profile (such as piano notes). Taking advantage of this unique note structure we can model the audio content of the musical passage by a linear basis transform and use non-negative matrix decomposition methods to(More)
In this paper we employ information theoretic algorithms, previously used for separating instantaneous mixtures of sources, for separating convolved mixtures in the frequency domain. It is observed that convolved mixing in the time domain corresponds to instantaneous mixing in the frequency domain. Such mixing can be inverted using simpler and more robust(More)
In this paper we describe a model developed for the analysis of acoustic spectra. Unlike decom-positions techniques that can result in difficult to interpret results this model explicitly models spectra as distributions and extracts sets of additive and semantically useful components that facilitate a variety of applications ranging from source separation,(More)
In this paper we describe a technique that allows the extraction of multiple local shift-invariant features from analysis of non-negative data of arbitrary dimensionality. Our approach employs a probabilistic latent variable model with sparsity constraints. We demonstrate its utility by performing feature extraction in a variety of domains ranging from(More)
In this paper, we present a convolutive basis decomposition method and its application on simultaneous speakers separation from monophonic recordings. The model we propose is a convolutive version of the nonnegative matrix factorization algorithm. Due to the nonnegativity constraint this type of coding is very well suited for intuitively and efficiently(More)
In this paper we present an extension to the Non-Negative Matrix Factorization algorithm which is capable of identifying components with temporal structure. We demonstrate the use of this algorithm in the magnitude spectrum domain, where we employ it to perform extraction of multiple sound objects from a single channel auditory scene. This work may not be(More)
Perceived-pitch tracking of potentially aperiodic sounds, as well as pitch tracking of multiple simultaneous sources, is shown to be feasible using a probabilistic methodology. The use of a shift-invariant representation in the constant-Q domain allows the modeling of perceived pitch changes as vertical shifts of spectra. This enables the tracking of these(More)
In this paper we describe a methodology for model-based single channel separation of sounds. We present a sparse latent variable model that can learn sounds based on their distribution of time/frequency energy. This model can then be used to extract known types of sounds from mixtures in two scenarios. One being the case where all sound types in the mixture(More)
In this paper we present an algorithm for the separation of multiple speakers from mixed single-channel recordings by latent variable decomposition of the speech spectrogram. We model each magnitude spectral vector in the short-time Fourier transform of a speech signal as the outcome of a discrete random process that generates frequency bin indices. The(More)
Separating singing voices from music accompaniment is an important task in many applications, such as music information retrieval, lyric recognition and alignment. Music accompaniment can be assumed to be in a low-rank subspace, because of its repetition structure; on the other hand, singing voices can be regarded as relatively sparse within songs. In this(More)