Learn More
The aims of the SpeechDat-Car project are to develop a set of speech databases to support training and testing of multilingual speech recognition applications in the car environment. As a result, a total of ten (10) equivalent and similar resources will be created. The 10 languages are Danish, each language 600 sessions will be recorded (from at least 300(More)
This paper introduces a fast implementation of the power iteration method for subspace tracking, based on an approximation that is less restrictive than the well-known projection approximation. This algorithm, referred to as the fast approximated power iteration (API) method, guarantees the orthonormality of the subspace weighting matrix at each iteration.(More)
Extracting the main melody from a polyphonic music recording seems natural even to untrained human listeners. To a certain extent it is related to the concept of source separation, with the human ability of focusing on a specific source in order to extract relevant information. In this paper, we propose a new approach for the estimation and extraction of(More)
We propose a statistical learning approach for the automatic detection of vocal regions in a polyphonic musical signal. A support vector model, based on a large feature set, is employed to discriminate accompanied singing voice from pure instrumental regions. We propose a temporal smoothing of the posterior probabilities with a hidden Markov model that(More)
Music signal processing may appear to be the junior relation of the large and mature field of speech signal processing, not least because many techniques and representations originally developed for speech have been applied to music, often with good results. However, music signals possess specific acoustic and structural characteristics that distinguish(More)
The purpose of this article is to present new advances in music transcription and source separation with a focus on drum signals. A complete drum transcription system is described, which combines information from the original music signal and a drum track enhanced version obtained by source separation. In addition to efficient fusion strategies to take into(More)
In this paper, we introduce the use of conditional random fields (CRFs) for the audio-to-score alignment task. This framework encompasses the statistical models which are used in the literature and allows for more flexible dependency structures. In particular, it allows observation functions to be computed from several analysis frames. Three different CRF(More)
We address the issue of underdetermined source separation in a particular informed configuration where both the sources and the mixtures are known during a so-called encoding stage. This knowledge enables the computation of a side-information which is small enough to be inaudibly embedded into the mixtures. At the decoding stage, the sources are no longer(More)
The present research deals with audio events detection in noisy environments for a multimedia surveillance application. In surveillance or homeland security most of the systems aiming to automatically detect abnormal situations are only based on visual clues while, in some situations, it may be easier to detect a given event using the audio information.(More)
When designing an audio processing system, the target tasks often influence the choice of a data representation or transformation. Low-level time-frequency representations such as the short-time Fourier transform (STFT) are popular, because they offer a meaningful insight on sound properties for a low computational cost. Conversely, when higher level(More)