Learn More
Distant-microphone automatic speech recognition (ASR) remains a challenging goal in everyday environments involving multiple background sources and reverberation. This paper is intended to be a reference on the 2nd 'CHiME' Challenge, an initiative designed to analyze and evaluate the performance of ASR systems in a real-world domestic environment. Two(More)
This paper presents a new multiplicative algorithm for non-negative matrix factorization with β-divergence. The derived update rules have a similar form to those of the conventional multiplicative algorithm, only differing through the presence of an exponent term depending on β. The convergence is theoretically proven for any real-valued β based on the(More)
In this paper, we present a simple and fast method to separate a monaural audio signal into harmonic and percussive components, which is much useful for multi-pitch analysis, automatic music transcription, drum detection, modification of music, and so on. Exploiting the differences in the spectro-grams of harmonic and percussive components, the objective(More)
Model-based methods and deep neural networks have both been tremendously successful paradigms in machine learning. In model-based methods, problem domain knowledge can be built into the constraints of the model, typically at the expense of difficulties during inference. In contrast, deterministic deep neural networks are constructed in such a way that(More)
Wiener filtering is one of the most widely used methods in audio source separation. It is often applied on time-frequency representations of signals, such as the short-time Fourier transform (STFT), to exploit their short-term stationarity, but so far the design of the Wiener time-frequency mask did not take into account the necessity for the output(More)
PROBLEM • Goal: Separate speech signal from background noise given a single channel recording of both • Assumption: available training data with ground truths Speech& Background& Mixture& Speech/background& Separa6on& Speech&es6mate& Background&audio& es6mate& +& =& • In the time domain y(τ) = s(τ) + n(τ) • Problem: Given mixed STFT y and given training(More)
This paper proposes a statistical model of speech fundamental frequency (F0) contours, based on the formulation of the discrete-time stochastic process version of the Fujisaki model, which is known as a well-founded mathematical model representing the control mechanism of vocal fold vibration. There are two important motivations for this statistical(More)
The objective of single-channel source separation is to accurately recover source signals from mixtures. Non-negative matrix factorization (NMF) is a popular approach for this task, yet previous NMF approaches have not optimized directly this objective, despite some efforts in this direction. Our paper introduces discriminative training of the NMF basis(More)