Learn More
Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE. Furthermore, this paper evaluates(More)
Distant-microphone automatic speech recognition (ASR) remains a challenging goal in everyday environments involving multiple background sources and reverberation. This paper is intended to be a reference on the 2nd 'CHiME' Challenge, an initiative designed to analyze and evaluate the performance of ASR systems in a real-world domestic environment. Two(More)
This paper presents a new multiplicative algorithm for non-negative matrix factorization with β-divergence. The derived update rules have a similar form to those of the conventional multiplicative algorithm, only differing through the presence of an exponent term depending on β. The convergence is theoretically proven for any real-valued β based on the(More)
In this paper, we present a simple and fast method to separate a monaural audio signal into harmonic and percussive components, which is much useful for multi-pitch analysis, automatic music transcription, drum detection, modification of music, and so on. Exploiting the differences in the spectro-grams of harmonic and percussive components, the objective(More)
Model-based methods and deep neural networks have both been tremendously successful paradigms in machine learning. In model-based methods, we can easily express our problem domain knowledge in the constraints of the model at the expense of difficulties during inference. Deterministic deep neural networks are constructed in such a way that inference is(More)
Wiener filtering is one of the most widely used methods in audio source separation. It is often applied on time-frequency representations of signals, such as the short-time Fourier transform (STFT), to exploit their short-term stationarity, but so far the design of the Wiener time-frequency mask did not take into account the necessity for the output(More)
This paper proposes a statistical model of speech fundamental frequency (F0) contours, based on the formulation of the discrete-time stochastic process version of the Fujisaki model, which is known as a well-founded mathematical model representing the control mechanism of vocal fold vibration. There are two important motivations for this statistical(More)
This paper presents a nonparametric Bayesian extension of non-negative matrix factorization (NMF) for music signal analysis. Instrument sounds often exhibit non-stationary spectral characteristics. We introduce infinite-state spectral bases into NMF to represent time-varying spectra in polyphonic music signals. We describe our extension of NMF with(More)
With the advancement of technology, both assisted listening devices and speech communication devices are becoming more portable and also more frequently used. As a consequence, the users of devices such as hearing aids, cochlear implants, and mobile telephones, expect their devices to work robustly anywhere and at any time. This holds in particular for(More)