Learn More
This paper describes systems for the enhancement and recognition of distant speech recorded in reverberant rooms. Our speech enhancement (SE) system handles reverberation with blind deconvolution using linear filtering estimated by exploiting the temporal correlation of observed reverberant speech signals. Additional noise reduction is then performed using(More)
In this paper, we introduce a system for recognizing speech in the presence of multiple rapidly time-varying noise sources. The main components of the proposed approach are a model-based speech enhancement pre-processor and an adaptation technique to optimize the integration between the pre-processor and the recognizer. The speech enhancement pre-processor(More)
In this paper, two weighted distance measures; the weighted K-L divergence and the Bayesian criterion-based distance measure are proposed to efficiently reduce the Gaussian mixture components in the HMM-based acoustic model. Conventional distance measures such as the K-L divergence and the Bhattacharyya distance consider only distribution parameters (i.e.(More)
This paper addresses error type classification in continuous speech recognition (CSR). In CSR, errors are classified into three types, namely, the substitution, insertion and deletion errors, by making an alignment between a recognized word sequence and its reference transcription with a dynamic programming (DP) procedure. We propose a method for deriving(More)
Recurrent neural networks (RNNs) have recently been applied as the classifiers for sequential labeling problems. In this paper, deep bidirectional RNNs (DBRNNs) are applied for the first time to error detection in automatic speech recognition (ASR), which is a sequential labeling problem. We investigate three types of ASR error detection tasks, i.e.(More)
A speech signal captured by a distant microphone is generally contaminated by background noise, which severely degrades the audible quality and intelligibility of the observed signal. To resolve this issue, speech enhancement has been intensively studied. In this paper, we consider a text-informed speech enhancement , where the enhancement process is guided(More)
This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically “who is speaking what” in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an(More)
The length of the word sequence is not taken into account under language modeling of n-gram local probability mod-eling. Due to this property, the optimal values of the language weight and word insertion penalty for balancing acoustic and linguistic probabilities is affected by the length of word sequence. To deal with this problem, a new language model is(More)