Atsunori Ogawa

Learn More
In this paper, we introduce a system for recognizing speech in the presence of multiple rapidly time-varying noise sources. The main components of the proposed approach are a modelbased speech enhancement pre-processor and an adaptation technique to optimize the integration between the pre-processor and the recognizer. The speech enhancement pre-processor(More)
This paper describes systems for the enhancement and recognition of distant speech recorded in reverberant rooms. Our speech enhancement (SE) system handles reverberation with blind deconvolution using linear filtering estimated by exploiting the temporal correlation of observed reverberant speech signals. Additional noise reduction is then performed using(More)
A speech signal captured by a distant microphone is generally contaminated by background noise, which severely degrades the audible quality and intelligibility of the observed signal. To resolve this issue, speech enhancement has been intensively studied. In this paper, we consider a text-informed speech enhancement, where the enhancement process is guided(More)
In this paper, two weighted distance measures; the weighted K-L divergence and the Bayesian criterion-based distance measure are proposed to efficiently reduce the Gaussian mixture components in the HMM-based acoustic model. Conventional distance measures such as the K-L divergence and the Bhattacharyya distance consider only distribution parameters (i.e.(More)
In this study, we explore an i-vector based adaptation of deep neural network (DNN) in noisy environment. We first demonstrate the importance of encapsulating environment and channel variability into i-vectors for DNN adaptation in noisy conditions. To be able to obtain robust i-vector without losing noise and channel variability information, we investigate(More)
This paper presents our real-time meeting analyzer for monitoring conversations in an ongoing group meeting. The goal of the system is to recognize automatically “who is speaking what” in an online manner for meeting assistance. Our system continuously captures the utterances and face poses of each speaker using a microphone array and an(More)
The length of the word sequence is not taken into account under language modeling of n-gram local probability modeling. Due to this property, the optimal values of the language weight and word insertion penalty for balancing acoustic and linguistic probabilities is affected by the length of word sequence. To deal with this problem, a new language model is(More)
This paper proposes an English speech recognition system which can recognize both non-native (i.e. Japanese) and native English speakers’ pronunciation of English speech. The system uses a bilingual pronunciation lexicon in which each word has both English and Japanese phoneme transcriptions. The Japanese transcription is constructed considering typical(More)