Randy Gomez

Learn More
Speech recognition under reverberant condition is a difficult task. Most dereverberation techniques used to address this problem enhance the reverberant waveform independent from that of the speech recognizer. In this paper, we improve the conventional Spectral Subtraction-based (SS) dereverberation technique. In our proposed approach, the dereverberation(More)
Automatic speech recognition (ASR) in reverberant environments is a challenging task. Most dereverberation techniques address this problem through signal processing and enhances the reverberant waveform independent from the speech recognizer. In this paper, we propose a novel scheme to perform dereverberation in relation with the likelihood of the back-end(More)
We propose a robust and fast dereverberation technique for real-time speech recognition application. First, we effectively identify the late reflection components of the room impulse response. We use this information together with the concept of spectral subtraction (SS) to remove the late reflection components of the reverberant signal. In the absence of(More)
Speech is one of the most natural medium for human communication, which makes it vital to human-robot interaction. In real environments where robots are deployed, distant-talking speech recognition is difficult to realize due to the effects of reverberation. This leads to the degradation of speech recognition and understanding, and hinders a seamless(More)
In real-time speech recognition applications, there is a need to implement a fast and reliable adaptation algorithm. We propose a method to reduce adaptation time of the unsupervised speaker adaptation based on HMM-sufficient statistics. We use only a single arbitrary utterance without transcriptions in selecting the N-best speakers' sufficient statistics(More)
This paper presents a new multimodal system for group dynamics and interaction analysis. The framework is composed of a mic array and multiview video cameras placed on a digital signage display which serves as a support for interaction. We show that visual information processing can be used to localize nonverbal communication events and synchronized with(More)
This paper presents a novel multimodal system designed for multi-party human-machine interaction understanding. The design of human-computer interfaces for multiple users is challenging because simultaneous processing of actions and reactions have to be consistent. The proposed system consists of a large display equipped with multiple sensing devices:(More)
In this paper, we show a method that significantly improved our previous work in single-channel dereverberation. The proposed method is more robust to changes in speaker position in distant talking ASR. First, we update the room transfer function (RTF) and weighting parameters for dereverberation to the target speaker position. This scheme corrects speech(More)
Speech enhancement is a common approach to address the effects of degradation due to noise and channel contamination. This approach is intended to suppress unwanted signal and recover the clean speech. In this paper, we focus on two simple and low-computational methods: Wiener filtering (WF) and spectral subtraction (SS). Conventionally, these are(More)