Kshitiz Kumar

Learn More
We present an algorithm for dereverberation of speech signals for automatic speech recognition (ASR) applications. Often ASR systems are presented with speech that has been recorded in environments that include noise and reverberation. The performance of ASR systems degrades with increasing levels of noise and reverberation. While many algorithms have been(More)
In this paper, we present a new two-microphone approach that improves speech recognition accuracy when speech is masked by other speech. The algorithm improves on previous systems that have been successful in separating signals based on differences in arrival time of signal components from two microphones. The present algorithm differs from these efforts in(More)
Almost all current automatic speech recognition (ASR) systems conventionally append delta and double-delta cepstral features to static cepstral features. In this work we describe a modified feature-extraction procedure in which the time-difference operation is performed in the spectral domain, rather than the cepstral domain as is generally presently done.(More)
Current state-of-the-art speech recognition systems work quite well in controlled environments but their performance degrades severely in realistic acoustical conditions in reverberant environments. In this paper we build on the recent developments that represent reverberation in the cepstral feature domain as a filtering operation and we formulate a(More)
In this paper we present a new method of signal processing for robust speech recognition using two microphones. The method, loosely based on the human binaural hearing system, consists of passing the speech signals detected by two microphones through bandpass filtering. We develop a spatial masking function based on normalized cross-correlation, which(More)
Some recent dereverberation approaches that have been effective for automatic speech recognition (ASR) applications, model reverberation as a linear convolution operation in the spectral domain, and derive a factorization to decompose spectra of reverberated speech in to those of clean speech and room-response filter. Typically, a general non-negative(More)
In this work, we study the problem of detecting audio-visual (AV) synchronization in video segments containing a speaker in frontal head pose. The problem holds important applications in biometrics, for example spoofing detection, and it constitutes an important step in AV segmentation necessary for deriving AV fingerprints in multimodal speaker(More)
In this paper, we introduce profile view (PV) lip reading, a scheme for speaker-dependent isolated word speech recognition. We provide historic motivation for PV from the importance of profile images in facial animation for lip reading, and we present feature extraction schemes for PV as well as for the traditional frontal view (FV) approach. We compare lip(More)
In this paper, we present a noise robustness algorithm called Small Power Boosting (SPB). We observe that in the spectral domain, time-frequency bins with smaller power are more affected by additive noise. The conventional way of handling this problem is estimating the noise from the test utterance and doing normalization or subtraction. In our work, in(More)
It is well known that binaural processing is very useful for separating incoming sound sources as well as for improving the intelligibility of speech in reverberant environments. This paper describes and compares a number of ways in which the classic model of interaural cross-correlation proposed by Jeffress, quantified by Colburn, and further elaborated by(More)