Ashish Verma

Learn More
We propose a three-stage pixel based visual front end for automatic speechreading (lipreading) that results in improved recognition performance of spoken words or phonemes. The proposed algorithm is a cascade of three transforms applied to a three-dimensional video region of interest that contains the speaker’s mouth area. The first stage is a typical image(More)
In this paper, we propose an approach towards audio search where no language specific resources are required. This approach is most useful in those scenarios where no training data exists to create an automatic speech recognition (ASR) system for a language, e.g. in the case of most regional languages or dialects. In this approach, a Multilayer perceptron(More)
Detection of filled pauses is a challenging research problem which has several practical applications. It can be used to evaluate the spoken fluency skills of the speaker, to improve the performance of automatic speech recognition systems or to predict the mental state of the speaker. This paper presents an algorithm for filled pause detection that is based(More)
We propose a three-stage pixel based visual front end for automatic speechreading (lipreading) that results in signi cantly improved recognition performance of spoken words or phonemes. The proposed algorithm is a cascade of three transforms applied on a three-dimensional video region-of-interest that contains the speaker's mouth area. The rst stage is a(More)
A popular approach for keyword search in speech files is the phone lattice search. Recently minimum edit distance (MED) has been used as a measure of similarity between strings rather than using simple string matching while searching the phone lattice for the keyword. In this paper, we propose a variation of the MED, where the substitution penalties are(More)
This paper describes a morphing-based audio driven facial animation system. Based on an incoming audio stream, a face image is animated with full lip synchronization and synthesized expressions. A novel scheme to implement a language independent system for audio-driven facial animation given a speech recognition system for just one language, in our case,(More)
A novel bacterium (strain NIO-1002(T)) belonging to the genus Microbacterium was isolated from a marine sediment sample in Chorao Island, Goa Province, India. Its morphology, physiology, biochemical features and 16S rRNA gene sequence were characterized. Cells of this strain were Gram-stain-positive, non-motile, non-spore-forming rods that formed(More)
We consider the problem of combining visual cues with audio signals for the purpose of improved automatic machine recognition of speech. Although signi cant progress has been made in machine transcription of large vocabulary continuous speech (LVCSR) over the last few years, the technology to date is most e ective only under controlled conditions such as(More)
Voice conversion techniques attempt to modify speech signal so that it is perceived as if spoken by another speaker, different from the original speaker. In this paper, we present a novel approach to perform voice conversion. Our approach uses acoustic models based on units of speech, like phones and diphones, for voice conversion. These models can be(More)