Dhananjay Ram

Learn More
Sparse representation has been shown to be a powerful modeling framework for classification and detection tasks. In this paper, we propose a new keyword detection algorithm based on sparse representation of the posterior exemplars. The posterior exemplars are phone conditional probabilities obtained from a deep neural network. This method relies on the(More)
We cast the problem of query by example spoken term detection (QbE-STD) as subspace detection where query and background are modeled as a union of low-dimensional subspaces. The speech exemplars used for subspace modeling consist of class-conditional posterior probabilities obtained from deep neural network (DNN). The query and background training exemplars(More)
We cast the query by example spoken term detection (QbESTD) problem as subspace detection where query and background subspaces are modeled as union of low-dimensional subspaces. The speech exemplars used for subspace modeling are class-conditional posterior probabilities estimated using deep neural network (DNN). The query and background training exemplars(More)
Neural sequence-to-sequence models achieve remarkable performance not only in machine translation (MT) but also in other language processing tasks. One of the reasons for their effectiveness is the ability of their decoder to capture contextual information through its recurrent layer. However, its sequential modeling over-emphasizes the local context, i.e.(More)
State of the art query by example spoken term detection (QbE-STD) systems rely on representation of speech in terms of sequences of class-conditional posterior probabilities estimated by deep neural network (DNN). The posteriors are often used for pattern matching or dynamic time warping (DTW). Exploiting posterior probabilities as speech representation(More)
Speech is a complex signal produced by a highly constrained articulation machinery. Neuro and psycholinguistic theories assert that speech can be decomposed into molecules of structured atoms. Although characterization of the atoms is controversial, the experiments support the notion of invariant speech codes governing speech production and perception. We(More)
This work demonstrates an application of different real-time speech technologies, exploited in an online gaming scenario. The game developed for this purpose is inspired by the famous television based quizgame show, “Who wants to be a millionaire”, in which multiple-choice questions of increasing difficulty are asked to the participant. Textto-speech(More)
In this work, a Bayesian approach to speaker normalization is proposed to compensate for the degradation in performance of a speaker independent speech recognition system. The speaker normalization method proposed herein uses the technique of vocal tract length normalization (VTLN). The VTLN parameters are estimated using a novel Bayesian approach which(More)