Learn More
Sparse representation has been shown to be a powerful mod-eling framework for classification and detection tasks. In this paper, we propose a new keyword detection algorithm based on sparse representation of the posterior exemplars. The posterior exemplars are phone conditional probabilities obtained from a deep neural network. This method relies on the(More)
We cast the problem of query by example spoken term detection (QbE-STD) as subspace detection where query and background are modeled as a union of low-dimensional subspaces. The speech ex-emplars used for subspace modeling consist of class-conditional posterior probabilities obtained from deep neural network (DNN). The query and background training(More)
We cast the query by example spoken term detection (QbE-STD) problem as subspace detection where query and background subspaces are modeled as union of low-dimensional subspaces. The speech exemplars used for subspace model-ing are class-conditional posterior probabilities estimated using deep neural network (DNN). The query and background training(More)
State of the art query by example spoken term detection (QbE-STD) systems rely on representation of speech in terms of sequences of class-conditional posterior probabilities estimated by deep neural network (DNN). The posteriors are often used for pattern matching or dynamic time warping (DTW). Exploiting posterior probabilities as speech representation(More)
In this work, a Bayesian approach to speaker normalization is proposed to compensate for the degradation in performance of a speaker independent speech recognition system. The speaker normalization method proposed herein uses the technique of vocal tract length normalization (VTLN). The VTLN parameters are estimated using a novel Bayesian approach which(More)
—Speech is a complex signal produced by a highly constrained articulation machinery. Neuro and psycholinguistic theories assert that speech can be decomposed into molecules of structured atoms. Although characterization of the atoms is controversial, the experiments support the notion of invariant speech codes governing speech production and perception. We(More)
This work demonstrates an application of different real-time speech technologies, exploited in an online gaming scenario. The game developed for this purpose is inspired by the famous television based quiz-game show, " Who wants to be a millionaire " , in which multiple-choice questions of increasing difficulty are asked to the participant. Text-to-speech(More)
Dynamic time warping (DTW) is an algorithm to find out the similarity between two temporal sequences of varying length. Previous works in this field can be traced back to as early as [1], for automatic speech recognition (ASR). Although this technique became obsolete for ASR with the advent of Hidden Markov Models (HMM) [2] and Deep Neural Network (DNN)(More)
  • 1