Learn More
This paper describes several approaches to keyword spotting (KWS) for informal continuous speech. We compare acoustic keyword spotting, spotting in word lattices generated by large vocabulary continuous speech recognition and a hybrid approach making use of phoneme lattices generated by a phoneme recognizer. The systems are compared on carefully defined(More)
This paper deals with a hybrid word-subword recognition system for spoken term detection. The decoding is driven by a hybrid recognition network and the decoder directly produces hybrid word-subword lattices. One phone and two multigram models were tested to represent sub-word units. The systems were evaluated in terms of spoken term detection accuracy and(More)
In this paper, we describe the " Spoken Web Search " Task, which is being held as part of the 2013 MediaEval campaign. The purpose of this task is to perform audio search in multiple languages and acoustic conditions, with very few resources being available for each individual language. This year the data contains audio from nine different languages and is(More)
We present two techniques that are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures: (i) score normalization, where the scores of different keywords become commensurate with each other and they more closely correspond to the probability of being correct than raw posteriors; and (ii) system combination,(More)
This paper deals with comparison of sub-word based methods for spoken term detection (STD) task and phone recognition. The sub-word units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were(More)
In this paper, we describe the " Query by Example Search on Speech Task " (QUESST, formerly SWS, " Spoken Web Search "), held as part of the MediaEval 2014 evaluation campaign. As in previous years, the proposed task requires performing language-independent audio search in a low resource scenario. This year, the task has been designed to get as close as(More)
We submitted two approaches as the required runs: Acoustic Keyword Spotting as the primary one (AKWS) and Dynamic Time Wrapping as the secondary one (DTW) for the Spoken Web Search task. We aimed at building a simple phone based language-dependent system. We experimented with universal context bottleneck neural network classifier with 3-state phone(More)
The thesis investigates into keyword spotting and spoken term detection (STD), that are considered as subsets of spoken document retrieval. It deals with two-phase approaches where speech is first processed by speech recog-nizer, and the search for queries is performed in the output of this recognizer. Standard large vocabulary continuous speech recognizer(More)
We present the three approaches submitted to the Spoken Web Search. Two of them rely on Acoustic Keyword Spotting (AKWS) while the other relies on Dynamic Time Warping. Features are 3-state phone posterior. Results suggest that applying a Karhunen-Loeve transform to the log-phone posteriors representing the query to build a GMM/HMM for each query and a(More)