Learn More
This paper describes several approaches to keyword spotting (KWS) for informal continuous speech. We compare acoustic keyword spotting, spotting in word lattices generated by large vocabulary continuous speech recognition and a hybrid approach making use of phoneme lattices generated by a phoneme recognizer. The systems are compared on carefully defined(More)
This paper describes several ways of keywords spotting (KWS), based on Gaussian mixture (GM) hidden Markov modelling (HMM). Context-independent and dependent phoneme models are used in our system. The system was trained and evaluated on informal continuous speech. We used different complexities of KWS recognition networks and different types of phoneme(More)
We present two techniques that are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures: (i) score normalization, where the scores of different keywords become commensurate with each other and they more closely correspond to the probability of being correct than raw posteriors; and (ii) system combination,(More)
This paper deals with comparison of sub-word based methods for spoken term detection (STD) task and phone recognition. The subword units are needed for search for out-of-vocabulary words. We compared words, phones and multigrams. The maximal length and pruning of multigrams were investigated first. Then two constrained methods of multigram training were(More)
This paper deals with a hybrid word-subword recognition system for spoken term detection. The decoding is driven by a hybrid recognition network and the decoder directly produces hybrid word-subword lattices. One phone and two multigram models were tested to represent sub-word units. The systems were evaluated in terms of spoken term detection accuracy and(More)
In this paper, we describe the “Query by Example Search on Speech Task” (QUESST, formerly SWS, “Spoken Web Search”), held as part of the MediaEval 2014 evaluation campaign. As in previous years, the proposed task requires performing language-independent audio search in a low resource scenario. This year, the task has been designed to get as close as(More)
In this paper, we describe the “Spoken Web Search” Task, which is being held as part of the 2013 MediaEval campaign. The purpose of this task is to perform audio search in multiple languages and acoustic conditions, with very few resources being available for each individual language. This year the data contains audio from nine different languages and is(More)
This paper summarizes our work for MediaEval 2013 Spoken Web Search task evaluations. The task was Query-by-Example (search of spoken queries within spoken data). We submitted a system composed of 26 subsystems, of which 13 are based on Acoustic Keyword Spotting and 13 on Dynamic Time Warping. All of them use threestate phoneme posteriors as input features.(More)
Features based on a hierarchy of neural networks with compressive layers – Stacked Bottle-Neck (SBN) features – were recently shown to provide excellent performance in LVCSR systems. This paper summarizes several techniques investigated in our work towards Babel 2014 evaluations: (1) using several versions of fundamental frequency (F0) estimates, (2)(More)