Constructing sub-word units for spoken term detection

  title={Constructing sub-word units for spoken term detection},
  author={Charl Johannes van Heerden and Damianos G. Karakos and Karthik Narasimhan and Marelie Hattingh Davel and Richard M. Schwartz},
  journal={2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
Spoken term detection, especially of out-of-vocabulary (OOV) keywords, benefits from the use of sub-word systems. We experiment with different language-independent approaches to sub-word unit generation, generating both syllable-like and morpheme-like units, and demonstrate how the performance of syllable-like units can be improved by artificially increasing the number of unique units. The effect of unit choice is empirically evaluated using the eight languages from the 2016 IARPA BABEL… Expand

Figures, Tables, and Topics from this paper

On the Use of Grapheme Models for Searching in Large Spoken Archives
This paper explores the possibility to use grapheme-based word and sub-word models in the task of spoken term detection (STD) and achieves STD performance comparable with phoneme-based models but without the additional burden of G2P conversion. Expand
ALBAYZIN 2018 spoken term detection evaluation: a multi-domain international evaluation in Spanish
The obtained results suggest that the STD task is still in progress and performance is highly sensitive to changes in the data domain. Expand
In this paper we describe the 2016 BBN conversational telephone speech keyword spotting system; the culmination of four years of research and development under the IARPA Babel program. The system wasExpand
ODSQA: Open-Domain Spoken Question Answering Dataset
This paper releases Open-Domain Spoken Question Answering Dataset (ODSQA), the largest real SQA dataset, and finds that ASR errors have catastrophic impact on SQA, and that data augmentation on text-based QA training examples can improve SQA. Expand
The 2016 BBN Georgian telephone speech keyword spotting system
The 2016 BBN conversational telephone speech keyword spotting system is described; the culmination of four years of research and development under the IARPA Babel program and presents the technological breakthroughs in building top-performing keyword spotting processing systems for new languages. Expand
Mitigating the Impact of Speech Recognition Errors on Spoken Question Answering by Adversarial Domain Adaptation
This work proposes to mitigate the ASR errors by aligning the mismatch between ASR hypotheses and their corresponding reference transcriptions by applying an adversarial model to this domain adaptation task. Expand
Efficient query-by-example spoken document retrieval combining phone multigram representation and dynamic time warping
Experiments performed on the MediaEval 2014 Query-by-Example Search on Speech (QUESST 2014) evaluation framework suggest that the phone multigram representation for QbESDR is a successful approach, and the assessed combinations with a DTW-based strategy lead to more efficient and effective Qb ESDR systems. Expand
Research on Chinese New Word Recognition Method
A major bottleneck in Chinese word segmentation technology is the lack of recognition of OOV, particularly in a specific field. In this paper, an unsupervised method based on the combination ofExpand
基於特徵粒度之訓練策略於中文口語問答系統之應用(A Feature-granularity Training Strategy for Chinese Spoken Question Answering)
In a spoken question answering (SQA) system, a straightforward strategy is to transcribe given speech utterances into text using an ASR system. After that, classic methods can be readily used to theExpand
Induced Inflection-Set Keyword Search in Speech
This work provides a recipe and evaluation set for the community to use as an extrinsic measure of the performance of inflection generation approaches and indicates how lexeme-set search performance changes with the number of hypothesized inflections. Expand


Cross-word sub-word units for low-resource keyword spotting
This work investigates the use of sub-word lexical units for the detection of out-of-vocabulary (OOV) keywords in the keyword spotting task and demonstrates that cross-word subword units achieve similar performance on OOV keywords as other types of sub, but can be combined to produce further gains. Expand
Using Pronunciation-Based Morphological Subword Units to Improve OOV Handling in Keyword Search
This paper systematically investigates morphology-based subword modeling approaches on seven low-resource languages and shows that using morphological subword units (morphs) in speech recognition decoding is substantially better than expanding word-decoded lattices into sub word units including phones, syllables and morphs. Expand
Towards using hybrid word and fragment units for vocabulary independent LVCSR systems
It is shown that a hybrid system which combines words and data-driven, variable length sub word units has a better phone accuracy than word only systems and is better in detecting Out-Of-Vocabulary (OOV) terms and representing them phonetically. Expand
Code-switched English Pronunciation Modeling for Swahili Spoken Term Detection
Modelling strategies for English code-switched words as found in a Swahili spoken term detection system are investigated to significantly improve the detection performance of these words. Expand
Comparing decoding strategies for subword-based keyword spotting in low-resourced languages
This paper investigates the use of subword lexical units for keyword spotting and finds that ignoring word boundaries improves the detection of OOV keywords without significantly impacting in-vocabulary keyword detection. Expand
Subword and phonetic search for detecting out-of-vocabulary keywords
The syllable units are the best of the subword units for OOV keyword detection using fuzzy phonetic search, and these methods combine very well, sometimes resulting in ATWV scores for Oov terms which are not too far below those of IV terms. Expand
A new method for OOV detection using hybrid word/fragment system
A new method for detecting regions with out-of-vocabulary words in the output of a large vocabulary continuous speech recognition (LVCSR) system that outperforms existing methods published in the literature. Expand
Subword speech recognition for detection of unseen words
Experiments show that the proposed subword recognizer outperforms other subword systems in terms of phonetic keyword search accuracy measured on queries that consist of words not present in the training data. Expand
Improvements on transducing syllable lattice to word lattice for keyword search
A weighted finite state transducer (WFST) based syllable decoding and transduction method for keyword search (KWS), and compares it with sub-word search and phone confusion methods in detail is compared. Expand
Analysis of keyword spotting performance across IARPA babel languages
This work demonstrates that ATWV is keyword dependent, and that this must be accounted for in any cross-language analysis, and shows that while performance across languages does not track with any particular feature of the language, it is correlated with inter-annotator agreement. Expand