Constructing sub-word units for spoken term detection

@article{Heerden2017ConstructingSU,
  title={Constructing sub-word units for spoken term detection},
  author={Charl Johannes van Heerden and Damianos G. Karakos and Karthik Narasimhan and Marelie Hattingh Davel and Richard M. Schwartz},
  journal={2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
  year={2017},
  pages={5780-5784}
}
Spoken term detection, especially of out-of-vocabulary (OOV) keywords, benefits from the use of sub-word systems. We experiment with different language-independent approaches to sub-word unit generation, generating both syllable-like and morpheme-like units, and demonstrate how the performance of syllable-like units can be improved by artificially increasing the number of unique units. The effect of unit choice is empirically evaluated using the eight languages from the 2016 IARPA BABEL… 

Figures and Tables from this paper

On the Use of Grapheme Models for Searching in Large Spoken Archives

This paper explores the possibility to use grapheme-based word and sub-word models in the task of spoken term detection (STD) and achieves STD performance comparable with phoneme-based models but without the additional burden of G2P conversion.

Induced Inflection-Set Keyword Search in Speech

This work provides a recipe and evaluation set for the community to use as an extrinsic measure of the performance of inflection generation approaches and indicates how lexeme-set search performance changes with the number of hypothesized inflections.

Spoken Term Detection and Relevance Score Estimation Using Dot-Product of Pronunciation Embeddings

A novel approach to Spoken Term Detection in large spoken archives using deep LSTM networks based on the previous approach of using Siamese neural networks for STD and naturally extends it to directly localize a spoken term and estimate its relevance score.

Deep LSTM Spoken Term Detection using Wav2Vec 2.0 Recognizer

A bootstrapping approach that allows the transfer of the knowledge contained in traditional pronunciation vocabulary of DNN-HMM hybrid ASR into the context of grapheme-based Wav2Vec in the task of spoken term detection over a large set of spoken docu-ments is described.

Deep LSTM Spoken Term Detection using Wav2Vec 2.0 Recognizer

A bootstrapping approach that allows the transfer of the knowledge contained in traditional pronunciation vocabulary of DNN-HMM hybrid ASR into the context of grapheme-based Wav2Vec in the task of spoken term detection over a large set of spoken docu-ments is described.

ALBAYZIN 2018 spoken term detection evaluation: a multi-domain international evaluation in Spanish

The obtained results suggest that the STD task is still in progress and performance is highly sensitive to changes in the data domain.

SPEECH KEYWORD SPOTTING SYSTEM

In this paper we describe the 2016 BBN conversational telephone speech keyword spotting system; the culmination of four years of research and development under the IARPA Babel program. The system was

ODSQA: Open-Domain Spoken Question Answering Dataset

This paper releases Open-Domain Spoken Question Answering Dataset (ODSQA), the largest real SQA dataset, and finds that ASR errors have catastrophic impact on SQA, and that data augmentation on text-based QA training examples can improve SQA.

The 2016 BBN Georgian telephone speech keyword spotting system

The 2016 BBN conversational telephone speech keyword spotting system is described; the culmination of four years of research and development under the IARPA Babel program and presents the technological breakthroughs in building top-performing keyword spotting processing systems for new languages.

Mitigating the Impact of Speech Recognition Errors on Spoken Question Answering by Adversarial Domain Adaptation

This work proposes to mitigate the ASR errors by aligning the mismatch between ASR hypotheses and their corresponding reference transcriptions by applying an adversarial model to this domain adaptation task.

References

SHOWING 1-10 OF 25 REFERENCES

Cross-word sub-word units for low-resource keyword spotting

This work investigates the use of sub-word lexical units for the detection of out-of-vocabulary (OOV) keywords in the keyword spotting task and demonstrates that cross-word subword units achieve similar performance on OOV keywords as other types of sub, but can be combined to produce further gains.

Using Pronunciation-Based Morphological Subword Units to Improve OOV Handling in Keyword Search

This paper systematically investigates morphology-based subword modeling approaches on seven low-resource languages and shows that using morphological subword units (morphs) in speech recognition decoding is substantially better than expanding word-decoded lattices into sub word units including phones, syllables and morphs.

Towards using hybrid word and fragment units for vocabulary independent LVCSR systems

It is shown that a hybrid system which combines words and data-driven, variable length sub word units has a better phone accuracy than word only systems and is better in detecting Out-Of-Vocabulary (OOV) terms and representing them phonetically.

Comparing decoding strategies for subword-based keyword spotting in low-resourced languages

This paper investigates the use of subword lexical units for keyword spotting and finds that ignoring word boundaries improves the detection of OOV keywords without significantly impacting in-vocabulary keyword detection.

Subword and phonetic search for detecting out-of-vocabulary keywords

The syllable units are the best of the subword units for OOV keyword detection using fuzzy phonetic search, and these methods combine very well, sometimes resulting in ATWV scores for Oov terms which are not too far below those of IV terms.

A new method for OOV detection using hybrid word/fragment system

A new method for detecting regions with out-of-vocabulary words in the output of a large vocabulary continuous speech recognition (LVCSR) system that outperforms existing methods published in the literature.

Subword speech recognition for detection of unseen words

Experiments show that the proposed subword recognizer outperforms other subword systems in terms of phonetic keyword search accuracy measured on queries that consist of words not present in the training data.

Improvements on transducing syllable lattice to word lattice for keyword search

A weighted finite state transducer (WFST) based syllable decoding and transduction method for keyword search (KWS), and compares it with sub-word search and phone confusion methods in detail is compared.

Analysis of keyword spotting performance across IARPA babel languages

This work demonstrates that ATWV is keyword dependent, and that this must be accounted for in any cross-language analysis, and shows that while performance across languages does not track with any particular feature of the language, it is correlated with inter-annotator agreement.