The Kaldi OpenKWS System: Improving Low Resource Keyword Search
@inproceedings{Trmal2017TheKO,
title={The Kaldi OpenKWS System: Improving Low Resource Keyword Search},
author={Jan Trmal and Matthew Wiesner and Vijayaditya Peddinti and Xiaohui Zhang and Pegah Ghahremani and Yiming Wang and Vimal Manohar and Hainan Xu and Daniel Povey and Sanjeev Khudanpur},
booktitle={INTERSPEECH},
year={2017}
}The IARPA BABEL program has stimulated worldwide research in keyword search technology for low resource languages, and the NIST OpenKWS evaluations are the de facto benchmark test for such capabilities. The 2016 OpenKWS evaluation featured Georgian speech, and had 10 participants from across the world. This paper describes the Kaldi system developed to assist IARPA in creating a competitive baseline against which participants were evaluated, and to provide a truly open source system to all…
31 Citations
ALBAYZIN 2018 spoken term detection evaluation: a multi-domain international evaluation in Spanish
- Computer ScienceEURASIP J. Audio Speech Music. Process.
- 2019
The obtained results suggest that the STD task is still in progress and performance is highly sensitive to changes in the data domain.
A General Procedure for Improving Language Models in Low-Resource Speech Recognition
- Computer Science2019 International Conference on Asian Language Processing (IALP)
- 2019
Pre-trained word vectors using out-of-domain data are utilized to improve the performance of RNN/LSTM LMs for rescoring first-pass decoding results and, after improving LMs, 5.4-7.6% relative reduction of word error rate (WER) is generally achieved compared to the baseline ASR systems.
Keyword Spotting With Audio And Text Embeddings
- 2019
Keyword Spotting (KWS) systems allow detecting a set of spoken (pre-defined) keywords. Open-vocabulary KWS systems search for the keywords in the set of word hypotheses generated by an automatic…
Feature learning for efficient ASR-free keyword spotting in low-resource languages
- Computer Science, EngineeringComput. Speech Lang.
- 2021
The CNN-DTW keyword spotter using BNF-derived CAE features represents an efficient approach with competitive performance suited to rapid deployment in a severely under-resourced scenario.
The MIT Lincoln Laboratory / JHU / EPITA-LSE LRE17 System
- Computer Science, EngineeringOdyssey
- 2018
The MITLL/JHU LRE17 submission represents a collaboration between researchers at MITLL and JHU with multiple sub-systems reflecting a range of language recognition technologies including traditional MFCC/SDC i- vector systems, deep neural network (DNN) bottleneck feature based i-vector systems, stateof-the-art DNN x- Vector systems and a sparse coding system.
Multilingual ASR with Massive Data Augmentation
- Computer Science, EngineeringArXiv
- 2019
This work presents a single grapheme-based ASR model learned on 7 geographically proximal languages, using standard hybrid BLSTM-HMM acoustic models with lattice-free MMI objective and evaluates the efficacy of multiple data augmentation alternatives within language, as well as their complementarity with multilingual modeling.
Multilingual Graphemic Hybrid ASR with Massive Data Augmentation
- Computer ScienceSLTU
- 2020
This work presents a single grapheme-based ASR model learned on 7 geographically proximal languages, using standard hybrid BLSTM-HMM acoustic models with lattice-free MMI objective and evaluates the efficacy of multiple data augmentation alternatives within language, as well as their complementarity with multilingual modeling.
The JHU Speech LOREHLT 2017 System: Cross-Language Transfer for Situation-Frame Detection
- Computer ScienceArXiv
- 2018
A language agnostic approach combining universal acoustic modeling, evaluation-language-to-English machine translation (MT) and an English-language topic classifier is presented, which requires no transcribed speech in the given evaluation language, nor even in a related language.
An Investigative Study of Multi-Modal Cross-Lingual Retrieval
- Computer ScienceCLSSTS
- 2020
This paper focuses on the use case of retrieving text and speech documents in Swahili, using English queries which was the main focus of the OpenCLIR shared task, and develops separate components for automatic translation (AT), speech processing (SP) and information retrieval (IR).
On the Use of Grapheme Models for Searching in Large Spoken Archives
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This paper explores the possibility to use grapheme-based word and sub-word models in the task of spoken term detection (STD) and achieves STD performance comparable with phoneme-based models but without the additional burden of G2P conversion.
References
SHOWING 1-10 OF 24 REFERENCES
A keyword search system using open source software
- Computer Science2014 IEEE Spoken Language Technology Workshop (SLT)
- 2014
Provides an overview of a speech-to-text (STT) and keyword search (KWS) system architecture build primarily on the top of the Kaldi toolkit and expands on a few highlights. The system was developed…
Score normalization and system combination for improved keyword spotting
- Computer Science2013 IEEE Workshop on Automatic Speech Recognition and Understanding
- 2013
Two techniques are shown to yield improved Keyword Spotting (KWS) performance when using the ATWV/MTWV performance measures, which resulted in the highest performance for the official surprise language evaluation for the IARPA-funded Babel project in April 2013.
The TAO of ATWV: Probing the mysteries of keyword search performance
- Computer Science2013 IEEE Workshop on Automatic Speech Recognition and Understanding
- 2013
This analysis quantifies the potential ATWV gains from improving the number of true hits and the overall quality of the detection scores in the authors' system's posting lists and shows that system combination improves their systems' ATWVs via a small increase in the numberof true hits in the posting lists.
The 2016 BBN Georgian telephone speech keyword spotting system
- Computer Science2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2017
The 2016 BBN conversational telephone speech keyword spotting system is described; the culmination of four years of research and development under the IARPA Babel program and presents the technological breakthroughs in building top-performing keyword spotting processing systems for new languages.
Using proxies for OOV keywords in the keyword search task
- Computer Science2013 IEEE Workshop on Automatic Speech Recognition and Understanding
- 2013
Empirical results when searching for the Babel/NIST evaluation keywords in the Babel 10 hour development-test speech collection show that searching for word proxies in the word index significantly outperforms searching for phonetic representations of OOV words in a phone index.
Multilingual representations for low resource speech recognition and keyword search
- Computer Science2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU)
- 2015
This paper examines the impact of multilingual acoustic representations on Automatic Speech Recognition (ASR) and keyword search (KWS) for low resource languages in the context of the OpenKWS15 evaluation of the IARPA Babel program and shows that these multilingual representations significantly improve ASR and KWS performance.
Syllable based keyword search: Transducing syllable lattices to word lattices
- Computer Science2014 IEEE Spoken Language Technology Workshop (SLT)
- 2014
This paper presents a weighted finite state transducer (WFST) based syllable decoding and transduction framework for keyword search (KWS), and shows that this method can effectively perform KWS on both IV and OOV keywords, and yields up to 0.03 Actual Term-Weighted Value (ATWV) improvement over searching keywords directly in subword lattices.
Purely Sequence-Trained Neural Networks for ASR Based on Lattice-Free MMI
- Computer ScienceINTERSPEECH
- 2016
A method to perform sequencediscriminative training of neural network acoustic models without the need for frame-level cross-entropy pre-training is described, using the lattice-free version of the maximum mutual information (MMI) criterion: LF-MMI.
Unicode-based graphemic systems for limited resource languages
- Computer Science2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2015
This paper proposes a simple approach for building graphemic systems for any language written in unicode, where the attributes for graphemes are automatically derived using features from the unicode character descriptions in decision tree construction.
The Kaldi Speech Recognition Toolkit
- Computer Science
- 2011
The design of Kaldi is described, a free, open-source toolkit for speech recognition research that provides a speech recognition system based on finite-state automata together with detailed documentation and a comprehensive set of scripts for building complete recognition systems.


