Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers
@article{Lugosch2021TimersAS, title={Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers}, author={Loren Lugosch and Piyush Papreja and Mirco Ravanelli and Abdelwahab Heba and Titouan Parcollet}, journal={ArXiv}, year={2021}, volume={abs/2104.01604} }
This paper introduces Timers and Such, a new open source dataset of spoken English commands for common voice control use cases involving numbers. We describe the gap in existing spoken language understanding datasets that Timers and Such fills, the design and creation of the dataset, and experiments with a number of ASR-based and end-to-end baseline models, the code for which has been made available as part of the SpeechBrain toolkit.
6 Citations
SpeechBrain: A General-Purpose Speech Toolkit
- Computer ScienceArXiv
- 2021
The core architecture of SpeechBrain is described, designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing pipelines.
Match to Win: Analysing Sequences Lengths for Efficient Self-Supervised Learning in Speech and Audio
- Computer Science2022 IEEE Spoken Language Technology Workshop (SLT)
- 2023
This paper provides the first empirical study of SSL pre-training for different specified sequence lengths and links this to various downstream tasks and finds that training on short sequences can dramatically reduce resource costs while retaining a satisfactory performance for all tasks.
SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks
- Computer ScienceArXiv
- 2022
This work introduces several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.
The Spoken Language Understanding MEDIA Benchmark Dataset in the Era of Deep Learning: data updates, training and evaluation tools
- Computer ScienceLREC
- 2022
A complete recipe, including data preparation, training and evaluation scripts, has been built and integrated to SpeechBrain, an already popular open-source and all-in-one conversational AI toolkit based on PyTorch, for the use of the French MEDIA SLU dataset.
Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models
- Computer ScienceArXiv
- 2022
This paper presents a simple method for embedding intents and entities into Finite State Transducers, and, in combination with a pretrained general-purpose Speech-to-Text model, allows building SLU-models without any additional training.
A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding
- Computer ScienceArXiv
- 2021
This work explored partial fine-tuning and entire wav2vec 2.0 and HuBERT pre-trained models for three non-ASR speech tasks: Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding.
References
SHOWING 1-10 OF 61 REFERENCES
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
- Computer ScienceEMNLP
- 2018
SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, finds that it is possible to achieve comparable accuracy to direct subword training from raw sentences.
SLURP: A Spoken Language Under- 318 standing Resource Package
- Proceedings of the 2020 Conference on Empirical Methods in 319 Natural Language Processing (EMNLP), 2020. 320
- 2020
Spoken language understanding on 311 9 the edge
- NeurIPS Workshop on Energy Efficient Machine Learning and Cognitive Computing, 312 2019. 313
- 2019
Spoken Language Understanding on the Edge
- Computer Science2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS)
- 2019
The design of an embedded, private-by-design SLU system is outlined and it is shown that it has performance on-par with cloud-based commercial solutions.
Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks
- Computer ScienceICML
- 2006
This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.
SpeechBrain: A General-Purpose Speech Toolkit
- Computer ScienceArXiv
- 2021
The core architecture of SpeechBrain is described, designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing pipelines.
A Streaming End-to-End Framework For Spoken Language Understanding
- Computer ScienceIJCAI
- 2021
This paper proposes a streaming endto-end framework that can process multiple intentions in an online and incremental way and employs a unidirectional RNN trained with the connectionist temporal classification (CTC) criterion.
SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding
- Computer ScienceNAACL
- 2021
A novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules and improves the previous state-of-the-art performance on the Spoken SQuAD dataset by more than 10%.
Integration of Pre-Trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding
- Computer ScienceICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2022
This work proposes a simple and robust integration method for the E2E SLU network with a novel Interface, Continuous Token Interface (CTI), and verifies that the NLU network, pre-trained with Masked Language Model (MLM), can utilize a noisy textual representation of CTI.