• Corpus ID: 233024867

Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers

@article{Lugosch2021TimersAS,
  title={Timers and Such: A Practical Benchmark for Spoken Language Understanding with Numbers},
  author={Loren Lugosch and Piyush Papreja and Mirco Ravanelli and Abdelwahab Heba and Titouan Parcollet},
  journal={ArXiv},
  year={2021},
  volume={abs/2104.01604}
}
This paper introduces Timers and Such, a new open source dataset of spoken English commands for common voice control use cases involving numbers. We describe the gap in existing spoken language understanding datasets that Timers and Such fills, the design and creation of the dataset, and experiments with a number of ASR-based and end-to-end baseline models, the code for which has been made available as part of the SpeechBrain toolkit. 

Figures and Tables from this paper

SpeechBrain: A General-Purpose Speech Toolkit

The core architecture of SpeechBrain is described, designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing pipelines.

Match to Win: Analysing Sequences Lengths for Efficient Self-Supervised Learning in Speech and Audio

This paper provides the first empirical study of SSL pre-training for different specified sequence lengths and links this to various downstream tasks and finds that training on short sequences can dramatically reduce resource costs while retaining a satisfactory performance for all tasks.

SLUE Phase-2: A Benchmark Suite of Diverse Spoken Language Understanding Tasks

This work introduces several new annotated SLU benchmark tasks based on freely available speech data, which complement existing benchmarks and address gaps in the SLU evaluation landscape.

The Spoken Language Understanding MEDIA Benchmark Dataset in the Era of Deep Learning: data updates, training and evaluation tools

A complete recipe, including data preparation, training and evaluation scripts, has been built and integrated to SpeechBrain, an already popular open-source and all-in-one conversational AI toolkit based on PyTorch, for the use of the French MEDIA SLU dataset.

Finstreder: Simple and fast Spoken Language Understanding with Finite State Transducers using modern Speech-to-Text models

This paper presents a simple method for embedding intents and entities into Finite State Transducers, and, in combination with a pretrained general-purpose Speech-to-Text model, allows building SLU-models without any additional training.

A Fine-tuned Wav2vec 2.0/HuBERT Benchmark For Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding

This work explored partial fine-tuning and entire wav2vec 2.0 and HuBERT pre-trained models for three non-ASR speech tasks: Speech Emotion Recognition, Speaker Verification and Spoken Language Understanding.

References

SHOWING 1-10 OF 61 REFERENCES

SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing

SentencePiece, a language-independent subword tokenizer and detokenizer designed for Neural-based text processing, finds that it is possible to achieve comparable accuracy to direct subword training from raw sentences.

SLURP: A Spoken Language Under- 318 standing Resource Package

  • Proceedings of the 2020 Conference on Empirical Methods in 319 Natural Language Processing (EMNLP), 2020. 320
  • 2020

Spoken language understanding on 311 9 the edge

  • NeurIPS Workshop on Energy Efficient Machine Learning and Cognitive Computing, 312 2019. 313
  • 2019

Spoken Language Understanding on the Edge

  • Alaa SaadeA. Coucke Maël Primet
  • Computer Science
    2019 Fifth Workshop on Energy Efficient Machine Learning and Cognitive Computing - NeurIPS Edition (EMC2-NIPS)
  • 2019
The design of an embedded, private-by-design SLU system is outlined and it is shown that it has performance on-par with cloud-based commercial solutions.

Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks

This paper presents a novel method for training RNNs to label unsegmented sequences directly, thereby solving both problems of sequence learning and post-processing.

SpeechBrain: A General-Purpose Speech Toolkit

The core architecture of SpeechBrain is described, designed to support several tasks of common interest, allowing users to naturally conceive, compare and share novel speech processing pipelines.

Language Understanding

  • Education
    Encyclopedia of Autism Spectrum Disorders
  • 2021

A Streaming End-to-End Framework For Spoken Language Understanding

This paper proposes a streaming endto-end framework that can process multiple intentions in an online and incremental way and employs a unidirectional RNN trained with the connectionist temporal classification (CTC) criterion.

SPLAT: Speech-Language Joint Pre-Training for Spoken Language Understanding

A novel semi-supervised learning framework, SPLAT, to jointly pre-train the speech and language modules and improves the previous state-of-the-art performance on the Spoken SQuAD dataset by more than 10%.

Integration of Pre-Trained Networks with Continuous Token Interface for End-to-End Spoken Language Understanding

  • S. SeoDonghyun KwakBowon Lee
  • Computer Science
    ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2022
This work proposes a simple and robust integration method for the E2E SLU network with a novel Interface, Continuous Token Interface (CTI), and verifies that the NLU network, pre-trained with Masked Language Model (MLM), can utilize a noisy textual representation of CTI.
...