End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting
@article{Desot2022EndtoEndSL, title={End-to-End Spoken Language Understanding: Performance analyses of a voice command task in a low resource setting}, author={Thierry Desot and François Portet and Michel Vacher}, journal={ArXiv}, year={2022}, volume={abs/2207.08179} }
Figures and Tables from this paper
3 Citations
Toward Low-Cost End-to-End Spoken Language Understanding
- Computer ScienceINTERSPEECH
- 2022
It is shown that it is possible to reduce the learning cost while maintaining state-of-the-art performance and using SSL models, and an extensive analysis is proposed where the cost of the models is measured in terms of training time and electric energy consumption.
Taxonomic Classification of IoT Smart Home Voice Control
- Computer ScienceArXiv
- 2022
A taxonomy of the voice control technologies present in commercial smart home systems is presented, and open-source libraries and devices that could support a cloud-free voice assistant are discussed.
COMPANIES´USAGE OF AI IN THE CZECH REPUBLIC
- Computer Science12
- 2022
The most used mechanism is image recognition, which used all of the sectors and the least are speech generation and machine learning.
References
SHOWING 1-10 OF 74 REFERENCES
Corpus Generation for Voice Command in Smart Home and the Effect of Speech Synthesis on End-to-End SLU
- Computer ScienceLREC
- 2020
This work presents the automatic generation process of a synthetic semantically-annotated corpus of French commands for smart-home to train pipeline and End-to-End (E2E) SLU models to jointly perform ASR and NLU.
SLU for Voice Command in Smart Home: Comparison of Pipeline and End-to-End Approaches
- Computer Science2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- 2019
Results show that the E2E approach can reach performances similar to a state-of-the art pipeline SLU despite a higher WER than the pipeline approach, and can benefit from artificially generated data to exhibit lower Concept Error Rates than the Pipeline baseline for slot recognition.
Towards End-to-end Spoken Language Understanding
- Computer Science2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2018
This study showed that the trained model can achieve reasonable good result and demonstrated that the model can capture the semantic attention directly from the audio features.
Exploring ASR-free end-to-end modeling to improve spoken language understanding in a cloud-based dialog system
- Computer Science2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU)
- 2017
This paper proposes an ASR-free, end-to-end (E2E) modeling approach to SLU for a cloud-based, modular spoken dialog system (SDS) and evaluates the effectiveness of the approach on crowdsourced data collected from non-native English speakers interacting with a conversational language learning application.
Speech To Semantics: Improve ASR and NLU Jointly via All-Neural Interfaces
- Computer ScienceINTERSPEECH
- 2020
This work presents models that extract utterance intent directly from speech without intermediate text output and contrast these methods to a jointly trained end-to-end joint SLU model, consisting of ASR and NLU subsystems which are connected by a neural network based interface instead of text, that produces transcripts as well as NLU interpretation.
Using Speech Synthesis to Train End-To-End Spoken Language Understanding Models
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
This work proposes a strategy to overcome this requirement in which speech synthesis is used to generate a large synthetic training dataset from several artificial speakers, and confirms the effectiveness of this approach with experiments on two open-source SLU datasets.
Towards End-to-End spoken intent recognition in smart home
- Computer Science2019 International Conference on Speech Technology and Human-Computer Dialogue (SpeD)
- 2019
Experiments on a corpus of voice commands acquired in a real smart home reveal that the state-of-the art pipeline baseline is still superior to the E2E approach, however, using artificial data generation techniques it is shown that significant improvement to theE2E model can be brought to reach competitive performances.
Beyond ASR 1-best: Using word confusion networks in spoken language understanding
- Computer ScienceComput. Speech Lang.
- 2006
Learning Natural Language Understanding Systems from Unaligned Labels for Voice Command in Smart Homes
- Computer Science2019 IEEE International Conference on Pervasive Computing and Communications Workshops (PerCom Workshops)
- 2019
This paper proposes to use a sequence-to-sequence neural architecture to train NLU models which do not need aligned data and can jointly learn the intent, slot-label and slot-value prediction tasks.
Pretrained Semantic Speech Embeddings for End-to-End Spoken Language Understanding via Cross-Modal Teacher-Student Learning
- Computer ScienceINTERSPEECH
- 2020
A novel training method is proposed that enables pretrained contextual embeddings to process acoustic features and is based on the teacher-student framework across speech and text modalities that aligns the acoustic and the semantic latent spaces.