The Norwegian Parliamentary Speech Corpus
@inproceedings{Solberg2022TheNP, title={The Norwegian Parliamentary Speech Corpus}, author={Per Erik Solberg and Pablo Ortiz}, booktitle={International Conference on Language Resources and Evaluation}, year={2022} }
The Norwegian Parliamentary Speech Corpus (NPSC) is a speech dataset with recordings of meetings from Stortinget, the Norwegian parliament. It is the first, publicly available dataset containing unscripted, Norwegian speech designed for training of automatic speech recognition (ASR) systems. The recordings are manually transcribed and annotated with language codes and speakers, and there are detailed metadata about the speakers. The transcriptions exist in both normalized and non-normalized…
One Citation
ParlaSpeech-HR - a Freely Available ASR Dataset for Croatian Bootstrapped from the ParlaMint Corpus
- Computer SciencePARLACLARIN
- 2022
This paper presents our bootstrapping efforts of producing the first large freely available Croatian automatic speech recognition (ASR) dataset, 1,816 hours in size, obtained from parliamentary…
References
SHOWING 1-10 OF 15 REFERENCES
BERT Attends the Conversation: Improving Low-Resource Conversational ASR
- Computer Science
- 2021
New, data-efficient training tasks for BERT models that improve performance of automatic speech recognition (ASR) systems on conversational speech are proposed and shown how the performance of context-augmented rescoring methods strongly depends on the degree of spontaneity and nature of the conversation.
Better Evaluation of ASR in Speech Translation Context Using Word Embeddings
- Computer ScienceINTERSPEECH
- 2016
A simple extension of WER metric is proposed in order to penalize differently substitution errors according to their context using word embeddings and shows encouraging results in the evaluation of ASR in spoken language translation context.
Towards acoustic model unification across dialects
- Computer Science2016 IEEE Spoken Language Technology Workshop (SLT)
- 2016
Two techniques are presented: Distillation and MultiTask Learning (MTL), which show that both techniques are superior to the jointly-trained model that is trained on all dialectal data, reducing word error rates by 4:2% and 0:6%, respectively.
Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin
- Computer ScienceICML
- 2016
It is shown that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech-two vastly different languages, and is competitive with the transcription of human workers when benchmarked on standard datasets.
Dialects in Norway: catching up with the rest of Europe?
- Linguistics
- 2009
Abstract Norway has sometimes been described as a sociolinguistic paradise with its abundant linguistic heterogeneity — both written and spoken. Dialect diversity has been and is still considerable…
Apertium: a free/open-source platform for rule-based machine translation
- Computer ScienceMachine Translation
- 2011
The Apertium platform is summarised: the translation engine, the encoding of linguistic data, and the tools developed around the platform are discussed.
Scalable Modified Kneser-Ney Language Model Estimation
- Computer ScienceACL
- 2013
We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. Streaming and sorting enables the algorithm to scale to much larger models by using a fixed…
Optuna: A Next-generation Hyperparameter Optimization Framework
- Computer ScienceKDD
- 2019
New design-criteria for next-generation hyperparameter optimization software are introduced, including define-by-run API that allows users to construct the parameter search space dynamically, and easy-to-setup, versatile architecture that can be deployed for various purposes.
Oversikt over innlesere i NB tale
- 2015
Table 03743: Pupils in primary and lower secondary school
- Statistics Norway
- 2020