• Corpus ID: 246285403

The Norwegian Parliamentary Speech Corpus

@inproceedings{Solberg2022TheNP,
  title={The Norwegian Parliamentary Speech Corpus},
  author={Per Erik Solberg and Pablo Ortiz},
  booktitle={International Conference on Language Resources and Evaluation},
  year={2022}
}
The Norwegian Parliamentary Speech Corpus (NPSC) is a speech dataset with recordings of meetings from Stortinget, the Norwegian parliament. It is the first, publicly available dataset containing unscripted, Norwegian speech designed for training of automatic speech recognition (ASR) systems. The recordings are manually transcribed and annotated with language codes and speakers, and there are detailed metadata about the speakers. The transcriptions exist in both normalized and non-normalized… 

Tables from this paper

ParlaSpeech-HR - a Freely Available ASR Dataset for Croatian Bootstrapped from the ParlaMint Corpus

This paper presents our bootstrapping efforts of producing the first large freely available Croatian automatic speech recognition (ASR) dataset, 1,816 hours in size, obtained from parliamentary

References

SHOWING 1-10 OF 15 REFERENCES

BERT Attends the Conversation: Improving Low-Resource Conversational ASR

New, data-efficient training tasks for BERT models that improve performance of automatic speech recognition (ASR) systems on conversational speech are proposed and shown how the performance of context-augmented rescoring methods strongly depends on the degree of spontaneity and nature of the conversation.

Better Evaluation of ASR in Speech Translation Context Using Word Embeddings

A simple extension of WER metric is proposed in order to penalize differently substitution errors according to their context using word embeddings and shows encouraging results in the evaluation of ASR in spoken language translation context.

Towards acoustic model unification across dialects

Two techniques are presented: Distillation and MultiTask Learning (MTL), which show that both techniques are superior to the jointly-trained model that is trained on all dialectal data, reducing word error rates by 4:2% and 0:6%, respectively.

Deep Speech 2 : End-to-End Speech Recognition in English and Mandarin

It is shown that an end-to-end deep learning approach can be used to recognize either English or Mandarin Chinese speech-two vastly different languages, and is competitive with the transcription of human workers when benchmarked on standard datasets.

Dialects in Norway: catching up with the rest of Europe?

Abstract Norway has sometimes been described as a sociolinguistic paradise with its abundant linguistic heterogeneity — both written and spoken. Dialect diversity has been and is still considerable

Apertium: a free/open-source platform for rule-based machine translation

The Apertium platform is summarised: the translation engine, the encoding of linguistic data, and the tools developed around the platform are discussed.

Scalable Modified Kneser-Ney Language Model Estimation

We present an efficient algorithm to estimate large modified Kneser-Ney models including interpolation. Streaming and sorting enables the algorithm to scale to much larger models by using a fixed

Optuna: A Next-generation Hyperparameter Optimization Framework

New design-criteria for next-generation hyperparameter optimization software are introduced, including define-by-run API that allows users to construct the parameter search space dynamically, and easy-to-setup, versatile architecture that can be deployed for various purposes.

Oversikt over innlesere i NB tale

  • 2015

Table 03743: Pupils in primary and lower secondary school

  • Statistics Norway
  • 2020