Swiss German Speech to Text system evaluation

  title={Swiss German Speech to Text system evaluation},
  author={Yanick Schraner and Christian Vibe Scheller and Michel Pl{\"u}ss and Manfred Vogel},
We present an in-depth evaluation of four commercially available Speech-to-Text (STT) systems for Swiss German. The systems are anonymized and referred to as system a-d in this report. We compare the four systems to our STT model, referred to as FHNW from hereon after, and provide details on how we trained our model. To evaluate the models, we use two STT datasets from different domains. The Swiss Parliament Corpus (SPC) test set and a private dataset in the news domain with an even… 

Figures and Tables from this paper



SwissDial: Parallel Multidialectal Corpus of Spoken Swiss German

The first annotated parallel corpus of spoken Swiss German across 8 major dialects, plus a Standard German reference is introduced, creating and making available a basic dataset for employing data-driven NLP applications in Swiss German.

Common Voice: A Massively-Multilingual Speech Corpus

This work presents speech recognition experiments using Mozilla’s DeepSpeech Speech-to-Text toolkit, and finds an average Character Error Rate improvement for twelve target languages, for most of these languages, these are the first ever published results on end- to-end Automatic Speech Recognition.

Automatic speech recognition and translation of a Swiss German dialect: Walliserdeutsch

A small database was collected based mainly on broadcast news from a local radio station to investigate the potential of automatic speech processing of Walliserdeutsch, suggesting that automatic speech recognition is feasible and statistical machine translation is feasible.

Swiss Parliaments Corpus, an Automatically Aligned Swiss German Speech to Standard German Text Corpus

The Swiss Parliaments Corpus is created, an automatically aligned Swiss German speech to Standard German text corpus, able to create a speech-to-text corpus in a fully automatic fashion, given an audio recording and the corresponding unaligned transcript.

Attention is All you Need

A new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely is proposed, which generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.

SDS-200: A Swiss German Speech to Standard German Text Corpus

We present SDS-200, a corpus of Swiss German dialectal speech with Standard German text translations, annotated with dialect, age, and gender information of the speakers. The dataset allows for

XLS-R: Self-supervised Cross-lingual Speech Representation Learning at Scale

XLS-R is presented, a large-scale model for cross-lingual speech representation learning based on wav2vec 2.0 that improves over the best known prior work on BABEL, MLS, CommonVoice as well as VoxPopuli, lowering error rates by 14-34% relative on average.

SwissText 2021 Task 3: Swiss German Speech to Standard German Text (short paper)

The objective was to maximize the BLEU score on a new test set covering a large part of the Swiss German dialect landscape and four teams participated, with the winning contribution achieving a BLEu score of 46.0.

Fairseq S2T: Fast Speech-to-Text Modeling with Fairseq

State-of-the-art RNN-based as well as Transformer-based models and open-source detailed training recipes are implemented and seamlessly integrated into S2T workflows for multi-task learning or transfer learning.

The ParlSpeech V2 data set: Full-text corpora of 6.3 million parliamentary speeches in the key legislative chambers of nine representative democracies

ParlSpeech V2 contains complete full-text vectors of more than 6.3 million parliamentary speeches in the key legislative chambers of Austria, the Czech Republic, Germany, Denmark, the Netherlands,