• Corpus ID: 246035920

Human and Automatic Speech Recognition Performance on German Oral History Interviews

@article{Gref2022HumanAA,
  title={Human and Automatic Speech Recognition Performance on German Oral History Interviews},
  author={Michael Gref and Nike Matthiesen and Christoph Schmidt and Sven Behnke and J. Kohler},
  journal={ArXiv},
  year={2022},
  volume={abs/2201.06841}
}
Automatic speech recognition systems have accomplished remarkable improvements in transcription accuracy in recent years. On some domains, models now achieve near-human performance. However, transcription performance on oral history has not yet reached human accuracy. In the present work, we investigate how large this gap between human and machine transcription still is. For this purpose, we analyze and compare transcriptions of three humans on a new oral history data set. We estimate a human… 

A Study on the Ambiguity in Human Annotation of German Oral History Interviews for Perceived Emotion Recognition and Sentiment Analysis

TLDR
The ambiguity in human perception of emotions and sentiment in German oral history interviews is investigated and the impact on machine learning systems is investigated.

References

SHOWING 1-10 OF 23 REFERENCES

English Conversational Telephone Speech Recognition by Humans and Machines

TLDR
An independent set of human performance measurements on two conversational tasks are performed and it is found that human performance may be considerably better than what was earlier reported, giving the community a significantly harder goal to achieve.

Improved Transcription and Indexing of Oral History Interviews for Digital Humanities Research

TLDR
The workflow is described used by Audio Mining to process long audio-files to automatically create time-aligned transcriptions to improve the transcription and indexing quality of the Fraunhofer IAIS Audio Mining system.

Comparing Human and Machine Errors in Conversational Speech Transcription

TLDR
It is found that the most frequent substitution, deletion and insertion error types of both outputs show a high degree of overlap, and the correlation between human and machine errors at the speaker level is quantified.

Toward Human Parity in Conversational Speech Recognition

TLDR
A human error rate on the widely used NIST 2000 test set for commercial bulk transcription is measured, suggesting that, given sufficient matched training data, conversational speech transcription engines are approximating human parity in both quantitative and qualitative terms.

Multi-Staged Cross-Lingual Acoustic Model Adaption for Robust Speech Recognition in Real-World Applications - A Case Study on German Oral History Interviews

TLDR
This work proposes and investigates an approach that performs a robust acoustic model adaption to a target domain in a cross-lingual, multi-staged manner and enables the exploitation of large-scale training data from other domains in both the same and other languages.

Two-Staged Acoustic Modeling Adaption for Robust Speech Recognition by the Example of German Oral History Interviews

TLDR
A two-staged approach to acoustic modeling that combines noise and reverberation data augmentation with transfer learning to robustly address challenges such as difficult acoustic recording conditions, spontaneous speech, and speech of elderly people is proposed.

Towards automatic transcription of large spoken archives - English ASR for the MALACH project

  • B. RamabhadranJing HuangM. Picheny
  • Computer Science
    2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03).
  • 2003
TLDR
This new testbed for developing speech recognition algorithms for the English speech in the MALACH corpus is described and the performance of well-known techniques for building better acoustic models for the speaking styles seen in this corpus are reported on.

Challenging the Boundaries of Speech Recognition: The MALACH Corpus

TLDR
It is proposed that the community place focus on the MALACH corpus to develop speech recognition systems that are more robust with respect to accents, disfluencies and emotional speech.

Speech recognition error analysis on the English MALACH corpus

TLDR
It was found that the signal-to-noise ratio and syllable rate were two dominant factors in explaining the overall word error rate, while there was no evidence of the impact of accent and speaker’s age on the recognition performance.

Operational Assessment of Keyword Search on Oral History

TLDR
This project assesses the resources necessary to make oral history searchable by means of automatic speech recognition (ASR), and shows comparable search performance using a standard speech recognition system as with hand-transcribed data, which is promising for increased accessibility of conversational speech and oral history archives.