Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

@article{Menne2019AnalysisOD,
  title={Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech},
  author={T. Menne and I. Sklyar and R. Schl{\"u}ter and H. Ney},
  journal={ArXiv},
  year={2019},
  volume={abs/1905.03500}
}
  • T. Menne, I. Sklyar, +1 author H. Ney
  • Published 2019
  • Computer Science, Engineering
  • ArXiv
  • Significant performance degradation of automatic speech recognition (ASR) systems is observed when the audio signal contains cross-talk. One of the recently proposed approaches to solve the problem of multi-speaker ASR is the deep clustering (DPCL) approach. Combining DPCL with a state-of-the-art hybrid acoustic model, we obtain a word error rate (WER) of 16.5 % on the commonly used wsj0-2mix dataset, which is the best performance reported thus far to the best of our knowledge. The wsj0-2mix… CONTINUE READING
    13 Citations
    Analyzing the impact of speaker localization errors on speech separation for automatic speech recognition
    • 3
    • PDF
    Localization guided speech separation
    • Highly Influenced
    • PDF
    Streaming Multi-speaker ASR with RNN-T
    • PDF
    End-to-End Training of Time Domain Audio Separation and Recognition
    • 7
    • Highly Influenced
    • PDF
    Learning Contextual Language Embeddings for Monaural Multi-Talker Speech Recognition
    • PDF
    On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments
    • 4
    • PDF
    End-To-End Multi-Speaker Speech Recognition With Transformer
    • 13
    • PDF
    Improving End-to-End Single-Channel Multi-Talker Speech Recognition
    • 4
    MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition
    • 23
    • PDF

    References

    SHOWING 1-10 OF 19 REFERENCES
    Single-Channel Multi-Speaker Separation Using Deep Clustering
    • 257
    • Highly Influential
    • PDF
    A Purely End-to-end System for Multi-speaker Speech Recognition
    • 40
    • Highly Influential
    • PDF
    End-to-end Monaural Multi-speaker ASR System without Pretraining
    • 29
    • Highly Influential
    • PDF
    End-to-End Multi-Speaker Speech Recognition
    • 33
    • PDF
    Permutation invariant training of deep models for speaker-independent multi-talker speech separation
    • 341
    • PDF
    Recognizing Multi-talker Speech with Permutation Invariant Training
    • 43
    • PDF
    The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
    • 166
    • PDF
    Deep clustering: Discriminative embeddings for segmentation and separation
    • 617
    • Highly Influential
    • PDF
    Deep attractor network for single-microphone speaker separation
    • Zhuo Chen, Yi Luo, Nima Mesgarani
    • Computer Science, Medicine
    • 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2017
    • 245
    • PDF
    Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks
    • 290