Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech

@article{Menne2019AnalysisOD,
  title={Analysis of Deep Clustering as Preprocessing for Automatic Speech Recognition of Sparsely Overlapping Speech},
  author={T. Menne and I. Sklyar and R. Schl{\"u}ter and H. Ney},
  journal={ArXiv},
  year={2019},
  volume={abs/1905.03500}
}
  • T. Menne, I. Sklyar, +1 author H. Ney
  • Published 2019
  • Computer Science, Engineering
  • ArXiv
  • Significant performance degradation of automatic speech recognition (ASR) systems is observed when the audio signal contains cross-talk. One of the recently proposed approaches to solve the problem of multi-speaker ASR is the deep clustering (DPCL) approach. Combining DPCL with a state-of-the-art hybrid acoustic model, we obtain a word error rate (WER) of 16.5 % on the commonly used wsj0-2mix dataset, which is the best performance reported thus far to the best of our knowledge. The wsj0-2mix… CONTINUE READING
    11 Citations
    Streaming Multi-speaker ASR with RNN-T
    End-to-End Training of Time Domain Audio Separation and Recognition
    • 7
    • Highly Influenced
    • PDF
    On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments
    • 4
    • PDF
    End-To-End Multi-Speaker Speech Recognition With Transformer
    • 12
    • PDF
    Improving End-to-End Single-Channel Multi-Talker Speech Recognition
    • 1
    MIMO-Speech: End-to-End Multi-Channel Multi-Speaker Speech Recognition
    • 19
    • PDF
    LibriMix: An Open-Source Dataset for Generalizable Speech Separation
    • 11
    • PDF
    SLOGD: Speaker Location Guided Deflation Approach to Speech Separation
    • S. Sivasankaran, E. Vincent, D. Fohr
    • Computer Science, Engineering
    • ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2020
    • 1
    • PDF

    References

    SHOWING 1-10 OF 19 REFERENCES
    Single-Channel Multi-Speaker Separation Using Deep Clustering
    • 247
    • Highly Influential
    • PDF
    A Purely End-to-end System for Multi-speaker Speech Recognition
    • 37
    • Highly Influential
    • PDF
    End-to-end Monaural Multi-speaker ASR System without Pretraining
    • 26
    • Highly Influential
    • PDF
    End-to-End Multi-Speaker Speech Recognition
    • 29
    • PDF
    Permutation invariant training of deep models for speaker-independent multi-talker speech separation
    • 310
    • PDF
    Recognizing Multi-talker Speech with Permutation Invariant Training
    • 40
    • PDF
    The fifth 'CHiME' Speech Separation and Recognition Challenge: Dataset, task and baselines
    • 144
    • PDF
    Deep clustering: Discriminative embeddings for segmentation and separation
    • 582
    • Highly Influential
    • PDF
    Deep attractor network for single-microphone speaker separation
    • Zhuo Chen, Yi Luo, Nima Mesgarani
    • Computer Science, Medicine
    • 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
    • 2017
    • 234
    • PDF
    Multitalker Speech Separation With Utterance-Level Permutation Invariant Training of Deep Recurrent Neural Networks
    • 264