• Corpus ID: 46116832

Enhancing LSTM RNN-Based Speech Overlap Detection by Artificially Mixed Data

@inproceedings{Hagerer2017EnhancingLR,
  title={Enhancing LSTM RNN-Based Speech Overlap Detection by Artificially Mixed Data},
  author={Gerhard Hagerer and Vedhas Pandit and Florian Eyben and Bj{\"o}rn Schuller},
  booktitle={Semantic Audio},
  year={2017}
}
This paper presents a new method for Long Short-Term Memory Recurrent Neural Network (LSTM) based speech overlap detection. To this end, speech overlap data is created artificially by mixing large amounts of speech utterances. Our elaborate training strategies and presented network structures demonstrate performance surpassing the considered state-of-the-art overlap detectors. Thereby we target the full ternary task of non-speech, speech, and overlap detection. Furthermore, speakers’ gender is… 
Overlap-Aware Diarization: Resegmentation Using Neural End-to-End Overlapped Speech Detection
TLDR
A neural Long Short-Term Memory- based architecture for overlap detection is detail, which achieves state-of-the-art performance on the AMI, DIHARD, and ETAPE corpora and reveals promising directions for handling overlap.
CountNet: Estimating the Number of Concurrent Speakers Using Supervised Learning
TLDR
A unifying probabilistic paradigm is proposed, where deep neural network architectures are used to infer output posterior distributions, and convolutional recurrent neural networks outperform recurrent networks used in a previous study when adequate input features are used.
Gender Classification Based on the Non-Lexical Cues of Emergency Calls with Recurrent Neural Networks (RNN)
TLDR
It is concluded that new speech features could be effective in improving gender classification through a behavioral approach, notably including emergency calls.
Target-Speaker Voice Activity Detection with Improved i-Vector Estimation for Unknown Number of Speaker
Target-speaker voice activity detection (TS-VAD) has recently shown promising results for speaker diarization on highly overlapped speech. However, the original model requires a fixed (and known)
DOVER-Lap: A Method for Combining Overlap-Aware Diarization Outputs
TLDR
The method, DOVER-Lap, is inspired from the recently proposed DOVER algorithm, but is designed to handle overlapping segments in diarization outputs, and modify the pair-wise incremental label mapping strategy used in DOVER.
Spherediar: An Effective Speaker Diarization System for Meeting Data
In this paper, we present SphereDiar, a speaker diarization system composed of three novel subsystems: the Sphere-Speaker (SS) neural network, designed for speaker embedding extraction, a
Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation
TLDR
A state-of-the-art DNN audio model based on a Bi-directional Long Short-Term Memory network architecture for speaker count estimations is evaluated and results for five seconds speech segments in mixtures of up to ten speakers are shown.
Multi-Class Spectral Clustering with Overlaps for Speaker Diarization
TLDR
This paper describes a method for overlap-aware speaker diarization which performs spectral clustering of segments informed by the output of the overlap detector by transforming the discrete clustering problem into a convex optimization problem which is solved by eigen-decomposition.
Robust Laughter Detection for Wearable Wellbeing Sensing
TLDR
To build a noise-robust online-capable laughter detector for behavioural monitoring on wearables, context-sensitive Long Short-Term Memory Deep Neural Networks are incorporated and potentially improves the detection of vocal cues when the amount of training data is small and robustness and efficiency are required.
"Did you laugh enough today?" - Deep Neural Networks for Mobile and Wearable Laughter Trackers
TLDR
A mobile and wearable devices app that recognises laughter from speech in real-time based on a deep neural network architecture, which runs smoothly and robustly, even natively on a smartwatch.
...
1
2
...

References

SHOWING 1-10 OF 28 REFERENCES
Detecting overlapping speech with long short-term memory recurrent neural networks
TLDR
This work proposes a novel overlap detection system using Long Short-Term Memory (LSTM) recurrent neural networks, used to generate framewise overlap predictions which are applied for overlap detection.
Convolutive Non-Negative Sparse Coding and New Features for Speech Overlap Handling in Speaker Diarization
TLDR
The combination of features derived through convolutive nonnegative sparse coding and new energy, spectral and voicingrelated features within a conventional HMM system is reported, showing significant reductions in missed speech and speaker error.
Using linguistic information to detect overlapping speech
TLDR
This paper considers the problem of detecting segments of overlapping speech within meeting recordings using an HMM-based framework, and the use of linguistic information, where spoken content is used to improve overlap detection.
The Detection of Overlapping Speech with Prosodic Features for Speaker Diarization
TLDR
It is shown that the addition of prosodic features decreased overlap detection error and was used in speaker diarization to recover missed speech by assigning multiple speaker labels and to increase the purity of speaker clusters.
Speech overlap detection in a two-pass speaker diarization system
TLDR
This paper presents the two-pass speaker diarization system that was developed for the NIST RT09s evaluation, and a model for speech overlap detection is generated automatically.
Speech overlap detection and attribution using convolutive non-negative sparse coding
TLDR
Experimental results on NIST RT data show that the CNSC approach gives comparable results to a state-of-the-art hidden Markov model based overlap detector and in a practical diarization system, CNSC based speaker attribution is shown to reduce the speaker error by over 40% relative in overlapping segments.
Improved overlap speech diarization of meeting recordings using long-term conversational features
  • S. Yella, H. Bourlard
  • Computer Science
    2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
TLDR
A method to improve the short-term spectral feature based overlap detector by incorporating information from long-term conversational features in the form of speaker change statistics at segment level from the output of a diarization system is proposed.
Annotating and categorizing competition in overlap speech
TLDR
This paper proposes and evaluates an annotation scheme for these two overlap categories in the context of spontaneous and in-vivo human conversations, and analyzes the distinctive predictive characteristics of a very large set of high-dimensional acoustic feature.
Overlapped speech detection for improved speaker diarization in multiparty meetings
TLDR
This work presents the initial work toward developing an overlap detection system for improved meeting diarization, and investigates various features, with a focus on high-precision performance for use in the detector, and examines performance results on a subset of the AMI Meeting Corpus.
Speech recognition robust against speech overlapping in monaural recordings of telephone conversations
TLDR
This paper uses a combination of garbage modeling and noise-robust monaural acoustic modeling to tackle the problem of automatic speech recognition at overlapping segments where the voices of the multiple speakers overlap.
...
1
2
3
...