The Third DIHARD Diarization Challenge
@inproceedings{Ryant2020TheTD, title={The Third DIHARD Diarization Challenge}, author={Neville Ryant and Prachi Singh and Venkat Krishnamohan and Rajat Varma and Kenneth Ward Church and Christopher Cieri and Jun Du and Sriram Ganapathy and Mark Y. Liberman}, booktitle={Interspeech}, year={2020} }
This paper introduces the third DIHARD challenge, the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variation in recording equipment, noise conditions, and conversational domain. Speaker diarization is evaluated under two segmentation conditions (diarization from a reference speech segmentation vs. diarization from scratch) and 11 diverse domains. The domains span a range of recording conditions and interaction types, including…
61 Citations
USTC-NELSLIP System Description for DIHARD-III Challenge
- Computer ScienceArXiv
- 2021
The innovation of the system lies in the combination of various front-end techniques to solve the diarization problem, including speech separation and target-speaker based voice activity detection (TS-VAD), combined with iterative data purification.
Domain-Dependent Speaker Diarization for the Third DIHARD Challenge
- Computer ScienceArXiv
- 2021
This report presents the system developed by the ABSP Lab-oratory team for the third DIHARD speech diarization challenge, and reveals that i-vector based method achieves considerably better performance than x- vector based approach in the thirdDIHARD challenge dataset.
ABSP System for The Third DIHARD Challenge
- Computer ScienceArXiv
- 2021
The primary contribution is to develop acoustic domain identification (ADI) system for speaker diarization, which investigates speaker embeddings based ADI sys- tem and applies a domain-dependent threshold for agglomerative hierarchical clustering.
Adapting Speaker Embeddings for Speaker Diarisation
- Computer ScienceInterspeech
- 2021
Three techniques that can be used to better adapt the speaker embeddings for diarisation: dimensionality reduction, attention-based embedding aggregation, and non-speech clustering are proposed.
The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap
- Computer ScienceArXiv
- 2021
This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results of the five…
Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization
- Computer ScienceIEEE/ACM Transactions on Audio, Speech, and Language Processing
- 2021
The SSC algorithm improves significantly over the baseline system (relative improvements of 13% and 59% on CALLHOME and AMI datasets respectively in terms of diarization error rate (DER)).
Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network
- Computer ScienceInterspeech
- 2021
The proposed overlapped speech detection model establishes a state-of-the-art performance with a precision of 0.6648 and a recall of0.3222 on the DIHARD II evaluation set, showing a 20% increase in recall along with higher precision.
High-resolution embedding extractor for speaker diarisation
- Computer ScienceArXiv
- 2022
This study proposes a novel embedding extractor architecture, referred to as a high-resolution embedding Extractor (HEE), which extracts multiple high- resolution embeddings from each speech segment, and proposes an artificially generating mixture data training framework to train the proposed HEE.
Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction
- Computer ScienceArXiv
- 2022
Experimental results show that larger speaker capacity and higher output resolution can signif-icantly reduce the diarization error rate (DER), which achieves the new state-of-the-art performance of 4.55% on the VoxConverse test set and 10.77% on Track 1 of the DIHARD-III evaluation set under the widely-used evaluation metrics.
Online Speaker Diarization with Graph-based Label Generation
- Computer ScienceOdyssey
- 2022
An online speaker diarization system that can handle long-time audio with low latency, and the framework combining the chkpt-AHC method and the label matching algorithm works well in the online set-ting.
References
SHOWING 1-10 OF 46 REFERENCES
The Second DIHARD Diarization Challenge: Dataset, task, and baselines
- Computer ScienceINTERSPEECH
- 2019
This paper introduces the second DIHARD challenge, the second in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variation in recording…
Third DIHARD Challenge Evaluation Plan
- Computer ScienceArXiv
- 2020
The third DIHARD challenge, the third in a series of speaker diarization challenges intended to improve the robustness of diarized systems to variation in recording equipment, noise conditions, and conversational domain, is introduced.
BUT System for DIHARD Speech Diarization Challenge 2018
- Computer ScienceINTERSPEECH
- 2018
This paper presents the approach developed by the BUT team for the DIHARD speech diarization challenge, which is based on the Bayesian Hidden Markov Model with eigenvoice priors system, and presents results obtained on the evaluation set.
LEAP Diarization System for the Second DIHARD Challenge
- Computer ScienceINTERSPEECH
- 2019
A modified VB-HMM model with posterior scaling which provides significant improvements in the final diarization error rate (DER) and an analysis performed using the proposed posterior scaling method shows that scaling results in improved discrimination among the HMM states in the VB -HMM.
But System for the Second Dihard Speech Diarization Challenge
- Computer ScienceICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2020
This paper describes the winning systems developed by the BUT team for the four tracks of the Second DIHARD Speech Diarization Challenge and provides a comparison of the improvement given by each step and shares the implementation of the core of the system.
Speaker diarization with plda i-vector scoring and unsupervised calibration
- Computer Science2014 IEEE Spoken Language Technology Workshop (SLT)
- 2014
A system that incorporates probabilistic linear discriminant analysis (PLDA) for i-vector scoring and uses unsupervised calibration of the PLDA scores to determine the clustering stopping criterion is proposed, and it is shown that PLDA scoring outperforms the same system with cosine scoring, and that overlapping segments reduce diarization error rate (DER) as well.
Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge
- Computer ScienceINTERSPEECH
- 2018
Several key aspects of currently state-of-the-art diarization methods are explored, such as training data se-lection, signal bandwidth for feature extraction, representations of speech segments (i-vector versus x-vector), and domain-adaptive processing.
Improved overlap speech diarization of meeting recordings using long-term conversational features
- Physics2013 IEEE International Conference on Acoustics, Speech and Signal Processing
- 2013
A method to improve the short-term spectral feature based overlap detector by incorporating information from long-term conversational features in the form of speaker change statistics at segment level from the output of a diarization system is proposed.
Fully Supervised Speaker Diarization
- Computer ScienceICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
- 2019
A fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN), given extracted speaker-discriminative embeddings, which decodes in an online fashion while most state-of-the-art systems rely on offline clustering.
CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings
- Physics6th International Workshop on Speech Processing in Everyday Environments (CHiME 2020)
- 2020
Of note, Track 2 is the first challenge activity in the community to tackle an unsegmented multispeaker speech recognition scenario with a complete set of reproducible open source baselines providing speech enhancement, speaker diarization, and speech recognition modules.