The Third DIHARD Diarization Challenge

  title={The Third DIHARD Diarization Challenge},
  author={Neville Ryant and Prachi Singh and Venkat Krishnamohan and Rajat Varma and Kenneth Ward Church and Christopher Cieri and Jun Du and Sriram Ganapathy and Mark Y. Liberman},
This paper introduces the third DIHARD challenge, the third in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variation in recording equipment, noise conditions, and conversational domain. Speaker diarization is evaluated under two segmentation conditions (diarization from a reference speech segmentation vs. diarization from scratch) and 11 diverse domains. The domains span a range of recording conditions and interaction types, including… 

Tables from this paper

USTC-NELSLIP System Description for DIHARD-III Challenge

The innovation of the system lies in the combination of various front-end techniques to solve the diarization problem, including speech separation and target-speaker based voice activity detection (TS-VAD), combined with iterative data purification.

Domain-Dependent Speaker Diarization for the Third DIHARD Challenge

This report presents the system developed by the ABSP Lab-oratory team for the third DIHARD speech diarization challenge, and reveals that i-vector based method achieves considerably better performance than x- vector based approach in the thirdDIHARD challenge dataset.

ABSP System for The Third DIHARD Challenge

The primary contribution is to develop acoustic domain identification (ADI) system for speaker diarization, which investigates speaker embeddings based ADI sys- tem and applies a domain-dependent threshold for agglomerative hierarchical clustering.

Adapting Speaker Embeddings for Speaker Diarisation

Three techniques that can be used to better adapt the speaker embeddings for diarisation: dimensionality reduction, attention-based embedding aggregation, and non-speech clustering are proposed.

The Hitachi-JHU DIHARD III System: Competitive End-to-End Neural Diarization and X-Vector Clustering Systems Combined by DOVER-Lap

This paper provides a detailed description of the Hitachi-JHU system that was submitted to the Third DIHARD Speech Diarization Challenge. The system outputs the ensemble results of the five

Self-Supervised Representation Learning With Path Integral Clustering for Speaker Diarization

The SSC algorithm improves significantly over the baseline system (relative improvements of 13% and 59% on CALLHOME and AMI datasets respectively in terms of diarization error rate (DER)).

Three-class Overlapped Speech Detection using a Convolutional Recurrent Neural Network

The proposed overlapped speech detection model establishes a state-of-the-art performance with a precision of 0.6648 and a recall of0.3222 on the DIHARD II evaluation set, showing a 20% increase in recall along with higher precision.

High-resolution embedding extractor for speaker diarisation

This study proposes a novel embedding extractor architecture, referred to as a high-resolution embedding Extractor (HEE), which extracts multiple high- resolution embeddings from each speech segment, and proposes an artificially generating mixture data training framework to train the proposed HEE.

Target-Speaker Voice Activity Detection via Sequence-to-Sequence Prediction

Experimental results show that larger speaker capacity and higher output resolution can signif-icantly reduce the diarization error rate (DER), which achieves the new state-of-the-art performance of 4.55% on the VoxConverse test set and 10.77% on Track 1 of the DIHARD-III evaluation set under the widely-used evaluation metrics.

Online Speaker Diarization with Graph-based Label Generation

An online speaker diarization system that can handle long-time audio with low latency, and the framework combining the chkpt-AHC method and the label matching algorithm works well in the online set-ting.



The Second DIHARD Diarization Challenge: Dataset, task, and baselines

This paper introduces the second DIHARD challenge, the second in a series of speaker diarization challenges intended to improve the robustness of diarization systems to variation in recording

Third DIHARD Challenge Evaluation Plan

The third DIHARD challenge, the third in a series of speaker diarization challenges intended to improve the robustness of diarized systems to variation in recording equipment, noise conditions, and conversational domain, is introduced.

BUT System for DIHARD Speech Diarization Challenge 2018

This paper presents the approach developed by the BUT team for the DIHARD speech diarization challenge, which is based on the Bayesian Hidden Markov Model with eigenvoice priors system, and presents results obtained on the evaluation set.

LEAP Diarization System for the Second DIHARD Challenge

A modified VB-HMM model with posterior scaling which provides significant improvements in the final diarization error rate (DER) and an analysis performed using the proposed posterior scaling method shows that scaling results in improved discrimination among the HMM states in the VB -HMM.

But System for the Second Dihard Speech Diarization Challenge

This paper describes the winning systems developed by the BUT team for the four tracks of the Second DIHARD Speech Diarization Challenge and provides a comparison of the improvement given by each step and shares the implementation of the core of the system.

Speaker diarization with plda i-vector scoring and unsupervised calibration

A system that incorporates probabilistic linear discriminant analysis (PLDA) for i-vector scoring and uses unsupervised calibration of the PLDA scores to determine the clustering stopping criterion is proposed, and it is shown that PLDA scoring outperforms the same system with cosine scoring, and that overlapping segments reduce diarization error rate (DER) as well.

Diarization is Hard: Some Experiences and Lessons Learned for the JHU Team in the Inaugural DIHARD Challenge

Several key aspects of currently state-of-the-art diarization methods are explored, such as training data se-lection, signal bandwidth for feature extraction, representations of speech segments (i-vector versus x-vector), and domain-adaptive processing.

Improved overlap speech diarization of meeting recordings using long-term conversational features

  • S. YellaH. Bourlard
  • Physics
    2013 IEEE International Conference on Acoustics, Speech and Signal Processing
  • 2013
A method to improve the short-term spectral feature based overlap detector by incorporating information from long-term conversational features in the form of speaker change statistics at segment level from the output of a diarization system is proposed.

Fully Supervised Speaker Diarization

A fully supervised speaker diarization approach, named unbounded interleaved-state recurrent neural networks (UIS-RNN), given extracted speaker-discriminative embeddings, which decodes in an online fashion while most state-of-the-art systems rely on offline clustering.

CHiME-6 Challenge: Tackling Multispeaker Speech Recognition for Unsegmented Recordings

Of note, Track 2 is the first challenge activity in the community to tackle an unsegmented multispeaker speech recognition scenario with a complete set of reproducible open source baselines providing speech enhancement, speaker diarization, and speech recognition modules.