A Sticky HDP-HMM With Application to Speaker Diarization

@article{Fox2011ASH,
  title={A Sticky HDP-HMM With Application to Speaker Diarization},
  author={Emily B. Fox and Erik B. Sudderth and Michael I. Jordan and Alan S. Willsky},
  journal={The Annals of Applied Statistics},
  year={2011},
  volume={5},
  pages={1020-1056}
}
We consider the problem of speaker diarization, the problem of segmenting an audio recording of a meeting into temporal segments corresponding to individual speakers. The problem is rendered particularly difficult by the fact that we are not allowed to assume knowledge of the number of people participating in the meeting. To address this problem, we take a Bayesian nonparametric approach to speaker diarization that builds on the hierarchical Dirichlet process hidden Markov model (HDP-HMM) of… Expand
A left-to-right HDP-HMM with HDPM emissions
TLDR
This paper introduces three enhancements to HDP-HMM: a left-to-right structure: needed for sequential decoding of speech, non-emitting initial and final states: required for modeling finite length sequences, and HDP mixture emissions: allows sharing of data across states. Expand
Speaker Diarization: An Emerging Research
TLDR
This chapter presents the fundamentals of speaker diarization and the most significant works over the recent years on this topic. Expand
An Adaptive Method for Cross-Recording Speaker Diarization
TLDR
This paper proposes a scalable unsupervised adaptation framework for two types of variability compensation, and investigates how unlabeled speakers can help improve between-recording variability estimation, to overcome the mismatch issue. Expand
Efficient speaker diarization and low-latency speaker spotting
TLDR
The new task, coined low latency speaker spotting (LLSS), involves the rapid detection of known speakers within multi-speaker audio streams and involves the re-thinking of online diarization and the manner by which diarizing and detection sub-systems should best be combined. Expand
Exploring methods of improving speaker accuracy for speaker diarization
TLDR
The focus of this work is to improve the speaker diarization error rate, and more specifically the speaker error rate by modifying the minimum duration constraint and incorporating novel purification techniques. Expand
Speaker diarization and tracking in multiple-sensor environments
This thesis verses about the research conducted in the topic of speaker recognition in real conditions like as meeting rooms, telephone quality speech and radio and TV broadcast news. The mainExpand
Using deep neural networks for speaker diarisation
TLDR
A method involving a pretrained Speaker Separation Deep Neural Network (ssDNN) is investigated which performs speaker clustering and speaker segmentation using DNNs successfully for meeting data and with mixed results for broadcast media. Expand
A Nonparametric Bayesian Approach to Acoustic Model Discovery
TLDR
An unsupervised model is presented that simultaneously segments the speech, discovers a proper set of sub-word units and learns a Hidden Markov Model for each induced acoustic unit and outperforms a language-mismatched acoustic model. Expand
Speaker diarization in meetings domain
The purpose of this study is to develop robust techniques for speaker segmentation and clustering with focus on meetings domain. The techniques examined can however be applied to any other domains.Expand
Speaker Diarization: Current Limitations and New Directions
Author(s): Knox, Mary Tai | Advisor(s): Morgan, Nelson | Abstract: Speaker diarization is the problem of determining "who spoke when" in an audio recording when the number and identities of theExpand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 164 REFERENCES
The ICSI RT07s Speaker Diarization System
TLDR
This paper used the most recent available version of the beam-forming toolkit, implemented a new speech/non-speech detector that does not require models trained on meeting data and performed the development on a much larger set of recordings. Expand
Data-Driven Recomposition using the Hierarchical Dirichlet Process Hidden Markov Model
TLDR
This paper shows how HMMs can be used to synthesize new audio clips of unlimited length inspired by the temporal structure and perceptual content of a training recording or set of such recordings. Expand
Improving Speaker Diarization
This paper describes the LIMSI speaker diarization system used in the RT-04F evaluation. The RT-04F system builds upon the LIMSI baseline data partitioner, which is used in the broadcast newsExpand
An HDP-HMM for systems with state persistence
TLDR
A sampling algorithm is developed that employs a truncated approximation of the DP to jointly resample the full state sequence, greatly improving mixing rates and demonstrating the advantages of the sticky extension, and the utility of the HDP-HMM in real-world applications. Expand
The Application of Hidden Markov Models in Speech Recognition
TLDR
The aim of this review is first to present the core architecture of a HMM-based LVCSR system and then to describe the various refinements which are needed to achieve state-of-the-art performance. Expand
An overview of automatic speaker diarization systems
TLDR
An overview of the approaches currently used in a key area of audio diarization, namely speaker diarizations, are provided and their relative merits and limitations are discussed. Expand
The MIT Lincoln Laboratory RT-04F Diarization Systems: Applications to Broadcast Audio and Telephone Conversations
TLDR
This paper describes the systems developed by MITLL and used in DARPA EARS Rich Transcription Fall 2004 (RT-04F) speaker diarization evaluation and presents experiments analyzing performance of the systems and a cross-cluster recombination approach that significantly improves performance. Expand
E-HMM approach for learning and adapting sound models for speaker indexing
TLDR
This paper presents an iterative process for blind speaker indexing based on a HMM that reduces the miss detection of short utterances by exploiting all the information (detected speakers) as soon as it is available. Expand
Partitioning and transcription of broadcast news data
TLDR
This paper reports on the recent work in transcribing broadcast news data, including the problem of partitioning the data into homogeneous segments prior to word recognition, using a continuous mixture density, tied-state cross-word context-dependent HMM system with a 65k trigram language model. Expand
Evolutive HMM for multi-speaker tracking system
TLDR
The proposed speaker tracking system is defined in the case where all speaker identities are known beforehand, and is modeled as an evolutive HMM-like model, in which speaker models computed are added one by one. Expand
...
1
2
3
4
5
...