A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery

  title={A Temporal Extension of Latent Dirichlet Allocation for Unsupervised Acoustic Unit Discovery},
  author={Werner van der Merwe and Herman Kamper and Johan A. du Preez},
Latent Dirichlet allocation (LDA) is widely used for unsupervised topic modelling on sets of documents. No temporal information is used in the model. However, there is often a relation-ship between the corresponding topics of consecutive tokens. In this paper, we present an extension to LDA that uses a Markov chain to model temporal information. We use this new model for acoustic unit discovery from speech. As input tokens, the model takes a discretised encoding of speech from a vector… 

Figures and Tables from this paper



A Nonparametric Bayesian Approach to Acoustic Model Discovery

An unsupervised model is presented that simultaneously segments the speech, discovers a proper set of sub-word units and learns a Hidden Markov Model for each induced acoustic unit and outperforms a language-mismatched acoustic model.

Bayesian Subspace Hidden Markov Model for Acoustic Unit Discovery

Results show that this approach significantly outperforms previous HMM based acoustic units discovery systems and compares favorably with the Variational Auto Encoder-HMM.

Bayesian phonotactic Language Model for Acoustic Unit Discovery

This work proposes to improve the non-parametric Bayesian phone-loop model by incorporating a Hierarchical Pitman-Yor based bigram Language Model on top of the units' transitions, which shows an absolute improvement of 1–2% on the Normalized Mutual Information (NMI) metric.

Vector-quantized neural networks for acoustic unit discovery in the ZeroSpeech 2020 challenge

Two neural models are proposed to tackle the challenge of discrete representations of speech that separate phonetic content from speaker-specific details, using vector quantization to map continuous features to a finite set of codes.

Unsupervised acoustic unit discovery for speech synthesis using discrete latent-variable neural networks

It is shown that decoupled speaker conditioning intrinsically improves discrete acoustic representations, yielding competitive synthesis quality compared to the challenge baseline.

Learning Supervised Feature Transformations on Zero Resources for Improved Acoustic Unit Discovery

This work utilizes feature transformations that are common in supervised learning without having prior supervision to improve Dirichlet process Gaussian mixture model (DPGMM) based acoustic unit discovery and introduces a method for combining posteriorgram outputs of multiple clusterings to improve sound class discriminability.

Feature optimized DPGMM clustering for unsupervised subword modeling: A contribution to zerospeech 2017

This paper describes the unsupervised subword modeling pipeline for the zero resource speech challenge (ZeroSpeech) 2017, which takes raw audio recordings as input and applies inferred subword models to previously unseen data from a new set of speakers.

Latent Dirichlet Allocation

Unsupervised Neural-Based Graph Clustering for Variable-Length Speech Representation Discovery of Zero-Resource Languages

This work proposes an approach based on graph neural networks (GNNs), which learn discrete encoding by contrastive predictive coding (CPC), and exploits the predetermined finite set of embeddings used by VQNNs to encode input data to obtain coarsened speech representation.

Towards unsupervised phone and word segmentation using self-supervised vector-quantized neural networks

This work constrain pretrained self-supervised vector-quantized (VQ) neural networks so that blocks of contiguous feature vectors are assigned to the same code, thereby giving a variable-rate segmentation of the speech into discrete units.