• Corpus ID: 239016526

A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer

  title={A Variational Bayesian Approach to Learning Latent Variables for Acoustic Knowledge Transfer},
  author={Hu Hu and Sabato Marco Siniscalchi and Chao-Han Huck Yang and Chin-Hui Lee},
We propose a variational Bayesian (VB) approach to learning distributions of latent variables in deep neural network (DNN) models for cross-domain knowledge transfer, to address acoustic mismatches between training and testing conditions. Instead of carrying out point estimation in conventional maximum a posteriori estimation with a risk of having a curse of dimensionality in estimating a huge number of model parameters, we focus our attention on estimating a manageable number of latent… 

Figures and Tables from this paper


Maximum a posteriori adaptation of network parameters in deep models
This work forms maximum a posteriori (MAP) adaptation of parameters of a specially designed CD-DNN-HMM with an augmented linear hidden networks connected to the output tied states, or senones, and compares it to feature space MAP linear regression previously proposed.
L-Vector: Neural Label Embedding for Domain Adaptation
  • Zhong Meng, Hu Hu, +4 authors Chin-Hui Lee
  • Engineering, Computer Science
    ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2020
A novel neural label embedding (NLE) scheme for the domain adaptation of a deep neural network (DNN) acoustic model with unpaired data samples from source and target domains and achieves up to 14.1% relative word error rate reduction over direct re-training with one-hot labels.
Scalable Factorized Hierarchical Variational Autoencoder Training
A hierarchical sampling training algorithm to address limitations in terms of runtime, memory, and hyperparameter optimization, and a new visualization method for qualitatively evaluating the performance with respect to the interpretability and disentanglement is presented.
Bayesian Unsupervised Batch and Online Speaker Adaptation of Activation Function Parameters in Deep Models for Automatic Speech Recognition
We present a Bayesian framework to obtain maximum a posteriori (MAP) estimation of a small set of hidden activation function parameters in context-dependent-deep neural network-hidden markov model
Auto-Encoding Variational Bayes
A stochastic variational inference and learning algorithm that scales to large datasets and, under some mild differentiability conditions, even works in the intractable case is introduced.
Relational Teacher Student Learning with Neural Label Embedding for Device Adaptation in Acoustic Scene Classification
A domain adaptation framework to address the device mismatch issue in acoustic scene classification leveraging upon neural label embedding (NLE) and relational teacher student learning (RTSL) and results confirm the effectiveness of the proposed approach for mismatch situations.
Variational bayesian estimation and clustering for speech recognition
Variational Bayesian estimation and clustering for speech recognition (VBEC) enables robust speech classification, based on Bayesian predictive classification using VB posterior distributions, and totally mitigated the over-training effects with high word accuracies.
Distilling the Knowledge in a Neural Network
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
Variational Information Distillation for Knowledge Transfer
An information-theoretic framework for knowledge transfer is proposed which formulates knowledge transfer as maximizing the mutual information between the teacher and the student networks and which consistently outperforms existing methods.
Variational Information Bottleneck for Effective Low-resource Audio Classification
The VIB framework is ready-to-use and could be easily utilized with many other state-of-the-art network architectures, and significantly outperforms baseline methods in terms of classification accuracy in some low-source settings.