Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing

@inproceedings{Fu2019CyclicalAS,
  title={Cyclical Annealing Schedule: A Simple Approach to Mitigating KL Vanishing},
  author={Hao Fu and C. Li and Xiaodong Liu and Jianfeng Gao and A. Çelikyilmaz and L. Carin},
  booktitle={NAACL},
  year={2019}
}
  • Hao Fu, C. Li, +3 authors L. Carin
  • Published in NAACL 2019
  • Computer Science, Mathematics, Materials Science
  • Variational autoencoders (VAE) with an auto-regressive decoder have been applied for many natural language processing (NLP) tasks. VAE objective consists of two terms, the KL regularization term and the reconstruction term, balanced by a weighting hyper-parameter \beta. One notorious training difficulty is that the KL term tends to vanish. In this paper we study different scheduling schemes for \beta, and show that KL vanishing is caused by the lack of good latent codes in training decoder at… CONTINUE READING
    81 Citations
    Discretized Bottleneck in VAE: Posterior-Collapse-Free Sequence-to-Sequence Learning
    • Highly Influenced
    Enhancing Variational Autoencoders with Mutual Information Neural Estimation for Text Generation
    • 2
    • Highly Influenced
    • PDF
    Discrete Auto-regressive Variational Attention Models for Text Modeling
    • PDF
    Preventing Posterior Collapse with Levenshtein Variational Autoencoder
    • 1
    • Highly Influenced
    • PDF
    Discrete Variational Attention Models for Language Generation
    • Highly Influenced
    Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
    • 18
    • PDF
    Preventing Posterior Collapse in Sequence VAEs with Pooling
    • 2
    • Highly Influenced
    A Surprisingly Effective Fix for Deep Latent Variable Modeling of Text
    • 21
    • Highly Influenced
    • PDF
    A Batch Normalized Inference Network Keeps the KL Vanishing Away
    • 4
    • Highly Influenced
    • PDF
    Flexible Text Modeling with Semi-Implicit Latent Representations
    • PDF

    References

    SHOWING 1-10 OF 44 REFERENCES
    Improved Variational Autoencoders for Text Modeling using Dilated Convolutions
    • 221
    • PDF
    Z-Forcing: Training Stochastic Recurrent Networks
    • 109
    • PDF
    InfoVAE: Information Maximizing Variational Autoencoders
    • 243
    • PDF
    Learning Discourse-level Diversity for Neural Dialog Models using Conditional Variational Autoencoders
    • 377
    • Highly Influential
    • PDF
    beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
    • 1,659
    • Highly Influential
    Isolating Sources of Disentanglement in VAEs
    • 22
    • Highly Influential
    • PDF
    Isolating Sources of Disentanglement in Variational Autoencoders
    • 445
    • PDF
    Generating Sentences from a Continuous Space
    • 1,282
    • Highly Influential
    • PDF