Corpus ID: 67856470

Latent Translation: Crossing Modalities by Bridging Generative Models

@article{Tian2019LatentTC,
  title={Latent Translation: Crossing Modalities by Bridging Generative Models},
  author={Yingtao Tian and Jesse Engel},
  journal={ArXiv},
  year={2019},
  volume={abs/1902.08261}
}
End-to-end optimization has achieved state-of-the-art performance on many specific problems, but there is no straight-forward way to combine pretrained models for new problems. Here, we explore improving modularity by learning a post-hoc interface between two existing models to solve a new task. Specifically, we take inspiration from neural machine translation, and cast the challenging problem of cross-modal domain transfer as unsupervised translation between the latent spaces of pretrained… Expand
Variational Mixture-of-Experts Autoencoders for Multi-Modal Deep Generative Models
TLDR
This work proposes a mixture-of-experts multi-modal variational autoencoder (MMVAE) for learning of generative models on different sets of modalities, including a challenging image language dataset, and demonstrates its ability to satisfy all four criteria, both qualitatively and quantitatively. Expand
Multimodal Variational Autoencoders for Semi-Supervised Learning: In Defense of Product-of-Experts
TLDR
A novel product-of-experts (PoE) based variational autoencoder that has these desired properties is proposed and an empirical evaluation shows that the PoE based models can outperform the contrasted models. Expand
Multimodal Few-Shot Learning with Frozen Language Models
TLDR
The resulting system is a multimodal few-shot learner, with the surprising ability to learn a variety of new tasks when conditioned on examples, represented as a sequence of multiple interleaved image and text embeddings. Expand
Self-supervised Disentanglement of Modality-Specific and Shared Factors Improves Multimodal Generative Models
TLDR
A new multimodal generative model is introduced that integrates both modality-specific and shared factors and aggregates shared information across any subset of modalities efficiently and learns to disentangle these in a purely self-supervised manner. Expand
Face-to-Music Translation Using a Distance-Preserving Generative Adversarial Network with an Auxiliary Discriminator
TLDR
This work discovers that the distance preservation constraint in the generative adversarial model leads to reduced diversity in the translated audio samples, and proposes the use of an auxiliary discriminator to enhance the diversity of the translations while using thedistance preservation constraint. Expand
Optimal Unsupervised Domain Translation
TLDR
A novel framework to efficiently compute optimal mappings in a dynamical setting that generalizes previous methods and enables a more explicit control over the computed optimal mapping and provides smooth interpolations between the two domains. Expand
Multimodal Generative Learning Utilizing Jensen-Shannon-Divergence
TLDR
A novel, efficient objective function that utilizes the Jensen-Shannon divergence for multiple distributions and simultaneously approximates the unimodal and joint multimodal posteriors directly via a dynamic prior is proposed. Expand
AudioViewer: Learning to Visualize Sound (DRAFT: December 24, 2020)
Sensory substitution can help persons with perceptual deficits. In this work, we attempt to visualize audio with video. Our long-term goal is to create sound perception for hearing impaired people,Expand
A VAE Conversion Method for Heterogeneous Data Inputs to Create Uniform Outputs for Diagnosis
TLDR
The results show that the two-stage architecture can effectively perform anomaly detection and help to deal with heterogeneous data and model variability in multi-sensor systems. Expand
Deep learning and the Global Workspace Theory
TLDR
This work proposes a roadmap based on unsupervised neural translation between multiple latent spaces (neural networks trained for distinct tasks, on distinct sensory inputs and/or modalities) to create a unique, amodal Global Latent Workspace (GLW). Expand
...
1
2
...

References

SHOWING 1-10 OF 42 REFERENCES
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
TLDR
A new language representation model, BERT, designed to pre-train deep bidirectional representations from unlabeled text by jointly conditioning on both left and right context in all layers, which can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of tasks. Expand
beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework
Learning an interpretable factorised representation of the independent data generative factors of the world without supervision is an important precursor for the development of artificialExpand
Towards Diverse and Natural Image Descriptions via a Conditional GAN
TLDR
A new framework based on Conditional Generative Adversarial Networks (CGAN) is proposed, which jointly learns a generator to produce descriptions conditioned on images and an evaluator to assess how well a description fits the visual content. Expand
A Hierarchical Latent Vector Model for Learning Long-Term Structure in Music
TLDR
This work proposes the use of a hierarchical decoder, which first outputsembeddings for subsequences of the input and then uses these embeddings to generate each subsequence independently, thereby avoiding the "posterior collapse" problem, which remains an issue for recurrent VAEs. Expand
Twin-GAN - Unpaired Cross-Domain Image Translation with Weight-Sharing GANs
  • J. Li
  • Computer Science, Mathematics
  • ArXiv
  • 2018
TLDR
A framework for translating unlabeled images from one domain into analog images in another domain is presented, and it is shown that it is capable of learning semantic mappings for face images with no supervised one-to-one image mapping. Expand
Latent Constraints: Learning to Generate Conditionally from Unconditional Generative Models
TLDR
This paper develops a method to condition generation without retraining the model, combining attribute constraints with a universal "realism" constraint, which enforces similarity to the data distribution, and generates realistic conditional images from an unconditional variational autoencoder. Expand
Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks
TLDR
Novel Stacked Cycle-Consistent Adversarial Networks (SCANs) are proposed by decomposing a single translation into multi-stage transformations, which not only boost the image translation quality but also enable higher resolution image-to-image translation in a coarse- to-fine fashion. Expand
Unsupervised Image-to-Image Translation Networks
TLDR
This work makes a shared-latent space assumption and proposes an unsupervised image-to-image translation framework based on Coupled GANs that achieves state-of-the-art performance on benchmark datasets. Expand
Large Scale GAN Training for High Fidelity Natural Image Synthesis
TLDR
It is found that applying orthogonal regularization to the generator renders it amenable to a simple "truncation trick," allowing fine control over the trade-off between sample fidelity and variety by reducing the variance of the Generator's input. Expand
Unsupervised Neural Machine Translation
TLDR
This work proposes a novel method to train an NMT system in a completely unsupervised manner, relying on nothing but monolingual corpora, and consists of a slightly modified attentional encoder-decoder model that can be trained on monolingUAL corpora alone using a combination of denoising and backtranslation. Expand
...
1
2
3
4
5
...