Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings
@article{Mahajan2020LatentNF, title={Latent Normalizing Flows for Many-to-Many Cross-Domain Mappings}, author={Shweta Mahajan and Iryna Gurevych and Stefan Roth}, journal={ArXiv}, year={2020}, volume={abs/2002.06661} }
Learned joint representations of images and text form the backbone of several important cross-domain tasks such as image captioning. Prior work mostly maps both domains into a common latent representation in a purely supervised fashion. This is rather restrictive, however, as the two domains follow distinct generative processes. Therefore, we propose a novel semi-supervised framework, which models shared information between domains and domain-specific information separately. The information…
Figures and Tables from this paper
25 Citations
Diverse Image Captioning with Context-Object Split Latent Spaces
- 2020
Computer Science
NeurIPS
This work introduces a novel factorization of the latent space, termed context-object split, to model diversity in contextual descriptions across images and texts within the dataset, and extends this to images with novel objects and without paired captions in the training data.
Cross-Domain Latent Modulation for Variational Transfer Learning
- 2021
Computer Science
2021 IEEE Winter Conference on Applications of Computer Vision (WACV)
A cross-domain latent modulation mechanism within a variational autoencoders (VAE) framework to enable improved transfer learning and shows competitive performance in unsupervised domain adaptation and image-to-image translation.
Variational Transfer Learning using Cross-Domain Latent Modulation
- 2022
Computer Science
ArXiv
This work proposes to introduce a novel cross-domain latent modulation mechanism to a variational autoencoder framework so as to achieve effective transfer learning and demonstrates competitive performance.
TIONS VIA INVERTIBLE GENERATIVE FLOWS
- 2021
Computer Science
This work demonstrates that with only architectural inductive biases, a generative model with a likelihood-based objective is capable of learning decoupled representations, requiring no explicit supervision.
Gradual Domain Adaptation via Normalizing Flows
- 2022
Computer Science
ArXiv
This work generates pseudo intermediate domains from normalizing flows and then uses them for gradual domain adaptation, which mitigates the above-explained problem and improves the classification performance.
Constrained Density Matching and Modeling for Cross-lingual Alignment of Contextualized Representations
- 2022
Computer Science
ArXiv
This work introduces supervised and unsupervised density-based approaches named Real-NVP and GAN-Real- NVP, driven by Normalizing Flow, to perform alignment, both dissecting the alignment of multilingual subspaces into density matching and density modeling, and complement these approaches with the validation criteria in order to guide the training process.
Learning Distinct and Representative Modes for Image Captioning
- 2022
Computer Science
NeurIPS
The innovative idea is to explore the rich modes in the training caption corpus to learn a set of mode embeddings, and further use them to control the mode of the generated captions for existing image captioning models, leading to better performance for both diversity and quality on the MSCOCO dataset.
Diverse Image Captioning with Grounded Style
- 2021
Computer Science
GCPR
The limitations of current stylized captioning datasets are analyzed and COCO attribute-based augmentations are proposed to obtain varied stylized captions from C OCO annotations to generate accurate captions with diversity in styles that are grounded in the image.
CAN KERNEL TRANSFER OPERATORS HELP FLOW
- 2020
Computer Science
This paper shows that a mapping to a RKHS which subsequently enables deploying mature ideas from the kernel methods literature for flow-based generative models, and empirically shows that this simple idea yields competitive results on popular datasets such as CelebA and promising results on a public 3D brain imaging dataset where the sample sizes are much smaller.
SceneTrilogy: On Scene Sketches and its Relationship with Text and Photo
- 2022
Computer Science
ArXiv
. We for the first time extend multi-modal scene understanding to include that of free-hand scene sketches. This uniquely results in a trilogy of scene data modalities (sketch, text, and photo),…
49 References
M3D-GAN: Multi-Modal Multi-Domain Translation with Universal Attention
- 2019
Computer Science
ArXiv
A unified model, M3D-GAN, that can translate across a wide range of modalities and domains, and introduces a universal attention module that is jointly trained with the whole network and learns to encode a large range of domain information into a highly structured latent space.
Sequential Latent Spaces for Modeling the Intention During Diverse Image Captioning
- 2019
Computer Science
2019 IEEE/CVF International Conference on Computer Vision (ICCV)
This work proposes Seq-CVAE which learns a latent space for every word which encourages this temporal latent space to capture the 'intention' about how to complete the sentence by mimicking a representation which summarizes the future.
Towards Diverse and Natural Image Descriptions via a Conditional GAN
- 2017
Computer Science
2017 IEEE International Conference on Computer Vision (ICCV)
A new framework based on Conditional Generative Adversarial Networks (CGAN) is proposed, which jointly learns a generator to produce descriptions conditioned on images and an evaluator to assess how well a description fits the visual content.
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space
- 2017
Computer Science
NIPS
Two models are proposed that explicitly structure the latent space around $K$ components corresponding to different types of image content, and combine components to create priors for images that contain multiple types of content simultaneously (e.g., several kinds of objects).
Deep Visual-Semantic Alignments for Generating Image Descriptions
- 2017
Computer Science
IEEE Transactions on Pattern Analysis and Machine Intelligence
A model that generates natural language descriptions of images and their regions based on a novel combination of Convolutional Neural Networks over image regions, bidirectional Recurrent Neural networks over sentences, and a structured objective that aligns the two modalities through a multimodal embedding is presented.
Semantics Disentangling for Text-To-Image Generation
- 2019
Computer Science
2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
A novel photo-realistic text-to-image generation model that implicitly disentangles semantics to both fulfill the high- level semantic consistency and low-level semantic diversity and a visual-semantic embedding strategy by semantic-conditioned batch normalization to find diverse low- level semantics.
Learning Two-Branch Neural Networks for Image-Text Matching Tasks
- 2019
Computer Science
IEEE Transactions on Pattern Analysis and Machine Intelligence
This paper investigates two-branch neural networks for learning the similarity between image-sentence matching and region-phrase matching, and proposes two network structures that produce different output representations.
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
- 2018
Computer Science
2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
This work proposes to incorporate generative processes into the cross-modal feature embedding, through which it is able to learn not only the global abstract features but also the local grounded features of image-text pairs.
Latent Normalizing Flows for Discrete Sequences
- 2019
Computer Science
ICML
A VAE-based generative model is proposed which jointly learns a normalizing flow-based distribution in the latent space and a stochastic mapping to an observed discrete space in this setting, finding that it is crucial for the flow- based distribution to be highly multimodal.
NICE: Non-linear Independent Components Estimation
- 2015
Computer Science, Mathematics
ICLR
We propose a deep learning framework for modeling complex high-dimensional densities called Non-linear Independent Component Estimation (NICE). It is based on the idea that a good representation is…