Imagination-Augmented Natural Language Understanding

  title={Imagination-Augmented Natural Language Understanding},
  author={Yujie Lu and Wanrong Zhu and Xin Wang and Miguel P. Eckstein and William Yang Wang},
Human brains integrate linguistic and perceptual information simultaneously to understand natural language, and hold the critical ability to render imaginations. Such abilities enable us to construct new abstract concepts or concrete objects, and are essential in involving practical knowledge to solve problems in low-resource scenarios. However, most existing methods for Natural Language Understanding (NLU) are mainly focused on textual signals. They do not simulate human visual imagination… 

Figures and Tables from this paper


Learning Transferable Visual Models From Natural Language Supervision
We use the 12 datasets from the well-studied evaluation suite introduced by (Kornblith et al., 2019) and add 15 additional datasets in order to assess the performance of models on a wider variety of
Vokenization: Improving Language Understanding via Contextualized, Visually-Grounded Supervision
A technique named "vokenization" is developed that extrapolates multimodal alignments to language-only data by contextually mapping language tokens to their related images (which the authors call "vokens").
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference
This paper introduces the task of grounded commonsense inference, unifying natural language inference and commonsense reasoning, and proposes Adversarial Filtering (AF), a novel procedure that constructs a de-biased dataset by iteratively training an ensemble of stylistic classifiers, and using them to filter the data.
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
A benchmark of nine diverse NLU tasks, an auxiliary dataset for probing models for understanding of specific linguistic phenomena, and an online platform for evaluating and comparing models, which favors models that can represent linguistic knowledge in a way that facilitates sample-efficient learning and effective knowledge-transfer across tasks.
Generative Imagination Elevates Machine Translation
ImagiT first learns to generate visual representation from the source sentence, and then utilizes both source sentence and the “imagined representation” to produce a target translation, which significantly outperforms the text-only neural machine translation baselines.
Imagined Visual Representations as Multimodal Embeddings
This paper presents a simple and effective method that learns a language-to-vision mapping and uses its output visual predictions to build multimodal representations, providing a cognitively plausible way of building representations, consistent with the inherently re-constructive and associative nature of human memory.
Look, Imagine and Match: Improving Textual-Visual Cross-Modal Retrieval with Generative Models
This work proposes to incorporate generative processes into the cross-modal feature embedding, through which it is able to learn not only the global abstract features but also the local grounded features of image-text pairs.
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank
A Sentiment Treebank that includes fine grained sentiment labels for 215,154 phrases in the parse trees of 11,855 sentences and presents new challenges for sentiment compositionality, and introduces the Recursive Neural Tensor Network.
Semantic textual similarity benchmark
  • In Proceedings of the Fourth International Workshop on Semantic Evaluations (SemEval-2007),
  • 2007
GLUE : A multitask benchmark and analysis platform for natural language understanding
  • 2018