Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models

@article{Mehrabi2022IsTE,
  title={Is the Elephant Flying? Resolving Ambiguities in Text-to-Image Generative Models},
  author={Ninareh Mehrabi and Palash Goyal and Apurv Verma and J. Dhamala and Varun Kumar and Q. P. Hu and Kai-Wei Chang and Richard S. Zemel and A. G. Galstyan and Rahul Gupta},
  journal={ArXiv},
  year={2022},
  volume={abs/2211.12503}
}
Natural language often contains ambiguities that can lead to misinterpretation and miscom-munication. While humans can handle ambiguities effectively by asking clarifying questions and/or relying on contextual cues and common-sense knowledge, resolving ambiguities can be notoriously hard for machines. In this work, we study ambiguities that arise in text-to-image generative models. We curate a benchmark dataset covering different types of ambiguities that occur in these systems. 1 We then… 

References

SHOWING 1-10 OF 23 REFERENCES

VISA: An Ambiguous Subtitles Dataset for Visual Scene-aware Machine Translation

VISA is introduced, a new dataset that consists of 40k Japanese-English parallel sentence pairs and corresponding video clips that have multiple possible translations with different meanings and is divided into Polysemy and Omission according to the cause of ambiguity.

Word Sense Disambiguation: Towards Interactive Context Exploitation from Both Word and Sense Perspectives

This work converts the nearly isolated decisions into interrelated ones by exposing senses in context when learning sense embeddings in a similarity-based Sense Aware Context Exploitation (SACE) architecture and enhances the context embedding learning with selected sentences from the same document, rather than utilizing only the sentence where each ambiguous word appears.

Do You See What I Mean? Visual Resolution of Linguistic Ambiguities

This work introduces a new multimodal corpus containing ambiguous sentences, representing a wide range of syntactic, semantic and discourse ambiguities, coupled with videos that visualize the different interpretations for each sentence.

Database Search Results Disambiguation for Task-Oriented Dialog Systems

Training on augmented dialog data improves the model’s ability to deal with ambiguous scenarios, without sacrificing performance on unmodified turns, and helps the model to improve performance on DSR-disambiguation even in the absence of in-domain data, suggesting that it can be learned as a universal dialog skill.

Calibrate Before Use: Improving Few-Shot Performance of Language Models

This work first estimates the model's bias towards each answer by asking for its prediction when given the training prompt and a content-free test input such as "N/A", and then fits calibration parameters that cause the prediction for this input to be uniform across answers.

Jam or Cream First? Modeling Ambiguity in Neural Machine Translation with SCONES

The softmax layer in neural machine translation is replaced with a multi-label classification layer that can model ambiguity more effectively, and the loss function Single-label Contrastive Objective for Non-Exclusive Sequences (SCONES), which yields consistent BLEU score gains across six translation directions.

AmbigQA: Answering Ambiguous Open-domain Questions

This paper introduces AmbigQA, a new open-domain question answering task which involves predicting a set of question-answer pairs, where every plausible answer is paired with a disambiguated rewrite of the original question.

It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners

This work shows that performance similar to GPT-3 can be obtained with language models that are much “greener” in that their parameter count is several orders of magnitude smaller, and identifies key factors required for successful natural language understanding with small language models.

Language Models are Unsupervised Multitask Learners

It is demonstrated that language models begin to learn these tasks without any explicit supervision when trained on a new dataset of millions of webpages called WebText, suggesting a promising path towards building language processing systems which learn to perform tasks from their naturally occurring demonstrations.

Scaling Autoregressive Models for Content-Rich Text-to-Image Generation

The Pathways Autoregressive Text-to-Image (Parti) model is presented, which generates high-fidelity photorealistic images and supports content-rich synthesis involving complex compositions and world knowledge and explores and highlights limitations of the models.