Pragmatic Factors in Image Description: The Case of Negations

  title={Pragmatic Factors in Image Description: The Case of Negations},
  author={Emiel van Miltenburg and Roser Morante and Desmond Elliott},
We provide a qualitative analysis of the descriptions containing negations (no, not, n't, nobody, etc) in the Flickr30K corpus, and a categorization of negation uses. Based on this analysis, we provide a set of requirements that an image description system should have in order to generate negation sentences. As a pilot experiment, we used our categorization to manually annotate sentences containing negations in the Flickr30K corpus, with an agreement score of K=0.67. With this paper, we hope to… 

Tables from this paper

Cross-linguistic differences and similarities in image descriptions

A cross-linguistic comparison of Dutch, English, and German image descriptions finds that these descriptions are similar in many respects, but the familiarity of crowd workers with the subjects of the images has a noticeable influence on the specificity of the descriptions.

Vision and Language Learning: From Image Captioning and Visual Question Answering towards Embodied Agents

A new automatic image caption evaluation metric that measures the quality of generated captions by analysing their semantic content and is dubbed SPICE, which shows high correlation with human judgements.

Pragmatic descriptions of perceptual stimuli

A general model of the human image description process is presented, and a road map for future research in automatic image description, and the automatic description of perceptual stimuli in general is proposed.

Linguistic issues behind visual question answering

This paper extracts from pioneering computational linguistic work a list of desiderata that are used to review current computational achievements and claims that further research is needed to get to a unified approach which jointly encompasses all the underlying linguistic problems.

DIDEC: The Dutch Image Description and Eye-tracking Corpus

An initial analysis of self-corrections in image descriptions is provided, and a corpus of spoken Dutch image descriptions, paired with two sets of eye-tracking data is presented, to gain a deeper understanding of the image description task.

Measuring the Diversity of Automatic Image Descriptions

This paper considers the production of generic descriptions as a lack of diversity in the output, which is quantified using established metrics and two new metrics that frame image description as a word recall task, to evaluate system performance on the head of the vocabulary, as well as on the long tail, where system performance degrades.

Talking about other people: an endless range of possibilities

A taxonomy of different ways to talk about other people is presented, which serves as a reference point to think about how other people should be described, and can be used to classify and compute statistics about labels applied to people.

Varying image description tasks: spoken versus written descriptions

This paper investigates whether there are differences between written and spoken image descriptions, even if they are elicited through similar tasks, and compares descriptions produced in two languages, English and Dutch.

On the use of human reference data for evaluating automatic image descriptions

It is argued that there is a need for more detailed guidelines that take into account the needs of visually impaired users, but also the feasibility of generating suitable descriptions, as the quality of current image description datasets is insufficient.

On task effects in NLG corpus elicitation: a replication study using mixed effects modeling

A controlled replication of the study by Van Miltenburg et al. (2018b) contrasting spoken with written descriptions is presented, showing that the effects of modality largely disappear in a controlled setting.



From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions

This work proposes to use the visual denotations of linguistic expressions to define novel denotational similarity metrics, which are shown to be at least as beneficial as distributional similarities for two tasks that require semantic inference.

Unsupervised Learning of Narrative Schemas and their Participants

An unsupervised system for learning narrative schemas, coherent sequences or sets of events whose arguments are filled with participant semantic roles defined over words to improve on previous results in narrative/frame learning and induce rich frame-specific semantic roles.

A Natural History of Negation

This book offers a unique synthesis of past and current work on the structure, meaning, and use of negation and negative expressions, a topic that has engaged thinkers from Aristotle and the Buddha

The Berkeley FrameNet Project

This report will present the project's goals and workflow, and information about the computational tools that have been adapted or created in-house for this work.

Multi30K: Multilingual English-German Image Descriptions

This dataset extends the Flickr30K dataset with i) German translations created by professional translators over a subset of the English descriptions, and ii) descriptions crowdsourced independently of the original English descriptions.

The negation bias: when negations signal stereotypic expectancies.

Findings indicate that by using negations people implicitly communicate stereotypic expectancies and that negations play a subtle but powerful role in stereotype maintenance.

Automatic induction of FrameNet lexical units

This paper investigates the applicability of distributional and WordNet-based models on the task of lexical unit induction, i.e. the expansion of FrameNet with new lexical units, and shows good level of accuracy and coverage, especially when combined.

Scripts, plans, goals and understanding: an inquiry into human knowledge structures

For both people and machines, each in their own way, there is a serious problem in common of making sense out of what they hear, see, or are told about the world. The conceptual apparatus necessary

Stereotyping and Bias in the Flickr30K Dataset

Evidence against the assumption that crowdsourced descriptions of the images in the Flickr30K dataset focus only on the information that can be obtained from the image alone is presented, and a list of biases and unwarranted inferences are provided.


The frames notion is defended, to give a number of examples, mostly from English, of different kinds of frame structures, to suggest informally and intuitively how the frame concept can figure in the explanation of the communication and comprehension processes, and in the end to offer some hedged speculations on how the study of frames might appear in research on evolution toward language and on the evolution of language.