Multimodal Distributional Semantics

@article{Bruni2014MultimodalDS,
  title={Multimodal Distributional Semantics},
  author={Elia Bruni and Nam Khanh Tran and Marco Baroni},
  journal={J. Artif. Intell. Res.},
  year={2014},
  volume={49},
  pages={1-47}
}
Distributional semantic models derive computational representations of word meaning from the patterns of co-occurrence of words in text. Such models have been a success story of computational linguistics, being able to provide reliable estimates of semantic relatedness for the many semantic tasks requiring them. However, distributional models extract meaning information exclusively from text, which is an extremely impoverished basis compared to the rich perceptual sources that ground human… 

Figures and Tables from this paper

Learning visually grounded meaning representations
TLDR
An approach is presented which draws inspiration from the successful application of attribute classifiers in image classification, and represents images and the concepts depicted by them by automatically predicted visual attributes, and can act as a substitute for human-produced attributes without any critical information.
From distributional semantics to feature norms: grounding semantic models in human perceptual data
TLDR
This work presents an automatic method for predicting feature norms for new concepts by learning a mapping from a text-based distributional semantic space to a space built using feature norms, which is able to generalise feature-based concept representations.
Seeing the advantage: visually grounding word embeddings to better capture human semantic knowledge
TLDR
This paper creates visually grounded word embeddings by combining English text and images and compares them to popular text-based methods, to see if visual information allows the model to better capture cognitive aspects of word meaning.
Modeling the Structure and Dynamics of Semantic Processing
TLDR
The results indicate that bringing together distributional semantic networks and spreading of activation provides a good fit to both automatic lexical processing as well as more deliberate processing, above and beyond what has been reported for previous models that take into account only similarity resulting from network structure.
Exploitation of Co-reference in Distributional Semantics
TLDR
Two basically different kinds of information contributed by co-reference with respect to the distribution of words will be identified and its general potential to improve distributional semantic models as well as certain more specific hypotheses are examined.
Grounding Distributional Semantics in the Visual World
TLDR
This article reviews how methods from computer vision are exploited to tackle the fundamental problem of grounding distributional semantic models, bringing them closer to providing a full-fledged computational account of meaning.
Crossmodal Network-Based Distributional Semantic Models
TLDR
This work proposes the crossmodal extension of a two-tier text-based model, where semantic representations are encoded in the first layer, while the second layer is used for computing similarity between words.
Constructing Semantic Models From Words, Images, and Emojis
TLDR
This work improves on visual and affective representations, derived from state-of-the-art existing models, by choosing models that best fit available human semantic data and extending the number of concepts they cover, and finds that, given specific weights assigned to the models, adding both visual and Affective representations improves performance.
Visually Grounded Meaning Representations
TLDR
A new model which uses stacked autoencoders to learn higher-level representations from textual and visual input is introduced which yields a better fit to behavioral data compared to baselines and related models which either rely on a single modality or do not make use of attribute-based input.
...
...

References

SHOWING 1-10 OF 189 REFERENCES
Distributional semantics with eyes: using image analysis to improve computational representations of word meaning
TLDR
This work claims that image analysis techniques can "return the favor" to the text processing community and be successfully used for a general-purpose representation of word meaning and shows how distinguishing between a concept and its context in images can improve the quality of the word meaning representations extracted from images.
Redundancy in Perceptual and Linguistic Experience: Comparing Feature-Based and Distributional Models of Semantic Representation
TLDR
It is argued that the amount of perceptual and other semantic information that can be learned from purely distributional statistics has been underappreciated and that future focus should be on understanding the cognitive mechanisms humans use to integrate the two sources.
Grounded Models of Semantic Representation
TLDR
Experimental results show that a closer correspondence to human data can be obtained by uncovering latent information shared among the textual and perceptual modalities rather than arriving at semantic knowledge by concatenating the two.
Distributional Semantics in Technicolor
TLDR
While visual models with state-of-the-art computer vision techniques perform worse than textual models in general tasks, they are as good or better models of the meaning of words with visual correlates such as color terms, even in a nontrivial task that involves nonliteral uses of such words.
Distributional Learning of Appearance
TLDR
The viability of distributional learning of appearance learning is assessed by looking at the performance of a computer system that interpolates, on the basis of Distributional and appearance similarity, from words that it has been explicitly taught the appearance of, in order to identify and name objects that it had not been taught about.
Topics in semantic representation.
TLDR
This article analyzes the abstract computational problem underlying the extraction and use of gist, formulating this problem as a rational statistical inference that leads to a novel approach to semantic representation in which word meanings are represented in terms of a set of probabilistic topics.
The Distributional Hypothesis
TLDR
There is a correlation between distributional similarity and meaning similarity, which allows us to utilize the former in order to estimate the latter, and one can pose two very basic questions concerning the distributional hypothesis: what kind of distributional properties the authors should look for, and what — if any — the differences are between different kinds of Distributional properties.
Extracting semantic representations from word co-occurrence statistics: stop-lists, stemming, and SVD
TLDR
This article investigates the use of three further factors—namely, the application of stop-lists, word stemming, and dimensionality reduction using singular value decomposition (SVD)—that have been used to provide improved performance elsewhere and introduces an additional semantic task and explores the advantages of using a much larger corpus.
A word at a time: computing word relatedness using temporal semantic analysis
TLDR
This paper proposes a new semantic relatedness model, Temporal Semantic Analysis (TSA), which captures this temporal information in word semantics as a vector of concepts over a corpus of temporally-ordered documents.
Integrating experiential and distributional data to learn semantic representations.
TLDR
Using a Bayesian probabilistic model, the authors demonstrate how word meanings can be learned by treating experiential and distributional data as a single joint distribution and learning the statistical structure that underlies it.
...
...