Visual grounding of abstract and concrete words: A response to Günther et al. (2020)

@article{Shahmohammadi2022VisualGO,
  title={Visual grounding of abstract and concrete words: A response to G{\"u}nther et al. (2020)},
  author={Hassan Shahmohammadi and Maria Heitmeier and Elnaz Shafaei-Bajestan and Hendrik P. A. Lensch and Harald Baayen},
  journal={ArXiv},
  year={2022},
  volume={abs/2206.15381}
}
Current computational models capturing words’ meaning mostly rely on textual corpora. While these approaches have been successful over the last decades, their lack of grounding in the real world is still an ongoing problem. In this paper, we focus on visual grounding of word embeddings and target two important questions. First, how can language benefit from vision in the process of visual grounding? And second, is there a link between visual grounding and abstract concepts? We investigate these… 

Figures and Tables from this paper

Language with Vision: a Study on Grounded Word and Sentence Embeddings

A series of evaluations on word similarity benchmarks shows that visual grounding is beneficial not only for concrete words, but also for abstract words, as well as for contextualized embeddings trained on corpora of relatively modest size.

References

SHOWING 1-10 OF 73 REFERENCES

Language with Vision: a Study on Grounded Word and Sentence Embeddings

A series of evaluations on word similarity benchmarks shows that visual grounding is beneficial not only for concrete words, but also for abstract words, as well as for contextualized embeddings trained on corpora of relatively modest size.

Learning Zero-Shot Multifaceted Visually Grounded Word Embeddings via Multi-Task Training

This paper argues that since concrete and abstract words are processed differently in the brain, such approaches sacrifice the abstract knowledge obtained from textual statistics in the process of acquiring perceptual information, and implicitly grounding the word embeddings is needed.

Images of the unseen: extrapolating visual representations for abstract and concrete words in a data-driven computational model.

Results show that participants' judgements were in line with model predictions even for the most abstract words, suggesting that the authors can tap into their previous experience to create possible visual representation they don't have.

Incorporating Visual Semantics into Sentence Representations within a Grounded Space

A model to transfer visual information to textual representations by learning an intermediate representation space: the grounded space is proposed and it is shown that this model outperforms the previous state-of-the-art on classification and semantic relatedness tasks.

Vector-Space Models of Semantic Representation From a Cognitive Perspective: A Discussion of Common Misconceptions

This article identifies common misconceptions that arise as a result of incomplete descriptions, outdated arguments, and unclear distinctions between theory and implementation of the models of semantic representation and clarify and amend these points to provide a theoretical basis for future research and discussions on vector models of semantics representation.

Can Language Models Encode Perceptual Structure Without Grounding? A Case Study in Color

A thorough case study on color finds that warmer colors are, on average, better aligned to the perceptual color space than cooler ones, suggesting an intriguing connection to findings from recent work on efficient communication in color naming.

Imagined Visual Representations as Multimodal Embeddings

This paper presents a simple and effective method that learns a language-to-vision mapping and uses its output visual predictions to build multimodal representations, providing a cognitively plausible way of building representations, consistent with the inherently re-constructive and associative nature of human memory.

Illustrative Language Understanding: Large-Scale Visual Grounding with Image Search

Picturebook, a large-scale lookup operation to ground language via ‘snapshots’ of the authors' physical world accessed through image search, is introduced and it is shown that gate activations corresponding to Picturebook embeddings are highly correlated to human judgments of concreteness ratings.

Multimodal Distributional Semantics

This work proposes a flexible architecture to integrate text- and image-based distributional information, and shows in a set of empirical tests that the integrated model is superior to the purely text-based approach, and it provides somewhat complementary semantic information with respect to the latter.

Multimodal Word Meaning Induction From Minimal Exposure to Natural Text.

It is concluded that DSMs provide a convincing computational account of word learning even at the early stages in which a word is first encountered, and the way they build meaning representations can offer new insights into human language acquisition.
...