Corpus ID: 225103287

A Visuospatial Dataset for Naturalistic Verb Learning

  title={A Visuospatial Dataset for Naturalistic Verb Learning},
  author={Dylan Ebert and Ellie Pavlick},
We introduce a new dataset for training and evaluating grounded language models. Our data is collected within a virtual reality environment and is designed to emulate the quality of language data to which a pre-verbal child is likely to have access: That is, naturalistic, spontaneous speech paired with richly grounded visuospatial context. We use the collected data to compare several distributional semantics models for verb learning. We evaluate neural models based on 2D (pixel) features as… Expand

Figures and Tables from this paper

Cross-situational word learning with multimodal neural networks
In order to learn the mappings from words to referents, children must integrate cooccurrence information across individually ambiguous pairs of scenes and utterances, a challenge known asExpand


Understanding Grounded Language Learning Agents
This work proposes a novel way to visualise and analyse semantic representation in grounded language learning agents that yields a plausible computational account of the observed effects and applies experimental paradigms from developmental psychology to this agent. Expand
Understanding Early Word Learning in Situated Artificial Agents
This paper focuses on a simple neural network-based language learning agent, trained via policy-gradient methods, which can interpret single-word instructions in a simulated 3D world and proposes a novel method for visualising semantic representations in the agent. Expand
A multimodal corpus for the evaluation of computational models for (grounded) language acquisition
A German multimodal corpus designed to support the development and evaluation of models learning rather complex grounded linguistic structures, e.g. syntactic patterns, from sub-symbolic input is described. Expand
Grounded Models of Semantic Representation
Experimental results show that a closer correspondence to human data can be obtained by uncovering latent information shared among the textual and perceptual modalities rather than arriving at semantic knowledge by concatenating the two. Expand
Human simulations of vocabulary learning
The work reported here experimentally investigates the hypothesis that vocabulary acquisition takes place via an incremental constraint-satisfaction procedure that bootstraps itself into successively more sophisticated linguistic representations which, in turn, enable new kinds of vocabulary learning. Expand
Neural Naturalist: Generating Fine-Grained Image Comparisons
A new model is proposed called Neural Naturalist that uses a joint image encoding and comparative module to generate comparative language, and the results indicate promising potential for neural models to explain differences in visual embedding space using natural language. Expand
Wordbank: an open repository for developmental vocabulary data*
Abstract The MacArthur-Bates Communicative Development Inventories (CDIs) are a widely used family of parent-report instruments for easy and inexpensive data-gathering about early languageExpand
Combining Language and Vision with a Multimodal Skip-gram Model
Since they propagate visual information to all words, the MMSKIP-GRAM models discover intriguing visual properties of abstract words, paving the way to realistic implementations of embodied theories of meaning. Expand
Why Nouns Trump Verbs in Word Learning: New Evidence from Children and Adults in the Human Simulation Paradigm
The HSP task is modified to accommodate children and represents the first empirical demonstration that young children's noun advantage may be attributable, at least in part, to the distinct linguistic requirements underlying the acquisition of nouns and verbs. Expand
Reading visually embodied meaning from the brain: Visually grounded computational models decode visual-object mental imagery induced by written text
By capturing latent visual-semantic structure their models provide a route into analyzing neural representations derived from past perceptual experience rather than stimulus-driven brain activity, and verify the benefit of combining multimodal data to model human-like semantic representations. Expand