Learning language through pictures

@inproceedings{Chrupala2015LearningLT,
  title={Learning language through pictures},
  author={Grzegorz Chrupala and {\'A}kos K{\'a}d{\'a}r and Afra Alishahi},
  booktitle={ACL},
  year={2015}
}
We propose Imaginet, a model of learning visually grounded representations of language from coupled textual and visual input. The model consists of two Gated Recurrent Unit networks with shared word embeddings, and uses a multi-task objective by receiving a textual description of a scene and trying to concurrently predict its visual representation and the next word in the sentence. Mimicking an important aspect of human language learning, it acquires meaning representations for individual words… CONTINUE READING

Similar Papers

Figures, Tables, and Topics from this paper.

Citations

Publications citing this paper.
SHOWING 1-10 OF 23 CITATIONS

Deep Learning Under Privileged Information Using Heteroscedastic Dropout

  • 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
  • 2018
VIEW 4 EXCERPTS
CITES BACKGROUND
HIGHLY INFLUENCED

References

Publications referenced by this paper.
SHOWING 1-10 OF 36 REFERENCES

ImageNet Large Scale Visual Recognition Challenge

  • International Journal of Computer Vision
  • 2014
VIEW 5 EXCERPTS
HIGHLY INFLUENTIAL

Grounded Compositional Semantics for Finding and Describing Images with Sentences

  • Transactions of the Association for Computational Linguistics
  • 2014
VIEW 1 EXCERPT
HIGHLY INFLUENTIAL

Deep Visual-Semantic Alignments for Generating Image Descriptions

  • IEEE Transactions on Pattern Analysis and Machine Intelligence
  • 2014
VIEW 1 EXCERPT