How2: A Large-scale Dataset for Multimodal Language Understanding

  title={How2: A Large-scale Dataset for Multimodal Language Understanding},
  author={Ramon Sanabria and Ozan Caglayan and Shruti Palaskar and Desmond Elliott and Lo{\"i}c Barrault and Lucia Specia and Florian Metze},
Human information processing is inherently multimodal, and language is best understood in a situated context. In order to achieve human-like language processing capabilities, machines should be able to jointly process multimodal data, and not just text, images, or speech in isolation. Nevertheless, there are very few multimodal datasets to support such research, resulting in limited interaction among different research communities. In this paper, we introduce How2, a large-scale dataset of… CONTINUE READING
This paper has been referenced on Twitter 47 times. VIEW TWEETS

From This Paper

Figures, tables, and topics from this paper.


Publications referenced by this paper.
Showing 1-10 of 52 references