Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension
- Aniruddha Kembhavi, Minjoon Seo, Dustin Schwenk, Jonghyun Choi, Ali Farhadi, Hannaneh Hajishirzi
- Computer ScienceIEEE Conference on Computer Vision and Pattern…
- 21 July 2017
The task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images, is introduced and state-of-the-art methods for textual machine comprehension and visual question answering are extended to the TQA dataset.
RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
- Matt Deitke, Winson Han, Ali Farhadi
- Computer ScienceIEEE/CVF Conference on Computer Vision and…
- 14 April 2020
RoboTHOR offers a framework of simulated environments paired with physical counterparts to systematically explore and overcome the challenges of simulation-to-real transfer, and a platform where researchers across the globe can remotely test their embodied models in the physical world.
X-LXMERT: Paint, Caption and Answer Questions with Multi-Modal Transformers
- Jaemin Cho, Jiasen Lu, Dustin Schwenk, Hannaneh Hajishirzi, Aniruddha Kembhavi
- Computer ScienceEMNLP
- 23 September 2020
X-LXMERT is introduced, an extension to LXMERT with training refinements including: discretizing visual representations, using uniform masking with a large range of masking ratios and aligning the right pre-training datasets to the right objectives which enables it to paint.
Imagine This! Scripts to Compositions to Videos
- Tanmay Gupta, Dustin Schwenk, Ali Farhadi, Derek Hoiem, Aniruddha Kembhavi
- Computer ScienceECCV
- 10 April 2018
This work presents the Composition, Retrieval, and Fusion Network (CRAFT), a model capable of learning knowledge from video-caption data and applying it while generating videos from novel captions, and evaluates CRAFT on semantic fidelity to caption, composition consistency, and visual quality.
Artificial Agents Learn Flexible Visual Representations by Playing a Hiding Game
This work is the first to show that embodied adversarial reinforcement learning agents playing cache, a variant of hide-and-seek, in a high fidelity, interactive, environment, learn representations of their observations encoding information such as occlusion, object permanence, free space, and containment.
Learning Generalizable Visual Representations via Interactive Gameplay
Iconary: A Pictionary-Based Game for Testing Multimodal Communication with Drawings and Text
This work proposes models to play Iconary, a collaborative game of drawing and guessing based on Pictionary, that poses a novel challenge for the research community and proposes models that are skillful players and able to employ world knowledge in language models toplay with words unseen during training.