• Publications
  • Influence
Bidirectional Attention Flow for Machine Comprehension
TLDR
The BIDAF network is introduced, a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization. Expand
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
TLDR
GVQA explicitly disentangles the recognition of visual concepts present in the image from the identification of plausible answer space for a given question, enabling the model to more robustly generalize across different distributions of answers. Expand
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition
TLDR
This work presents a Bayesian approach which applies spatial and functional constraints on each of the perceptual elements for coherent semantic interpretation and demonstrates the use of such constraints in recognition of actions from static images without using any motion information. Expand
Human detection using partial least squares analysis
TLDR
This paper describes a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set, and is shown to outperform state-of-the-art techniques on three varied datasets. Expand
A Diagram is Worth a Dozen Images
TLDR
An LSTM-based method for syntactic parsing of diagrams and a DPG-based attention model for diagram question answering are devised and a new dataset of diagrams with exhaustive annotations of constituents and relationships is compiled. Expand
What’s Hidden in a Randomly Weighted Neural Network?
TLDR
It is empirically show that as randomly weighted neural networks with fixed weights grow wider and deeper, an ``untrained subnetwork" approaches a network with learned weights in accuracy. Expand
Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension
TLDR
The task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images, is introduced and state-of-the-art methods for textual machine comprehension and visual question answering are extended to the TQA dataset. Expand
IQA: Visual Question Answering in Interactive Environments
TLDR
The Hierarchical Interactive Memory Network (HIMN), consisting of a factorized set of controllers, allowing the system to operate at multiple levels of temporal abstraction, is proposed, and outperforms popular single controller based methods on IQUAD V1. Expand
C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset
TLDR
This paper proposes a new setting for Visual Question Answering where the test question-answer pairs are compositionally novel compared to training question- answer pairs, and presents a new compositional split of the VQA v1.0 dataset, which it is called Compositional VZA (C-VQA). Expand
RoboTHOR: An Open Simulation-to-Real Embodied AI Platform
TLDR
RoboTHOR offers a framework of simulated environments paired with physical counterparts to systematically explore and overcome the challenges of simulation-to-real transfer, and a platform where researchers across the globe can remotely test their embodied models in the physical world. Expand
...
1
2
3
4
5
...