• Publications
  • Influence
Bidirectional Attention Flow for Machine Comprehension
TLDR
We introduce the Bi-Directional Attention Flow (BIDAF) network, a multi-stage hierarchical process that represents the context at different levels of granularity and uses bi-directional attention flow mechanism to obtain a query-aware context representation without early summarization. Expand
Don't Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
TLDR
A number of studies have found that today's Visual Question Answering (VQA) models are heavily driven by superficial correlations in the training data and lack sufficient image grounding. Expand
Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition
TLDR
We present a Bayesian approach for the interpretation of human-object interactions, that integrates information from perceptual tasks such as scene analysis, human motion/pose estimation,1 manipulable object detection, and object reaction determination. Expand
Human detection using partial least squares analysis
TLDR
We propose a human detection method that augments widely used edge-based features with texture and color information, providing us with a much richer descriptor set. Expand
Are You Smarter Than a Sixth Grader? Textbook Question Answering for Multimodal Machine Comprehension
TLDR
We introduce the task of Multi-Modal Machine Comprehension (M3C), which aims at answering multimodal questions given a context of text, diagrams and images. Expand
A Diagram is Worth a Dozen Images
TLDR
We study the problem of diagram interpretation, the challenging task of identifying the structure of a diagram and the semantics of its constituents and their relationships. Expand
IQA: Visual Question Answering in Interactive Environments
TLDR
We propose the Hierarchical Interactive Memory Network (HIMN), consisting of a factorized set of controllers, allowing the system to operate at multiple levels of temporal abstraction. Expand
What’s Hidden in a Randomly Weighted Neural Network?
TLDR
We empirically show that randomly weighted neural networks contain subnetworks which achieve impressive performance without ever training the weight values. Expand
Vehicle Detection Using Partial Least Squares
TLDR
We describe a vehicle detector that improves upon previous approaches by incorporating a very large and rich set of image descriptors that captures the structural characteristics of objects. Expand
C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset
TLDR
In this paper, we propose a new setting for Visual Question Answering where the test question-answer pairs are compositionally novel compared to those in C-VQA train split. Expand
...
1
2
3
4
5
...