• Publications
  • Influence
The Role of Context for Object Detection and Semantic Segmentation in the Wild
TLDR
A novel deformable part-based model is proposed, which exploits both local context around each candidate detection as well as global context at the level of the scene, which significantly helps in detecting objects at all scales. Expand
Beyond PASCAL: A benchmark for 3D object detection in the wild
TLDR
PASCAL3D+ dataset is contributed, which is a novel and challenging dataset for 3D object detection and pose estimation, and on average there are more than 3,000 object instances per category. Expand
Detect What You Can: Detecting and Representing Objects Using Holistic Models and Body Parts
TLDR
This work proposes a novel approach to handle large deformations and partial occlusions in animals in terms of body parts, and applies it to the six animal categories in the PASCAL VOC dataset and shows that it significantly improves state-of-the-art (by 4.1% AP) and provides a richer representation for objects. Expand
AI2-THOR: An Interactive 3D Environment for Visual AI
TLDR
AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks and facilitate building visually intelligent models. Expand
Target-driven visual navigation in indoor scenes using deep reinforcement learning
TLDR
This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine. Expand
On Evaluation of Embodied Navigation Agents
TLDR
The present document summarizes the consensus recommendations of a working group to study empirical methodology in navigation research and discusses different problem statements and the role of generalization, present evaluation measures, and provides standard scenarios that can be used for benchmarking. Expand
ALFRED: A Benchmark for Interpreting Grounded Instructions for Everyday Tasks
TLDR
It is shown that a baseline model based on recent embodied vision-and-language tasks performs poorly on ALFRED, suggesting that there is significant room for developing innovative grounded visual language understanding models with this benchmark. Expand
ObjectNet3D: A Large Scale Database for 3D Object Recognition
TLDR
A large scale database for 3D object recognition that consists of 100 categories, 90,127 images, 201,888 objects in these images and 44,147 3D shapes, which is useful for recognizing the 3D pose and 3D shape of objects from 2D images is contributed. Expand
Visual Semantic Navigation using Scene Priors
TLDR
This work proposes to use Graph Convolutional Networks for incorporating the prior knowledge into a deep reinforcement learning framework and shows how semantic knowledge improves performance significantly and improves in generalization to unseen scenes and/or objects. Expand
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
TLDR
This paper addresses the task of knowledge-based visual question answering and provides a benchmark, called OK-VQA, where the image content is not sufficient to answer the questions, encouraging methods that rely on external knowledge resources. Expand
...
1
2
3
4
5
...