• Publications
  • Influence
Non-local Neural Networks
TLDR
This paper presents non-local operations as a generic family of building blocks for capturing long-range dependencies in computer vision and improves object detection/segmentation and pose estimation on the COCO suite of tasks. Expand
The Visual Object Tracking VOT2016 Challenge Results
TLDR
The Visual Object Tracking challenge VOT2016 goes beyond its predecessors by introducing a new semi-automatic ground truth bounding box annotation methodology and extending the evaluation system with the no-reset experiment. Expand
Ensemble of exemplar-SVMs for object detection and beyond
This paper proposes a conceptually simple but surprisingly powerful method which combines the effectiveness of a discriminative object detector with the explicit correspondence offered by aExpand
Training Region-Based Object Detectors with Online Hard Example Mining
TLDR
OHEM is a simple and intuitive algorithm that eliminates several heuristics and hyperparameters in common use that leads to consistent and significant boosts in detection performance on benchmarks like PASCAL VOC 2007 and 2012. Expand
Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding
TLDR
This work proposes a novel Hollywood in Homes approach to collect data, collecting a new dataset, Charades, with hundreds of people recording videos in their own homes, acting out casual everyday activities, and evaluates and provides baseline results for several tasks including action recognition and automatic description generation. Expand
Unsupervised Visual Representation Learning by Context Prediction
TLDR
It is demonstrated that the feature representation learned using this within-image context indeed captures visual similarity across images and allows us to perform unsupervised visual discovery of objects like cats, people, and even birds from the Pascal VOC 2011 detection dataset. Expand
Cross-Stitch Networks for Multi-task Learning
TLDR
This paper proposes a principled approach to learn shared representations in Convolutional Networks using multitask learning using a new sharing unit: "cross-stitch" unit that combines the activations from multiple networks and can be trained end-to-end. Expand
Target-driven visual navigation in indoor scenes using deep reinforcement learning
TLDR
This paper proposes an actor-critic model whose policy is a function of the goal as well as the current state, which allows better generalization and proposes the AI2-THOR framework, which provides an environment with high-quality 3D scenes and a physics engine. Expand
Supersizing self-supervision: Learning to grasp from 50K tries and 700 robot hours
TLDR
This paper takes the leap of increasing the available training data to 40 times more than prior work, leading to a dataset size of 50K data points collected over 700 hours of robot grasping attempts, which allows us to train a Convolutional Neural Network for the task of predicting grasp locations without severe overfitting. Expand
Unsupervised Discovery of Mid-Level Discriminative Patches
TLDR
The paper experimentally demonstrates the effectiveness of discriminative patches as an unsupervised mid-level visual representation, suggesting that it could be used in place of visual words for many tasks. Expand
...
1
2
3
4
5
...