• Publications
  • Influence
Training Region-Based Object Detectors with Online Hard Example Mining
TLDR
OHEM is a simple and intuitive algorithm that eliminates several heuristics and hyperparameters in common use that leads to consistent and significant boosts in detection performance on benchmarks like PASCAL VOC 2007 and 2012.
Cross-Stitch Networks for Multi-task Learning
TLDR
This paper proposes a principled approach to learn shared representations in Convolutional Networks using multitask learning using a new sharing unit: "cross-stitch" unit that combines the activations from multiple networks and can be trained end-to-end.
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
TLDR
It is found that the performance on vision tasks increases logarithmically based on volume of training data size, and it is shown that representation learning (or pre-training) still holds a lot of promise.
NEIL: Extracting Visual Knowledge from Web Data
TLDR
NEIL (Never Ending Image Learner), a computer program that runs 24 hours per day and 7 days per week to automatically extract visual knowledge from Internet data, is proposed in an attempt to develop the world's largest visual structured knowledge base with minimum human labeling effort.
A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection
TLDR
This paper proposes to learn an adversarial network that generates examples with occlusions and deformations, the goal of the adversary is to generate examples that are difficult for the object detector to classify and both the original detector and adversary are learned in a joint manner.
Tracking Emerges by Colorizing Videos
TLDR
The natural temporal coherency of color is leveraged to create a model that learns to colorize gray-scale videos by copying colors from a reference frame, which learns to track well enough to outperform the latest methods based on optical flow.
Beyond Skip Connections: Top-Down Modulation for Object Detection
TLDR
Inspired by the human visual pathway, this paper proposes top-down modulations as a way to incorporate fine details into the detection framework, and supplements the standard bottom-up, feedforward ConvNet with a top- down modulation (TDM) network, connected using lateral connections.
Enriching Visual Knowledge Bases via Object Discovery and Segmentation
TLDR
The approach combines the power of generative modeling for segmentation with the effectiveness of discriminative models for detection to learn and exploit top-down segmentation priors based on visual subcategories.
Data-driven visual similarity for cross-domain image matching
TLDR
A surprisingly simple method that estimates the relative importance of different features in a query image based on the notion of "data-driven uniqueness" is proposed, yielding a generic approach that does not depend on a particular image representation or a specific visual domain.
Detecting Human-Object Interactions via Functional Generalization
TLDR
This work presents an approach for detecting human-object interactions (HOIs) in images, based on the idea that humans interact with functionally similar objects in a similar manner, and demonstrates that using a generic object detector, the model can generalize to interactions involving previously unseen objects.
...
1
2
3
4
5
...