• Publications
  • Influence
ImageNet Large Scale Visual Recognition Challenge
TLDR
The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared. Expand
What's the Point: Semantic Segmentation with Point Supervision
TLDR
This work takes a natural step from image-level annotation towards stronger supervision: it asks annotators to point to an object if one exists, and incorporates this point supervision along with a novel objectness potential in the training loss function of a CNN model. Expand
End-to-End Learning of Action Detection from Frame Glimpses in Videos
TLDR
A fully end-to-end approach for action detection in videos that learns to directly predict the temporal bounds of actions and uses REINFORCE to learn the agent's decision policy. Expand
Every Moment Counts: Dense Detailed Labeling of Actions in Complex Videos
TLDR
A novel variant of long short-term memory deep networks is defined for modeling these temporal relations via multiple input and output connections and it is shown that this model improves action labeling accuracy and further enables deeper understanding tasks ranging from structured retrieval to action prediction. Expand
Attribute Learning in Large-Scale Datasets
TLDR
This work learns 20 visual attributes and uses them in a zero-shot transfer learning experiment as well as to make visual connections between semantically unrelated object categories. Expand
Object-Centric Spatial Pooling for Image Classification
TLDR
A framework that learns object detectors using only image-level class labels, or so-called weak labels is proposed, comparable in accuracy with state-of-the-art weakly supervised detection methods and significantly outperforms SPM-based pooling in image classification. Expand
Human Uncertainty Makes Classification More Robust
TLDR
It is shown that, while contemporary classifiers fail to exhibit human-like uncertainty on their own, explicit training on this dataset closes this gap, supports improved generalization to increasingly out-of-training-distribution test datasets, and confers robustness to adversarial attacks. Expand
Best of both worlds: Human-machine collaboration for object annotation
TLDR
This paper empirically validate the effectiveness of the human-in-the-loop labeling approach on the ILSVRC2014 object detection dataset and seamlessly integrates multiple computer vision models with multiple sources of human input in a Markov Decision Process. Expand
CornerNet-Lite: Efficient Keypoint based Object Detection
TLDR
CornerNet-Lite is a combination of two efficient variants of CornerNet: Corner net-Saccade, which uses an attention mechanism to eliminate the need for exhaustively processing all pixels of the image, and CornerNet-Squeeze, which introduces a new compact backbone architecture that addresses the two critical use cases in efficient object detection. Expand
Towards fairer datasets: filtering and balancing the distribution of the people subtree in the ImageNet hierarchy
TLDR
This paper examines ImageNet, a large-scale ontology of images that has spurred the development of many modern computer vision methods, and considers three key factors within the person subtree of ImageNet that may lead to problematic behavior in downstream computer vision technology. Expand
...
1
2
3
4
5
...