• Publications
  • Influence
ImageNet Large Scale Visual Recognition Challenge
TLDR
The creation of this benchmark dataset and the advances in object recognition that have been possible as a result are described, and the state-of-the-art computer vision accuracy with human accuracy is compared. Expand
SSD: Single Shot MultiBox Detector
TLDR
The approach, named SSD, discretizes the output space of bounding boxes into a set of default boxes over different aspect ratios and scales per feature map location, which makes SSD easy to train and straightforward to integrate into systems that require a detection component. Expand
DSSD : Deconvolutional Single Shot Detector
TLDR
This paper combines a state-of-the-art classifier with a fast detection framework and augments SSD+Residual-101 with deconvolution layers to introduce additional large-scale context in object detection and improve accuracy, especially for small objects. Expand
Attribute and simile classifiers for face verification
TLDR
Two novel methods for face verification using binary classifiers trained to recognize the presence or absence of describable aspects of visual appearance and a new data set of real-world images of public figures acquired from the internet. Expand
Classification using intersection kernel support vector machines is efficient
TLDR
It is shown that one can build histogram intersection kernel SVMs (IKSVMs) with runtime complexity of the classifier logarithmic in the number of support vectors as opposed to linear for the standard approach. Expand
SVM-KNN: Discriminative Nearest Neighbor Classification for Visual Category Recognition
TLDR
This work considers visual category recognition in the framework of measuring similarities, or equivalently perceptual distances, to prototype examples of categories and proposes a hybrid of these two methods which deals naturally with the multiclass setting, has reasonable computational complexity both in training and at run time, and yields excellent results in practice. Expand
Modeling Context in Referring Expressions
TLDR
This work focuses on incorporating better measures of visual context into referring expression models and finds that visual comparison to other objects within an image helps improve performance significantly. Expand
Recognizing action at a distance
TLDR
A novel motion descriptor based on optical flow measurements in a spatiotemporal volume for each stabilized human figure is introduced, and an associated similarity measure to be used in a nearest-neighbor framework is introduced. Expand
ParseNet: Looking Wider to See Better
TLDR
This work presents a technique for adding global context to deep convolutional networks for semantic segmentation, and achieves state-of-the-art performance on SiftFlow and PASCAL-Context with small additional computational cost over baselines. Expand
MatchNet: Unifying feature and metric learning for patch-based matching
TLDR
A unified approach to combining feature computation and similarity networks for training a patch matching system that improves accuracy over previous state-of-the-art results on patch matching datasets, while reducing the storage requirement for descriptors is confirmed. Expand
...
1
2
3
4
5
...