• Publications
  • Influence
Show and tell: A neural image caption generator
tl;dr
We present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. Expand
  • 3,334
  • 463
DeepPose: Human Pose Estimation via Deep Neural Networks
tl;dr
We propose a method for human pose estimation based on Deep Neural Networks (DNNs). Expand
  • 1,391
  • 102
Generation and Comprehension of Unambiguous Object Descriptions
tl;dr
We propose a method that can generate an unambiguous description (known as a referring expression) of a specific object or region in an image, and which can also comprehend or interpret such an expression to infer which object is being described. Expand
  • 337
  • 79
Show and Tell: Lessons Learned from the 2015 MSCOCO Image Captioning Challenge
tl;dr
We present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. Expand
  • 433
  • 67
Scalable Object Detection Using Deep Neural Networks
tl;dr
We propose a saliency-inspired neural network model for detection, which predicts a set of class-agnostic bounding boxes along with a single score for each box, corresponding to its likelihood of containing any object of interest. Expand
  • 754
  • 53
Deep Convolutional Ranking for Multilabel Image Annotation
tl;dr
In this work, we proposed to use ranking to train deep convolutional neural networks for multilabel image annotation problems. Expand
  • 300
  • 49
No Fuss Distance Metric Learning Using Proxies
tl;dr
We address the problem of distance metric learning (DML), defined as learning a distance consistent with a notion of semantic similarity. Expand
  • 183
  • 36
Deep Neural Networks for Object Detection
tl;dr
We present a simple and yet powerful formulation of object detection as a regression problem to object bounding box masks, that is not only classifying but also precisely localizing objects of various classes. Expand
  • 880
  • 31
Towards Accurate Multi-person Pose Estimation in the Wild
tl;dr
We propose a method for multi-person detection and 2-D pose estimation that achieves state-of-art results on the challenging COCO keypoints task. Expand
  • 312
  • 26
Cascaded Models for Articulated Pose Estimation
tl;dr
We address the problem of articulated human pose estimation by learning a coarse-to-fine cascade of pictorial structure models, where coarse models filter the pose space for the next level via their max-marginals. Expand
  • 213
  • 19