• Publications
  • Influence
Going deeper with convolutions
TLDR
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). Expand
  • 23,425
  • 3018
  • PDF
OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks
TLDR
We present an integrated framework for using Convolutional Networks for classification, localization and detection with a single ConvNet. Expand
  • 3,866
  • 316
  • PDF
Traffic sign recognition with multi-scale Convolutional Networks
TLDR
We apply Convolutional Networks (ConvNets) to the task of traffic sign classification as part of the GTSRB competition. Expand
  • 586
  • 59
  • PDF
Learning Convolutional Feature Hierarchies for Visual Recognition
TLDR
We propose an unsupervised method for learning multi-stage hierarchies of sparse convolutional features. Expand
  • 531
  • 40
  • PDF
Pedestrian Detection with Unsupervised Multi-stage Feature Learning
TLDR
We report state-of-the-art and competitive results on all major pedestrian datasets with a convolutional network model. Expand
  • 721
  • 38
  • PDF
Convolutional neural networks applied to house numbers digit classification
TLDR
We classify digits of real-world house numbers using convolutional neural networks and establish a new state-of-the-art of 95.10% accuracy on the SVHN dataset. Expand
  • 403
  • 30
  • PDF
Time-Contrastive Networks: Self-Supervised Learning from Video
TLDR
We propose a self-supervised approach for learning representations and robotic behaviors entirely from unlabeled videos recorded from multiple viewpoints, and study how this representation can be used in robotic imitation settings: imitating object interactions from videos of humans, and imitating human poses. Expand
  • 254
  • 14
  • PDF
Attention for Fine-Grained Categorization
TLDR
This paper presents experiments extending the work of Ba et al. (2014) on recurrent neural models for attention into less constrained visual environments, specifically fine-grained categorization on Stanford Dogs data set. Expand
  • 120
  • 10
  • PDF
Time-Contrastive Networks: Self-Supervised Learning from Multi-view Observation
TLDR
We propose a self-supervised approach for learning representations of relationships between humans and their environment, including object interactions, attributes, and body pose, entirely from unlabeled videos recorded from multiple viewpoints. Expand
  • 86
  • 10
  • PDF
Temporal Cycle-Consistency Learning
TLDR
We introduce a self-supervised representation learning method based on the task of temporal alignment between videos based on a differentiable cycle-consistency loss that can be used to find correspondences across time in multiple videos. Expand
  • 53
  • 7
  • PDF
...
1
2
3
4
5
...