• Publications
  • Influence
Going deeper with convolutions
TLDR
We propose a deep convolutional neural network architecture codenamed Inception that achieves the new state of the art for classification and detection in the ImageNet Large-Scale Visual Recognition Challenge 2014 (ILSVRC14). Expand
  • 20,909
  • 2727
  • PDF
Rethinking the Inception Architecture for Computer Vision
TLDR
Convolutional networks are at the core of most state of-the-art computer vision solutions for a wide variety of tasks. Expand
  • 9,250
  • 1437
  • PDF
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
TLDR
This paper describes the TensorFlow interface for expressing machine learning algorithms, and an implementation of that interface that we have built at Google. Expand
  • 7,778
  • 889
  • PDF
Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning
TLDR
We study the combination of the two most recent ideas: Residual connections introduced by He et al. in [5] and the latest revised version of the Inception architecture. Expand
  • 5,027
  • 658
  • PDF
Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups
TLDR
DNNs for acoustic modeling in speech recognition can outperform Gaussian mixture models on speech recognition benchmarks. Expand
  • 6,138
  • 234
  • PDF
Deep Neural Networks for Acoustic Modeling in Speech Recognition
TLDR
Deep neural networks with many hidden layers, that are trained using new methods have been shown to outperform Gaussian mixture models on a variety of speech rec ognition benchmarks, sometimes by a large margin. Expand
  • 2,003
  • 142
  • PDF
QT-Opt: Scalable Deep Reinforcement Learning for Vision-Based Robotic Manipulation
TLDR
We introduce QT-Opt, a scalable self-supervised vision-based reinforcement learning framework that can leverage over 580k real-world grasp attempts to train a deep neural network Q-function with over 1.2M parameters to perform closed-loop, real world grasping that generalizes to 96% grasp success on unseen objects. Expand
  • 336
  • 28
  • PDF
Improving the speed of neural networks on CPUs
TLDR
This paper shows that simple techniques can dramatically enhance the performance of neural network-based systems. Expand
  • 569
  • 26
  • PDF
YouTube-BoundingBoxes: A Large High-Precision Human-Annotated Data Set for Object Detection in Video
TLDR
We introduce a new large-scale data set of video URLs with densely-sampled object bounding box annotations called YouTube-BoundingBoxes (YT-BB). Expand
  • 186
  • 22
  • PDF
On rectified linear units for speech processing
TLDR
We show that we can improve generalization and make training of deep networks faster and simpler by substituting the logistic units with rectified linear units. Expand
  • 420
  • 17
  • PDF