• Publications
  • Influence
Distributed Representations of Words and Phrases and their Compositionality
TLDR
We present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible. Expand
  • 19,609
  • 2930
  • PDF
Efficient Estimation of Word Representations in Vector Space
TLDR
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. Expand
  • 15,675
  • 2655
  • PDF
TensorFlow: A system for large-scale machine learning
TLDR
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Expand
  • 8,078
  • 974
  • PDF
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
TLDR
This paper describes the TensorFlow interface for expressing machine learning algorithms, and an implementation of that interface that we have built at Google. Expand
  • 7,741
  • 886
  • PDF
Distilling the Knowledge in a Neural Network
TLDR
A very simple way to improve the performance of almost any machine learning algorithm is to train many different models on the same data and then to average their predictions. Expand
  • 4,563
  • 682
  • PDF
Bigtable: A Distributed Storage System for Structured Data
TLDR
Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Expand
  • 4,727
  • 372
  • PDF
In-datacenter performance analysis of a tensor processing unit
TLDR
This paper evaluates a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN). Expand
  • 1,835
  • 265
  • PDF
Large Scale Distributed Deep Networks
TLDR
We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas. Expand
  • 2,434
  • 259
  • PDF
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
TLDR
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Expand
  • 3,039
  • 244
  • PDF
DeViSE: A Deep Visual-Semantic Embedding Model
TLDR
We present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. Expand
  • 1,462
  • 181
  • PDF