• Publications
  • Influence
Distributed Representations of Words and Phrases and their Compositionality
tl;dr
We present a simple method for finding phrases in text, and show that learning good vector representations for millions of phrases is possible. Expand
  • 18,433
  • 2803
Efficient Estimation of Word Representations in Vector Space
tl;dr
We propose two novel model architectures for computing continuous vector representations of words from very large data sets. Expand
  • 14,780
  • 2564
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
tl;dr
This paper describes the TensorFlow interface for expressing machine learning algorithms, and an implementation of that interface that we have built at Google. Expand
  • 7,410
  • 862
Large Scale Distributed Deep Networks
tl;dr
We have developed a software framework called DistBelief that can utilize computing clusters with thousands of machines to train large models. Within this framework, we have developed two algorithms for large-scale distributed training: (i) Downpour SGD, an asynchronous stochastic gradient descent procedure supporting a large number of model replicas. Expand
  • 2,338
  • 256
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
tl;dr
Neural Machine Translation (NMT) is an end-to-end learning approach for automated translation, with the potential to overcome many of the weaknesses of conventional phrase-based translation systems. Expand
  • 2,746
  • 224
DeViSE: A Deep Visual-Semantic Embedding Model
tl;dr
We present a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text. Expand
  • 1,388
  • 180
Wide & Deep Learning for Recommender Systems
tl;dr
In this paper, we present Wide & Deep learning---jointly trained wide linear models and deep neural networks---to combine the benefits of memorization and generalization for recommender systems. Expand
  • 930
  • 136
Zero-Shot Learning by Convex Combination of Semantic Embeddings
tl;dr
We propose a simple method for constructing an image embedding system from any existing \nway{} image classifier and a semantic word embedding model, which contains the $\n$ class labels in its vocabulary. Expand
  • 558
  • 110
Google’s Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation
tl;dr
We propose a simple solution to use a single Neural Machine Translation (NMT) model to translate between multiple languages using a single model, taking advantage of multilingual data to improve NMT. Expand
  • 808
  • 98
Building high-level features using large scale unsupervised learning
tl;dr
We consider the problem of building high-level, class-specific feature detectors from only unlabeled data. Expand
  • 1,860
  • 90