Distributed Representations of Words and Phrases and their Compositionality
- Tomas Mikolov, Ilya Sutskever, Kai Chen, G. Corrado, J. Dean
- Computer ScienceNIPS
- 16 October 2013
This paper presents a simple method for finding phrases in text, and shows that learning good vector representations for millions of phrases is possible and describes a simple alternative to the hierarchical softmax called negative sampling.
Efficient Estimation of Word Representations in Vector Space
- Tomas Mikolov, Kai Chen, G. Corrado, J. Dean
- Computer ScienceInternational Conference on Learning…
- 16 January 2013
Two novel model architectures for computing continuous vector representations of words from very large data sets are proposed and it is shown that these vectors provide state-of-the-art performance on the authors' test set for measuring syntactic and semantic word similarities.
TensorFlow: A system for large-scale machine learning
- Martín Abadi, P. Barham, Xiaoqiang Zhang
- Computer ScienceUSENIX Symposium on Operating Systems Design and…
- 27 May 2016
The TensorFlow dataflow model is described and the compelling performance that Tensor Flow achieves for several real-world applications is demonstrated.
Distilling the Knowledge in a Neural Network
- Geoffrey E. Hinton, Oriol Vinyals, J. Dean
- Computer ScienceArXiv
- 9 March 2015
This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse.
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
- Martín Abadi, Ashish Agarwal, Xiaoqiang Zheng
- Computer ScienceArXiv
- 14 March 2016
The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Bigtable: A Distributed Storage System for Structured Data
- Fay W. Chang, J. Dean, R. Gruber
- Computer ScienceTOCS
- 6 November 2006
The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.
In-datacenter performance analysis of a tensor processing unit
- N. Jouppi, C. Young, D. Yoon
- Computer ScienceInternational Symposium on Computer Architecture
- 16 April 2017
This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) and compares it to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the samedatacenters.
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
- Yonghui Wu, M. Schuster, J. Dean
- Computer ScienceArXiv
- 26 September 2016
GNMT, Google's Neural Machine Translation system, is presented, which attempts to address many of the weaknesses of conventional phrase-based translation systems and provides a good balance between the flexibility of "character"-delimited models and the efficiency of "word"-delicited models.
Large Scale Distributed Deep Networks
- J. Dean, G. Corrado, A. Ng
- Computer ScienceNIPS
- 3 December 2012
This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.
DeViSE: A Deep Visual-Semantic Embedding Model
- Andrea Frome, G. Corrado, Tomas Mikolov
- Computer ScienceNIPS
- 5 December 2013
This paper presents a new deep visual-semantic embedding model trained to identify visual objects using both labeled image data as well as semantic information gleaned from unannotated text and shows that the semantic information can be exploited to make predictions about tens of thousands of image labels not observed during training.
...
...