• Corpus ID: 6287870

# TensorFlow: A system for large-scale machine learning

@article{Abadi2016TensorFlowAS,
title={TensorFlow: A system for large-scale machine learning},
author={Mart{\'i}n Abadi and Paul Barham and Jianmin Chen and Z. Chen and Andy Davis and Jeffrey Dean and Matthieu Devin and Sanjay Ghemawat and Geoffrey Irving and Michael Isard and Manjunath Kudlur and Josh Levenberg and Rajat Monga and Sherry Moore and Derek Gordon Murray and Benoit Steiner and Paul A. Tucker and Vijay Vasudevan and Pete Warden and Martin Wicke and Yuan Yu and Xiaoqiang Zhang},
journal={ArXiv},
year={2016},
volume={abs/1605.08695}
}
TensorFlow is a machine learning system that operates at large scale and in heterogeneous environments. Tensor-Flow uses dataflow graphs to represent computation, shared state, and the operations that mutate that state. It maps the nodes of a dataflow graph across many machines in a cluster, and within a machine across multiple computational devices, including multicore CPUs, general-purpose GPUs, and custom-designed ASICs known as Tensor Processing Units (TPUs). This architecture gives…
11,606 Citations
Performance Analysis of Just-in-Time Compilation for Training TensorFlow Multi-Layer Perceptrons
The TensorFlow system [1] has been developed to provide a general, efficient and scalable framework for writing Machine Learning (ML) applications. With the rapid advancement and popularity of ML,
Improving the Performance of Distributed TensorFlow with RDMA
This work presents a RDMA-capable design of TensorFlow, which shows a great scalability among the training scale and gets nearly 6$$\times$$× performance improvements over the original distributed Tensor Flow, based on gRPC.
TensorBow: Supporting Small-Batch Training in TensorFlow
Deep neural networks are trained using mini-batch Stochastic Gradient Descent (SGD) on specialised hardware accelerators such as a GPU. Existing training systems support the configuration of the
A Performance Evaluation of Distributed TensorFlow
• 2017
TensorFlow is a deep learning framework which is developed by Google. The computations in TensorFlow are implemented and expressed as data flow graphs of multidimensional array data which is referred
Benchmarking TensorFlow on a personal computer not specialised for machine learning
Many recent advancement of modern technologies can be attributed to the rapid growth of the machine learning field and especially deep learning. A big challenge for deep learning is that the learning
Towards Evaluation of Tensorflow Performance in a Distributed Compute Environment
• Computer Science
TPCTC
• 2018
The results show that with the right choice of input parameters and appropriate hardware, GPU-equipped general-purpose compute clusters can provide comparable deep learning training performance to specialized machines designed for AI workloads.
Tensor Relational Algebra for Distributed Machine Learning System Design
• Computer Science
Proc. VLDB Endow.
• 2021
The TRA is a set-based algebra based on the relational algebra that is easily executed with high efficiency in a parallel or distributed environment, and amenable to automatic optimization.
EasyDist: An End-to-End Distributed Deep Learning Tool for Cloud
• Computer Science
• 2019
EasyDist is an end-to-end DDL tool that preserves the single-node programming model by leveraging distributed TensorFlow between a Keras interface and public Cloud infrastructure and Evaluation of EasyDist on publicly available benchmark datasets and models shows that the model accuracy is not compromised and the training times can be reduced upto ~6-8x compared to single machine settings.
Fast Distributed Deep Learning over RDMA
• Computer Science
EuroSys
• 2019
It is shown that RPC is suboptimal for distributed deep learning computation, especially on an RDMA-capable network, and the graph analyzer looks at both the data flow graph and the tensors to optimize memory allocation and remote data access using this interface.
TensorLayer: A Versatile Library for Efficient Deep Learning Development
TensorLayer is a Python-based versatile deep learning library that provides high-level modules that abstract sophisticated operations towards neuron layers, network models, training data and dependent training jobs and has transparent module interfaces that allows developers to flexibly embed low-level controls within a backend engine.

## References

SHOWING 1-10 OF 111 REFERENCES
SparkNet: Training Deep Networks in Spark
• Computer Science, Mathematics
ICLR
• 2016
This work introduces SparkNet, a framework for training deep networks in Spark using a simple parallelization scheme for stochastic gradient descent that scales well with the cluster size and tolerates very high-latency communication.
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
The TensorFlow interface and an implementation of that interface that is built at Google are described, which has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields.
Project Adam: Building an Efficient and Scalable Deep Learning Training System
• Computer Science
OSDI
• 2014
The design and implementation of a distributed system called Adam comprised of commodity server machines to train large deep neural network models that exhibits world-class performance, scaling and task accuracy on visual recognition tasks and shows that task accuracy improves with larger models.
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
• Tianqi Chen, +7 authors Zheng Zhang
• Computer Science
ArXiv
• 2015
The API design and the system implementation of MXNet are described, and it is explained how embedding of both symbolic expression and tensor operation is handled in a unified fashion.
Large Scale Distributed Deep Networks
This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.
Caffe: Convolutional Architecture for Fast Feature Embedding
Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.
Building high-level features using large scale unsupervised learning
• Quoc V. Le, +5 authors A. Ng
• Computer Science, Mathematics
2013 IEEE International Conference on Acoustics, Speech and Signal Processing
• 2013
Contrary to what appears to be a widely-held intuition, the experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not.
Theano: A Python framework for fast computation of mathematical expressions
The performance of Theano is compared against Torch7 and TensorFlow on several machine learning models and recently-introduced functionalities and improvements are discussed.
GeePS: scalable deep learning on distributed GPUs with a GPU-specialized parameter server
• Computer Science
EuroSys
• 2016
GeePS enables a state-of-the-art single-node GPU implementation to scale well, such as to 13 times the number of training images processed per second on 16 machines (relative to the original optimized single- node code), and achieves a higher training throughput with just four GPU machines than that a state of theart CPU-only system achieves with 108 machines.
Rethinking the Inception Architecture for Computer Vision
• Computer Science
2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
• 2016
This work is exploring ways to scale up networks in ways that aim at utilizing the added computation as efficiently as possible by suitably factorized convolutions and aggressive regularization.