Chainer: A Deep Learning Framework for Accelerating the Research Cycle

  title={Chainer: A Deep Learning Framework for Accelerating the Research Cycle},
  author={Seiya Tokui and Ryosuke Okuta and Takuya Akiba and Yusuke Niitani and Toru Ogawa and Shunta Saito and Shuji Suzuki and Kota Uenishi and Brian K. Vogel and Hiroyuki Yamazaki Vincent},
  journal={Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery \& Data Mining},
Software frameworks for neural networks play a key role in the development and application of deep learning methods. In this paper, we introduce the Chainer framework, which intends to provide a flexible, intuitive, and high performance means of implementing the full range of deep learning models needed by researchers and practitioners. Chainer provides acceleration using Graphics Processing Units with a familiar NumPy-like API through CuPy, supports general and dynamic models in Python through… 

Figures and Tables from this paper

ChainerRL: A Deep Reinforcement Learning Library

ChainerRL is an open-source Deep Reinforcement Learning library built using Python and the Chainer deep learning framework that implements a comprehensive set of DRL algorithms and techniques drawn from the state-of-the-art research in the field.

Pixyz: a library for developing deep generative models

A new DGM library called Pixyz is proposed that is faster than existing probabilistic modeling languages in learning simple DGMs and can be used to implement complex DGMs in a simple and concise manner, which is difficult to do with existing libraries.

NNBlocks: a Blockly framework for AI computing

A visual concept approach that can execute artificial intelligence (AI) computing using block-based tools with AI knowledge is proposed and a web-based NNBlocks framework that uses this approach to integrate with TVM is developed.

Towards a Scalable and Distributed Infrastructure for Deep Learning Applications

Phylanx presents a productivity-oriented frontend where user Python code is translated to a futurized execution tree that can be executed efficiently on multiple nodes using the C++ standard library for parallelism and concurrency (HPX), leveraging fine-grained threading and an active messaging task-based runtime system.

TensorX: Extensible API for Neural Network Model Design and Deployment

TensorX is a Python library for prototyping, design, and deployment of complex neural network models in TensorFlow, aiming to make available high-level components like neural network layers that are, in effect, stateful functions, easy to compose and reuse.

Swift for TensorFlow: A portable, flexible platform for deep learning

Deep learning platform Swift for TensorFlow combines a language-integrated automatic differentiation system and multiple Tensor implementations within a modern ahead-of-time compiled language oriented around mutable value semantics.

PrototypeML: A Neural Network Integrated Design and Development Environment

This paper details the deep learning development deficiencies that drove the implementation of PrototypeML, and proposes a hybrid approach to resolve these issues without limiting network expressiveness or reducing code quality.

torch.fx: Practical Program Capture and Transformation for Deep Learning in Python

This work designs a program capture and transformation library for PyTorch written entirely in Python and optimized for high developer productivity by ML practitioners, and presents case studies showing how torch.fx enables workflows previously inaccessible in thePyTorch ecosystem.

SoftNeuro: Fast Deep Inference using Multi-platform Optimization

This work proposes SoftNeuro, a novel, high-performance inference framework with efficient performance tuning that maximizes the inference performance by profiling various routines for each layer and selecting the fastest path.

ODEN: Live Programming for Neural Network Architecture Editing

This work implements the live visualization and integrates it into an IDE called ODEN that seamlessly supports the “edit→experiment→edit→···” repetition and proposes to leverage live programming techniques in NN architecture editing with an always-on visualization.



Caffe: Convolutional Architecture for Fast Feature Embedding

Caffe provides multimedia scientists and practitioners with a clean and modifiable framework for state-of-the-art deep learning algorithms and a collection of reference models for training and deploying general-purpose convolutional neural networks and other deep models efficiently on commodity architectures.

MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems

The API design and the system implementation of MXNet are described, and it is explained how embedding of both symbolic expression and tensor operation is handled in a unified fashion.

Large Scale Distributed Deep Networks

This paper considers the problem of training a deep network with billions of parameters using tens of thousands of CPU cores and develops two algorithms for large-scale distributed training, Downpour SGD and Sandblaster L-BFGS, which increase the scale and speed of deep network training.

Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train

The challenges and novel solutions needed in order to train ResNet-50 in this large scale environment are described and the novel Collapsed Ensemble (CE) technique is introduced that allows for a 77.5\% top-1 accuracy, similar to that of a Res net-152, while training a unmodified Res Net-50 topology for the same fixed training budget.

An introduction to computational networks and the computational network toolkit (invited talk)

The computational network toolkit (CNTK), an implementation of CN that supports both GPU and CPU, is introduced and the architecture and the key components of the CNTK are described, the command line options to use C NTK, and the network definition and model editing language are described.

Mesh-TensorFlow: Deep Learning for Supercomputers

Mesh-TensorFlow is introduced, a language for specifying a general class of distributed tensor computations and used to implement an efficient data-parallel, model-Parallel version of the Transformer sequence-to-sequence model, surpassing state of the art results on WMT'14 English- to-French translation task and the one-billion-word language modeling benchmark.

Sequence to Sequence Learning with Neural Networks

This paper presents a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure, and finds that reversing the order of the words in all source sentences improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier.

Horovod: fast and easy distributed deep learning in TensorFlow

Horovod is an open source library that improves on both obstructions to scaling: it employs efficient inter-GPU communication via ring reduction and requires only a few lines of modification to user code, enabling faster, easier distributed training in TensorFlow.

Squeeze-and-Excitation Networks

This work proposes a novel architectural unit, which is term the “Squeeze-and-Excitation” (SE) block, that adaptively recalibrates channel-wise feature responses by explicitly modelling interdependencies between channels and shows that these blocks can be stacked together to form SENet architectures that generalise extremely effectively across different datasets.

Feature Pyramid Networks for Object Detection

This paper exploits the inherent multi-scale, pyramidal hierarchy of deep convolutional networks to construct feature pyramids with marginal extra cost and achieves state-of-the-art single-model results on the COCO detection benchmark without bells and whistles.