# Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations

@article{Tayfun2016AccelerationOD, title={Acceleration of Deep Neural Network Training with Resistive Cross-Point Devices: Design Considerations}, author={G{\"o}kmen Tayfun and Yurii A. Vlasov}, journal={Frontiers in Neuroscience}, year={2016}, volume={10} }

In recent years, deep neural networks (DNN) have demonstrated significant business impact in large scale analysis and classification tasks such as speech recognition, visual object detection, pattern extraction, etc. [... ] Key Result A system consisting of a cluster of RPU accelerators will be able to tackle Big Data problems with trillions of parameters that is impossible to address today like, for example, natural speech recognition and translation between all world languages, real-time analytics on large… Expand

## 279 Citations

Zero-shifting Technique for Deep Neural Network Training on Resistive Cross-point Arrays

- Computer ScienceArXiv
- 2019

A concept of symmetry point is introduced and a zero-shifting technique is proposed which can compensate imbalance by programming the reference device and changing the zero value point of the weight and it is shown that network performance dramatically improves for imbalanced synapse devices.

TxSim: Modeling Training of Deep Neural Networks on Resistive Crossbar Systems

- Computer ScienceIEEE Transactions on Very Large Scale Integration (VLSI) Systems
- 2021

TxSim is proposed, a fast and customizable modeling framework to functionally evaluate DNN training on crossbar-based hardware considering the impact of nonidealities and achieves computational efficiency by mapping crossbar evaluations to well-optimized Basic Linear Algebra Subprograms routines and incorporates speedup techniques to further reduce simulation time with minimal impact on accuracy.

Stochastic learning in deep neural networks based on nanoscale PCMO device characteristics

- Computer ScienceNeurocomputing
- 2018

Training Deep Convolutional Neural Networks with Resistive Cross-Point Devices

- Computer ScienceFront. Neurosci.
- 2017

This work shows how to map the convolutional layers to fully connected RPU arrays such that the parallelism of the hardware can be fully utilized in all three cycles of the backpropagation algorithm.

Perspective on training fully connected networks with resistive memories: Device requirements for multiple conductances of varying significance

- Computer ScienceJournal of Applied Physics
- 2018

Simulations to evaluate the final generalization accuracy of a trained four-neuron-layer fully-connected network quantify the required dynamic range, the tolerable device-to-device variability in both maximum conductance andmaximum conductance change, the tolerateable pulse- to-pulse variability in conductance changes, and the tolerably device yield.

Algorithm for Training Neural Networks on Resistive Device Arrays

- Computer ScienceFrontiers in Neuroscience
- 2020

A new training algorithm, so-called the “Tiki-Taka” algorithm, is presented that eliminates this stringent symmetry requirement for resistive crossbar arrays and maintains the aforementioned power and speed benefits.

Analog CMOS-based resistive processing unit for deep neural network training

- Computer Science2017 IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS)
- 2017

An analog CMOS-based RPU design (CMOS RPU) is proposed which can store and process data locally and can be operated in a massively parallel manner and evaluated the functionality and feasibility for acceleration of DNN training.

Design and characterization of superconducting nanowire-based processors for acceleration of deep neural network training

- Computer ScienceNanotechnology
- 2019

The superconducting nanowire-based processing element as a crosspoint device has many programmable non-volatile states that can be used to perform analog multiplication, and these states are intrinsically discrete due to quantization of flux, which provides symmetric switching characteristics.

Vesti: Energy-Efficient In-Memory Computing Accelerator for Deep Neural Networks

- Computer ScienceIEEE Transactions on Very Large Scale Integration (VLSI) Systems
- 2020

A new DNN accelerator is designed to support configurable multibit activations and large-scale DNNs seamlessly while substantially improving the chip-level energy-efficiency with favorable accuracy tradeoff compared to conventional digital ASIC.

Training LSTM Networks With Resistive Cross-Point Devices

- Computer ScienceFront. Neurosci.
- 2018

This work further extends the RPU concept for training recurrent neural networks (RNNs) namely LSTMs and finds that RPU device variations and hardware noise are enough to mitigate overfitting, so that there is less need for using dropout.

## References

SHOWING 1-10 OF 59 REFERENCES

On-Chip Sparse Learning Acceleration With CMOS and Resistive Synaptic Devices

- Computer ScienceIEEE Transactions on Nanotechnology
- 2015

This paper cooptimizes algorithm, architecture, circuit, and device for real-time energy-efficient on-chip hardware acceleration of sparse coding and shows that 65 nm implementation of the CMOS ASIC and PARCA scheme accelerates sparse coding computation by 394 and 2140×, respectively, compared to software running on a eight-core CPU.

A 240 G-ops/s Mobile Coprocessor for Deep Neural Networks

- Computer Science2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops
- 2014

The nn-X system is presented, a scalable, low-power coprocessor for enabling real-time execution of deep neural networks, able to achieve a peak performance of 227 G-ops/s, which translates to a performance per power improvement of 10 to 100 times that of conventional mobile and desktop processors.

DaDianNao: A Machine-Learning Supercomputer

- Computer Science2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
- 2014

This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.

Large-scale neural networks implemented with non-volatile memory as the synaptic weight element: Comparative performance analysis (accuracy, speed, and power)

- Computer Science2015 IEEE International Electron Devices Meeting (IEDM)
- 2015

It is shown that NVM-based systems could potentially offer faster and lower-power ML training than GPU-based hardware, despite the inherent random and deterministic imperfections of such devices.

Scaling-up resistive synaptic arrays for neuro-inspired architecture: Challenges and prospect

- Computer Science2015 IEEE International Electron Devices Meeting (IEDM)
- 2015

A circuit-level macro simulator is developed to explore the design trade-offs and evaluate the overhead of the proposed mitigation strategies as well as project the scaling trend of the neuro-inspired architecture.

Memristor-Based Multilayer Neural Networks With Online Gradient Descent Training

- Computer ScienceIEEE Transactions on Neural Networks and Learning Systems
- 2015

The utility and robustness of the proposed memristor-based circuit can compactly implement hardware MNNs trainable by scalable algorithms based on online gradient descent (e.g., backpropagation).

Training and operation of an integrated neuromorphic network based on metal-oxide memristors

- Computer ScienceNature
- 2015

The experimental implementation of transistor-free metal-oxide memristor crossbars, with device variability sufficiently low to allow operation of integrated neural networks, in a simple network: a single-layer perceptron (an algorithm for linear classification).

Deep learning with COTS HPC systems

- Computer ScienceICML
- 2013

This paper presents technical details and results from their own system based on Commodity Off-The-Shelf High Performance Computing (COTS HPC) technology: a cluster of GPU servers with Infiniband interconnects and MPI, and shows that it can scale to networks with over 11 billion parameters using just 16 machines.

Experimental Demonstration and Tolerancing of a Large-Scale Neural Network (165 000 Synapses) Using Phase-Change Memory as the Synaptic Weight Element

- Computer ScienceIEEE Transactions on Electron Devices
- 2015

It is shown that a bidirectional NVM with a symmetric, linear conductance response of high dynamic range is capable of delivering the same high classification accuracies on this problem as a conventional, software-based implementation of this same network.

A generic systolic array building block for neural networks with on-chip learning

- Computer ScienceIEEE Trans. Neural Networks
- 1993

The two-dimensional systolic array system presented is an attempt to define a novel computer architecture inspired by neurobiology that is composed of generic building blocks for basic operations rather than predefined neural models.