ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars

@article{Shafiee2016ISAACAC,
  title={ISAAC: A Convolutional Neural Network Accelerator with In-Situ Analog Arithmetic in Crossbars},
  author={Ali Shafiee and Anirban Nag and Naveen Muralimanohar and Rajeev Balasubramonian and John Paul Strachan and Miao Hu and R. Stanley Williams and Vivek Srikumar},
  journal={2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)},
  year={2016},
  pages={14-26}
}
A number of recent efforts have attempted to design accelerators for popular machine learning algorithms, such as those involving convolutional and deep neural networks (CNNs and DNNs). These algorithms typically involve a large number of multiply-accumulate (dot-product) operations. A recent project, DaDianNao, adopts a near data processing approach, where a specialized neural functional unit performs all the digital arithmetic operations and receives input weights from adjacent eDRAM banks… Expand
A Versatile ReRAM-based Accelerator for Convolutional Neural Networks
TLDR
This work proposes a multi-tile ReRAM accelerator for supporting multiple CNN topologies, where each tile processes one or more layers in a pipelined fashion, and designs every tile with 9 processing elements that operate in a systolic fashion. Expand
Trained Biased Number Representation for ReRAM-Based Neural Network Accelerators
TLDR
A new CNN training and implementation approach that implements weights using a trained biased number representation, which can achieve near full-precision model accuracy with as little as 2-bit weights and 2- bit activations on the CIFAR datasets. Expand
ATRIA: A Bit-Parallel Stochastic Arithmetic Based Accelerator for In-DRAM CNN Processing
TLDR
ATRIA significantly improves the latency, throughput, and efficiency of processing CNN inferences by performing 16 MAC operations in only five consecutive memory operation cycles, compared to the best-performing in-DRAM accelerator from prior work. Expand
PipeLayer: A Pipelined ReRAM-Based Accelerator for Deep Learning
TLDR
PipeLayer is presented, a ReRAM-based PIM accelerator for CNNs that support both training and testing and proposes highly parallel design based on the notion of parallelism granularity and weight replication, which enables the highly pipelined execution of bothTraining and testing, without introducing the potential stalls in previous work. Expand
Input-Splitting of Large Neural Networks for Power-Efficient Accelerator with Resistive Crossbar Memory Array
TLDR
It is demonstrated that any CNN model can be represented with multiple arrays without using intermediate partial sums, and the ADC power of the proposed design is 32x smaller and the total chip power is 3x smaller than those of the baseline design. Expand
Analog Weights in ReRAM DNN Accelerators
TLDR
This paper presents a novel scheme in alleviating the single-bit-per-device restriction by exploiting frequency dependence of v-i plane hysteresis, and assigning kernel information not only to the device conductance but also partially distributing it to the frequency of a time-varying input. Expand
PANTHER: A Programmable Architecture for Neural Network Training Harnessing Energy-Efficient ReRAM
TLDR
PANTHER, an ISA-programmable training accelerator with compiler support, is developed and can be integrated into other accelerators in the literature to enhance their efficiency. Expand
MAX2: An ReRAM-Based Neural Network Accelerator That Maximizes Data Reuse and Area Utilization
TLDR
A multi-tile ReRAM accelerator framework for supporting multiple CNN topologies that maximizes on-chip data reuse and reduces on- chip bandwidth to minimize energy consumption due to data movement and a detailed energy and area breakdown of each component at the PE level, tile level, and system level. Expand
Processing Convolutional Neural Networks on Cache
TLDR
This paper proposes and assess a novel mechanism that operates at cache level, leveraging both data-proximity and parallel processing capabilities, enabled by dedicated fully-digital vector Functional Units (FUs), and demonstrates the integration of this mechanism in a conventional Central Processing Unit (CPU). Expand
Deep Learning Acceleration with Neuron-to-Memory Transformation
TLDR
A novel framework, called RAPIDNN, which performs neuron-to-memory transformation in order to accelerate DNNs in a highly parallel architecture and achieves 68.4×, 49.5× energy efficiency improvement and 48.9× speedup as compared to ISAAC and PipeLayer, the state-of-the-art DNN accelerators, while ensuring less than 0.5% quality loss. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 92 REFERENCES
Accelerating Deep Convolutional Neural Networks Using Specialized Hardware
TLDR
Hardware specialization in the form of GPGPUs, FPGAs, and ASICs offers a promising path towards major leaps in processing capability while achieving high energy efficiency, and combining multiple FPGA over a low-latency communication fabric offers further opportunity to train and evaluate models of unprecedented size and quality. Expand
DaDianNao: A Machine-Learning Supercomputer
  • Yunji Chen, Tao Luo, +8 authors O. Temam
  • Computer Science
  • 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
  • 2014
TLDR
This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system. Expand
ShiDianNao: Shifting vision processing closer to the sensor
TLDR
This paper proposes an accelerator which is 60x more energy efficient than the previous state-of-the-art neural network accelerator, designed down to the layout at 65 nm, with a modest footprint and consuming only 320 mW, but still about 30x faster than high-end GPUs. Expand
Dot-product engine for neuromorphic computing: Programming 1T1M crossbar to accelerate matrix-vector multiplication
TLDR
The Dot-Product Engine (DPE) is developed as a high density, high power efficiency accelerator for approximate matrix-vector multiplication, invented a conversion algorithm to map arbitrary matrix values appropriately to memristor conductances in a realistic crossbar array. Expand
CNP: An FPGA-based processor for Convolutional Networks
TLDR
The implementation exploits the inherent parallelism of ConvNets and takes full advantage of multiple hardware multiplyaccumulate units on the FPGA and can be used for low-power, lightweight embedded vision systems for micro-UAVs and other small robots. Expand
EIE: Efficient Inference Engine on Compressed Deep Neural Network
  • Song Han, Xingyu Liu, +4 authors W. Dally
  • Computer Science
  • 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
  • 2016
TLDR
An energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing and is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression. Expand
Minerva: Enabling Low-Power, Highly-Accurate Deep Neural Network Accelerators
The continued success of Deep Neural Networks (DNNs) in classification tasks has sparked a trend of accelerating their execution with specialized hardware. While published designs easily give anExpand
Origami: A Convolutional Network Accelerator
TLDR
This paper presents the first convolutional network accelerator which is scalable to network sizes that are currently only handled by workstation GPUs, but remains within the power envelope of embedded systems. Expand
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
TLDR
This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint. Expand
A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm
TLDR
This work fabricated a key building block of a modular neuromorphic architecture, a neurosynaptic core, with 256 digital integrate-and-fire neurons and a 1024×256 bit SRAM crossbar memory for synapses using IBM's 45nm SOI process, leading to ultra-low active power consumption. Expand
...
1
2
3
4
5
...