• Corpus ID: 36570826

Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks

  title={Dynamic Stripes: Exploiting the Dynamic Precision Requirements of Activation Values in Neural Networks},
  author={Alberto Delmas and Patrick Judd and Sayeh Sharify and Andreas Moshovos},
Stripes is a Deep Neural Network (DNN) accelerator that uses bit-serial computation to offer performance that is proportional to the fixed-point precision of the activation values. The fixed-point precisions are determined a priori using profiling and are selected at a per layer granularity. This paper presents Dynamic Stripes, an extension to Stripes that detects precision variance at runtime and at a finer granularity. This extra level of precision reduction increases performance by 41% over… 

Figures and Tables from this paper

DPRed: Making Typical Activation Values Matter In Deep Learning Computing

This work proposes Dynamic Prediction Reduction (DPRed), a data-parallel hardware accelerator that adjusts precision on-the-fly to accommodate the values of the activations it processes concurrently, and extends DPRS to exploit activation and weight precisions for fully-connected layers.

Characterizing Sources of Ineffectual Computations in Deep Learning Networks

This work analyzes to what extent value properties persist in a broader set of neural network models and data sets, and shows that these properties persist, albeit to a different degree, and identifies opportunities for future accelerator design efforts.

DPRed: Making Typical Activation and Weight Values Matter In Deep Learning Computing

Designs where the time required to process each group of activations and/or weights scales proportionally to the precision they use for convolutional and fully-connected layers improves execution time and energy efficiency for both dense and sparse networks.

Identifying and Exploiting Ineffectual Computations to Enable Hardware Acceleration of Deep Learning

This article summarizes of the work on hardware accelerators for inference with Deep Learning Neural Networks, focusing on properties in the value stream of DNNs which can be exploited at the hardware level to improve execution time, reduce off- and on-chip communication and storage, resulting in higher energy efficiency and execution time reduction.

Bit-Tactical: A Software/Hardware Approach to Exploiting Value and Bit Sparsity in Neural Networks

Two back-end designs chosen to target bit-sparsity in activations, rather than value- sparsity, are empirically motivate, with two benefits: a) they avoid handling the dynamically sparse whole-value activation stream, and b) they uncover more ineffectual work.

Bit-Tactical: Exploiting Ineffectual Computations in Convolutional Neural Networks: Which, Why, and How

We show that, during inference with Convolutional Neural Networks (CNNs), more than 2x to $8x ineffectual work can be exposed if instead of targeting those weights and activations that are zero, we

Value-Based Deep-Learning Acceleration

This article summarizes the recent work on value-based hardware accelerators for image classification using Deep Convolutional Neural Networks (CNNs) by exploiting runtime value properties that are difficult or impossible to discern in advance.

Characterizing Sources of Ineffectual Computations in Deep Learning Networks

It is demonstrated that such properties persist in more recent and thus more accurate and better performing image classification networks, models for image applications other than classification such as image segmentation and low-level computational imaging, and Long-Short-Term-Memory models for non-image applications such as those for natural language processing.

Reconfigurable Multi-Input Adder Design for Deep Neural Network Accelerators

Two efficient designs of reconfigurable multi-input adders for deep neural network accelerators enable bit-width adaptive computing in neural network layers, which improves computing throughput.

Laconic Deep Learning Computing

It is shown that if the authors decompose multiplications down to the bit level the amount of work performed during inference for image classification models can be consistently reduced by two orders of magnitude.



Bit-Pragmatic Deep Neural Network Computing

This work proposes Pragmatic (PRA), a massively data-parallel architecture that eliminates most of the ineffectual computations on- the-fly, improving performance and energy efficiency compared to state-of-the-art high-performance accelerators.

DaDianNao: A Machine-Learning Supercomputer

  • Yunji ChenTao Luo O. Temam
  • Computer Science
    2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
  • 2014
This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.

A low-power DCT core using adaptive bitwidth and arithmetic activity exploiting signal correlations and quantization

This work describes the implementation of a discrete cosine transform (DCT) core compression system targetted to low-power video (MPEG2 MP@ML) and still-image (JPEG) applications. It exhibits two

Stripes: Bit-serial deep neural network computing

Stripes (STR) relies on bit-serial compute units and on the parallelism that is naturally present within DNNs to improve performance and energy with no accuracy loss, and provides a new degree of adaptivity enabling on-the-fly trade-offs among accuracy, performance, and energy.