Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators

@article{Du2014LeveragingTE,
  title={Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators},
  author={Zidong Du and Krishna V. Palem and Lingamneni Avinash and Olivier Temam and Yunji Chen and Chengyong Wu},
  journal={2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC)},
  year={2014},
  pages={201-206}
}
  • Zidong Du, K. Palem, Chengyong Wu
  • Published 20 February 2014
  • Computer Science
  • 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC)
In recent years, inexact computing has been increasingly regarded as one of the most promising approaches for reducing energy consumption in many applications that can tolerate a degree of inaccuracy. Driven by the principle of trading tolerable amounts of application accuracy in return for significant resource savings - the energy consumed, the (critical path) delay and the (silicon) area being the resources - this approach has been limited to certain application domains. In this paper, we… 

Figures and Tables from this paper

Retraining-based timing error mitigation for hardware neural networks
TLDR
Experimental results show that timing errors in NN accelerators can be well tamed for different applications, and when timing errors significantly affect the output results, it is proposed to retrain the accelerators to update their weights, thus circumventing critical timing errors.
Resilience-Aware Frequency Tuning for Neural-Network-Based Approximate Computing Chips
TLDR
The experimental results show that timing errors in neural circuits can be effectively tamed for different applications, so that the circuits can operate at higher clocking rates under the specified quality constraint or be dynamically scaled to work at a wide range of frequency states with only minor accuracy losses.
Colony of NPUs : Scaling the Efficiency of Neural Accelerators
TLDR
A Neural Processing Unit (NPU) is proposed as a programmable approximate accelerator to train a neural network to mimic an approximable region of original code and replace that with an efficient computation of the learned model.
Bit Error Robustness for Energy-Efficient DNN Accelerators
TLDR
This paper shows that a combination of robust fixed-point quantization, weight clipping, and random bit error training (RandBET) improves robustness against random bit errors in (quantized) DNN weights significantly, which leads to high energy savings from both low-voltage operation as well as low-precision quantization.
Reducing Energy of Approximate Feature Extraction in Heterogeneous Architectures for Sensor Inference via Energy-Aware Genetic Programming
TLDR
A heterogeneous architecture for embedded sensor inference is employed, demonstrated in custom silicon, where programmable feature extraction is mapped to an accelerator via genetic programming, and robust energy models are employed in a genetic-programming algorithm to improve the energy-approximation Pareto frontier.
Prediction-Based Quality Control for Approximate Accelerators
TLDR
This work uses neural networks as an alternative prediction mechanism for quality control that also provides a realistic reference point to evaluate the effectiveness of the table-based predictor.
Float-Fix: An Efficient and Hardware-Friendly Data Type for Deep Neural Network
TLDR
This paper proposes a new data type, called Float-Fix (FF), and introduces the structure of FF and compares it with other data types and shows that the hardware cost of convertors converting between 16-bit fixed-point and FF is really small.
AxBench: A Benchmark Suite for Approximate Computing Across the System Stack
TLDR
AxBench, a general, diverse and representative multi-framework set of benchmarks for CPUs, GPUs, and hardware design, is developed and introduced and found that NPUs offer higher performance and energy efficiency as compared to loop perforation on both CPUs and GPUs.
A survey of neural network accelerators
TLDR
This review can serve as a reference for hardware researchers in the area of neural networks and recent related works, as well as the DianNao-family accelerators.
On quality trade-off control for approximate computing using iterative training
TLDR
A novel optimization framework advocates an iteratively training process to coordinate the training of the classifier and the accelerator with a judicious selection of training data and integrates a dynamic threshold tuning algorithm to maximize the invocation of the accelerator under the quality requirement.
...
...

References

SHOWING 1-10 OF 24 REFERENCES
A defect-tolerant accelerator for emerging high-performance applications
  • O. Temam
  • Computer Science
    2012 39th Annual International Symposium on Computer Architecture (ISCA)
  • 2012
TLDR
It is empirically show that the conceptual error tolerance of neural networks does translate into the defect tolerance of hardware neural networks, paving the way for their introduction in heterogeneous multi-cores as intrinsically defect-tolerant and energy-efficient accelerators.
BenchNN: On the broad potential application scope of hardware neural network accelerators
TLDR
Software neural network implementations of 5 RMS applications from the PARSEC Benchmark Suite are developed and evaluated and it is highlighted that a hardware neural network accelerator is indeed compatible with many of the emerging high- performance workloads, currently accepted as benchmarks for high-performance micro-architectures.
Synthesizing Parsimonious Inexact Circuits through Probabilistic Design Techniques
TLDR
Two novel design approaches called Probabilistic Pruning and Probabilism Logic Minimization are proposed to realize inexact circuits with zero hardware overhead and can independently achieve normalized gains as large as 2x--9.5x in energy-delay-area product for relative error magnitude as low as 10 − 4%--8% compared to corresponding conventional correct circuits.
Energy parsimonious circuit design through probabilistic pruning
TLDR
This paper presents a novel design-level technique called probabilistic pruning to realize inexact circuits using pruning of portions of circuits having a lower probability of being active, as the basis for performing architectural modifications resulting in significant savings in energy, delay and area.
Conservation cores: reducing the energy of mature computations
TLDR
A toolchain for automatically synthesizing c-cores from application source code is presented and it is demonstrated that they can significantly reduce energy and energy-delay for a wide range of applications, and patching can extend the useful lifetime of individual c-Cores to match that of conventional processors.
Parsimonious Circuits for Error-Tolerant Applications through Probabilistic Logic Minimization
TLDR
This work proposes a novel technique called Probabilistic Logic Minimization which relies on synthesizing an inexact circuit in the first place resulting in zero hardware overhead and normalized gains as high as 2X- 9.5X in the Energy-Delay-Area product can be obtained when compared to the corresponding correct designs.
A digital neurosynaptic core using embedded crossbar memory with 45pJ per spike in 45nm
TLDR
This work fabricated a key building block of a modular neuromorphic architecture, a neurosynaptic core, with 256 digital integrate-and-fire neurons and a 1024×256 bit SRAM crossbar memory for synapses using IBM's 45nm SOI process, leading to ultra-low active power consumption.
Approximate logic synthesis for error tolerant applications
  • Doochul Shin, S. Gupta
  • Computer Science
    2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010)
  • 2010
TLDR
A new logic synthesis approach for the new problem of identifying how to exploit a given error rate threshold to maximally reduce the area of the synthesized circuit is proposed.
Optimizing energy to minimize errors in dataflow graphs using approximate adders
TLDR
This work presents a method to optimally distribute a given energy budget among adders in a dataflow graph so as to minimize expected errors and demonstrates this method on a finite impulse response filter and a Fast Fourier Transform.
Selective flexibility: Breaking the rigidity of datapath merging
TLDR
This paper combines flexibility and efficiency in the design and synthesis of domain-specific datapaths by merging all individual paths from the Data Flow Graphs of the target applications, leading to a minimal set of required resources and generating a domain- specific rectangular lattice.
...
...