• Publications
  • Influence
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
TLDR
This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.
ShiDianNao: Shifting vision processing closer to the sensor
TLDR
This paper proposes an accelerator which is 60x more energy efficient than the previous state-of-the-art neural network accelerator, designed down to the layout at 65 nm, with a modest footprint and consuming only 320 mW, but still about 30x faster than high-end GPUs.
Cambricon-X: An accelerator for sparse neural networks
TLDR
A novel accelerator is proposed, Cambricon-X, to exploit the sparsity and irregularity of NN models for increased efficiency and experimental results show that this accelerator achieves, on average, 7.23x speedup and 6.43x energy saving against the state-of-the-art NN accelerator.
Cambricon: An Instruction Set Architecture for Neural Networks
TLDR
This paper proposes a novel domain-specific Instruction Set Architecture (ISA) for NN accelerators, called Cambricon, which is a load-store architecture that integrates scalar, vector, matrix, logical, data transfer, and control instructions, based on a comprehensive analysis of existing NN techniques.
Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach
TLDR
A software-based coarse-grained pruning technique, together with local quantization, significantly reduces the size of indexes and improves the network compression ratio and a hardware accelerator is designed to address the remaining irregularity of sparse synapses and neurons efficiently.
Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators
TLDR
This paper proposes to expand the application scope, error tolerance as well as the energy savings of inexact computing systems through neural network architectures, and demonstrates that the proposed inexact neural network accelerator could achieve 43.91%-62.49% savings in energy consumption.
Neuromorphic accelerators: A comparison between neuroscience and machine-learning approaches
TLDR
This study identifies the key sources of inaccuracy of SNN+STDP which are less related to the loss of information due to spike coding than to the nature of the STDP learning algorithm, and outlines that for the category of applications which require permanent online learning and moderate accuracy, SNN-STDP hardware accelerators could be a very cost-efficient solution.
Fixed-Point Back-Propagation Training
TLDR
By keeping the data distribution stable through a layer-wise precision-adaptive quantization, this paper is able to directly train deep neural networks using low bit-width fixed-point data and achieve guaranteed accuracy, without changing hyper parameters.
Rubik: A Hierarchical Architecture for Efficient Graph Learning
TLDR
This work proposes a lightweight graph reordering methodology, incorporated with a GCN accelerator architecture that equips a customized cache design to fully utilize the graph-level data reuse, and proposes a mapping methodology aware of data reuse and task-level parallelism to handle various graphs inputs effectively.
A High-Throughput Neural Network Accelerator
TLDR
A concrete design at 65 nm that can perform 496 16-bit fixed-point operations in parallel every 1.02 ns, that is, 452 gop/s, in a 3.02mm2, 485-mw footprint (excluding main memory accesses).
...
...