• Publications
  • Influence
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
TLDR
This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.
DaDianNao: A Machine-Learning Supercomputer
TLDR
This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.
ShiDianNao: Shifting vision processing closer to the sensor
TLDR
This paper proposes an accelerator which is 60x more energy efficient than the previous state-of-the-art neural network accelerator, designed down to the layout at 65 nm, with a modest footprint and consuming only 320 mW, but still about 30x faster than high-end GPUs.
Cambricon-X: An accelerator for sparse neural networks
TLDR
A novel accelerator is proposed, Cambricon-X, to exploit the sparsity and irregularity of NN models for increased efficiency and experimental results show that this accelerator achieves, on average, 7.23x speedup and 6.43x energy saving against the state-of-the-art NN accelerator.
Cambricon: An Instruction Set Architecture for Neural Networks
TLDR
This paper proposes a novel domain-specific Instruction Set Architecture (ISA) for NN accelerators, called Cambricon, which is a load-store architecture that integrates scalar, vector, matrix, logical, data transfer, and control instructions, based on a comprehensive analysis of existing NN techniques.
PuDianNao: A Polyvalent Machine Learning Accelerator
TLDR
An ML accelerator called PuDianNao is presented, which accommodates seven representative ML techniques, including k-means, k-nearest neighbors, naive bayes, support vector machine, linear regression, classification tree, and deep neural network, and can perform up to 1056 GOP/s, and consumes 596 mW only.
Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach
TLDR
A software-based coarse-grained pruning technique, together with local quantization, significantly reduces the size of indexes and improves the network compression ratio and a hardware accelerator is designed to address the remaining irregularity of sparse synapses and neurons efficiently.
Neuromorphic accelerators: A comparison between neuroscience and machine-learning approaches
TLDR
This study identifies the key sources of inaccuracy of SNN+STDP which are less related to the loss of information due to spike coding than to the nature of the STDP learning algorithm, and outlines that for the category of applications which require permanent online learning and moderate accuracy, SNN-STDP hardware accelerators could be a very cost-efficient solution.
DaDianNao: A Neural Network Supercomputer
TLDR
A custom multi-chip machine-learning architecture containing a combination of custom storage and computational units, with electrical and optical inter-chip interconnects separately is introduced, and it is shown that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 656.63× over a GPU, and reduce the energy by 184.05× on average for a 64-chip system.
IMR: High-Performance Low-Cost Multi-Ring NoCs
TLDR
This paper presents a novel type of multi-ring NoC called isolated multi- ring (IMR), which can even support chip multiprocessors (CMPs) with 1,024 cores, and observes from experiments that IMR significantly outperforms its competitors in both saturation throughput and latency across all scenarios considered.
...
...