Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
DianNao: a small-footprint high-throughput accelerator for ubiquitous machine-learning
This study designs an accelerator for large-scale CNNs and DNNs, with a special emphasis on the impact of memory on accelerator design, performance and energy, and shows that it is possible to design an accelerator with a high throughput, capable of performing 452 GOP/s in a small footprint.
DaDianNao: A Machine-Learning Supercomputer
- Yunji Chen, Tao Luo, O. Temam
- Computer Science47th Annual IEEE/ACM International Symposium on…
- 13 December 2014
This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.
ShiDianNao: Shifting vision processing closer to the sensor
- Zidong Du, Robert Fasthuber, O. Temam
- Computer ScienceACM/IEEE 42nd Annual International Symposium on…
- 13 June 2015
This paper proposes an accelerator which is 60x more energy efficient than the previous state-of-the-art neural network accelerator, designed down to the layout at 65 nm, with a modest footprint and consuming only 320 mW, but still about 30x faster than high-end GPUs.
Cambricon-X: An accelerator for sparse neural networks
- Shijin Zhang, Zidong Du, Yunji Chen
- Computer Science49th Annual IEEE/ACM International Symposium on…
- 15 October 2016
A novel accelerator is proposed, Cambricon-X, to exploit the sparsity and irregularity of NN models for increased efficiency and experimental results show that this accelerator achieves, on average, 7.23x speedup and 6.43x energy saving against the state-of-the-art NN accelerator.
Cambricon: An Instruction Set Architecture for Neural Networks
- Shaoli Liu, Zidong Du, Tianshi Chen
- Computer ScienceACM/IEEE 43rd Annual International Symposium on…
- 18 June 2016
This paper proposes a novel domain-specific Instruction Set Architecture (ISA) for NN accelerators, called Cambricon, which is a load-store architecture that integrates scalar, vector, matrix, logical, data transfer, and control instructions, based on a comprehensive analysis of existing NN techniques.
PuDianNao: A Polyvalent Machine Learning Accelerator
An ML accelerator called PuDianNao is presented, which accommodates seven representative ML techniques, including k-means, k-nearest neighbors, naive bayes, support vector machine, linear regression, classification tree, and deep neural network, and can perform up to 1056 GOP/s, and consumes 596 mW only.
Cambricon-S: Addressing Irregularity in Sparse Neural Networks through A Cooperative Software/Hardware Approach
- Xuda Zhou, Zidong Du, Yunji Chen
- Computer Science51st Annual IEEE/ACM International Symposium on…
- 1 October 2018
A software-based coarse-grained pruning technique, together with local quantization, significantly reduces the size of indexes and improves the network compression ratio and a hardware accelerator is designed to address the remaining irregularity of sparse synapses and neurons efficiently.
Neuromorphic accelerators: A comparison between neuroscience and machine-learning approaches
- Zidong Du, D. B. Rubin, O. Temam
- Computer Science48th Annual IEEE/ACM International Symposium on…
- 5 December 2015
This study identifies the key sources of inaccuracy of SNN+STDP which are less related to the loss of information due to spike coding than to the nature of the STDP learning algorithm, and outlines that for the category of applications which require permanent online learning and moderate accuracy, SNN-STDP hardware accelerators could be a very cost-efficient solution.
DaDianNao: A Neural Network Supercomputer
A custom multi-chip machine-learning architecture containing a combination of custom storage and computational units, with electrical and optical inter-chip interconnects separately is introduced, and it is shown that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 656.63× over a GPU, and reduce the energy by 184.05× on average for a 64-chip system.
IMR: High-Performance Low-Cost Multi-Ring NoCs
- Shaoli Liu, Tianshi Chen, Yunji Chen
- Computer ScienceIEEE Transactions on Parallel and Distributed…
- 1 June 2016
This paper presents a novel type of multi-ring NoC called isolated multi- ring (IMR), which can even support chip multiprocessors (CMPs) with 1,024 cores, and observes from experiments that IMR significantly outperforms its competitors in both saturation throughput and latency across all scenarios considered.