• Publications
  • Influence
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
TLDR
We introduce "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy. Expand
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size
TLDR
We propose SqueezeNet, a CNN architecture that has 50× fewer parameters than AlexNet and maintains AlexNet-level accuracy on ImageNet. Expand
Principles and Practices of Interconnection Networks
TLDR
This book provides a detailed and comprehensive presentation of the basic principles of interconnection network design, clearly illustrating them with numerous examples, chapter exercises, and case studies. Expand
Learning both Weights and Connections for Efficient Neural Network
TLDR
We present a method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections. Expand
EIE: Efficient Inference Engine on Compressed Deep Neural Network
TLDR
We propose an energy efficient inference engine (EIE) that performs inference on compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing. Expand
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
TLDR
A deadlock-free routing algorithm can be generated for arbitrary interconnection networks using the concept of virtual channels. Expand
Route packets, not wires: on-chip interconnection networks
Using on-chip interconnection networks in place of ad-hoc global wiring structures the top level wires on a chip and facilitates modular design. With this approach, system modules (processors,Expand
Memory access scheduling
TLDR
We introduce memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure. Expand
- LEVEL ACCURACY WITH 50 X FEWER PARAMETERS AND < 0 . 5 MB MODEL SIZE
Recent research on deep convolutional neural networks (CNNs) has focused primarily on improving accuracy. For a given accuracy level, it is typically possible to identify multiple CNN architecturesExpand
SCNN: An accelerator for compressed-sparse convolutional neural networks
TLDR
This paper introduces the Sparse CNN (SCNN) accelerator architecture, which improves performance and energy efficiency by exploiting the zero-valued weights that stem from network pruning during training andzero-valued activations that arise from the common ReLU operator. Expand
...
1
2
3
4
5
...