Author pages are created from data sourced from our academic publisher partnerships and public sources.
Share This Author
Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding
This work introduces "deep compression", a three stage pipeline: pruning, trained quantization and Huffman coding, that work together to reduce the storage requirement of neural networks by 35x to 49x without affecting their accuracy.
SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size
- Forrest N. Iandola, M. Moskewicz, Khalid Ashraf, Song Han, W. Dally, K. Keutzer
- Computer ScienceArXiv
- 24 February 2016
This work proposes a small DNN architecture called SqueezeNet, which achieves AlexNet-level accuracy on ImageNet with 50x fewer parameters and is able to compress to less than 0.5MB (510x smaller than AlexNet).
Learning both Weights and Connections for Efficient Neural Network
A method to reduce the storage and computation required by neural networks by an order of magnitude without affecting their accuracy by learning only the important connections, and prunes redundant connections using a three-step method.
Principles and Practices of Interconnection Networks
This book offers a detailed and comprehensive presentation of the basic principles of interconnection network design, clearly illustrating them with numerous examples, chapter exercises, and case studies, allowing a designer to see all the steps of the process from abstract design to concrete implementation.
EIE: Efficient Inference Engine on Compressed Deep Neural Network
- Song Han, Xingyu Liu, W. Dally
- Computer ScienceACM/IEEE 43rd Annual International Symposium on…
- 4 February 2016
An energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing and is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression.
Deadlock-Free Message Routing in Multiprocessor Interconnection Networks
A deadlock-free routing algorithm can be generated for arbitrary interconnection networks using the concept of virtual channels, which is used to develop deadlocked routing algorithms for k-ary n-cubes, for cube-connected cycles, and for shuffle-exchange networks.
Route packets, not wires: on-chip interconnection networks
The concept of on-chip networks is introduced, a simple network is sketched, and some challenges in the architecture and design of these networks are discussed.
Memory access scheduling
- S. Rixner, W. Dally, U. Kapasi, P. Mattson, John Douglas Owens
- Computer ScienceProceedings of 27th International Symposium on…
- 1 May 2000
This paper introduces memory access scheduling, a technique that improves the performance of a memory system by reordering memory references to exploit locality within the 3-D memory structure.
SCNN: An accelerator for compressed-sparse convolutional neural networks
- A. Parashar, Minsoo Rhu, W. Dally
- Computer ScienceACM/IEEE 44th Annual International Symposium on…
- 23 May 2017
The Sparse CNN (SCNN) accelerator architecture is introduced, which improves performance and energy efficiency by exploiting thezero-valued weights that stem from network pruning during training and zero-valued activations that arise from the common ReLU operator.
Technology-Driven, Highly-Scalable Dragonfly Topology
- John Kim, W. Dally, Steve Scott, Dennis Abts
- Computer ScienceInternational Symposium on Computer Architecture
- 1 June 2008
The dragonfly topology is introduced which uses a group of high-radix routers as a virtual router to increase the effective radix of the network and the use of selective virtual-channel discrimination and theUse of credit round-trip latency to both sense and signal channel congestion gives throughput and latency that approaches that of an ideal adaptive routing algorithm.