Share This Author
EIE: Efficient Inference Engine on Compressed Deep Neural Network
- Song Han, Xingyu Liu, W. Dally
- Computer ScienceACM/IEEE 43rd Annual International Symposium on…
- 4 February 2016
An energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing and is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression.
Light Field Photography with a Hand-held Plenoptic Camera
This paper presents a camera that samples the 4D light field on its sensor in a single photographic exposure. This is achieved by inserting a microlens array between the sensor and main lens,…
Forwarding metamorphosis: fast programmable match-action processing in hardware for SDN
The RMT (reconfigurable match tables) model is proposed, a new RISC-inspired pipelined architecture for switching chips, and the essential minimal set of action primitives to specify how headers are processed in hardware are identified.
The future of wires
Wires that shorten in length as technologies scale have delays that either track gate delays or grow slowly relative to gate delays, which is good news since these "local" wires dominate chip wiring.
The Stanford Dash multiprocessor
The overall goals and major features of the directory architecture for shared memory (Dash), a distributed directory-based protocol that provides cache coherence without compromising scalability, are presented.
Architectural support for copy and tamper resistant software
The hardware implementation of a form of execute-only memory (XOM) that allows instructions stored in memory to be executed but not otherwise manipulated is studied, indicating that it is possible to create a normal multi-tasking machine where nearly all applications can be run in XOM mode.
TETRIS: Scalable and Efficient Neural Network Acceleration with 3D Memory
The hardware architecture and software scheduling and partitioning techniques for TETRIS, a scalable NN accelerator using 3D memory, are presented and it is shown that despite the use of small SRAM buffers, the presence of3D memory simplifies dataflow scheduling for NN computations.
Clustered voltage scaling technique for low-power design
Low-power digital design
- M. Horowitz, T. Indermaur, R. Gonzalez
- EngineeringProceedings of IEEE Symposium on Low Power…
- 10 October 1994
Recently there has been a surge of interest in low-power devices and design techniques. While many papers have been published describing power-saving techniques for use in digital systems, trade-offs…
Understanding sources of inefficiency in general-purpose chips
The sources of these performance and energy overheads in general-purpose processing systems are explored by quantifying the overheads of a 720p HD H.264 encoder running on a general- Purpose CMP system and exploring methods to eliminate these overheads by transforming the CPU into a specialized system for H. 264 encoding.