• Publications
  • Influence
Virtual-channel flow control
TLDR
Network throughput can be increased by dividing the buffer storage associated with each network channel into several virtual channels [DalSei]. Expand
  • 708
  • 83
  • PDF
ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA
TLDR
We propose a load-balance-aware pruning method that can compress the LSTM model size by 20x (10x from pruning and 2x from quantization) with negligible loss of the prediction accuracy. Expand
  • 333
  • 63
  • PDF
A VLSI Architecture for Concurrent Data Structures
TLDR
Concurrent data structures simplify the development of concurrent programs by encapsulating commonly used mechanisms for synchronization and communication into data structures. Expand
  • 229
  • 9
A programming system for the imagine media processor
TLDR
This thesis describes a programming system that enables efficient application development for Imagine and other architectures that include these innovations. Expand
  • 101
  • 9
  • PDF
Network and processor architecture for message-driven computers
  • 101
  • 9
A Deep Neural Network Compression Pipeline: Pruning, Quantization, Huffman Encoding
TLDR
We introduce a three stage pipeline: pruning, quantization and Huffman encoding, that work together to reduce the storage requirement of neural networks by 35× to 49×. Expand
  • 48
  • 9
  • PDF
Flow control and micro-architectural mechanisms for extending the performance of interconnection networks
TLDR
In recent years, interconnection network fabrics, historically used in high-end multiprocessor systems, have been deployed in a wide spectrum of communication systems—I/O interconnects, high-speed network switches, terabit Internet routers and on-chip Interconnects. Expand
  • 58
  • 7
  • PDF
The message-driven processor
  • 50
  • 6
  • PDF
Processor coupling: integrating compile time and runtime scheduling for parallelism
TLDR
We present processor coupling, a mechanism for controlling multiple high-performance floating-point ALUs to exploit both instruction-level and inter-thread parallelism, by using compile time and runtime scheduling. Expand
  • 117
  • 5
  • PDF
The J-machine multicomputer: an architectural evaluation
TLDR
The MIT J-Machine multicomputer has been constructed to study the role of a set of primitive mechanisms in providing efficient support for parallel computing. Expand
  • 183
  • 4
  • PDF