• Publications
  • Influence
Stripes: Bit-serial deep neural network computing
Motivated by the variance in the numerical precision requirements of Deep Neural Networks (DNNs) [1], [2], Stripes (STR), a hardware accelerator is presented whose execution time scales almostExpand
  • 190
  • 30
  • Open Access
Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing
This work observes that a large fraction of the computations performed by Deep Neural Networks (DNNs) are intrinsically ineffectual as they involve a multiplication where one of the inputs is zero.Expand
  • 229
  • 28
  • Open Access
Dependence based prefetching for linked data structures
We introduce a dynamic scheme that captures the accesspat-terns of linked data structures and can be used to predict future accesses with high accuracy. Our technique exploits the dependenceExpand
  • 323
  • 20
  • Open Access
Low-leakage asymmetric-cell SRAM
We introduce a novel family of asymmetric dual-Vt SRAM cell designs that reduce leakage power in caches while maintaining low access latency. Our designs exploit the strong bias towards zero at theExpand
  • 159
  • 20
  • Open Access
Streamlining inter-operation memory communication via data dependence prediction
  • A. Moshovos, G. Sohi
  • Computer Science
  • Proceedings of 30th Annual International…
  • 1 December 1997
We revisit memory hierarchy design viewing memory as an inter-operation communication agent. This perspective leads to the development of novel methods of performing inter-operation memoryExpand
  • 153
  • 16
  • Open Access
JETTY: filtering snoops for reduced energy consumption in SMP servers
We propose methods for reducing the energy consumed by snoop requests in snoopy bus-based symmetric multiprocessor (SMP) systems. Observing that a large fraction of snoops do not find copies in manyExpand
  • 196
  • 12
  • Open Access
A Tagless Coherence Directory
A key challenge in architecting a CMP with many cores is maintaining cache coherence in an efficient manner. Directory-based protocols avoid the bandwidth overhead of snoop-based protocols, andExpand
  • 137
  • 11
  • Open Access
Multi-grain coherence directories
Conventional directory coherence operates at the finest granularity possible, that of a cache block. While simple, this organization fails to exploit frequent application behavior: at any given pointExpand
  • 46
  • 11
  • Open Access