Learn More
This work observes that a large fraction of the computations performed by Deep Neural Networks (DNNs) are intrinsically ineffectual as they involve a multiplication where one of the inputs is zero. This observation motivates <i>Cnvlutin</i> (<i>CNV</i>), a value-based approach to hardware acceleration that eliminates most of these ineffectual operations,(More)
This work investigates how using reduced precision data in Convolutional Neural Networks (CNNs) affects network accuracy during classification. Unlike previous work, this study considers networks where each layer may use different precision data. Our key result is the observation that the tolerance of CNNs to reduced precision data not only varies across(More)
—Motivated by the variance in the numerical precision requirements of Deep Neural Networks (DNNs) [1], [2], Stripes (STR), a hardware accelerator is presented whose execution time scales almost proportionally with the length of the numerical representation used. STR relies on bit-serial compute units and on the parallelism that is naturally present within(More)
This work exploits the tolerance of Deep Neural Networks (DNNs) to reduced precision numerical representations and specifically, their recently demonstrated ability to tolerate representations of different precision per layer while maintaining accuracy. This flexibility enables improvements over conventional DNN implementations that use a single, uniform(More)
Modern smartphones comprise several processing and input/output units that communicate mostly through main memory. As a result, memory represents a critical performance bottleneck for smartphones. This work<sup>1</sup> introduces a set of emerging workloads for smartphones and characterizes the performance of several memory controller policies and(More)
Research Interests Computer architecture, many-core architectures, on-chip networks, cache coherence protocols, inter-connection networks, emerging applications for many-core architectures.
—We quantify a source of ineffectual computations when processing the multiplications of the convolutional layers in Deep Neural Networks (DNNs) and propose Pragmatic (PRA), an architecture that exploits it improving performance and energy efficiency. The source of these ineffectual computations is best understood in the context of conventional multipliers(More)
  • 1