Patrick Judd

Learn More
This work observes that a large fraction of the computations performed by Deep Neural Networks (DNNs) are intrinsically ineffectual as they involve a multiplication where one of the inputs is zero. This observation motivates <i>Cnvlutin</i> (<i>CNV</i>), a value-based approach to hardware acceleration that eliminates most of these ineffectual operations,(More)
This work investigates how using reduced precision data in Convolutional Neural Networks (CNNs) affects network accuracy during classification. Unlike previous work, this study considers networks where each layer may use different precision data. Our key result is the observation that the tolerance of CNNs to reduced precision data not only varies across(More)
The numerical representation precision required by the computations performed by Deep Neural Networks (DNNs) varies across networks and between layers of a same network. This observation motivates a precision-based approach to acceleration which takes into account both the computational structure and the required numerical precision representation. This(More)
This work exploits the tolerance of Deep Neural Networks (DNNs) to reduced precision numerical representations and specifically, their recently demonstrated ability to tolerate representations of different precision per layer while maintaining accuracy. This flexibility enables improvements over conventional DNN implementations that use a single, uniform(More)
Modern smartphones comprise several processing and input/output units that communicate mostly through main memory. As a result, memory represents a critical performance bottleneck for smartphones. This work<sup>1</sup> introduces a set of emerging workloads for smartphones and characterizes the performance of several memory controller policies and(More)
Deep Neural Networks expose a high degree of parallelism, making them amenable to highly data parallel architectures. However, data-parallel architectures often accept inefficiency in individual computations for the sake of overall efficiency. We show that on average, activation values of convolutional layers during inference in modern Deep Convolutional(More)
Stripes is a Deep Neural Network (DNN) accelerator that uses bit-serial computation to offer performance that is proportional to the fixed-point precision of the activation values. The fixed-point precisions are determined a priori using profiling and are selected at a per layer granularity. This paper presents Dynamic Stripes, an extension to Stripes that(More)
Loom (LM), a hardware inference accelerator for Convolutional Neural Networks (CNNs) is presented. In LM every bit of data precision that can be saved translates to proportional performance gains. Specifically, for convolutional layers LM’s execution time scales inversely proportionally with the precisions of both weights and activations. For(More)
We discuss several modifications and extensions over the previous proposed Cnvlutin (CNV) accelerator for convolutional and fully-connected layers of Deep Learning Network. We first describe different encodings of the activations that are deemed ineffectual. The encodings have different memory overhead and energy characteristics. We propose using a level of(More)