• Publications
  • Influence
Soft-to-Hard Vector Quantization for End-to-End Learning Compressible Representations
We present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy. Our method is based on a soft (continuous) relaxation of quantization andExpand
CAS-CNN: A deep convolutional neural network for image compression artifact suppression
TLDR
This work presents a novel 12-layer deep convolutional network for image compression artifact suppression with hierarchical skip connections and a multi-scale loss function and shows that a network trained for a specific quality factor is resilient to the QF used to compress the input image. Expand
Origami: A Convolutional Network Accelerator
TLDR
This paper presents the first convolutional network accelerator which is scalable to network sizes that are currently only handled by workstation GPUs, but remains within the power envelope of embedded systems. Expand
YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration
TLDR
This paper presents an accelerator optimized for binary-weight CNNs that significantly outperforms the state-of-the-art in terms of energy and area efficiency and removes the need for expensive multiplications, as well as reducing I/O bandwidth and storage. Expand
Origami: A 803-GOp/s/W Convolutional Network Accelerator
TLDR
A new architecture, design, and implementation, as well as the first reported silicon measurements of such an accelerator, outperforming previous work in terms of power, area, and I/O efficiency are presented. Expand
Accelerating real-time embedded scene labeling with convolutional networks
TLDR
This paper presents an optimized convolutional network implementation suitable for real-time scene labeling on embedded platforms and demonstrates that for scene labeling this approach achieves a 1.5x improvement in throughput when compared to a modern desktop CPU at a power budget of only 11 W. Expand
YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights
TLDR
A HW accelerator optimized for BinaryConnect CNNs that achieves 1510 GOp/s on a core area of only 1.33 MGE and with a power dissipation of 153 mW in UMC 65 nm technology at 1.2 V is presented. Expand
Soft-to-Hard Vector Quantization for End-to-End Learned Compression of Images and Neural Networks
In this work we present a new approach to learn compressible representations in deep architectures with an end-to-end training strategy. Our method is based on a soft (continuous) relaxation ofExpand
CBinfer: Exploiting Frame-to-Frame Locality for Faster Convolutional Network Inference on Video Streams
TLDR
This work adopts an orthogonal viewpoint and proposes a novel algorithm exploiting the spatio-temporal sparsity of pixel changes that resulted in an average speed-up of 9.1X over cuDNN on the Tegra X2 platform at a negligible accuracy loss and a lower power consumption. Expand
InfiniTime: Multi-sensor wearable bracelet with human body harvesting
TLDR
Simulations using energy intake measurements from solar and TEG modules confirm that InfiniTime achieves self-sustainability with indoor lighting levels and body heat for several realistic applications featuring data acquisition from the on-board camera and multiple sensors, as well as visualization and wireless connectivity. Expand
...
1
2
3
4
5
...