• Publications
  • Influence
Federated Learning with Non-IID Data
TLDR
We show that the accuracy of federated learning reduces significantly, by up to 55% for neural networks trained for highly skewed non-IID data, where each client device trains on a single class of data. Expand
  • 222
  • 27
  • PDF
Hello Edge: Keyword Spotting on Microcontrollers
TLDR
We perform neural network architecture evaluation and exploration for running KWS on resource-constrained microcontrollers. Expand
  • 124
  • 27
  • PDF
CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs
TLDR
We develop CMSIS-NN, efficient kernels developed to maximize the performance and minimize the memory footprint of neural network (NN) applications on Arm Cortex-M processors targeted for intelligent IoT edge devices. Expand
  • 93
  • 27
  • PDF
Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA
TLDR
In this work, we present a scalable solution that integrates the flexibility of high-level synthesis and the finer level optimization of an RTL implementation for end-to-end CNN implementations. Expand
  • 88
  • 7
PrivyNet: A Flexible Framework for Privacy-Preserving Deep Neural Network Training with A Fine-Grained Privacy Control
TLDR
We propose PrivyNet, a flexible framework to enable DNN training on the cloud while protecting the data privacy simultaneously. Expand
  • 27
  • 6
  • PDF
Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations
TLDR
We propose using floating-point representation for weights and fixed-point numbers for activations for CNN inference and demonstrate it on popular large-scale CNNs. Expand
  • 52
  • 5
  • PDF
ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler
TLDR
This work presents a scalable solution that achieves the flexibility and reduced design time of high-level synthesis and the near-optimality of an RTL implementation. Expand
  • 28
  • 2
Enabling Deep Learning at the LoT Edge
TLDR
We introduce CMSIS-NN, a library of optimized software kernels to enable deployment of NNs on Cortex-M cores, using keyword spotting as an example. Expand
  • 13
  • 1
Not All Ops Are Created Equal!
TLDR
We show that throughput and energy varies by up to 5X across different neural network operation types on an off-the-shelf Arm Cortex-M7 microcontroller. Expand
  • 11
  • 1
  • PDF
High-performance face detection with CPU-FPGA acceleration
TLDR
In this paper, we propose a suite of acceleration techniques to enable real-time face detection on the CPU-FPGA platform, based on a state-of-the-art face detection algorithm that employs a large number of simple classifiers. Expand
  • 7
  • 1