• Publications
  • Influence
Federated Learning with Non-IID Data
TLDR
We show that the accuracy of federated learning reduces significantly, by up to 55% for neural networks trained for highly skewed non-IID data, where each client device trains on a single class of data. Expand
  • 219
  • 27
  • PDF
Hello Edge: Keyword Spotting on Microcontrollers
TLDR
We perform neural network architecture evaluation and exploration for running KWS on resource-constrained microcontrollers. Expand
  • 124
  • 27
  • PDF
CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs
TLDR
We develop CMSIS-NN, efficient kernels developed to maximize the performance and minimize the memory footprint of neural network (NN) applications on Arm Cortex-M processors targeted for intelligent IoT edge devices. Expand
  • 93
  • 27
  • PDF
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network
TLDR
This work introduces dynamic bit-level fusion/decomposition as a new dimension in the design of DNN accelerators. Expand
  • 146
  • 15
  • PDF
PrivyNet: A Flexible Framework for Privacy-Preserving Deep Neural Network Training with A Fine-Grained Privacy Control
TLDR
We propose PrivyNet, a flexible framework to enable DNN training on the cloud while protecting the data privacy simultaneously. Expand
  • 27
  • 6
  • PDF
Deep Convolutional Neural Network Inference with Floating-point Weights and Fixed-point Activations
TLDR
We propose using floating-point representation for weights and fixed-point numbers for activations for CNN inference and demonstrate it on popular large-scale CNNs. Expand
  • 52
  • 5
  • PDF
SlackProbe: A low overhead in situ on-line timing slack monitoring methodology
TLDR
We observe that most existing slack monitoring methods exclusively focus on monitoring path ending registers, which is not cost efficient from power and area perspectives. Expand
  • 47
  • 4
  • PDF
DDRO: A novel performance monitoring methodology based on design-dependent ring oscillators
TLDR
We develop a systematic approach to the synthesis of multiple design-dependent monitors, as well as a corresponding delay estimation method. Expand
  • 37
  • 4
  • PDF
SlackProbe: A Flexible and Efficient In Situ Timing Slack Monitoring Methodology
TLDR
In situ monitoring is an accurate way to monitor circuit delay or timing slack, but usually incurs significant overhead. Expand
  • 29
  • 3
  • PDF
Synthesis and Analysis of Design-Dependent Ring Oscillator (DDRO) Performance Monitors
TLDR
We develop a systematic approach for the synthesis of multiple design-dependent monitors, as well as the corresponding calibration and delay estimation methods. Expand
  • 20
  • 3
  • PDF