• Publications
  • Influence
Federated Learning with Non-IID Data
This work presents a strategy to improve training on non-IID data by creating a small subset of data which is globally shared between all the edge devices, and shows that accuracy can be increased by 30% for the CIFAR-10 dataset with only 5% globally shared data. Expand
CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs
CMSIS-NN, efficient kernels developed to maximize the performance and minimize the memory footprint of neural network (NN) applications on Arm Cortex-M processors targeted for intelligent IoT edge devices are presented. Expand
Hello Edge: Keyword Spotting on Microcontrollers
It is shown that it is possible to optimize these neural network architectures to fit within the memory and compute constraints of microcontrollers without sacrificing accuracy, and the depthwise separable convolutional neural network (DS-CNN) is explored and compared against other neural network architecture. Expand
Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
This work presents a systematic design space exploration methodology to maximize the throughput of an OpenCL-based FPGA accelerator for a given CNN model, considering the FPGAs resource constraints such as on-chip memory, registers, computational resources and external memory bandwidth. Expand
Exploring sub-20nm FinFET design with Predictive Technology Models
Predictive MOSFET models are critical for early stage design-technology co-optimization and circuit design research and PTM for FinFET devices are generated for 5 technology nodes corresponding to the years 2012-2020 on the ITRS roadmap. Expand
Bit Fusion: Bit-Level Dynamically Composable Architecture for Accelerating Deep Neural Network
This work designs Bit Fusion, a bit-flexible accelerator that constitutes an array of bit-level processing elements that dynamically fuse to match the bitwidth of individual DNN layers, and compares it to two state-of-the-art DNN accelerators, Eyeriss and Stripes. Expand
TIMBER: Time borrowing and error relaying for online timing error resilience
TIMBER, a technique for online timing error resilience that masks timing errors by borrowing time from successive pipeline stages, can recover timing margins without instruction replay or roll-back support. Expand
RecNMP: Accelerating Personalized Recommendation with Near-Memory Processing
RecNMP is proposed which provides a scalable solution to improve system throughput, supporting a broad range of sparse embedding models, and is specifically tailored to production environments with heavy co-location of operators on a single server. Expand
Impact of Technology and Voltage Scaling on the Soft Error Susceptibility in Nanoscale CMOS
  • V. Chandra, R. Aitken
  • Computer Science
  • IEEE International Symposium on Defect and Fault…
  • 1 October 2008
This work shows that in sub-65 nm technology nodes with aggressive voltage scaling, it is equally critical to solve the soft error problems in logic (latches, flip-flops) as it is in SRAMs. Expand
PrivyNet: A Flexible Framework for Privacy-Preserving Deep Neural Network Training with A Fine-Grained Privacy Control
PrivyNet, a flexible framework to enable DNN training on the cloud while protecting the data privacy simultaneously, is proposed and validated, demonstrating that PrivyNet is efficient and can help explore and optimize the trade-off between privacy loss and accuracy. Expand