• Corpus ID: 238856982

Bandwidth Utilization Side-Channel on ML Inference Accelerators

  title={Bandwidth Utilization Side-Channel on ML Inference Accelerators},
  author={Sarbartha Banerjee and Shijia Wei and Prakash Ramrakhyani and Mohit Tiwari},
Accelerators used for machine learning (ML) inference provide great performance benefits over CPUs. Securing confidential model in inference against off-chip side-channel attacks is critical in harnessing the performance advantage in practice. Data and memory address encryption has been recently proposed to defend against off-chip attacks. In this paper, we demonstrate that bandwidth utilization on the interface between accelerators and the weight storage can serve a side-channel for leaking… 

Figures and Tables from this paper


InvisiMem: Smart memory defenses for memory bus side channel
It is demonstrated that smart memory, memory with compute capability and a packetized interface, can dramatically simplify this problem and have one to two orders of magnitude of lower overheads for performance, space, energy, and memory bandwidth, compared to prior solutions.
Reverse Engineering Convolutional Neural Networks Through Side-channel Information Leaks
This study shows that even with data encryption, the adversary can infer the underlying network structure by exploiting the memory and timing side-channels, and reveals the importance of hiding off-chip memory access pattern to truly protect confidential CNN models.
EIE: Efficient Inference Engine on Compressed Deep Neural Network
  • Song Han, Xingyu Liu, +4 authors W. Dally
  • Computer Science
    2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA)
  • 2016
An energy efficient inference engine (EIE) that performs inference on this compressed network model and accelerates the resulting sparse matrix-vector multiplication with weight sharing and is 189x and 13x faster when compared to CPU and GPU implementations of the same DNN without compression.
Camouflage: Memory Traffic Shaping to Mitigate Timing Attacks
Camouflage introduces the novel idea of shaping memory requests' and responses' inter-arrival time into a pre-determined distribution for security purposes, even creating additional fake traffic if needed, and offers a tunable trade-off between system security and system performance.
ObfusMem: A low-overhead access obfuscation for trusted memories
This work proposes a new approach to access pattern obfuscation, called ObfusMem, which adds the memory to the trusted computing base and incorporates cryptographic engines within the memory, and encrypts commands and addresses on the memory bus, hence the access pattern is cryptographically obfuscated from external observers.
DaDianNao: A Machine-Learning Supercomputer
  • Yunji Chen, Tao Luo, +8 authors O. Temam
  • Computer Science
    2014 47th Annual IEEE/ACM International Symposium on Microarchitecture
  • 2014
This article introduces a custom multi-chip machine-learning architecture, showing that, on a subset of the largest known neural network layers, it is possible to achieve a speedup of 450.65x over a GPU, and reduce the energy by 150.31x on average for a 64-chip system.
In-datacenter performance analysis of a tensor processing unit
  • N. Jouppi, C. Young, +73 authors D. Yoon
  • Computer Science
    2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA)
  • 2017
This paper evaluates a custom ASIC-called a Tensor Processing Unit (TPU)-deployed in datacenters since 2015 that accelerates the inference phase of neural networks (NN) and compares it to a server-class Intel Haswell CPU and an Nvidia K80 GPU, which are contemporaries deployed in the samedatacenters.
TVM: End-to-End Optimization Stack for Deep Learning
TVM is proposed, an end-to-end optimization stack that exposes graph-level and operator-level optimizations to provide performance portability to deep learning workloads across diverse hardware back-ends and discusses the optimization challenges specific toDeep learning that TVM solves.
VTA: An Open Hardware-Software Stack for Deep Learning
This work proposes VTA, a programmable deep learning architecture template designed to be extensible in the face of evolving workloads, and proposes a runtime system equipped with a JIT compiler for flexible code-generation and heterogeneous execution that enables effective use of the VTA architecture.
Stealing Machine Learning Models via Prediction APIs
Simple, efficient attacks are shown that extract target ML models with near-perfect fidelity for popular model classes including logistic regression, neural networks, and decision trees against the online services of BigML and Amazon Machine Learning.