Memory-Efficient Deep Learning Inference in Trusted Execution Environments

  title={Memory-Efficient Deep Learning Inference in Trusted Execution Environments},
  author={Jean-Baptiste Truong and W. Lynn Gallagher and Tian Guo and Robert J. Walls},
  journal={2021 IEEE International Conference on Cloud Engineering (IC2E)},
This study identifies and proposes techniques to alleviate two key bottlenecks to executing deep neural networks in trusted execution environments (TEEs): page thrashing during the execution of convolutional layers and the decryption of large weight matrices in fully-connected layers. For the former, we propose a novel partitioning scheme, y-plane partitioning, designed to (i) provide consistent execution time when the layer output is large compared to the TEE secure memory; and (ii… 

Figures and Tables from this paper

Seculator: A Fast and Secure Neural Processing Unit
The key techniques in the proposed accelerator architecture, Seculator, are to encode memory access patterns to create a small HW-based tile version number generator for a given layer, and to store layer-level MACs, which completely eliminate the need for having a MAC cache and a Tile version number store.


Vessels: efficient and scalable deep learning prediction on trusted processors
Vessels is presented, a new system that addresses the inefficiency and overcomes the limitation on SGX memory through memory usage optimization techniques, and can achieve highly efficient and scalable deep learning prediction while providing strong data confidentiality and integrity with SGX.
Occlumency: Privacy-preserving Remote Deep-learning Inference Using SGX
This paper designed a suite of novel techniques to accelerate DL inference inside the enclave with a limited memory size and implemented Occlumency based on Caffe, a novel cloud-driven solution designed to protect user privacy without compromising the benefit of using powerful cloud resources.
Sponge Examples: Energy-Latency Attacks on Neural Networks
It is shown how adversaries can exploit carefully-crafted sponge examples, which are inputs designed to maximise energy consumption and latency, to drive machine learning (ML) systems towards their worst-case performance.
Privado: Practical and Secure DNN Inference
This work addresses the three main challenges that SGX-based DNN inferencing faces, namely, security, ease-of-use, and performance.
Quantizing deep convolutional networks for efficient inference: A whitepaper
An overview of techniques for quantizing convolutional neural networks for inference with integer weights and activations is presented and it is recommended that per-channel quantization of weights and per-layer quantized of activations be the preferred quantization scheme for hardware acceleration and kernel optimization.
Delphi: A Cryptographic Inference Service for Neural Networks
This work designs, implements, and evaluates DELPHI, a secure prediction system that allows two parties to execute neural network inference without revealing either party’s data, and develops a hybrid cryptographic protocol that improves upon the communication and computation costs over prior work.
Practical Black-Box Attacks against Machine Learning
This work introduces the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge, and finds that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder.
Chiron: Privacy-preserving Machine Learning as a Service
Evaluated Chiron is evaluated on popular deep learning models, focusing on benchmark image classification tasks such as CIFAR and ImageNet, and shows that its training performance and accuracy of the resulting models are practical for common uses of ML-as-a-service.
CryptoNets: applying neural networks to encrypted data with high throughput and accuracy
It is shown that the cloud service is capable of applying the neural network to the encrypted data to make encrypted predictions, and also return them in encrypted form, which allows high throughput, accurate, and private predictions.
Incremental Network Quantization: Towards Lossless CNNs with Low-Precision Weights
Extensive experiments on the ImageNet classification task using almost all known deep CNN architectures including AlexNet, VGG-16, GoogleNet and ResNets well testify the efficacy of the proposed INQ, showing that at 5-bit quantization, models have improved accuracy than the 32-bit floating-point references.