THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption

@inproceedings{Chen2022THEXPT,
  title={THE-X: Privacy-Preserving Transformer Inference with Homomorphic Encryption},
  author={Tianyu Chen and Hangbo Bao and Shaohan Huang and Li Dong and Binxing Jiao and Daxin Jiang and Haoyi Zhou and Jianxin Li and Furu Wei},
  booktitle={Findings},
  year={2022}
}
As more and more pre-trained language models adopt on-cloud deployment, the privacy issues grow quickly, mainly for the exposure of plain-text user data (e.g., search history, medical record, bank account). Privacy-preserving inference of transformer models is on the demand of cloud service users. To protect privacy, it is an attractive choice to compute only with ciphertext in homomorphic encryption (HE). However, enabling pre-trained models inference on ciphertext data is difficult due to the… 

Iron : Private Inference on Transformers

This work proposes a customized homomorphic encryption-based protocol for matrix multiplication that crucially relies on a novel compact packing technique and designs efficient protocols for three non-linear functions via integrating advanced underlying protocols and specialized optimizations.

Toward Privacy-preserving Text Embedding Similarity with Homomorphic Encryption

Through text embedding inversion tests, it is proved that the benchmark datasets are vulnerable to inversion attacks and another privacy preserving approach, dχ-privacy, a relaxed version of Local Differential Privacy method fails to prevent them.

TextFusion: Privacy-Preserving Pre-trained Model Inference via Token Fusion

This work proposes TextFusion, a novel method for preserving inference privacy, which trains a Fusion Predictor to dynamically fuse token representations, which hides multiple private token representations behind an unrecognizable one.

References

SHOWING 1-10 OF 47 REFERENCES

nGraph-HE2: A High-Throughput Framework for Neural Network Inference on Encrypted Data

The proposed nGraph-HE2 framework leverages the CKKS scheme, whose support for real numbers is friendly to data science, and a client-aided model using a two-party approach to compute activation functions to enable privacy-preserving inference on standard, pre-trained models using their native activation functions and number fields.

MP2ML: a mixed-protocol machine learning framework for private inference

MP2ML is a machine learning framework which integrates nGraph-HE and the secure two-party computation framework ABY to execute DL inference while maintaining the privacy of both the input data and model weights and is compatible with popular DL frameworks such as TensorFlow.

TextHide: Tackling Data Privacy for Language Understanding Tasks

The proposed TextHide requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data, and it fits well with the popular framework of fine-tuning pre-trained language models for any sentence or sentence-pair task.

Natural Language Understanding with Privacy-Preserving BERT

This work investigates the privacy and utility implications of applying dχ-privacy, a variant of Local Differential Privacy, to BERT fine-tuning in NLU applications and proposes privacy-adaptive LM pretraining methods, which can boost the utility of BERT dramatically while retaining the same level of privacy protection.

Differentially Private Representation for NLP: Formal Guarantee and An Empirical Study on Privacy and Fairness

Experimental results on benchmark datasets under various parameter settings demonstrate that DPNR largely reduces privacy leakage without significantly sacrificing the main task performance.

Fully homomorphic encryption using ideal lattices

This work proposes a fully homomorphic encryption scheme that allows one to evaluate circuits over encrypted data without being able to decrypt, and describes a public key encryption scheme using ideal lattices that is almost bootstrappable.

Deep Leakage from Gradients

This work shows that it is possible to obtain the private training data from the publicly shared gradients, and names this leakage as Deep Leakage from Gradient and empirically validate the effectiveness on both computer vision and natural language processing tasks.

iDLG: Improved Deep Leakage from Gradients

This paper finds that sharing gradients definitely leaks the ground-truth labels and proposes a simple but reliable approach to extract accurate data from the gradients, which is valid for any differentiable model trained with cross-entropy loss over one-hot labels and is named Improved DLG (iDLG).

Calibrating Noise to Sensitivity in Private Data Analysis

The study is extended to general functions f, proving that privacy can be preserved by calibrating the standard deviation of the noise according to the sensitivity of the function f, which is the amount that any single argument to f can change its output.

Privacy enabled Financial Text Classification using Differential Privacy and Federated Learning

This work proposes a contextualized transformer (BERT and RoBERTa) based text classification model integrated with privacy features like Differential Privacy (DP) and Federated Learning (FL) and evaluates it on the Financial Phrase Bank dataset.