Differentially Private Bias-Term only Fine-tuning of Foundation Models

  title={Differentially Private Bias-Term only Fine-tuning of Foundation Models},
  author={Zhiqi Bu and Yu-Xiang Wang and Sheng Zha and George Karypis},
We study the problem of differentially private (DP) fine-tuning of large pre-trained models – a recent privacy-preserving approach suitable for solving downstream tasks with sensitive data. Existing work has demonstrated that high accuracy is possible under strong privacy constraint, yet requires significant computational overhead or modifications to the network architecture. We propose differentially private bias-term fine-tuning (DP-BiTFiT), which matches the state-of-the-art accuracy for DP… 

Differentially Private Image Classification from Features

It is found that, while commonly used logistic regression performs better than linear regression in the non-private setting, the situation is reversed in the private setting, and a novel optimization algorithm called DP-FC is proposed, which leverages feature covariance instead of the Hessian of the logistics regression loss and performs well across all ε values the authors tried.



Differentially Private Fine-tuning of Language Models

We give simpler, sparser, and faster algorithms for differentially private fine-tuning of large-scale pre-trained language models, which achieve the state-of-the-art privacy versus utility tradeoffs on

Unlocking High-Accuracy Differentially Private Image Classification through Scale

It is demonstrated that DP-SGD on over-parameterized models can perform significantly better than previously thought and is believed to be a step towards closing the accuracy gap between private and non-private image classi-cation benchmarks.

Large Language Models Can Be Strong Differentially Private Learners

Empirical results reveal that private learning with pretrained language models tends to not suffer from dimension-dependent performance degradation, and DP optimization fails at learning high-dimensional models, but is on par with or outperforms others methods that execute gradient update in low dimensional spaces.

Scaling up Differentially Private Deep Learning with Fast Per-Example Gradient Clipping

New methods for per-example gradient clipping that are compatible with auto-differeniation and provide better GPU utilization are derived by analyzing the back-propagation equations of Renyi Differential Privacy.

Automatic Clipping: Differentially Private Deep Learning Made Easier and Stronger

This work proposes an easy-to-use replacement, called AutoClipping, that eliminates the need to tune R for any DP optimizers, including DP-SGD, DP-Adam,DP-LAMB and many others, and shows that automatic clipping outperforms or matches the state-of-the-art, and can be easily employed with minimal changes to existing codebases.

Large Scale Transfer Learning for Differentially Private Image Classification

This work zoom in on the ImageNet dataset and demonstrates that similar to the non-private case, pre-training over-parameterized models on a large public dataset can lead to substantial gains when the model is netuned privately, by systematically comparing private and non- private models across a range of huge batch sizes.

Do Not Let Privacy Overbill Utility: Gradient Embedding Perturbation for Private Learning

An algorithm Gradient Embedding Perturbation towards training differentially private deep models with decent accuracy and modest privacy guarantee for deep models, which greatly helps to break the dimensional barrier of private learning.

Large Scale Private Learning via Low-rank Reparametrization

This work is the first able to apply differential privacy on the BERT model and achieve an average accuracy of 83.9% on four downstream tasks with ǫ = 8, which is within 5% loss compared to the non-private baseline but enjoys much lower privacy leakage risk.

Toward Training at ImageNet Scale with Differential Privacy

Initial lessons from this effort to investigate how to train differential privacy training at scale are shared, showing approaches that help make DP training faster, as well as model types and settings of the training process that tend to work better in the DP setting.

Normalized/Clipped SGD with Perturbation for Differentially Private Non-Convex Optimization

This paper studies two algorithms, DP-SGD and DP-NSGD, which clip or normalize per-sample gradients to bound the sensitivity and then add noise to obfuscate the exact information, and demonstrates that these two algorithms achieve similar best accuracy.