Privacy Leakage of Adversarial Training Models in Federated Learning Systems

  title={Privacy Leakage of Adversarial Training Models in Federated Learning Systems},
  author={Jingyang Zhang and Yiran Chen and Hai Helen Li},
  journal={2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW)},
Adversarial Training (AT) is crucial for obtaining deep neural networks that are robust to adversarial attacks, yet recent works found that it could also make models more vulnerable to privacy attacks. In this work, we further reveal this unsettling property of AT by designing a novel privacy attack that is practically applicable to the privacy-sensitive Federated Learning (FL) systems. Using our method, the attacker can exploit AT models in the FL system to accurately reconstruct users… 

Figures from this paper



Robust or Private? Adversarial Training Makes Models More Vulnerable to Privacy Attacks

This work demonstrates how model inversion attacks, extracting training data directly from the model, previously thought to be intractable become feasible when attacking a robustly trained model.

Inverting Gradients - How easy is it to break privacy in federated learning?

It is shown that is is actually possible to faithfully reconstruct images at high resolution from the knowledge of their parameter gradients, and it is demonstrated that such a break of privacy is possible even for trained deep networks.

Towards Deep Learning Models Resistant to Adversarial Attacks

This work studies the adversarial robustness of neural networks through the lens of robust optimization, and suggests the notion of security against a first-order adversary as a natural and broad security guarantee.

Privacy Risks of Securing Machine Learning Models against Adversarial Examples

This paper measures the success of membership inference attacks against six state-of-the-art defense methods that mitigate the risk of adversarial examples, and proposes two new inference methods that exploit structural properties of robust models on adversarially perturbed data.

RobustBench: a standardized adversarial robustness benchmark

This work evaluates robustness of models for their benchmark with AutoAttack, an ensemble of white- and black-box attacks which was recently shown in a large-scale study to improve almost all robustness evaluations compared to the original publications.

Do Adversarially Robust ImageNet Models Transfer Better?

It is found that adversarially robust models, while less accurate, often perform better than their standard-trained counterparts when used for transfer learning, and this work focuses on adversARially robust ImageNet classifiers.

Membership Inference Attacks Against Machine Learning Models

This work quantitatively investigates how machine learning models leak information about the individual data records on which they were trained and empirically evaluates the inference techniques on classification models trained by commercial "machine learning as a service" providers such as Google and Amazon.

Adversarial Robustness as a Prior for Learned Representations

This work shows that robust optimization can be re-cast as a tool for enforcing priors on the features learned by deep neural networks, and indicates adversarial robustness as a promising avenue for improving learned representations.

iDLG: Improved Deep Leakage from Gradients

This paper finds that sharing gradients definitely leaks the ground-truth labels and proposes a simple but reliable approach to extract accurate data from the gradients, which is valid for any differentiable model trained with cross-entropy loss over one-hot labels and is named Improved DLG (iDLG).

Model Inversion Attacks that Exploit Confidence Information and Basic Countermeasures

A new class of model inversion attack is developed that exploits confidence values revealed along with predictions and is able to estimate whether a respondent in a lifestyle survey admitted to cheating on their significant other and recover recognizable images of people's faces given only their name.