• Corpus ID: 231942635

Label Leakage and Protection in Two-party Split Learning

  title={Label Leakage and Protection in Two-party Split Learning},
  author={Oscar Li and Jiankai Sun and Xin Yang and Weihao Gao and Hongyi Zhang and Junyuan Xie and Virginia Smith and Chong Wang},
  • Oscar Li, Jiankai Sun, Chong Wang
  • Published 17 February 2021
  • Computer Science
  • ArXiv
Two-party split learning is a popular technique for learning a model across feature-partitioned data. In this work, we explore whether it is possible for one party to steal the private label information from the other party during split training, and whether there are methods that can protect against such attacks. Specifically, we first formulate a realistic threat model and propose a privacy loss metric to quantify label leakage in split learning. We then show that there exist two simple yet e… 

Figures from this paper

Differentially Private Label Protection in Split Learning
This work proposes TPSL (Transcript Private Split Learning), a generic gradient perturbation based split learning framework that provides provable differential privacy guarantee and is found to have a better utility-privacy trade-off than baselines.
Gradient Inversion Attack: Leaking Private Labels in Two-Party Split Learning
This paper proposes Gradient Inversion Attack (GIA), a label leakage attack that allows an adversarial input owner to learn the label owner’s private labels by exploiting the gradient information obtained during split learning.
ExPLoit: Extracting Private Labels in Split Learning
ExPLoit is proposed – a label-leakage attack that allows an adversarial input-owner to extract the private labels of the label-owner during split-learning using a novel loss function that combines gradient-matching and several regularization terms developed using key properties of the dataset and models.
Clustering Label Inference Attack against Practical Split Learning
Experimental results validate that the proposed approach is scalable and robust under different settings (e.g., cut layer positions, epochs, and batch sizes) for practical split learning.
Defending Label Inference and Backdoor Attacks in Vertical Federated Learning
It is shown that private labels can be reconstructed even when only batch-averaged gradients instead of sample-level gradients are revealed, and it is demonstrated that label inference attacks can be successfully blocked by this technique without hurting as much main task accuracy as compared to existing methods.
UnSplit: Data-Oblivious Model Inversion, Model Stealing, and Label Inference Attacks Against Split Learning
It is shown that an honest-butcurious split learning server, equipped only with the knowledge of the client neural network architecture, can recover the input samples and obtain a functionally similar model to the client model, without the client being able to detect the attack.
Residue-based Label Protection Mechanisms in Vertical Logistic Regression
Experimental results show that both the additive noise mechanism and the multiplicative noise mechanism can achieve label protection with only a slight drop in model testing accuracy, and the hybrid mechanism can achieved label protection without any testing accuracy degradation, which demonstrates the effectiveness and effectiveness of the protection techniques.
User-Level Label Leakage from Gradients in Federated Learning
This work investigates Label Leakage from Gradients (LLG), a novel attack to extract the labels of the users’ training data from their shared gradients, and suggests that gradient compression is a practical technique to mitigate the attack.
Batch Label Inference and Replacement Attacks in Black-Boxed Vertical Federated Learning
This paper explores the possibility of recovering labels in the vertical federated learning setting with HE-protected communication, and shows that private labels can be reconstructed with high accuracy by training a gradient inversion model, and demonstrates that label inference and replacement attacks can be successfully blocked by this technique without hurting as much main task accuracy as compared to existing methods.
User Label Leakage from Gradients in Federated Learning
Lab Leakage from Gradients (LLG), a novel attack to extract the labels of the users’ training data from their shared gradients by exploiting the direction and magnitude of gradients to determine the presence or absence of any label is proposed.


Practical Secure Aggregation for Privacy-Preserving Machine Learning
This protocol allows a server to compute the sum of large, user-held data vectors from mobile devices in a secure manner, and can be used, for example, in a federated learning setting, to aggregate user-provided model updates for a deep neural network.
A Differentially Private Stochastic Gradient Descent Algorithm for Multiparty Classification
A new differentially private algorithm for the multiparty setting that uses a stochastic gradient descent based procedure to directly optimize the overall multiparty objective rather than combining classifiers learned from optimizing local objectives.
Sample Complexity Bounds for Differentially Private Learning
An upper bound on the sample requirement of learning with label privacy is provided that depends on a measure of closeness between and the unlabeled data distribution and applies to the non-realizable as well as the realizable case.
Protection Against Reconstruction and Its Applications in Private Federated Learning
In large-scale statistical learning, data collection and model fitting are moving increasingly toward peripheral devices---phones, watches, fitness trackers---away from centralized data collection.
Differentially Private Federated Learning: A Client Level Perspective
The aim is to hide clients' contributions during training, balancing the trade-off between privacy loss and model performance, and empirical studies suggest that given a sufficiently large number of participating clients, this procedure can maintain client-level differential privacy at only a minor cost in model performance.
On Deep Learning with Label Differential Privacy
A novel algorithm, Randomized Response with Prior (RRWithPrior), is proposed, which can provide more accurate results while maintaining the same level of privacy guaranteed by RR, and is applied to learn neural networks with label differential privacy (LabelDP).
Scalable Private Set Intersection Based on OT Extension
This article focuses on PSI protocols that are secure against semi-honest adversaries and take advantage of the most recent efficiency improvements in Oblivious Transfer (OT) extension, proposes significant optimizations to previous PSi protocols, and suggests a new PSI protocol whose runtime is superior to that of existing protocols.
Secure Logistic Regression Based on Homomorphic Encryption: Design and Evaluation
The first homomorphically encrypted logistic regression outsourcing model is presented based on the critical observation that the precision loss of classification models is sufficiently small so that the decision plan stays still, to provide a practical support to the mainstream learning models.
Distributed Differential Privacy via Shuffling
Evidence that the power of the shuffled model lies strictly between those of the central and local models is given: for a natural restriction of the model, it is shown that shuffled protocols for a widely studied selection problem require exponentially higher sample complexity than do central-model protocols.
A General Approach to Adding Differential Privacy to Iterative Training Procedures
In this work we address the practical challenges of training machine learning models on privacy-sensitive datasets by introducing a modular approach that minimizes changes to training algorithms,