• Corpus ID: 231861410

CaPC Learning: Confidential and Private Collaborative Learning

  title={CaPC Learning: Confidential and Private Collaborative Learning},
  author={Christopher A. Choquette-Choo and Natalie Dullerud and Adam Dziedzic and Yunxiang Zhang and Somesh Jha and Nicolas Papernot and Xiao Wang},
Machine learning benefits from large training datasets, which may not always be possible to collect by any single entity, especially when using privacy-sensitive data. In many contexts, such as healthcare and finance, separate parties may wish to collaborate and learn from each other’s data but are prevented from doing so due to privacy regulations. Some regulations prevent explicit sharing of data between parties by joining datasets in a central location (confidentiality). Others also limit… 

Private, Efficient, and Accurate: Protecting Models Trained by Multi-party Learning with Differential Privacy

A solution, referred to as PEA (Private, Efficient, Accu-rate), which consists of a secure differentially private stochastic gradient descent (DPSGD for short) protocol and two optimization methods and is implemented in two open-source MPL frameworks.


  • Computer Science
  • 2021
Three new privacy-preserving multi-label mechanisms are proposed: Binary, τ, and Powerset voting, which enable multi- label CaPC and show that these mechanisms can be used to collaboratively improve models in a multi-site (distributed) setting.

SafeNet: The Unreasonable Effectiveness of Ensembles in Private Collaborative Learning

It is argued that model ensembles, implemented in the framework called SafeNet, are a highly MPC-amenable way to avoid many adversarial ML attacks and the simplicity, cheap setup, and robustness properties of ensembling make it a strong choice for training ML models privately in MPC.

ProgFed: Effective, Communication, and Computation Efficient Federated Learning by Progressive Training

Extensive results on a broad range of architectures, including CNNs, ResNet, ConvNets, and U-nets, and diverse tasks from simple classification to medical image segmentation show that the ProgFed approach saves up to 20% computation and up to 63% communication costs for converged models.

A Comprehensive Survey of Privacy-preserving Federated Learning

A comprehensive and systematic survey on the PPFL based on the proposed 5W-scenario-based taxonomy is presented, which analyze the privacy leakage risks in the FL from five aspects, summarize existing methods, and identify future research directions.

In Differential Privacy, There is Truth: On Vote Leakage in Ensemble Private Learning

This work observes that this use of noise, which makes PATE predictions stochastic, enables new forms of leakage of sensitive information, and encourages future work to consider privacy holistically rather than treat differential privacy as a panacea.

Private and Reliable Neural Network Inference

This work presents the first system which enables privacy-preserving NN inference with robustness and fairness guarantees in a system called Phoenix, and is believed to be the first work which bridges the areas of client data privacy and reliability guarantees for NNs.

Privacy-Preserving Federated Recurrent Neural Networks

R HODE is the first system that provides the building blocks for the training of RNNs and its variants, under encryption in a federated learning setting, and it is proposed a novel packing scheme, multi-dimensional packing, for a better utilization of Single Instruction, Multiple Data operations under encryption.

Disparate Impact in Differential Privacy from Gradient Misalignment

This work studies the causes of unfairness in DPSGD and identifies gradient misalignment due to inequitable gradient clipping as the most significant source, which leads to a new method for reducing unfairness by preventing gradient misAlignment in DPS GD.

SafeNet: Mitigating Data Poisoning Attacks on Private Machine Learning

The SafeNet framework for building ensemble models in MPC with formal guarantees of robustness to data poisoning attacks is proposed and demonstrated, which reduces backdoor attack success from 100% to 0% for a neural network model, while achieving 39 × faster training and 36 × less communication than the four-party MPC framework of Dalskov et al.



Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

Private Aggregation of Teacher Ensembles (PATE) is demonstrated, in a black-box fashion, multiple models trained with disjoint datasets, such as records from different subsets of users, which achieves state-of-the-art privacy/utility trade-offs on MNIST and SVHN.

Scalable Private Learning with PATE

This work shows how PATE can scale to learning tasks with large numbers of output classes and uncurated, imbalanced training data with errors, and introduces new noisy aggregation mechanisms for teacher ensembles that are more selective and add less noise, and prove their tighter differential-privacy guarantees.

InstaHide: Instance-hiding Schemes for Private Distributed Learning

InstaHide, a simple encryption of training images, which can be plugged into existing distributed deep learning pipelines is introduced, which is efficient and applying it during training has minor effect on test accuracy.

Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data

It is shown that federated learning among 10 institutions results in models reaching 99% of the model quality achieved with centralized data, and the effects of data distribution across collaborating institutions on model quality and learning patterns are investigated.

SecureML: A System for Scalable Privacy-Preserving Machine Learning

This paper presents new and efficient protocols for privacy preserving machine learning for linear regression, logistic regression and neural network training using the stochastic gradient descent method, and implements the first privacy preserving system for training neural networks.

MP2ML: a mixed-protocol machine learning framework for private inference

MP2ML is a machine learning framework which integrates nGraph-HE and the secure two-party computation framework ABY to execute DL inference while maintaining the privacy of both the input data and model weights and is compatible with popular DL frameworks such as TensorFlow.

CryptoNets: applying neural networks to encrypted data with high throughput and accuracy

It is shown that the cloud service is capable of applying the neural network to the encrypted data to make encrypted predictions, and also return them in encrypted form, which allows high throughput, accurate, and private predictions.

TextHide: Tackling Data Privacy for Language Understanding Tasks

The proposed TextHide requires all participants to add a simple encryption step to prevent an eavesdropping attacker from recovering private text data, and it fits well with the popular framework of fine-tuning pre-trained language models for any sentence or sentence-pair task.

Learning Differentially Private Recurrent Language Models

This work builds on recent advances in the training of deep networks on user-partitioned data and privacy accounting for stochastic gradient descent and adds user-level privacy protection to the federated averaging algorithm, which makes "large step" updates from user- level data.

The Algorithmic Foundations of Differential Privacy

The preponderance of this monograph is devoted to fundamental techniques for achieving differential privacy, and application of these techniques in creative combinations, using the query-release problem as an ongoing example.