Privacy-Preserving Gradient Boosting Decision Trees

@inproceedings{Li2020PrivacyPreservingGB,
  title={Privacy-Preserving Gradient Boosting Decision Trees},
  author={Q. Li and Zhaomin Wu and Zeyi Wen and Bingsheng He},
  booktitle={AAAI},
  year={2020}
}
The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective… 

Figures and Tables from this paper

Practical Federated Gradient Boosting Decision Trees
TLDR
This paper studies a practical federated environment with relaxed privacy constraints, where a dishonest party might obtain some information about the other parties' data, but it is still impossible for the dishonest party to derive the actual raw data of other parties.
Federated Bayesian Optimization via Thompson Sampling
TLDR
Federated Thompson sampling (FTS) is presented which overcomes a number of key challenges of FBO and FL in a principled way and provides a theoretical convergence guarantee that is robust against heterogeneous agents, which is a major challenge in FL and FBO.
MixBoost: A Heterogeneous Boosting Machine
TLDR
A Heterogeneous Newton Boosting Machine (HNBM) in which the base hypothesis class may vary across boosting iterations is studied, and a particular realization of a HNBM, MixBoost, is described that is able to achieve better generalization loss than competing boosting frameworks, without taking significantly longer to tune.
A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection
TLDR
A comprehensive review on federated learning systems is conducted and a thorough categorization is provided according to six different aspects, including data distribution, machine learning model, privacy mechanism, communication architecture, scale of federation and motivation of federation.
Model-Agnostic Round-Optimal Federated Learning via Knowledge Transfer
TLDR
This paper proposes a novel federated learning algorithm FedKT that needs only a single communication round and can be applied to any classification model, and develops the differentially private versions of FedKT and theoretically analyzes the privacy loss.
Private Boosted Decision Trees via Smooth Re-Weighting
TLDR
This work proposes and test a practical algorithm for boosting decision trees that guarantees differential privacy and shows that this boosting algorithm can produce better model sparsity and accuracy than other differentially private ensemble classifiers.
SoK: Privacy-Preserving Collaborative Tree-based Model Learning
TLDR
This work surveys the literature on distributed and privacy-preserving training of tree-based models and systematizes its knowledge based on four axes: the learning algorithm, the collaborative model, the protection mechanism, and the threat model, to provide for the first time a framework analyzing the information leakage occurring in distributed tree- based model learning.
DPSNN: A Differentially Private Spiking Neural Network
TLDR
This study combines the differential privacy (DP) algorithm and SNN and proposes differentially private spiking neural network (DPSNN), which injects noise into the gradient, andSNN transmits information in discrete spike trains so that the differentiallyPrivate SNN can maintain strong privacy protection while still ensuring high accuracy.
Federated Learning for Healthcare Domain - Pipeline, Applications and Challenges
TLDR
This paper aims to lay out existing research and list the possibilities of federated learning for healthcare industries and what challenges, methods, and applications a practitioner should be aware of in the topic of Federated learning.
...
...

References

SHOWING 1-10 OF 31 REFERENCES
Practical Federated Gradient Boosting Decision Trees
TLDR
This paper studies a practical federated environment with relaxed privacy constraints, where a dishonest party might obtain some information about the other parties' data, but it is still impossible for the dishonest party to derive the actual raw data of other parties.
Privacy-preserving logistic regression
TLDR
This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases by providing a privacy-preserving regularized logistic regression algorithm based on a new privacy- Preserving technique.
Differentially private classification with decision tree ensemble
InPrivate Digging: Enabling Tree-based Distributed Data Mining with Differential Privacy
TLDR
This paper designs and implements a privacy-preserving system for gradient boosting decision tree (GBDT), where different regression trees trained by multiple data owners can be securely aggregated into an ensemble and demonstrates that the system can provide a strong privacy protection for individual data owners while maintaining the prediction accuracy of the original trained model.
Data mining with differential privacy
TLDR
This paper addresses the problem of data mining with formal privacy guarantees, given a data access interface based on the differential privacy framework by considering the privacy and the algorithmic requirements simultaneously, focusing on decision tree induction as a sample application.
DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions
TLDR
This paper conducts a careful investigation of existing systems by developing a performance model with respect to the dimensionality of the data and implements a series of optimizations to further optimize the performance of collective communications.
Deep Learning with Differential Privacy
TLDR
This work develops new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy, and demonstrates that deep neural networks can be trained with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.
LightGBM: A Highly Efficient Gradient Boosting Decision Tree
TLDR
It is proved that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size, and is called LightGBM.
Gradient Boosted Decision Trees for High Dimensional Sparse Output
TLDR
This paper studies the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse, and proposes a new GBDT variant, GBDT-SPARSE, to resolve this problem by employing L0 regularization.
Differentially private data release for data mining
TLDR
This paper proposes the first anonymization algorithm for the non-interactive setting based on the generalization technique, which first probabilistically generalizes the raw data and then adds noise to guarantee ∈-differential privacy.
...
...