Privacy-Preserving Gradient Boosting Decision Trees
@inproceedings{Li2020PrivacyPreservingGB, title={Privacy-Preserving Gradient Boosting Decision Trees}, author={Q. Li and Zhaomin Wu and Zeyi Wen and Bingsheng He}, booktitle={AAAI}, year={2020} }
The Gradient Boosting Decision Tree (GBDT) is a popular machine learning model for various tasks in recent years. In this paper, we study how to improve model accuracy of GBDT while preserving the strong guarantee of differential privacy. Sensitivity and privacy budget are two key design aspects for the effectiveness of differential private models. Existing solutions for GBDT with differential privacy suffer from the significant accuracy loss due to too loose sensitivity bounds and ineffective…
Figures and Tables from this paper
22 Citations
Federated Bayesian Optimization via Thompson Sampling
- Computer ScienceNeurIPS
- 2020
Federated Thompson sampling (FTS) is presented which overcomes a number of key challenges of FBO and FL in a principled way and provides a theoretical convergence guarantee that is robust against heterogeneous agents, which is a major challenge in FL and FBO.
MixBoost: A Heterogeneous Boosting Machine
- Computer ScienceArXiv
- 2020
A Heterogeneous Newton Boosting Machine (HNBM) in which the base hypothesis class may vary across boosting iterations is studied, and a particular realization of a HNBM, MixBoost, is described that is able to achieve better generalization loss than competing boosting frameworks, without taking significantly longer to tune.
A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy and Protection
- Computer ScienceIEEE Transactions on Knowledge and Data Engineering
- 2021
A comprehensive review on federated learning systems is conducted and a thorough categorization is provided according to six different aspects, including data distribution, machine learning model, privacy mechanism, communication architecture, scale of federation and motivation of federation.
Model-Agnostic Round-Optimal Federated Learning via Knowledge Transfer
- Computer ScienceArXiv
- 2020
This paper proposes a novel federated learning algorithm FedKT that needs only a single communication round and can be applied to any classification model, and develops the differentially private versions of FedKT and theoretically analyzes the privacy loss.
Novel hybrid models between bivariate statistics, artificial neural networks and boosting algorithms for flood susceptibility assessment.
- EngineeringJournal of environmental management
- 2020
Private Boosted Decision Trees via Smooth Re-Weighting
- Computer ScienceArXiv
- 2022
This work proposes and test a practical algorithm for boosting decision trees that guarantees differential privacy and shows that this boosting algorithm can produce better model sparsity and accuracy than other differentially private ensemble classifiers.
SoK: Privacy-Preserving Collaborative Tree-based Model Learning
- Computer ScienceProc. Priv. Enhancing Technol.
- 2021
This work surveys the literature on distributed and privacy-preserving training of tree-based models and systematizes its knowledge based on four axes: the learning algorithm, the collaborative model, the protection mechanism, and the threat model, to provide for the first time a framework analyzing the information leakage occurring in distributed tree- based model learning.
A privacy-preserving multi-agent updating framework for self-adaptive tree model
- Computer SciencePeer-to-Peer Netw. Appl.
- 2022
Constraint Enforcement on Decision Trees: a Survey
- Computer ScienceACM Computing Surveys
- 2022
A survey of works that attempted to solve the problem of learning decision trees under constraints is proposed, and a flexible taxonomy of constraints applied to decision trees and methods for their treatment in the literature is defined.
Federated Learning for Healthcare Domain - Pipeline, Applications and Challenges
- Computer Science, MedicineACM Transactions on Computing for Healthcare
- 2022
This paper aims to lay out existing research and list the possibilities of federated learning for healthcare industries and what challenges, methods, and applications a practitioner should be aware of in the topic of Federated learning.
References
SHOWING 1-10 OF 31 REFERENCES
Privacy-preserving logistic regression
- Computer ScienceNIPS
- 2008
This paper addresses the important tradeoff between privacy and learnability, when designing algorithms for learning from private databases by providing a privacy-preserving regularized logistic regression algorithm based on a new privacy- Preserving technique.
Differentially private classification with decision tree ensemble
- Computer ScienceAppl. Soft Comput.
- 2018
InPrivate Digging: Enabling Tree-based Distributed Data Mining with Differential Privacy
- Computer ScienceIEEE INFOCOM 2018 - IEEE Conference on Computer Communications
- 2018
This paper designs and implements a privacy-preserving system for gradient boosting decision tree (GBDT), where different regression trees trained by multiple data owners can be securely aggregated into an ensemble and demonstrates that the system can provide a strong privacy protection for individual data owners while maintaining the prediction accuracy of the original trained model.
Data mining with differential privacy
- Computer ScienceKDD
- 2010
This paper addresses the problem of data mining with formal privacy guarantees, given a data access interface based on the differential privacy framework by considering the privacy and the algorithmic requirements simultaneously, focusing on decision tree induction as a sample application.
DimBoost: Boosting Gradient Boosting Decision Tree to Higher Dimensions
- Computer ScienceSIGMOD Conference
- 2018
This paper conducts a careful investigation of existing systems by developing a performance model with respect to the dimensionality of the data and implements a series of optimizations to further optimize the performance of collective communications.
Deep Learning with Differential Privacy
- Computer ScienceCCS
- 2016
This work develops new algorithmic techniques for learning and a refined analysis of privacy costs within the framework of differential privacy, and demonstrates that deep neural networks can be trained with non-convex objectives, under a modest privacy budget, and at a manageable cost in software complexity, training efficiency, and model quality.
LightGBM: A Highly Efficient Gradient Boosting Decision Tree
- Computer ScienceNIPS
- 2017
It is proved that, since the data instances with larger gradients play a more important role in the computation of information gain, GOSS can obtain quite accurate estimation of the information gain with a much smaller data size, and is called LightGBM.
Gradient Boosted Decision Trees for High Dimensional Sparse Output
- Computer ScienceICML
- 2017
This paper studies the gradient boosted decision trees (GBDT) when the output space is high dimensional and sparse, and proposes a new GBDT variant, GBDT-SPARSE, to resolve this problem by employing L0 regularization.
Differentially private data release for data mining
- Computer ScienceKDD
- 2011
This paper proposes the first anonymization algorithm for the non-interactive setting based on the generalization technique, which first probabilistically generalizes the raw data and then adds noise to guarantee ∈-differential privacy.
Privacy-preserving deep learning
- Computer Science2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton)
- 2015
This paper presents a practical system that enables multiple parties to jointly learn an accurate neural-network model for a given objective without sharing their input datasets, and exploits the fact that the optimization algorithms used in modern deep learning, namely, those based on stochastic gradient descent, can be parallelized and executed asynchronously.