• Corpus ID: 239050036

User-Level Private Learning via Correlated Sampling

@article{Ghazi2021UserLevelPL,
  title={User-Level Private Learning via Correlated Sampling},
  author={Badih Ghazi and Ravi Kumar and Pasin Manurangsi},
  journal={ArXiv},
  year={2021},
  volume={abs/2110.11208}
}
Most works in learning with differential privacy (DP) have focused on the setting where each user has a single sample. In this work, we consider the setting where each user holds m samples and the privacy protection is enforced at the level of each user’s data. We show that, in this setting, we may learn with a much fewer number of users. Specifically, we show that, as long as each user receives sufficiently many samples, we can learn any privately learnable class via an (ε, δ)-DP algorithm… 

Tables from this paper

Efficient multivariate low-degree tests via interactive oracle proofs of proximity for polynomial codes
TLDR
The first interactive oracle proofs of proximity (IOPP) for tensor products of Reed-Solomon codes and for Reed-Muller codes (evaluation of polynomials with bounds on individual degrees) are presented.
Private and Online Learnability are Equivalent
TLDR
It is proved that PAC-learned by an (approximate) differentially-private algorithm if and only if it has a finite Littlestone dimension, which implies a qualitative equivalence between online learnability and private PAC learnability.
Learning with User-Level Privacy
TLDR
User-level DP protects a user’s entire contribution, providing more stringent but more realistic protection against information leaks, and shows that for high-dimensional mean estimation, empirical risk minimization with smooth losses, stochastic convex optimization, and learning hypothesis class with finite metric entropy, the privacy cost decreases as O(1/ m) as users provide more samples.
Private Federated Learning Without a Trusted Server: Optimal Algorithms for Convex Losses
TLDR
This paper provides tight upper and lower bounds for LDP convex/strongly convex federated stochastic optimization with homogeneous (i.i.d.) client data, and shows that similar rates are attainable for smooth losses with arbitrary heterogeneous client data distributions, via a linear-time accelerated LDP algorithm.

References

SHOWING 1-10 OF 77 REFERENCES
Simultaneous Private Learning of Multiple Concepts
TLDR
Lower bounds are given showing that even for very simple concept classes, the sample cost of private multi-learning must grow polynomially in k, and some multi-learners are given that require fewer samples than the basic strategy.
Locally Private Learning without Interaction Requires Separation
TLDR
This work shows that the margin complexity of a class of Boolean functions is a lower bound on the complexity of any non-interactive LDP algorithm for distribution-independent PAC learning of the class, and complements this lower bound with a new efficient learning algorithm whose complexity is polynomial in themargin complexity of theclass.
On the geometry of differential privacy
TLDR
The lower bound is strong enough to separate the concept of differential privacy from the notion of approximate differential privacy where an upper bound of O(√{d}/ε) can be achieved.
Bounding User Contributions: A Bias-Variance Trade-off in Differential Privacy
TLDR
It is shown that in general there is a “sweet spot” that depends on measurable properties of the dataset, but that there is also a concrete cost to privacy that cannot be avoided simply by collecting more data.
Characterizing the Sample Complexity of Pure Private Learners
TLDR
A combinatorial characterization of the sample size sufficient and necessary to learn a class of concepts under pure differential privacy is given and a similar characterization holds for the database size needed for computing a large class of optimization problems underpure differential privacy, and also for the well studied problem of private data release.
Smoothly Bounding User Contributions in Differential Privacy
TLDR
This work proposes a method which smoothly bounds user contributions by setting appropriate weights on data points and applies it to estimating the mean/quantiles, linear regression, and empirical risk minimization and shows that the algorithm provably outperforms the sample limiting algorithm.
Amplification by Shuffling: From Local to Central Differential Privacy via Anonymity
TLDR
It is shown, via a new and general privacy amplification technique, that any permutation-invariant algorithm satisfying e-local differential privacy will satisfy [MATH HERE]-central differential privacy.
Differentially Private Nonparametric Regression Under a Growth Condition
TLDR
It is shown that under the relaxed condition lim infη↓0 η · sfatη(H) = 0, H is privately learnable, establishing the first nonparametric private learnability guarantee for classes H with sfat-sequential fat shattering dimension, diverging as η ↓ 0.
Distributed Differential Privacy via Shuffling
TLDR
Evidence that the power of the shuffled model lies strictly between those of the central and local models is given: for a natural restriction of the model, it is shown that shuffled protocols for a widely studied selection problem require exponentially higher sample complexity than do central-model protocols.
Efficient noise-tolerant learning from statistical queries
TLDR
This paper formalizes a new but related model of learning from statistical queries, and demonstrates the generality of the statistical query model, showing that practically every class learnable in Valiant’s model and its variants can also be learned in the new model (and thus can be learning in the presence of noise).
...
1
2
3
4
5
...