A Differentially Private Kernel Two-Sample Test

@inproceedings{Raj2019ADP,
  title={A Differentially Private Kernel Two-Sample Test},
  author={Anant Raj and Ho Chung Leon Law and D. Sejdinovic and Mijung Park},
  booktitle={ECML/PKDD},
  year={2019}
}
Kernel two-sample testing is a useful statistical tool in determining whether data samples arise from different distributions without imposing any parametric assumptions on those distributions. However, raw data samples can expose sensitive information about individuals who participate in scientific studies, which makes the current tests vulnerable to privacy breaches. Hence, we design a new framework for kernel two-sample testing conforming to differential privacy constraints, in order to… Expand
Application of Kernel Hypothesis Testing on Set-valued Data
We present a general framework for kernel hypothesis testing on distributions of sets of individual examples. Sets may represent many common data sources such as groups of observations in timeExpand
MONK - Outlier-Robust Mean Embedding Estimation by Median-of-Means
TLDR
This paper shows how the recently emerged principle of median-of-means can be used to design estimators for kernel mean embedding and MMD with excessive resistance properties to outliers, and optimal sub-Gaussian deviation bounds under mild assumptions. Expand

References

SHOWING 1-10 OF 35 REFERENCES
Differentially Private Chi-Squared Hypothesis Testing: Goodness of Fit and Independence Testing
Hypothesis testing is a useful statistical tool in determining whether a given model should be rejected based on a sample from the population. Sample data may contain sensitive information aboutExpand
A Kernel Two-Sample Test
TLDR
This work proposes a framework for analyzing and comparing distributions, which is used to construct statistical tests to determine if two samples are drawn from different distributions, and presents two distribution free tests based on large deviation bounds for the maximum mean discrepancy (MMD). Expand
A Fast, Consistent Kernel Two-Sample Test
TLDR
A novel estimate of the null distribution is computed, computed from the eigen-spectrum of the Gram matrix on the aggregate sample from P and Q, and having lower computational cost than the bootstrap. Expand
Optimal kernel choice for large-scale two-sample tests
TLDR
The new kernel selection approach yields a more powerful test than earlier kernel selection heuristics, and makes the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. Expand
Fast Two-Sample Testing with Analytic Representations of Probability Measures
TLDR
A class of nonparametric two-sample tests with a cost linear in the sample size based on an ensemble of distances between analytic functions representing each of the distributions that give a better power/time tradeoff than competing approaches and in some cases better outright power than even the most expensive quadratic-time tests. Expand
The Limits of Two-Party Differential Privacy
TLDR
Borders expose a dramatic gap between the accuracy that can be obtained by differentially private data analysis versus the accuracy obtainable when privacy is relaxed to a computational variant of differential privacy. Expand
Local Private Hypothesis Testing: Chi-Square Tests
TLDR
This work analyzes locally private chi-square tests for goodness of fit and independence testing, which have been studied in the traditional, curator model for differential privacy, to explore the design of private hypothesis tests in the local model. Expand
Differentially Private Learning with Kernels
TLDR
This paper derives differentially private learning algorithms with provable "utility" or error bounds from the standard learning model of releasing different private predictor using three simple but practical models. Expand
Differentially Private Empirical Risk Minimization
TLDR
This work proposes a new method, objective perturbation, for privacy-preserving machine learning algorithm design, and shows that both theoretically and empirically, this method is superior to the previous state-of-the-art, output perturbations, in managing the inherent tradeoff between privacy and learning performance. Expand
Differential privacy for functions and functional data
TLDR
This work shows that adding an appropriate Gaussian process to the function of interest yields differential privacy, and develops methods for releasing functions while preserving differential privacy. Expand
...
1
2
3
4
...