A Kernel Two-Sample Test

Abstract

We propose a framework for analyzing and comparing distributions, which we use to construct statistical tests to determine if two samples are drawn from different distributions. Our test statistic is the largest difference in expectations over functions in the unit ball of a reproducing kernel Hilbert space (RKHS), and is called the maximum mean discrepancy (MMD). We present two distribution-free tests based on large deviation bounds for the MMD, and a third test based on the asymptotic distribution of this statistic. The MMD can be computed in quadratic time, although efficient linear time approximations are available. Our statistic is an instance of an integral probability metric, and various classical metrics on distributions are obtained when alternative function classes are used in place of an RKHS. We apply our two-sample tests to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where they perform strongly. Excellent performance is also obtained when comparing distributions over graphs, for which these are the first such tests.

Extracted Key Phrases

10 Figures and Tables

Showing 1-10 of 88 references

Real Analysis and Probability

  • R M Dudley
  • 2002
Highly Influential
7 Excerpts

Approximation Theorems of Mathematical Statistics

  • R Serfling
  • 1980
Highly Influential
4 Excerpts

Support Vector Machines. Information Science and Statistics

  • I Steinwart, A Christmann
  • 2008
Highly Influential
1 Excerpt

All of Nonparametric Statistics

  • L Wasserman
  • 2006
Highly Influential
1 Excerpt

Statistical Inference

  • G Casella, R Berger
  • 2002
Highly Influential
1 Excerpt

Continuous Univariate Distributions

  • N L Johnson, S Kotz, N Balakrishnan
  • 1994
Highly Influential
1 Excerpt

Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates

  • N Anderson, P Hall, D Titterington
  • 1994
Highly Influential
3 Excerpts

Probability inequalities for sums of bounded random variables

  • W Hoeffding
  • 1963
Highly Influential
1 Excerpt
Showing 1-10 of 250 extracted citations
050100150201220132014201520162017
Citations per Year

384 Citations

Semantic Scholar estimates that this publication has received between 317 and 469 citations based on the available data.

See our FAQ for additional information.