• Publications
  • Influence
A Kernel Two-Sample Test
This work proposes a framework for analyzing and comparing distributions, which is used to construct statistical tests to determine if two samples are drawn from different distributions, and presents two distribution free tests based on large deviation bounds for the maximum mean discrepancy (MMD). Expand
A Kernel Method for the Two-Sample-Problem
This work proposes two statistical tests to determine if two samples are from different distributions, and applies this approach to a variety of problems, including attribute matching for databases using the Hungarian marriage method, where the test performs strongly. Expand
Measuring Statistical Dependence with Hilbert-Schmidt Norms
We propose an independence criterion based on the eigen-spectrum of covariance operators in reproducing kernel Hilbert spaces (RKHSs), consisting of an empirical estimate of the Hilbert-Schmidt normExpand
Correcting Sample Selection Bias by Unlabeled Data
A nonparametric method which directly produces resampling weights without distribution estimation is presented, which works by matching distributions between training and testing sets in feature space. Expand
Integrating structured biological data by Kernel Maximum Mean Discrepancy
A novel statistical test of whether two samples are from the same distribution, compatible with both multivariate and structured data, that is fast, easy to implement, and works well, as confirmed by the experiments. Expand
A Hilbert Space Embedding for Distributions
We describe a technique for comparing distributions without the need for density estimation as an intermediate step. Our approach relies on mapping the distributions into a reproducing kernel HilbertExpand
Demystifying MMD GANs
The situation with bias in GAN loss functions raised by recent work is clarified, and it is shown that gradient estimators used in the optimization process for both MMD GANs and Wasserstein GAns are unbiased, but learning a discriminator based on samples leads to biased gradients for the generator parameters. Expand
A Kernel Statistical Test of Independence
A novel test of the independence hypothesis for one particular kernel independence measure, the Hilbert-Schmidt independence criterion (HSIC), which outperforms established contingency table and functional correlation-based tests, and is greater for multivariate data. Expand
Ranking on Data Manifolds
A simple universal ranking algorithm for data lying in the Euclidean space, such as text or image data, to rank the data with respect to the intrinsic manifold structure collectively revealed by a great amount of data. Expand
Optimal kernel choice for large-scale two-sample tests
The new kernel selection approach yields a more powerful test than earlier kernel selection heuristics, and makes the kernel selection and test procedures suited to data streams, where the observations cannot all be stored in memory. Expand