• Corpus ID: 245650554

Kernel Two-Sample Tests in High Dimension: Interplay Between Moment Discrepancy and Dimension-and-Sample Orders

@inproceedings{Yan2021KernelTT,
  title={Kernel Two-Sample Tests in High Dimension: Interplay Between Moment Discrepancy and Dimension-and-Sample Orders},
  author={Jie Yan and Xianyang Zhang},
  year={2021}
}
Motivated by the increasing use of kernel-based metrics for high-dimensional and large-scale data, we study the asymptotic behavior of kernel two-sample tests when the dimension and sample sizes both diverge to infinity. We focus on the maximum mean discrepancy (MMD) with the kernel of the form k(x, y) = f(‖x− y‖2/γ), including MMD with the Gaussian kernel and the Laplacian kernel, and the energy distance as special cases. We derive asymptotic expansions of the kernel two-sample statistics… 
Angle Based Dependence Measures in Metric Spaces
TLDR
This article introduces a general framework of angle based independence test using reproducing kernel Hilbert space equipped with Gaussian measure, which can be adapted to different types of data, like high-dimensional vectors or symmetry positive definite matrices.

References

SHOWING 1-10 OF 52 REFERENCES
Two Sample Testing in High Dimension via Maximum Mean Discrepancy
TLDR
This work obtains the central limit theorems for the studentized sample MMD as both the dimension $p$ and sample sizes $n,m$ diverge to infinity and suggests that the accuracy of normal approximation can improve with dimensionality.
A new framework for distance and kernel-based metrics in high dimensions
TLDR
A new class of metrics is proposed which inherits the desirable properties of the energy distance and maximum mean discrepancy/(generalized) distance covariance and the Hilbert-Schmidt Independence Criterion and is capable of detecting the homogeneity of/completely characterizing independence between the low-dimensional marginal distributions in the high dimensional setup.
Generalized kernel distance covariance in high dimensions: non-null CLTs and power universality
TLDR
The key step in the proof of the non-null central limit theorem is a precise expansion of the mean and variance of the sample distance covariance in high dimensions, which shows that thenon-null Gaussian approximation of the samples covariance involves a rather subtle interplay between the dimension-to-sample ratio and the dependence between X and Y.
On the Decreasing Power of Kernel and Distance Based Nonparametric Hypothesis Tests in High Dimensions
TLDR
It is demonstrated that the power of these tests actually drops polynomially with increasing dimension against fair alternatives as dimension increases, which advances the current understanding of thePower of modern nonparametric hypothesis tests in high dimensions.
Adaptivity and Computation-Statistics Tradeoffs for Kernel and Distance based High Dimensional Two Sample Testing
TLDR
This paper formally characterize the power of popular tests for GDA like the Maximum Mean Discrepancy with the Gaussian kernel (gMMD) and bandwidth-dependent variants of the Energy Distance with the Euclidean norm (eED) in the high-dimensional MDA regime.
A Kernel Two-Sample Test
TLDR
This work proposes a framework for analyzing and comparing distributions, which is used to construct statistical tests to determine if two samples are drawn from different distributions, and presents two distribution free tests based on large deviation bounds for the maximum mean discrepancy (MMD).
Distance-based and RKHS-based dependence metrics in high dimension
TLDR
The theoretical and simulation results shed light on the limitation of distance/Hilbert-Schmidt covariance when used jointly in the high dimensional setting and suggest the aggregation of marginal distance / Hilbert-Sch Schmidt covariance as a useful alternative.
EFFECT OF HIGH DIMENSION: BY AN EXAMPLE OF A TWO SAMPLE PROBLEM
With the rapid development of modern computing techniques, statisticians are dealing with data with much higher dimension. Consequently, due to their loss of accuracy or power, some classical
The spectrum of kernel random matrices
TLDR
Surprisingly, it is shown that in high-dimensions, and for the models the authors analyze, the problem becomes essentially linear—which is at odds with heuristics sometimes used to justify the usage of these methods.
High-dimensional Change-point Detection Using Generalized Homogeneity Metrics
TLDR
This work develops a nonparametric methodology to detect an unknown number of change-points in an independent sequence of high-dimensional observations and test for the significance of the estimated change-point locations and rigorously derives its limiting distribution under the high dimension medium sample size (HDMSS) framework.
...
...