# Nonparametric Two-Sample Hypothesis Testing for Random Graphs with Negative and Repeated Eigenvalues.

@article{Agterberg2020NonparametricTH, title={Nonparametric Two-Sample Hypothesis Testing for Random Graphs with Negative and Repeated Eigenvalues.}, author={Joshua Agterberg and Minh Tang and Carey E. Priebe}, journal={arXiv: Statistics Theory}, year={2020} }

We propose a nonparametric two-sample test statistic for low-rank, conditionally independent edge random graphs whose edge probability matrices have negative eigenvalues and arbitrarily close eigenvalues. Our proposed test statistic involves using the maximum mean discrepancy applied to suitably rotated rows of a graph embedding, where the rotation is estimated using optimal transport. We show that our test statistic, appropriately scaled, is consistent for sufficiently dense graphs, and we…

## 7 Citations

### Lost in the Shuffle: Testing Power in the Presence of Errorful Network Vertex Labels

- Computer Science
- 2022

This paper considers the degradation of power in two-sample graph hypothesis testing when there are misaligned/label-shuﬄed vertices across networks, and theoretically explores the power loss due to shuﷄing for a pair of hypothesis tests based on Frobenius norm diﬀerences between estimated edge probability matrices or between adjacency matrices.

### Bias-Variance Tradeoffs in Joint Spectral Embeddings

- Computer Science
- 2020

An explicit bias-variance tradeoff for latent position estimates produced by the omnibus embedding of arXiv:1705.09355 is established and an analytic bias expression is revealed, a uniform concentration bound on the residual term is derived, and a central limit theorem characterizing the distributional properties of these estimates is proved.

### Entrywise Estimation of Singular Vectors of Low-Rank Matrices With Heteroskedasticity and Dependence

- MathematicsIEEE Transactions on Information Theory
- 2022

We propose an estimator for the singular vectors of high-dimensional low-rank matrices corrupted by additive subgaussian noise, where the noise matrix is allowed to have dependence within rows and…

### Graphon based Clustering and Testing of Networks: Algorithms and Theory

- Computer ScienceICLR
- 2022

This work proposes a novel graph distance based on sorting-and-smoothing graphon estimators, and presents two clustering algorithms that achieve state-of-the-art results and proves the statistical consistency of both algorithms under Lipschitz assumptions on the graph degrees.

### Higher-order accurate two-sample network inference and network hashing

- Computer Science
- 2022

This article proposes the first provably higher-order accurate two-sample inference method by comparing network moments, and establishes strong ﬁnite-sample theoretical guarantees, including rate-optimality properties.

### G RAPHON BASED C LUSTERING AND T ESTING OF N ET WORKS : A LGORITHMS AND T HEORY

- Computer Science
- 2021

This work proposes methods for clustering multiple graphs, without vertex correspondence, that are inspired by the recent literature on estimating graphons— symmetric functions corresponding to inﬁnite vertex limit of graphs, and proposes a novel graph distance based on sorting-and-smoothing graphon estimators.

### Valid Two-Sample Graph Testing via Optimal Transport Procrustes and Multiscale Graph Correlation with Applications in Connectomics

- Computer Science
- 2021

It is demonstrated that substituting the MMD test with multiscale graph correlation (MGC) test leads to a more powerful test both in synthetic and in simulated data, and there is not sufficient evidence to reject the null hypothesis that the two hemispheres are equally distributed.

## References

SHOWING 1-10 OF 74 REFERENCES

### A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs

- Mathematics, Computer Science
- 2017

A semiparametric problem of two-sample hypothesis testing for a class of latent position random graphs is considered and a notion of consistency is formulated and a valid test is proposed for the hypothesis that two finite-dimensional random dot product graphs on a common vertex set have the same generating latent positions.

### Limit theorems for eigenvectors of the normalized Laplacian for random graphs

- Mathematics, Computer ScienceThe Annals of Statistics
- 2018

We prove a central limit theorem for the components of the eigenvectors corresponding to the $d$ largest eigenvalues of the normalized Laplacian matrix of a finite dimensional random dot product…

### Two-Sample Tests for Large Random Graphs Using Network Statistics

- Mathematics, Computer ScienceCOLT
- 2017

The main contribution of the paper is a general formulation of the problem based on concentration of network statistics, and consequently, a consistent two-sample test that arises as the natural solution for this problem.

### Two-sample hypothesis testing for inhomogeneous random graphs

- Mathematics, Computer Science
- 2020

If $m$ is small, then the minimax separation is too large for some popular choices of $d$, including total variation distance between corresponding distributions, which implies that some models that are widely separated in $d$ cannot be distinguished for small $m$, and hence, the testing problem is generally not solvable in these cases.

### Two-sample Test of Community Memberships of Weighted Stochastic Block Models.

- Computer Science, Mathematics
- 2018

A test statistic based on singular subspace distance is developed and under the weighted stochastic block models with dense graphs, the limiting distribution of the proposed test statistic is developed.

### The Kato–Temple inequality and eigenvalue concentration with applications to graph inference

- Mathematics
- 2017

We present an adaptation of the Kato--Temple inequality for bounding perturbations of eigenvalues with applications to statistical inference for random graphs, specifically hypothesis testing and…

### Statistical Inference on Random Dot Product Graphs: a Survey

- Computer Science, MathematicsJ. Mach. Learn. Res.
- 2017

This survey paper describes a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices, and investigates several real-world applications, including community detection and classification in large social networks and the determination of functional and biologically relevant network properties from an exploratory data analysis of the Drosophila connectome.

### A goodness-of-fit test for stochastic block models

- Mathematics
- 2016

The stochastic block model is a popular tool for studying community structures in network data. We develop a goodness-of-fit test for the stochastic block model. The test statistic is based on the…

### A central limit theorem for an omnibus embedding of multiple random graphs and implications for multiscale network inference

- Mathematics, Computer Science
- 2017

An "omnibus" embedding in which multiple graphs on the same vertex set are jointly embedded into a single space with a distinct representation for each graph is described, which achieves near-optimal inference accuracy and allows the identification of specific brain regions associated with population-level differences.

### Bias-Variance Tradeoffs in Joint Spectral Embeddings

- Computer Science
- 2020

An explicit bias-variance tradeoff for latent position estimates produced by the omnibus embedding of arXiv:1705.09355 is established and an analytic bias expression is revealed, a uniform concentration bound on the residual term is derived, and a central limit theorem characterizing the distributional properties of these estimates is proved.