Valid Two-Sample Graph Testing via Optimal Transport Procrustes and Multiscale Graph Correlation with Applications in Connectomics

  title={Valid Two-Sample Graph Testing via Optimal Transport Procrustes and Multiscale Graph Correlation with Applications in Connectomics},
  author={Jaewon Chung and Bijan K. Varjavand and Jes{\'u}s Arroyo and Anton Alyakin and Joshua Agterberg and Minh Tang and Joshua T. Vogelstein and Carey E. Priebe},
Testing whether two graphs come from the same distribution is of interest in many real world scenarios, including brain network analysis. Under the random dot product graph model, the nonparametric hypothesis testing framework consists of embedding the graphs using the adjacency spectral embedding (ASE), followed by aligning the embeddings using the median flip heuristic, and finally applying the nonparametric maximum mean discrepancy (MMD) test to obtain a p-value. Using synthetic data… 
2 Citations

Figures and Tables from this paper

Bias-Variance Tradeoffs in Joint Spectral Embeddings

An explicit bias-variance tradeoff for latent position estimates produced by the omnibus embedding of arXiv:1705.09355 is established and an analytic bias expression is revealed, a uniform concentration bound on the residual term is derived, and a central limit theorem characterizing the distributional properties of these estimates is proved.

Adversarial contamination of networks in the setting of vertex nomination: a new trimming method

A new trimming method is proposed that operates in model space which can address both block structure contamination and white noise contamination and is more amenable to theoretical analysis while also demonstrating superior performance in a number of simulations, compared to direct trimming.



Correcting a Nonparametric Two-sample Graph Hypothesis Test for Graphs with Different Numbers of Vertices

A test for testing two random graphs for equality of generating distributions of subgraphs in Drosophila connectome and shows that CASE remedies the exchangeability problem of the original test and demonstrates the validity and consistency of the test that uses CASE via a simulation study.

A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs

A semiparametric problem of two-sample hypothesis testing for a class of latent position random graphs is considered and a notion of consistency is formulated and a valid test is proposed for the hypothesis that two finite-dimensional random dot product graphs on a common vertex set have the same generating latent positions.

Testing for Equivalence of Network Distribution Using Subgraph Counts

Simulation experiments and an illustrative example on a sample of brain networks where it is found that highly creative individuals’ brains present significantly more short cycles than found in less creative people are presented.

Statistical Inference on Random Dot Product Graphs: a Survey

This survey paper describes a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices, and investigates several real-world applications, including community detection and classification in large social networks and the determination of functional and biologically relevant network properties from an exploratory data analysis of the Drosophila connectome.

Network dependence testing via diffusion maps and distance-based correlations

It is proved that the new method yields a consistent test statistic under mild distributional assumptions on the graph structure, and it is demonstrated that it is able to efficiently identify the most informative graph embedding with respect to the diffusion time.

From Distance Correlation to Multiscale Graph Correlation

A new framework that generalizes distance correlation (Dcorr) to the multiscale graph correlation (MGC), which motivates a theoretically sound sample MGC and allows a number of desirable properties to be proved, including the universal consistency, convergence, and almost unbiasedness of the sample version.

Two-sample hypothesis testing for inhomogeneous random graphs

If $m$ is small, then the minimax separation is too large for some popular choices of $d$, including total variation distance between corresponding distributions, which implies that some models that are widely separated in $d$ cannot be distinguished for small $m$, and hence, the testing problem is generally not solvable in these cases.

A Central Limit Theorem for an Omnibus Embedding of Multiple Random Dot Product Graphs

A central limit theorem is proved for this omnibus embedding and it is shown that simultaneous embedding into a common space allows comparison of graphs without the need to perform pairwise alignments of graph embeddings.

Two-Sample Tests for Large Random Graphs Using Network Statistics

The main contribution of the paper is a general formulation of the problem based on concentration of network statistics, and consequently, a consistent two-sample test that arises as the natural solution for this problem.

The Chi-Square Test of Distance Correlation

It is proved the chi-squared test can be valid and universally consistent for testing independence, and established a testing power inequality with respect to the permutation test.