• Corpus ID: 229298036

Nonparametric Two-Sample Hypothesis Testing for Random Graphs with Negative and Repeated Eigenvalues.

  title={Nonparametric Two-Sample Hypothesis Testing for Random Graphs with Negative and Repeated Eigenvalues.},
  author={Joshua Agterberg and Minh Tang and Carey E. Priebe},
  journal={arXiv: Statistics Theory},
We propose a nonparametric two-sample test statistic for low-rank, conditionally independent edge random graphs whose edge probability matrices have negative eigenvalues and arbitrarily close eigenvalues. Our proposed test statistic involves using the maximum mean discrepancy applied to suitably rotated rows of a graph embedding, where the rotation is estimated using optimal transport. We show that our test statistic, appropriately scaled, is consistent for sufficiently dense graphs, and we… 

Figures and Tables from this paper

Valid Two-Sample Graph Testing via Optimal Transport Procrustes and Multiscale Graph Correlation with Applications in Connectomics

It is demonstrated that substituting the MMD test with multiscale graph correlation (MGC) test leads to a more powerful test both in synthetic and in simulated data, and there is not sufficient evidence to reject the null hypothesis that the two hemispheres are equally distributed.

Lost in the Shuffle: Testing Power in the Presence of Errorful Network Vertex Labels

This paper considers the degradation of power in two-sample graph hypothesis testing when there are misaligned/label-shuffled vertices across networks, and theoretically explores the power loss due to shuﷄing for a pair of hypothesis tests based on Frobenius norm differences between estimated edge probability matrices or between adjacency matrices.

Bias-Variance Tradeoffs in Joint Spectral Embeddings

An explicit bias-variance tradeoff for latent position estimates produced by the omnibus embedding of arXiv:1705.09355 is established and an analytic bias expression is revealed, a uniform concentration bound on the residual term is derived, and a central limit theorem characterizing the distributional properties of these estimates is proved.

Higher-order accurate two-sample network inference and network hashing

This article proposes the first provably higher-order accurate two-sample inference method by comparing network moments, and establishes strong finite-sample theoretical guarantees, including rate-optimality properties.

Entrywise Estimation of Singular Vectors of Low-Rank Matrices With Heteroskedasticity and Dependence

We propose an estimator for the singular vectors of high-dimensional low-rank matrices corrupted by additive subgaussian noise, where the noise matrix is allowed to have dependence within rows and

Graphon based Clustering and Testing of Networks: Algorithms and Theory

This work proposes a novel graph distance based on sorting-and-smoothing graphon estimators, and presents two clustering algorithms that achieve state-of-the-art results and proves the statistical consistency of both algorithms under Lipschitz assumptions on the graph degrees.


This work proposes methods for clustering multiple graphs, without vertex correspondence, that are inspired by the recent literature on estimating graphons— symmetric functions corresponding to infinite vertex limit of graphs, and proposes a novel graph distance based on sorting-and-smoothing graphon estimators.

The connectome of an insect brain

The synaptic-resolution connectome of an insect brain with rich behavior, including learning, value-computation, and action-selection, is mapped, comprising 3,013 neurons and 544,000 synapses, and its connection types, neuron types, and circuit motifs are characterized.



A Semiparametric Two-Sample Hypothesis Testing Problem for Random Graphs

A semiparametric problem of two-sample hypothesis testing for a class of latent position random graphs is considered and a notion of consistency is formulated and a valid test is proposed for the hypothesis that two finite-dimensional random dot product graphs on a common vertex set have the same generating latent positions.

Limit theorems for eigenvectors of the normalized Laplacian for random graphs

We prove a central limit theorem for the components of the eigenvectors corresponding to the $d$ largest eigenvalues of the normalized Laplacian matrix of a finite dimensional random dot product

Two-Sample Tests for Large Random Graphs Using Network Statistics

The main contribution of the paper is a general formulation of the problem based on concentration of network statistics, and consequently, a consistent two-sample test that arises as the natural solution for this problem.

Two-sample hypothesis testing for inhomogeneous random graphs

If $m$ is small, then the minimax separation is too large for some popular choices of $d$, including total variation distance between corresponding distributions, which implies that some models that are widely separated in $d$ cannot be distinguished for small $m$, and hence, the testing problem is generally not solvable in these cases.

Two-sample Test of Community Memberships of Weighted Stochastic Block Models.

A test statistic based on singular subspace distance is developed and under the weighted stochastic block models with dense graphs, the limiting distribution of the proposed test statistic is developed.

The Kato–Temple inequality and eigenvalue concentration with applications to graph inference

We present an adaptation of the Kato--Temple inequality for bounding perturbations of eigenvalues with applications to statistical inference for random graphs, specifically hypothesis testing and

Statistical Inference on Random Dot Product Graphs: a Survey

This survey paper describes a comprehensive paradigm for statistical inference on random dot product graphs, a paradigm centered on spectral embeddings of adjacency and Laplacian matrices, and investigates several real-world applications, including community detection and classification in large social networks and the determination of functional and biologically relevant network properties from an exploratory data analysis of the Drosophila connectome.

Higher-Order Correct Multiplier Bootstraps for Count Functionals of Networks

This paper proposes a new class of multiplier bootstraps for count functionals and proposes linear and quadratic approximations to the multiplier bootstrap, which correspond to the first and second-order Hayek projections of an approximating U-statistic, respectively.

A goodness-of-fit test for stochastic block models

The stochastic block model is a popular tool for studying community structures in network data. We develop a goodness-of-fit test for the stochastic block model. The test statistic is based on the

A central limit theorem for an omnibus embedding of multiple random graphs and implications for multiscale network inference

An "omnibus" embedding in which multiple graphs on the same vertex set are jointly embedded into a single space with a distinct representation for each graph is described, which achieves near-optimal inference accuracy and allows the identification of specific brain regions associated with population-level differences.