• Corpus ID: 245906421

Multiple Hypothesis Testing To Estimate The Number of Communities in Sparse Stochastic Block Models

  title={Multiple Hypothesis Testing To Estimate The Number of Communities in Sparse Stochastic Block Models},
  author={Chetkar Jha and Mingyao Li and Ian Barnett},
Network-based clustering methods frequently require the number of communities to be specified a priori. Moreover, most of the existing methods for estimating the number of communities assume the number of communities to be fixed and not scale with the network size n. The few methods that assume the number of communities to increase with the network size n are only valid when the average degree d of a network grows at least as fast as O(n) (i.e., the dense case) or lies within a narrow range… 

Figures and Tables from this paper



Determining the Number of Communities in Degree-corrected Stochastic Block Models

This work introduces a method that combines spectral clustering with binary segmentation and guarantees an upper bound for the pseudo likelihood ratio statistic when the model is over-fitted, and establishes the consistency of the estimator for the true number of communities.

Estimating the number of communities in networks by spectral methods

This work proposes a simple and very fast method for estimating the number of communities based on the spectral properties of certain graph operators, such as the non-backtracking matrix and the Bethe Hessian matrix, which performs well under several models and a wide range of parameters.

Pseudo-likelihood methods for community detection in large sparse networks

It is proved that pseudo-likelihood provides consistent estimates of the communities under a mild condition on the starting value, for the case of a block model with two communities.

Spectral clustering and the high-dimensional stochastic blockmodel

The asymptotic results in th is paper are the first clustering results that allow the number of clusters in the model to grow with theNumber of nodes, hence the name high-dimensional.

Network Cross-Validation for Determining the Number of Communities in Network Data

It is proved that the probability of under-selection vanishes as the number of nodes increases, under mild conditions satisfied by a wide range of popular community recovery algorithms.

A survey on theoretical advances of community detection in networks

A survey on the recent theoretical advances of community detection, including graph cut methods, profile likelihoods, the pseudo‐likelihood method, the variational method, belief propagation, spectral clustering, and semidefinite relaxations of the stochastic blockmodel.

Hierarchical Community Detection by Recursive Partitioning

It is argued that it is more interpretable and in some regimes more accurate to construct a hierarchical tree of communities instead, and a natural framework for analyzing the algorithm’s theoretical performance is proposed, the binary tree stochastic block model.

Stochastic blockmodels with growing number of classes

It is shown that the fraction of misclassified network nodes converges in probability to zero under maximum likelihood fitting when the number of classes is allowed to grow as the root of the network size and the average network degree grows at least poly-logarithmically in this size.

Consistency of community detection in networks under degree-corrected stochastic block models

It is found that methods based on the degree-corrected stochastic block model are consistent under a wider class of models and that modularity-type methods require parameter constraints for consistency, whereas likelihood-based methods do not.

Likelihood-based model selection for stochastic block models

An approach based on the log likelihood ratio statistic is considered and its asymptotic properties under model misspecification are analyzed, showing the limiting distribution of the statistic in the case of underfitting is normal and its convergence rate in the cases of overfitting.