Classification and estimation in the Stochastic Block Model based on the empirical degrees

  title={Classification and estimation in the Stochastic Block Model based on the empirical degrees},
  author={Antoine Channarond and Jean-Jacques Daudin and St{\'e}phane Robin},
  journal={arXiv: Statistics Theory},
The Stochastic Block Model (Holland et al., 1983) is a mixture model for heterogeneous network data. Unlike the usual statistical framework, new nodes give additional information about the previous ones in this model. Thereby the distribution of the degrees concentrates in points conditionally on the node class. We show under a mild assumption that classification, estimation and model selection can actually be achieved with no more than the empirical degree data. We provide an algorithm able to… 

Figures from this paper

The Highest Dimensional Stochastic Blockmodel with a Regularized Estimator

This is the first paper to explicitly introduce and demonstrate the advantages of statistical regularization in a parametric form for network analysis in the high dimensional Stochastic Block model.

Posterior Contraction Rates for Stochastic Block Models

This article undertakes a theoretical investigation of the posterior distribution of the parameters in a stochastic block model and shows that one obtains near-optimal rates of posterior contraction with routinely used multinomial-Dirichlet priors on cluster indicators and uniform or general Beta prior on the probabilities of the random edge indicators.

Convergence of the groups posterior distribution in latent or stochastic block models

This work establishes sufficient conditions for the groups posterior distribution to converge (as the size of the data increases) to a Dirac mass located at the actual (random) groups configuration.

An empirical Bayes approach to stochastic blockmodels and graphons: shrinkage estimation and model selection

A hierarchical model is proposed and a novel empirical Bayes estimate of the connectivity matrix of a stochastic block model to approximate the graphon function is developed, which introduces a new model selection criterion for choosing the number of communities.

Fast and Consistent Algorithm for the Latent Block Model

The Largest Gaps algorithm is introduced, for simultaneously clustering both rows and columns of a matrix to form homogeneous blocks, and the paper proves the procedure to be consistent under the LBM.

Bayesian Community Detection

A Bayesian estimator of the underlying class structure in the stochastic block model is introduced and it is shown that this estimator is strongly consistent when the expected degree is at least of order $\log^2{n}$, where $n$ is the number of nodes in the network.

Uniform estimation in stochastic block models is slow

It is explicitly quantify the empirically observed phenomenon that estimation under a stochastic block model (SBM) is hard if the model contains classes that are similar, and the lower and upper bounds of estimation along specific submodels are derived.

Uncertainty quantification in the stochastic block model with an unknown number of classes

The frequentist properties of Bayesian statistical inference for the stochastic block model, with an unknown number of classes of varying sizes, are studied, and credible tests are shown to be confidence sets and confidence sets are enlarged to form confidence sets.

Consistency of spectral clustering in stochastic block models

It is shown that, under mild conditions, spectral clustering applied to the adjacency matrix of the network can consistently recover hidden communities even when the order of the maximum expected degree is as small as $\log n$ with $n$ the number of nodes.

Co-clustering through Latent Bloc Model: a Review

We present here model-based co-clustering methods, with a focus on the latent block model (LBM). We introduce several specifications of the LBM (standard, sparse, Bayesian) and review some



Stochastic blockmodels with growing number of classes

It is shown that the fraction of misclassified network nodes converges in probability to zero under maximum likelihood fitting when the number of classes is allowed to grow as the root of the network size and the average network degree grows at least poly-logarithmically in this size.

A mixture model for random graphs

The degree distribution and the clustering coefficient associated with this model are given, a variational method to estimate its parameters and a model selection criterion to select the number of classes are selected, which allows us to deal with large networks containing thousands of vertices.

Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications

This paper uses the cavity method of statistical physics to obtain an asymptotically exact analysis of the phase diagram of the stochastic block model, a commonly used generative model for social and biological networks, and develops a belief propagation algorithm for inferring functional groups or communities from the topology of the network.

Estimation and Prediction for Stochastic Blockstructures

A statistical approach to a posteriori blockmodeling for digraphs and valued digraphs is proposed. The probability model assumes that the vertices of the digraph are partitioned into several

Consistency of maximum-likelihood and variational estimators in the Stochastic Block Model

The identi ability of SBM is proved, while asymptotic properties of maximum-likelihood and variational esti- mators are provided, and the consistency of these estimators is settled, which is, to the best of the authors' knowledge, the rst result of this type for variational estimators with random graphs.

New consistent and asymptotically normal parameter estimates for random‐graph mixture models

It is established that the overall structure of an affiliation model can be (asymptotically) caught by the description of the network in terms of its number of triads and edges, when the number n of nodes increases to ∞.

The method of moments and degree distributions for network models

Probability models on graphs are becoming increasingly important in many applications, but statistical tools for fitting such models are not yet well developed. Here we propose a general method of

Spectral clustering and the high-dimensional stochastic blockmodel

The asymptotic results in th is paper are the first clustering results that allow the number of clusters in the model to grow with theNumber of nodes, hence the name high-dimensional.

Stochastic blockmodels: First steps

Graphical Models, Exponential Families, and Variational Inference

The variational approach provides a complementary alternative to Markov chain Monte Carlo as a general source of approximation methods for inference in large-scale statistical models.