Pseudo-likelihood methods for community detection in large sparse networks

  title={Pseudo-likelihood methods for community detection in large sparse networks},
  author={Arash A. Amini and Aiyou Chen and Peter J. Bickel and Elizaveta Levina},
  journal={Annals of Statistics},
Many algorithms have been proposed for fitting network models with communities, but most of them do not scale well to large networks, and often fail on sparse networks. Here we propose a new fast pseudo-likelihood method for fitting the stochastic block model for networks, as well as a variant that allows for an arbitrary degree distribution by conditioning on degrees. We show that the algorithms perform well under a range of settings, including on very sparse networks, and illustrate on the… 

Figures from this paper

Fast Network Community Detection with Profile-Pseudo Likelihood Methods.

A novel likelihood based approach that decouples row and column labels in the likelihood function, which enables a fast alternating maximization; the new method is computationally efficient, performs well for both small and large scale networks, and has provable convergence guarantee.

A pseudo-likelihood approach to community detection in weighted networks

It is proved that the estimates obtained by the proposed pseudo-likelihood community estimation algorithm are consistent under the assumption of homogeneous networks, a weighted analogue of the planted partition model, and show that they work well in practice for both homogeneous and heterogeneous networks.

Detecting Overlapping Communities in Networks Using Spectral Methods

An efficient spectral algorithm for estimating the community memberships is developed, which deals with the overlaps by employing the K-medians algorithm rather than the usual K-means for clustering in the spectral domain.

Spectral redemption in clustering sparse networks

A way of encoding sparse data using a “nonbacktracking” matrix, and it is shown that the corresponding spectral algorithm performs optimally for some popular generative models, including the stochastic block model.

Distribution-Free Models for Community Detection

This paper develops an efficient spectral algorithm to fit a DistributionFree Models (DFM) model for networks in which nodes are partitioned into different communities, and introduces a noise matrix to show that the proposed algorithm stably yields consistent community detection under DFM.

Multiple Hypothesis Testing To Estimate The Number of Communities in Sparse Stochastic Block Models

Network-based clustering methods frequently require the number of communities to be specified a priori. Moreover, most of the existing methods for estimating the number of communities assume the

Covariate Regularized Community Detection in Sparse Graphs

This article examines sparse networks in conjunction with finite dimensional sub-Gaussian mixtures as covariates under moderate separation conditions and proposes a simple optimization framework which improves clustering accuracy when the two sources carry partial information about the cluster memberships, and hence perform poorly on their own.

Estimating the number of communities in networks by spectral methods

This work proposes a simple and very fast method for estimating the number of communities based on the spectral properties of certain graph operators, such as the non-backtracking matrix and the Bethe Hessian matrix, which performs well under several models and a wide range of parameters.

Determining the Number of Communities in Degree-corrected Stochastic Block Models

This work introduces a method that combines spectral clustering with binary segmentation and guarantees an upper bound for the pseudo likelihood ratio statistic when the model is over-fitted, and establishes the consistency of the estimator for the true number of communities.

A survey on theoretical advances of community detection in networks

A survey on the recent theoretical advances of community detection, including graph cut methods, profile likelihoods, the pseudo‐likelihood method, the variational method, belief propagation, spectral clustering, and semidefinite relaxations of the stochastic blockmodel.



An efficient and principled method for detecting communities in networks

This work describes a method for finding overlapping communities based on a principled statistical approach using generative network models and shows how the method can be implemented using a fast, closed-form expectation-maximization algorithm that allows us to analyze networks of millions of nodes in reasonable running times.

Spectral clustering and the high-dimensional stochastic blockmodel

The asymptotic results in th is paper are the first clustering results that allow the number of clusters in the model to grow with theNumber of nodes, hence the name high-dimensional.

Stochastic blockmodels and community structure in networks

  • B. KarrerM. Newman
  • Computer Science
    Physical review. E, Statistical, nonlinear, and soft matter physics
  • 2011
This work demonstrates how the generalization of blockmodels to incorporate this missing element leads to an improved objective function for community detection in complex networks and proposes a heuristic algorithm forcommunity detection using this objective function or its non-degree-corrected counterpart.

Null models for network data

This work shows how the logistic-linear model and the implicit log- linear model may be viewed as instances of a broader class of null models, with the property that all members of this class give rise to essentially the same likelihood-based estimates of link probabilities in sparse graph regimes.

Consistency of community detection in networks under degree-corrected stochastic block models

It is found that methods based on the degree-corrected stochastic block model are consistent under a wider class of models and that modularity-type methods require parameter constraints for consistency, whereas likelihood-based methods do not.

Detecting community structure in networks

A number of more recent algorithms that appear to work well with real-world network data, including algorithms based on edge betweenness scores, on counts of short loops in networks and on voltage differences in resistor networks are described.

Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications

This paper uses the cavity method of statistical physics to obtain an asymptotically exact analysis of the phase diagram of the stochastic block model, a commonly used generative model for social and biological networks, and develops a belief propagation algorithm for inferring functional groups or communities from the topology of the network.

Mixture models and exploratory analysis in networks

A general technique for detecting structural features in large-scale network data that works by dividing the nodes of a network into classes such that the members of each class have similar patterns of connection to other nodes is described.

Uncovering latent structure in valued graphs: A variational approach

This work presents a model-based strategy to uncover groups of nodes in valued graphs that can be used for a wide span of parametric random graphs models and allows to include covariates.

Finding and evaluating community structure in networks.

  • M. NewmanM. Girvan
  • Computer Science
    Physical review. E, Statistical, nonlinear, and soft matter physics
  • 2004
It is demonstrated that the algorithms proposed are highly effective at discovering community structure in both computer-generated and real-world network data, and can be used to shed light on the sometimes dauntingly complex structure of networked systems.