Evaluating Overfit and Underfit in Models of Network Community Structure

@article{Ghasemian2020EvaluatingOA,
  title={Evaluating Overfit and Underfit in Models of Network Community Structure},
  author={Amir Ghasemian and Homa Hosseinmardi and Aaron Clauset},
  journal={IEEE Transactions on Knowledge and Data Engineering},
  year={2020},
  volume={32},
  pages={1722-1735}
}
A common graph mining task is community detection, which seeks an unsupervised decomposition of a network into groups based on statistical regularities in network connectivity. Although many such algorithms exist, community detection's No Free Lunch theorem implies that no algorithm can be optimal across all inputs. However, little is known in practice about how different algorithms over or underfit to real networks, or how to reliably assess such behavior across algorithms. Here, we present a… 

Figures and Tables from this paper

Community Detection in Bipartite Networks with Stochastic Blockmodels
TLDR
This work introduces a Bayesian nonparametric formulation of the stochastic block model and a corresponding algorithm to efficiently find communities in bipartite networks which parsimoniously chooses the number of communities, and expands the understanding of the complicated optimization landscape associated with community detection tasks.
Discovering Communities of Community Discovery
  • M. Coscia
  • Computer Science
    2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)
  • 2019
TLDR
This paper creates an Algorithm Similarity Network (ASN), whose nodes are the community detection approaches, connected if they return similar groupings, and discovers that the ASN contains well-separated groups, making it a sensible tool for practitioners, aiding their choice of algorithms fitting their analytic needs.
Community Detection in Dynamic Networks: Equivalence Between Stochastic Blockmodels and Evolutionary Spectral Clustering
TLDR
This paper introduces a novel dynamic SBM where the evolution of communities over time is modeled with pairwise Markov random fields and shows the equivalence of evolutionary spectral clustering to a variant of dynamic stochastic blockmodel.
On community structure in complex networks: challenges and opportunities
TLDR
This work focuses on generative models of communities in complex networks and their role in developing strong foundation for community detection algorithms, and introduces deterministic strategies that have proven to be very efficient in controlling the epidemic outbreaks, but require complete knowledge of the network.
Community structure: A comparative evaluation of community detection methods
TLDR
This paper provides comprehensive analyses on computation time, community size distribution, a comparative evaluation of methods according to their optimization schemes as well as a comparison of their partitioning strategy through validation metrics, and proposes ways to classify community detection methods.
Revealing consensus and dissensus between network partitions
TLDR
This work provides a comprehensive set of methods designed to characterize and summarize complex populations of partitions in a manner that captures not only the existing consensus, but also the dissensus between elements of the population.
Using bootstrap procedures for testing the modular partition inferred via leading eigenvector community detection algorithm
TLDR
An adapted bootstrap-based procedure based on Shimodaira’s multiscale bootstrap algorithm to derive approximately unbiased p-values for the module partitions of observations datasets is proposed.
Descriptive vs. inferential community detection: pitfalls, myths and half-truths
TLDR
It is argued that inferential methods are more typically aligned with clearer scientific questions, yield more robust results, and should be in many cases preferred.
Estimating the Similarity of Community Detection Methods Based on Cluster Size Distribution
TLDR
This paper proposes a novel approach to estimate the similarity between community detection methods using the size density distributions of communities that they detect and shows that there is a very clear distinction between the partitioning strategies of differentcommunity detection methods.
...
...

References

SHOWING 1-10 OF 78 REFERENCES
Learning Latent Block Structure in Weighted Networks
TLDR
This model learns from both the presence and weight of edges, allowing it to discover structure that would otherwise be hidden when weights are discarded or thresholded, and a Bayesian variational algorithm is described for efficiently approximating this model's posterior distribution over latent block structures.
Hypothesis testing for automated community detection in networks
TLDR
This work theoretically establishes the limiting distribution of the principal eigenvalue of the suitably centred and scaled adjacency matrix and uses that distribution for the test of the hypothesis that a random graph is of Erdős–Rényi (noise) type, and designs a recursive bipartitioning algorithm, which naturally uncovers nested community structure.
Finding Statistically Significant Communities in Networks
TLDR
OSLOM (Order Statistics Local Optimization Method), the first method capable to detect clusters in networks accounting for edge directions, edge weights, overlapping communities, hierarchies and community dynamics, is presented.
Finding and evaluating community structure in networks.
  • M. Newman, M. Girvan
  • Computer Science
    Physical review. E, Statistical, nonlinear, and soft matter physics
  • 2004
TLDR
It is demonstrated that the algorithms proposed are highly effective at discovering community structure in both computer-generated and real-world network data, and can be used to shed light on the sometimes dauntingly complex structure of networked systems.
Statistical properties of community structure in large social and information networks
TLDR
It is found that a generative model, in which new edges are added via an iterative "forest fire" burning process, is able to produce graphs exhibiting a network community structure similar to that observed in nearly every network dataset examined.
Spectral redemption in clustering sparse networks
TLDR
A way of encoding sparse data using a “nonbacktracking” matrix, and it is shown that the corresponding spectral algorithm performs optimally for some popular generative models, including the stochastic block model.
node2vec: Scalable Feature Learning for Networks
TLDR
In node2vec, an algorithmic framework for learning continuous feature representations for nodes in networks, a flexible notion of a node's network neighborhood is defined and a biased random walk procedure is designed, which efficiently explores diverse neighborhoods.
Community detection in networks: Structural communities versus ground truth
TLDR
It is shown that traditional community detection methods fail to find the metadata groups in many large networks, and that either the current modeling of community structure has to be substantially modified, or that metadata groups may not be recoverable from topology alone.
The ground truth about metadata and community detection in networks
TLDR
It is proved that no algorithm can uniquely solve community detection, and a general No Free Lunch theorem for community detection is proved, which implies that there can be no algorithm that is optimal for all possible community detection tasks.
Hierarchical block structures and high-resolution model selection in large networks
TLDR
A nested generative model is constructed that, through a complete description of the entire network hierarchy at multiple scales, enables the detection of modular structure at levels far beyond those possible with current approaches, and is based on the principle of parsimony.
...
...