Scalable Community Detection via Parallel Correlation Clustering

@article{Shi2021ScalableCD,
  title={Scalable Community Detection via Parallel Correlation Clustering},
  author={Jessica Shi and Laxman Dhulipala and David Eisenstat and Jakub Lacki and Vahab S. Mirrokni},
  journal={Proc. VLDB Endow.},
  year={2021},
  volume={14},
  pages={2305-2313}
}
Graph clustering and community detection are central problems in modern data mining. The increasing need for analyzing billion-scale data calls for faster and more scalable algorithms for these problems. There are certain trade-offs between the quality and speed of such clustering algorithms. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth. We develop a generalized sequential and shared-memory parallel framework based on the LAMBDACC… 

Faster Deterministic Approximation Algorithms for Correlation Clustering and Cluster Deletion

This paper proves new relationships between correlation clustering problems and edge labeling problems related to the principle of strong triadic closure, and develops faster techniques that are purely combinatorial, based on computing maximal matchings in certain auxiliary graphs and hypergraphs.

Hierarchical Agglomerative Graph Clustering in Poly-Logarithmic Depth

It is shown that ParHAC obtains a 50.1x speedup on average over the best sequential baseline, while achieving quality similar to the exact HAC algorithm, and can cluster one of the largest publicly available graph datasets with 124 billion edges in a little over three hours using a commodity multicore machine.

Almost 3-Approximate Correlation Clustering in Constant Rounds

This work builds on the work of Yoshida, Yamamoto, and Ito on bounding the β€œquery complexity” of greedy maximal independent set in analyzing the approximation ratio of any algorithm, and introduces a simple $O(1/\varepsilon$)-round parallel algorithm.

Sublinear Time and Space Algorithms for Correlation Clustering via Sparse-Dense Decompositions

A new approach for solving (minimum disagreement) correlation clustering that results in sublinear algorithms with highly efficient time and space complexity for this problem is presented, with a novel connection to sparse-dense graph decompositions that are used extensively in the graph coloring literature.

Correlation Clustering via Strong Triadic Closure Labeling: Fast Approximation Algorithms and Practical Lower Bounds

This work presents faster approximation algorithms that avoid linear programming relaxations, for two well-studied special cases: cluster editing and cluster deletion, by draw-ing new connections to edge labeling problems related to the principle of strong triadic closure.

Constant Approximation for Normalized Modularity and Associations Clustering

A linear time constant-approximate algorithm is given for the objective of graph clustering, which implies the first constant-factor approximation algorithms for normalized modularity and normalized associations.

References

SHOWING 1-10 OF 45 REFERENCES

Engineering Parallel Algorithms for Community Detection in Massive Networks

This work designs and implements efficient parallel community detection heuristics; the first large-scale parallelization of the well-known Louvain method, as well as an extension of the method adding refinement; and an ensemble scheme combining the above.

Scalable Multi-threaded Community Detection in Social Networks

This work improves performance of their parallel community detection algorithm by 20% on the massively multithreaded Cray XMT, evaluates its performance on the next-generation CrayXMT2, and extends its reach to Intel-based platforms with OpenMP.

Scalable static and dynamic community detection using Grappolo

This work presents several parallelization heuristics for fast community detection using the Louvain method as the serial template and implements them in a software library called Grappolo, which is used on static graphs as the first step towards community detection on streaming graphs.

Parallel Heuristics for Scalable Community Detection

Compared to the serial Louvain implementation, the parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number of iterations, while providing real speedups of up to 8Γ— using 32 threads.

High quality, scalable and parallel community detection for large real graphs

Scalable Community Detection is proposed, a novel disjoint community detection algorithm that is able to run up to two orders of magnitude faster than practical existing solutions by exploiting the parallelism of current multi-core processors, enabling us to process graphs of unprecedented size in short execution times.

Parallel Modularity-Based Community Detection on Large-Scale Graphs

This work designs a parallel hierarchical graph clustering algorithm that uses modularity as clustering criteria to effectively extract community structures in large graphs of different types by investigating graph partitioning and distribution schemes on distributed memory architectures and conducting clustering in a divide-and-conquer manner.

Scalable Community Detection with the Louvain Algorithm

This paper presents and evaluates a parallel community detection algorithm derived from the state-of-the-art Louvain modularity maximization method, which is able to parallelize graphs with up to 138 billion edges on 8, 192 Blue Gene/Q nodes and 1, 024 P7-IH nodes.

A Correlation Clustering Framework for Community Detection

This paper introduces a new community detection framework called LambdaCC that is based on a specially weighted version of correlation clustering, and shows that, by increasing this parameter, its objective effectively interpolates between two different strategies in graph clustering: finding a sparse cut and forming dense subgraphs.

Distributed Louvain Algorithm for Graph Community Detection

The design of a distributed memory implementation of the Louvain algorithm for parallel community detection is presented, which begins with an arbitrarily partitioned distributed graph input, and employs several heuristics to speedup the computation of the different steps of theLouVain algorithm.

Adaptive parallel Louvain community detection on a multicore platform