# Scalable Community Detection via Parallel Correlation Clustering

@article{Shi2021ScalableCD,
title={Scalable Community Detection via Parallel Correlation Clustering},
author={Jessica Shi and Laxman Dhulipala and David Eisenstat and Jakub Lacki and Vahab S. Mirrokni},
journal={Proc. VLDB Endow.},
year={2021},
volume={14},
pages={2305-2313}
}
• Published 1 July 2021
• Computer Science
• Proc. VLDB Endow.
Graph clustering and community detection are central problems in modern data mining. The increasing need for analyzing billion-scale data calls for faster and more scalable algorithms for these problems. There are certain trade-offs between the quality and speed of such clustering algorithms. In this paper, we design scalable algorithms that achieve high quality when evaluated based on ground truth. We develop a generalized sequential and shared-memory parallel framework based on the LAMBDACC…
6 Citations

## Figures and Tables from this paper

This paper proves new relationships between correlation clustering problems and edge labeling problems related to the principle of strong triadic closure, and develops faster techniques that are purely combinatorial, based on computing maximal matchings in certain auxiliary graphs and hypergraphs.
• Computer Science
ArXiv
• 2022
It is shown that ParHAC obtains a 50.1x speedup on average over the best sequential baseline, while achieving quality similar to the exact HAC algorithm, and can cluster one of the largest publicly available graph datasets with 124 billion edges in a little over three hours using a commodity multicore machine.
• Computer Science
2022 IEEE 63rd Annual Symposium on Foundations of Computer Science (FOCS)
• 2022
This work builds on the work of Yoshida, Yamamoto, and Ito on bounding the “query complexity” of greedy maximal independent set in analyzing the approximation ratio of any algorithm, and introduces a simple $O(1/\varepsilon$)-round parallel algorithm.
• Computer Science
ITCS
• 2022
A new approach for solving (minimum disagreement) correlation clustering that results in sublinear algorithms with highly efficient time and space complexity for this problem is presented, with a novel connection to sparse-dense graph decompositions that are used extensively in the graph coloring literature.
This work presents faster approximation algorithms that avoid linear programming relaxations, for two well-studied special cases: cluster editing and cluster deletion, by draw-ing new connections to edge labeling problems related to the principle of strong triadic closure.
• Computer Science, Mathematics
ArXiv
• 2022
A linear time constant-approximate algorithm is given for the objective of graph clustering, which implies the first constant-factor approximation algorithms for normalized modularity and normalized associations.

## References

SHOWING 1-10 OF 45 REFERENCES

• Computer Science
IEEE Transactions on Parallel and Distributed Systems
• 2016
This work designs and implements efficient parallel community detection heuristics; the first large-scale parallelization of the well-known Louvain method, as well as an extension of the method adding refinement; and an ensemble scheme combining the above.
• Computer Science
2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum
• 2012
This work improves performance of their parallel community detection algorithm by 20% on the massively multithreaded Cray XMT, evaluates its performance on the next-generation CrayXMT2, and extends its reach to Intel-based platforms with OpenMP.
• Computer Science
2017 IEEE High Performance Extreme Computing Conference (HPEC)
• 2017
This work presents several parallelization heuristics for fast community detection using the Louvain method as the serial template and implements them in a software library called Grappolo, which is used on static graphs as the first step towards community detection on streaming graphs.
• Computer Science
2014 IEEE International Parallel & Distributed Processing Symposium Workshops
• 2014
Compared to the serial Louvain implementation, the parallel implementation is able to produce community outputs with a higher modularity for most of the inputs tested, in comparable number of iterations, while providing real speedups of up to 8× using 32 threads.
• Computer Science
WWW
• 2014
Scalable Community Detection is proposed, a novel disjoint community detection algorithm that is able to run up to two orders of magnitude faster than practical existing solutions by exploiting the parallelism of current multi-core processors, enabling us to process graphs of unprecedented size in short execution times.
• Computer Science
2015 IEEE International Conference on Cluster Computing
• 2015
This work designs a parallel hierarchical graph clustering algorithm that uses modularity as clustering criteria to effectively extract community structures in large graphs of different types by investigating graph partitioning and distribution schemes on distributed memory architectures and conducting clustering in a divide-and-conquer manner.
• Computer Science
2015 IEEE International Parallel and Distributed Processing Symposium
• 2015
This paper presents and evaluates a parallel community detection algorithm derived from the state-of-the-art Louvain modularity maximization method, which is able to parallelize graphs with up to 138 billion edges on 8, 192 Blue Gene/Q nodes and 1, 024 P7-IH nodes.
• Computer Science
WWW
• 2018
This paper introduces a new community detection framework called LambdaCC that is based on a specially weighted version of correlation clustering, and shows that, by increasing this parameter, its objective effectively interpolates between two different strategies in graph clustering: finding a sparse cut and forming dense subgraphs.
• Computer Science
2018 IEEE International Parallel and Distributed Processing Symposium (IPDPS)
• 2018
The design of a distributed memory implementation of the Louvain algorithm for parallel community detection is presented, which begins with an arbitrarily partitioned distributed graph input, and employs several heuristics to speedup the computation of the different steps of theLouVain algorithm.