# Parallel Correlation Clustering on Big Graphs

@inproceedings{Pan2015ParallelCC, title={Parallel Correlation Clustering on Big Graphs}, author={Xinghao Pan and Dimitris Papailiopoulos and Samet Oymak and Benjamin Recht and Kannan Ramchandran and Michael I. Jordan}, booktitle={NIPS}, year={2015} }

Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obtains a 3-approximation ratio. Unfortunately, KwikCluster in practice requires a large number of clustering rounds, a potential bottleneck for large graphs.
We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run… Expand

#### 55 Citations

Faster Deterministic Approximation Algorithms for Correlation Clustering and Cluster Deletion

- Computer Science
- ArXiv
- 2021

This paper proves new relationships between correlation clustering problems and edge labeling problems related to the principle of strong triadic closure, and develops faster techniques that are purely combinatorial, based on computing maximal matchings in certain auxiliary graphs and hypergraphs. Expand

Correlation Clustering in Constant Many Parallel Rounds

- Computer Science
- ICML
- 2021

This work proposes a massively parallel computation (MPC) algorithm for correlation clustering that is considerably faster than prior work and is the first that can provably approximate a clustering problem on graphs using only a constant number of MPC rounds in the sublinear memory regime. Expand

Scalable Community Detection via Parallel Correlation Clustering

- Computer Science
- Proc. VLDB Endow.
- 2021

This paper develops a generalized sequential and shared-memory parallel framework based on the LAMBDACC objective, which encompasses modularity and correlation clustering, and shows that this framework improves the state-of-the-art trade-offs between speed and quality of scalable community detection. Expand

Motif and Hypergraph Correlation Clustering

- Computer Science
- IEEE Transactions on Information Theory
- 2020

This work introduces several variants of motif correlation clustering and then describes polynomial-time clustering algorithms that provide constant approximation guarantees for the problems at hand, and shows that these clustering problems are NP-hard. Expand

Edge partitioning of large graphs

- Computer Science
- 2017

The novel fashion vertex-cut is applied, instead of the traditional edge-cut method, for achieving balanced workload in distributed graph processing and the overhead of both communication and runtime can be decreased greatly, compared to existing approaches. Expand

A Distributed GPU-based Correlation Clustering Algorithm for Large-scale Signed Social Networks

- 2017

When applied to signed networks, the Correlation Clustering (CC) problem consists of an important tool to study how balanced a social group behaves and if this group might evolve to a possible… Expand

Correlation Clustering in Data Streams

- Computer Science
- ICML
- 2015

This work develops data structures based on linear sketches that allow the "quality" of a given node-partition to be measured, and presents spaceefficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling. Expand

Correlation Clustering and Biclustering With Locally Bounded Errors

- Computer Science, Mathematics
- IEEE Transactions on Information Theory
- 2018

This work provides a rounding algorithm which converts “fractional clusterings” into discrete clusterings while causing only a constant-factor blowup in the number of errors at each vertex. Expand

Parallelism in Randomized Incremental Algorithms

- Computer Science, Mathematics
- SPAA
- 2016

It is shown that most sequential randomized incremental algorithms are in fact parallel, and three types of dependences found in the algorithms studied are identified and a framework for analyzing each type of algorithm is presented. Expand

Parallelism in Randomized Incremental Algorithms

- Computer Science
- J. ACM
- 2020

This article shows the first incremental Delaunay triangulation algorithm with optimal work and polylogarithmic depth, and identifies three types of algorithms based on their dependencies and presents a framework for analyzing each type. Expand

#### References

SHOWING 1-10 OF 32 REFERENCES

Correlation clustering in general weighted graphs

- Computer Science, Mathematics
- Theor. Comput. Sci.
- 2006

An O(log n)-approximation algorithm is given for the general case based on a linear-programming rounding and the "region-growing" technique for Kr, r-minor-free graphs and it is proved that this linear program has a gap of Ω( log n), and therefore the approximation is tight under this approach. Expand

Correlation clustering with a fixed number of clusters

- Computer Science, Mathematics
- SODA '06
- 2006

This paper focuses on the situation when the number of clusters is stipulated to be a small constant k, and finds that for every k, there is a polynomial time approximation scheme for both maximizing agreements and minimizing disagreements. Expand

Correlation clustering in MapReduce

- Computer Science
- KDD
- 2014

This paper obtains a new algorithm for correlation clustering that is easily implementable in computational models such as MapReduce and streaming, and runs in a small number of rounds. Expand

Correlation clustering

- Mathematics, Computer Science
- The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings.
- 2002

This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of "agnostic learning" problem. Expand

Near Optimal LP Rounding Algorithm for CorrelationClustering on Complete and Complete k-partite Graphs

- Computer Science, Mathematics
- STOC
- 2015

These results improve a long line of work on approximation algorithms for correlation clustering in complete graphs, previously culminating in a ratio of 2.5 by Ailon, Charikar and Newman. Expand

Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks

- Computer Science, Physics
- WWW
- 2011

Experiments performed show that combining the order produced by the proposed algorithm with the WebGraph compression framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks. Expand

Correlation clustering: from theory to practice

- Computer Science, Mathematics
- KDD
- 2014

This tutorial is to show how correlation clustering can be a powerful addition to the toolkit of the data mining researcher and practitioner, and to encourage discussions and further research in the area. Expand

Greedy sequential maximal independent set and matching are parallel on average

- Mathematics, Computer Science
- SPAA '12
- 2012

It is shown that for any graph, and for a random ordering of the vertices, the dependence length of the sequential greedy MIS algorithm is polylogarithmic (O(log^2 n) with high probability). Expand

Correlation Clustering: maximizing agreements via semidefinite programming

- Mathematics, Computer Science
- SODA '04
- 2004

This work gives a 0.7666-approximation algorithm for maximizing agreements on any graph even when the edges have non-negative weights (along with labels) and they want to maximize the weight of agreements. Expand

Clustering with qualitative information

- Computer Science, Mathematics
- J. Comput. Syst. Sci.
- 2005

A factor 4 approximation for minimization on complete graphs, and a factor O(logn) approximation for general graphs are demonstrated, and the APX-hardness of minimization of complete graphs is proved. Expand