Scaling up Correlation Clustering through Parallelism and Concurrency Control

  title={Scaling up Correlation Clustering through Parallelism and Concurrency Control},
  author={Xinghao Pan and Dimitris S. Papailiopoulos},
Given a similarity graph between items, correlation clustering (CC) aims to group similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: a simple peeling scheme that offers a 3-approximation ratio. Unfortunately, KwikCluster is inherently sequential and can require a large number of peeling rounds. This can be a significant bottleneck when scaling up to big graphs. Recent proposals to parallelize KwikCluster encounter challenges in scaling up… CONTINUE READING


Publications citing this paper.


Publications referenced by this paper.
Showing 1-10 of 18 references

An evaluation of clustering algorithms in duplicate detection

Bilal Hussain, Oktie Hassanzadeh, Fei Chiang, Hyun Chul Lee, Renée J Miller
Technical report, • 2013
View 1 Excerpt

Large-Scale Deduplication with Constraints Using Dedupalog

2009 IEEE 25th International Conference on Data Engineering • 2009
View 1 Excerpt

Similar Papers

Loading similar papers…