Scaling up Correlation Clustering through Parallelism and Concurrency Control

@inproceedings{Pan2014ScalingUC,
  title={Scaling up Correlation Clustering through Parallelism and Concurrency Control},
  author={Xinghao Pan and Dimitris S. Papailiopoulos},
  year={2014}
}
Given a similarity graph between items, correlation clustering (CC) aims to group similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: a simple peeling scheme that offers a 3-approximation ratio. Unfortunately, KwikCluster is inherently sequential and can require a large number of peeling rounds. This can be a significant bottleneck when scaling up to big graphs. Recent proposals to parallelize KwikCluster encounter challenges in scaling up… CONTINUE READING

Citations

Publications citing this paper.

References

Publications referenced by this paper.
Showing 1-10 of 18 references

An evaluation of clustering algorithms in duplicate detection

Bilal Hussain, Oktie Hassanzadeh, Fei Chiang, Hyun Chul Lee, Renée J Miller
Technical report, • 2013
View 1 Excerpt

Large-Scale Deduplication with Constraints Using Dedupalog

2009 IEEE 25th International Conference on Data Engineering • 2009
View 1 Excerpt

Similar Papers

Loading similar papers…