Corpus ID: 1928684

Parallel Correlation Clustering on Big Graphs

@inproceedings{Pan2015ParallelCC,
  title={Parallel Correlation Clustering on Big Graphs},
  author={Xinghao Pan and Dimitris Papailiopoulos and Samet Oymak and Benjamin Recht and Kannan Ramchandran and Michael I. Jordan},
  booktitle={NIPS},
  year={2015}
}
Given a similarity graph between items, correlation clustering (CC) groups similar items together and dissimilar ones apart. One of the most popular CC algorithms is KwikCluster: an algorithm that serially clusters neighborhoods of vertices, and obtains a 3-approximation ratio. Unfortunately, KwikCluster in practice requires a large number of clustering rounds, a potential bottleneck for large graphs. We present C4 and ClusterWild!, two algorithms for parallel correlation clustering that run… Expand
Faster Deterministic Approximation Algorithms for Correlation Clustering and Cluster Deletion
TLDR
This paper proves new relationships between correlation clustering problems and edge labeling problems related to the principle of strong triadic closure, and develops faster techniques that are purely combinatorial, based on computing maximal matchings in certain auxiliary graphs and hypergraphs. Expand
Correlation Clustering in Constant Many Parallel Rounds
TLDR
This work proposes a massively parallel computation (MPC) algorithm for correlation clustering that is considerably faster than prior work and is the first that can provably approximate a clustering problem on graphs using only a constant number of MPC rounds in the sublinear memory regime. Expand
Scalable Community Detection via Parallel Correlation Clustering
TLDR
This paper develops a generalized sequential and shared-memory parallel framework based on the LAMBDACC objective, which encompasses modularity and correlation clustering, and shows that this framework improves the state-of-the-art trade-offs between speed and quality of scalable community detection. Expand
Motif and Hypergraph Correlation Clustering
TLDR
This work introduces several variants of motif correlation clustering and then describes polynomial-time clustering algorithms that provide constant approximation guarantees for the problems at hand, and shows that these clustering problems are NP-hard. Expand
Edge partitioning of large graphs
TLDR
The novel fashion vertex-cut is applied, instead of the traditional edge-cut method, for achieving balanced workload in distributed graph processing and the overhead of both communication and runtime can be decreased greatly, compared to existing approaches. Expand
A Distributed GPU-based Correlation Clustering Algorithm for Large-scale Signed Social Networks
When applied to signed networks, the Correlation Clustering (CC) problem consists of an important tool to study how balanced a social group behaves and if this group might evolve to a possibleExpand
Correlation Clustering in Data Streams
TLDR
This work develops data structures based on linear sketches that allow the "quality" of a given node-partition to be measured, and presents spaceefficient algorithms for the convex programming required, as well as approaches to reduce the adaptivity of the sampling. Expand
Correlation Clustering and Biclustering With Locally Bounded Errors
TLDR
This work provides a rounding algorithm which converts “fractional clusterings” into discrete clusterings while causing only a constant-factor blowup in the number of errors at each vertex. Expand
Parallelism in Randomized Incremental Algorithms
TLDR
It is shown that most sequential randomized incremental algorithms are in fact parallel, and three types of dependences found in the algorithms studied are identified and a framework for analyzing each type of algorithm is presented. Expand
Parallelism in Randomized Incremental Algorithms
TLDR
This article shows the first incremental Delaunay triangulation algorithm with optimal work and polylogarithmic depth, and identifies three types of algorithms based on their dependencies and presents a framework for analyzing each type. Expand
...
1
2
3
4
5
...

References

SHOWING 1-10 OF 32 REFERENCES
Correlation clustering in general weighted graphs
TLDR
An O(log n)-approximation algorithm is given for the general case based on a linear-programming rounding and the "region-growing" technique for Kr, r-minor-free graphs and it is proved that this linear program has a gap of Ω( log n), and therefore the approximation is tight under this approach. Expand
Correlation clustering with a fixed number of clusters
TLDR
This paper focuses on the situation when the number of clusters is stipulated to be a small constant k, and finds that for every k, there is a polynomial time approximation scheme for both maximizing agreements and minimizing disagreements. Expand
Correlation clustering in MapReduce
TLDR
This paper obtains a new algorithm for correlation clustering that is easily implementable in computational models such as MapReduce and streaming, and runs in a small number of rounds. Expand
Correlation clustering
  • N. Bansal, A. Blum, Shuchi Chawla
  • Mathematics, Computer Science
  • The 43rd Annual IEEE Symposium on Foundations of Computer Science, 2002. Proceedings.
  • 2002
TLDR
This formulation is motivated from a document clustering problem in which one has a pairwise similarity function f learned from past data, and the goal is to partition the current set of documents in a way that correlates with f as much as possible; it can also be viewed as a kind of "agnostic learning" problem. Expand
Near Optimal LP Rounding Algorithm for CorrelationClustering on Complete and Complete k-partite Graphs
TLDR
These results improve a long line of work on approximation algorithms for correlation clustering in complete graphs, previously culminating in a ratio of 2.5 by Ailon, Charikar and Newman. Expand
Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks
TLDR
Experiments performed show that combining the order produced by the proposed algorithm with the WebGraph compression framework provides a major increase in compression with respect to all currently known techniques, both on web graphs and on social networks. Expand
Correlation clustering: from theory to practice
TLDR
This tutorial is to show how correlation clustering can be a powerful addition to the toolkit of the data mining researcher and practitioner, and to encourage discussions and further research in the area. Expand
Greedy sequential maximal independent set and matching are parallel on average
TLDR
It is shown that for any graph, and for a random ordering of the vertices, the dependence length of the sequential greedy MIS algorithm is polylogarithmic (O(log^2 n) with high probability). Expand
Correlation Clustering: maximizing agreements via semidefinite programming
  • C. Swamy
  • Mathematics, Computer Science
  • SODA '04
  • 2004
TLDR
This work gives a 0.7666-approximation algorithm for maximizing agreements on any graph even when the edges have non-negative weights (along with labels) and they want to maximize the weight of agreements. Expand
Clustering with qualitative information
TLDR
A factor 4 approximation for minimization on complete graphs, and a factor O(logn) approximation for general graphs are demonstrated, and the APX-hardness of minimization of complete graphs is proved. Expand
...
1
2
3
4
...