Clique Counting in MapReduce

@article{Finocchi2015CliqueCI,
  title={Clique Counting in MapReduce},
  author={Irene Finocchi and Marco Finocchi and Emanuele G. Fusco},
  journal={Journal of Experimental Algorithmics (JEA)},
  year={2015},
  volume={20},
  pages={1 - 20}
}
We tackle the problem of counting the number qk of k-cliques in large-scale graphs, for any constant k ≥ 3. Clique counting is essential in a variety of applications, including social network analysis. Our algorithms make it possible to compute qk for several real-world graphs and shed light on its growth rate as a function of k. Even for small values of k, the number qk of k-cliques can be in the order of tens or hundreds of trillions. As k increases, different graph instances show different… 
The Power of Pivoting for Exact Clique Counting
TLDR
The main insight is the construction of a Succinct Clique Tree (SCT) that stores a compressed unique representation of all cliques in an input graph using a technique called pivoting, a classic approach by Bron-Kerbosch to reduce the recursion tree of backtracking algorithms for maximal cliques.
Counting cliques in parallel without a cluster: Engineering a fork/join algorithm for shared-memory platforms
TLDR
Simple and fast multicore parallel algorithms for counting the number of k-cliques in large undirected graphs, for any small constant k, that largely outperform the running times of highly optimized sequential solutions and gracefully scale to non-trivial values of k even on medium/large graphs are developed.
Listing k-cliques in Sparse Real-World Graphs*
TLDR
This work revisits the iconic algorithm of Chiba and Nishizeki and develops the most efficient parallel algorithm for list all k-cliques in graphs containing up to tens of millions of edges, which is faster than state-of-the-art algorithms, while boasting an excellent degree of parallelism.
Ordering heuristics for k-clique listing
TLDR
A comprehensive comparison of all the state-of-the-art k-clique listing and counting algorithms is presented and a new color ordering heuristics based on greedy graph coloring techniques which is able to significantly prune the unpromising search paths is proposed.
On approximating the number of k-cliques in sublinear time
TLDR
A general algorithm is obtained that works for any k≥ 3 by designing a procedure that samples each k-clique incident to a given set S of vertices with approximately equal probability, and it is proved that the query complexity of the algorithm is essentially optimal.
A Fast and Provable Method for Estimating Clique Counts Using Turán's Theorem
TLDR
A new randomized algorithm is presented that provably approximates the number of k-cliques, for any constant k, using the classic Turán's theorem, and it is found that TURÀN-SHADOW is generally much faster and more accurate than a range of other sampling algorithms.
Provably and Efficiently Approximating Near-cliques using the Turán Shadow: PEANUTS
TLDR
A space efficient adaptation of the Turán Shadow sampling approach, recently introduced by Jain and Seshadhri, that constructs a large recursion tree that represents cliques in a graph, and designs a novel algorithm that builds an estimator for near-cliques, using a online, compact construction of theTurán Shadow.
Triangle counting in large networks: a review
TLDR
The existing methods of triangle counting, ranging from sequential to parallel, single‐machine to distributed, exact to approximate, and off‐line to streaming are discussed, and experimental results of performance comparison among a set of approximate triangle counting methods are presented.
How the Degeneracy Helps for Triangle Counting in Graph Streams
TLDR
A streaming algorithm is given that can circumvent the lower bound for low degeneracy graphs, ~O(mK/T), which is significantly smaller than both m3/2 /T and m/√T.
D S ] 1 1 N ov 2 01 8 Faster sublinear approximations of k-cliques for low arboricity graphs Talya Eden
Given query access to an undirected graph G, we consider the problem of computing a (1 ± ε)-approximation of the number of k-cliques in G. The standard query model for general graphs allows for
...
1
2
3
...

References

SHOWING 1-10 OF 59 REFERENCES
Counting triangles and the curse of the last reducer
TLDR
This work describes a sequential triangle counting algorithm and shows how to adapt it to the MapReduce setting, and presents a new algorithm designed specifically for the Map Reduce framework that achieves a factor of 10-100 speed up over the naive approach.
DOULION: counting triangles in massive graphs with a coin
TLDR
A practical method, out of which all triangle counting algorithms can potentially benefit, is proposed, which works with high accuracy, typically more than 99% and gives significant speedups, resulting in even ≈ 130 times faster performance.
Estimating Clustering Indexes in Data Streams
TLDR
Random sampling algorithms that with probability at least 1 - δ compute a (1 ± Ɛ)-approximation of the clustering coefficient and of the number of bipartite clique subgraphs of a graph given as an incidence stream of edges provide a basic tool to analyze the structure of dense clusters in large graphs.
Counting Arbitrary Subgraphs in Data Streams
TLDR
This work provides the first non-trivial estimator for approximately counting the number of occurrences of an arbitrary subgraph H of constant size in a (large) graph G and demonstrates the applicability of the estimator by analyzing its concentration for several graphs H and the case where G is a power law graph.
Colorful triangle counting and a MapReduce implementation
TLDR
A new randomized algorithm for counting triangles in graphs that colors the vertices of G with N=1/p colors uniformly at random, counts monochromatic triangles, i.e., triangles whose vertices have the same color, and scales that count appropriately.
MapReduce Triangle Enumeration With Guarantees
TLDR
This work is the first to give guarantees on the maximum load of each reducer for an arbitrary input graph, and is competitive with existing methods improving the performance by a factor up to 2X, and can significantly increase the size of datasets that can be processed.
PATRIC: a parallel algorithm for counting triangles in massive networks
TLDR
An efficient MPI-based distributed memory parallel algorithm, called PATRIC, for counting triangles in massive networks, which scales well to networks with billions of nodes and can compute the exact number of triangles in a network with one billion nodes and 10 billion edges in 16 minutes.
Mining Large Networks with Subgraph Counting
TLDR
Data-stream algorithms that approximate the number of all subgraphs of three and four vertices in directed and undirected networks are developed and achieve very good precision in clustering networks with similar structure.
Counting triangles in data streams
TLDR
Two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges are presented and they provide a basic tool to analyze the structure of large graphs.
Enumerating subgraph instances using map-reduce
TLDR
This paper exploits the techniques of [1] for computing multiway joins (evaluating conjunctive queries) in a single map-reduce round for the simplest sample graph, the triangle, and addresses the matter of optimizing computation cost.
...
1
2
3
4
5
...