# Clique Counting in MapReduce

@article{Finocchi2015CliqueCI, title={Clique Counting in MapReduce}, author={Irene Finocchi and Marco Finocchi and Emanuele G. Fusco}, journal={Journal of Experimental Algorithmics (JEA)}, year={2015}, volume={20}, pages={1 - 20} }

We tackle the problem of counting the number qk of k-cliques in large-scale graphs, for any constant k ≥ 3. Clique counting is essential in a variety of applications, including social network analysis. Our algorithms make it possible to compute qk for several real-world graphs and shed light on its growth rate as a function of k. Even for small values of k, the number qk of k-cliques can be in the order of tens or hundreds of trillions. As k increases, different graph instances show different…

## 27 Citations

The Power of Pivoting for Exact Clique Counting

- Computer ScienceWSDM
- 2020

The main insight is the construction of a Succinct Clique Tree (SCT) that stores a compressed unique representation of all cliques in an input graph using a technique called pivoting, a classic approach by Bron-Kerbosch to reduce the recursion tree of backtracking algorithms for maximal cliques.

Counting cliques in parallel without a cluster: Engineering a fork/join algorithm for shared-memory platforms

- Computer ScienceInf. Sci.
- 2019

Simple and fast multicore parallel algorithms for counting the number of k-cliques in large undirected graphs, for any small constant k, that largely outperform the running times of highly optimized sequential solutions and gracefully scale to non-trivial values of k even on medium/large graphs are developed.

Listing k-cliques in Sparse Real-World Graphs*

- Computer ScienceWWW
- 2018

This work revisits the iconic algorithm of Chiba and Nishizeki and develops the most efficient parallel algorithm for list all k-cliques in graphs containing up to tens of millions of edges, which is faster than state-of-the-art algorithms, while boasting an excellent degree of parallelism.

Ordering heuristics for k-clique listing

- Computer ScienceProc. VLDB Endow.
- 2020

A comprehensive comparison of all the state-of-the-art k-clique listing and counting algorithms is presented and a new color ordering heuristics based on greedy graph coloring techniques which is able to significantly prune the unpromising search paths is proposed.

On approximating the number of k-cliques in sublinear time

- Computer Science, MathematicsSTOC
- 2018

A general algorithm is obtained that works for any k≥ 3 by designing a procedure that samples each k-clique incident to a given set S of vertices with approximately equal probability, and it is proved that the query complexity of the algorithm is essentially optimal.

A Fast and Provable Method for Estimating Clique Counts Using Turán's Theorem

- Computer ScienceWWW
- 2017

A new randomized algorithm is presented that provably approximates the number of k-cliques, for any constant k, using the classic Turán's theorem, and it is found that TURÀN-SHADOW is generally much faster and more accurate than a range of other sampling algorithms.

Provably and Efficiently Approximating Near-cliques using the Turán Shadow: PEANUTS

- Computer ScienceWWW
- 2020

A space efficient adaptation of the Turán Shadow sampling approach, recently introduced by Jain and Seshadhri, that constructs a large recursion tree that represents cliques in a graph, and designs a novel algorithm that builds an estimator for near-cliques, using a online, compact construction of theTurán Shadow.

Triangle counting in large networks: a review

- Computer ScienceWiley Interdiscip. Rev. Data Min. Knowl. Discov.
- 2018

The existing methods of triangle counting, ranging from sequential to parallel, single‐machine to distributed, exact to approximate, and off‐line to streaming are discussed, and experimental results of performance comparison among a set of approximate triangle counting methods are presented.

How the Degeneracy Helps for Triangle Counting in Graph Streams

- Computer Science, PhysicsPODS
- 2020

A streaming algorithm is given that can circumvent the lower bound for low degeneracy graphs, ~O(mK/T), which is significantly smaller than both m3/2 /T and m/√T.

D S ] 1 1 N ov 2 01 8 Faster sublinear approximations of k-cliques for low arboricity graphs Talya Eden

- 2018

Given query access to an undirected graph G, we consider the problem of computing a (1 ± ε)-approximation of the number of k-cliques in G. The standard query model for general graphs allows for…

## References

SHOWING 1-10 OF 59 REFERENCES

Counting triangles and the curse of the last reducer

- Computer ScienceWWW
- 2011

This work describes a sequential triangle counting algorithm and shows how to adapt it to the MapReduce setting, and presents a new algorithm designed specifically for the Map Reduce framework that achieves a factor of 10-100 speed up over the naive approach.

DOULION: counting triangles in massive graphs with a coin

- Mathematics, Computer ScienceKDD
- 2009

A practical method, out of which all triangle counting algorithms can potentially benefit, is proposed, which works with high accuracy, typically more than 99% and gives significant speedups, resulting in even ≈ 130 times faster performance.

Estimating Clustering Indexes in Data Streams

- Mathematics, Computer ScienceESA
- 2007

Random sampling algorithms that with probability at least 1 - δ compute a (1 ± Ɛ)-approximation of the clustering coefficient and of the number of bipartite clique subgraphs of a graph given as an incidence stream of edges provide a basic tool to analyze the structure of dense clusters in large graphs.

Counting Arbitrary Subgraphs in Data Streams

- Mathematics, Computer ScienceICALP
- 2012

This work provides the first non-trivial estimator for approximately counting the number of occurrences of an arbitrary subgraph H of constant size in a (large) graph G and demonstrates the applicability of the estimator by analyzing its concentration for several graphs H and the case where G is a power law graph.

Colorful triangle counting and a MapReduce implementation

- Mathematics, Computer ScienceInf. Process. Lett.
- 2012

A new randomized algorithm for counting triangles in graphs that colors the vertices of G with N=1/p colors uniformly at random, counts monochromatic triangles, i.e., triangles whose vertices have the same color, and scales that count appropriately.

MapReduce Triangle Enumeration With Guarantees

- Computer ScienceCIKM
- 2014

This work is the first to give guarantees on the maximum load of each reducer for an arbitrary input graph, and is competitive with existing methods improving the performance by a factor up to 2X, and can significantly increase the size of datasets that can be processed.

PATRIC: a parallel algorithm for counting triangles in massive networks

- Computer ScienceCIKM
- 2013

An efficient MPI-based distributed memory parallel algorithm, called PATRIC, for counting triangles in massive networks, which scales well to networks with billions of nodes and can compute the exact number of triangles in a network with one billion nodes and 10 billion edges in 16 minutes.

Mining Large Networks with Subgraph Counting

- Computer Science2008 Eighth IEEE International Conference on Data Mining
- 2008

Data-stream algorithms that approximate the number of all subgraphs of three and four vertices in directed and undirected networks are developed and achieve very good precision in clustering networks with similar structure.

Counting triangles in data streams

- Mathematics, Computer SciencePODS '06
- 2006

Two space bounded random sampling algorithms that compute an approximation of the number of triangles in an undirected graph given as a stream of edges are presented and they provide a basic tool to analyze the structure of large graphs.

Enumerating subgraph instances using map-reduce

- Computer Science2013 IEEE 29th International Conference on Data Engineering (ICDE)
- 2013

This paper exploits the techniques of [1] for computing multiway joins (evaluating conjunctive queries) in a single map-reduce round for the simplest sample graph, the triangle, and addresses the matter of optimizing computation cost.