Efficiently Counting Vertex Orbits of All 5-vertex Subgraphs, by EVOKE

@article{Pashanasangi2020EfficientlyCV,
  title={Efficiently Counting Vertex Orbits of All 5-vertex Subgraphs, by EVOKE},
  author={Noujan Pashanasangi and C. Seshadhri},
  journal={Proceedings of the 13th International Conference on Web Search and Data Mining},
  year={2020}
}
Subgraph counting is a fundamental task in network analysis. Typically, algorithmic work is on total counting, where we wish to count the total frequency of a (small) pattern subgraph in a large input data set. But many applications require local counts (also called vertex orbit counts) wherein, for every vertex v of the input graph, one needs the count of the pattern subgraph involving v. This provides a rich set of vertex features that can be used in machine learning tasks, especially… 
Distributed subgraph counting
TLDR
This work develops a new general approach to count any k pattern graphs with any orbits selected by homomorphism counting, which can be solved by relational algebra using joins, group-by and aggregation.
Distributed Subgraph Counting: A General Approach
TLDR
This work develops a new general approach to count any k pattern graphs with any orbits selected by homomorphism counting, which can be solved by relational algebra using joins, group-by and aggregation.
BFS based distributed algorithm for parallel local directed sub-graph enumeration
TLDR
VDMC (Vertex specific Distributed Motif Counting) a fully distributed algorithm to optimally count all the 3 and 4 vertices connected directed graphs (sub-graph motifs) associated with each vertex of a graph is proposed and its efficacy is linear in the number of counted motifs.
Lightning Fast and Space Efficient k-clique Counting
TLDR
This work develops two novel dynamic programming based k-color set sampling techniques to efficiently estimate the k-clique counts, where a k- Color set contains k nodes with k different colors, which are extremely efficient and accurate.
Faster and Generalized Temporal Triangle Counting, via Degeneracy Ordering
TLDR
A new algorithm, DOTTT (Degeneracy Oriented Temporal Triangle Totaler), that exactly counts all directed variants of (δ1,3, δ2,3), and it is proved that DOTTT runs in O(mκłog m) time, where m is the number of (temporal) edges and κ is the graph degeneracy (max core number).
MaNIACS: Approximate Mining of Frequent Subgraph Patterns through Sampling
TLDR
MaNIACS leverages properties of the MNI-frequency to aggressively prune the pattern search space, and thus to reduce the time spent in exploring subspaces containing no frequent patterns, and returns high-quality collections of frequent patterns in large graphs up to two orders of magnitude faster than the exact algorithm.
odeN: Simultaneous Approximation of Multiple Motif Counts in Large Temporal Networks
TLDR
OdeN, a sampling-based algorithm that provides an accurate approximation of all the counts of the motifs in temporal networks in a fraction of the time needed by state-of-the-art methods, and that it also reports more accurate approximations than such methods.
Counting five-node subgraphs
TLDR
This work proves the main result that induced subgraph counts follow as linear combinations of non-induced counts, using short and purely combinatorial arguments that can be adapted to derive count formulae for larger subgraphs.
Near-Linear Time Homomorphism Counting in Bounded Degeneracy Graphs: The Barrier of Long Induced Cycles
TLDR
The following are proved: if the largest induced cycle in H has length at most $5, then there is an O(m\log m) algorithm for counting H-homomorphisms in bounded degeneracy graphs, and there is a constant $\gamma > 0, such that there is no o(m^{1+\gamma})$ time algorithm.
Motif-based spectral clustering of weighted directed networks
TLDR
It is concluded that motif-based spectral clustering is a valuable tool for analysis of directed and bipartite weighted networks, which is also scalable and easy to implement.
...
...

References

SHOWING 1-10 OF 72 REFERENCES
ESCAPE: Efficiently Counting All 5-Vertex Subgraphs
TLDR
It is proved that it suffices to enumerate only four specific subgraphs (three of them have less than 5 vertices) to exactly count all 5-vertex patterns, the first practical algorithm for 5- Vertex pattern counting that runs at this scale and is able to compute counts of graphs with tens of millions of edges in minutes on a commodity machine.
Path Sampling: A Fast and Provable Method for Estimating 4-Vertex Subgraph Counts
TLDR
A sampling algorithm that provably and accurately approximates the frequencies of all 4-vertex pattern subgraphs is provided, based on a novel technique of 3-path sampling and a special pruning scheme to decrease the variance in estimates.
Scalable Subgraph Counting: The Methods Behind The Madness
TLDR
This tutorial will also cover methods for subgraph analysis on “big data” computational models such as the streaming model and models of parallel and distributed computation.
Efficient semi-streaming algorithms for local triangle counting in massive graphs
TLDR
This is the first paper that addresses the problem of local triangle counting with a focus on the efficiency issues arising in massive graphs and proposes two approximation algorithms, which are based on the idea of min-wise independent permutations.
An Algorithm to Automatically Generate the Combinatorial Orbit Counting Equations
TLDR
Two new techniques are presented that allow to generate the equations needed to count graphlets with 4, 5 and 6 vertices in an automatic way and can be used to count larger graphlets than previously possible.
Efficient Graphlet Counting for Large Networks
TLDR
This paper proposes a fast, efficient, and parallel algorithm for counting graphlets of size k={3,4}-nodes that take only a fraction of the time to compute when compared with the current methods used, and is on average 460x faster than current methods.
DOULION: counting triangles in massive graphs with a coin
TLDR
A practical method, out of which all triangle counting algorithms can potentially benefit, is proposed, which works with high accuracy, typically more than 99% and gives significant speedups, resulting in even ≈ 130 times faster performance.
Efficient Triangle Counting in Large Graphs via Degree-Based Vertex Partitioning
TLDR
This paper presents an efficient triangle-counting approximation algorithm that can be adapted to the semistreaming model with space usage and a constant number of passes over the graph stream, and applies its methods to various networks with several millions of edges and gets excellent results.
MOSS-5: A Fast Method of Approximating Counts of 5-Node Graphlets in Large Graphs
TLDR
A fast sampling method and unbiased estimators of graphlet counts, but also derive simple yet exact formulas for the variances of the estimators which are of great value in practice—the variances can be used to bound the estimates’ errors and determine the smallest necessary sampling budget for a desired accuracy.
SAHAD: Subgraph Analysis in Massive Networks Using Hadoop
TLDR
SAHAD is the first such Hadoop based subgraph/subtree analysis algorithm, and performs significantly better than prior approaches for very large graphs and templates, and is also amenable to running quite easily on Amazon EC2, without needs for any system level optimization.
...
...