A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling

@inproceedings{Assadi2019ASS,
  title={A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling},
  author={Sepehr Assadi and Mikhail Kapralov and Sanjeev Khanna},
  booktitle={ITCS},
  year={2019}
}
In the subgraph counting problem, we are given a input graph $G(V, E)$ and a target graph $H$; the goal is to estimate the number of occurrences of $H$ in $G$. Our focus here is on designing sublinear-time algorithms for approximately counting occurrences of $H$ in $G$ in the setting where the algorithm is given query access to $G$. This problem has been studied in several recent papers which primarily focused on specific families of graphs $H$ such as triangles, cliques, and stars. However… 
Linear Time Subgraph Counting, Graph Degeneracy, and the Chasm at Size Six
TLDR
It is proved that for all $k < 6, sub-cnt$_k$ cannot be solved even in near-linear time, which is a standard conjecture in fine-grained complexity.
The Arboricity Captures the Complexity of Sampling Edges
TLDR
The upper bound is tight (up to poly-logarithmic factors and the dependence in $\varepsilon$), as Omega queries are necessary for the easier task of sampling edges from any distribution over $E$ that is close to uniform in total variational distance.
Amortized Edge Sampling
TLDR
An algorithm is presented that allows one to sample multiple edges from a distribution that is pointwise $\epsilon$-close to the uniform distribution, in an \emph{amortized-efficient} fashion, and which, if one knows the number of required samples $q$ in advance, has an overall cost of O^*(\sqrt q \cdot(n/\sqrt m)), which is strictly preferable.
A Lower Bound on Cycle-Finding in Sparse Digraphs
TLDR
An information-theoretic lower bound is proved, showing that for N-vertex graphs with constant outdegree any algorithm for this problem must make $\tilde{\Omega}(N^{5/9})$ queries to an adjacency list representation of $G$.
Sampling connected subgraphs: nearly-optimal mixing time bounds, nearly-optimal $\epsilon$-uniform sampling, and perfect uniform sampling
We study the connected subgraph sampling problem: given an integer $k \ge 3$ and a simple graph $G=(V,E)$, sample a connected induced $k$-node subgraph of $G$ (also called $k$-graphlet) uniformly at
Near-Linear Time Homomorphism Counting in Bounded Degeneracy Graphs: The Barrier of Long Induced Cycles
TLDR
The following are proved: if the largest induced cycle in H has length at most $5, then there is an O(m\log m) algorithm for counting H-homomorphisms in bounded degeneracy graphs, and there is a constant $\gamma > 0, such that there is no o(m^{1+\gamma})$ time algorithm.
Towards a Decomposition-Optimal Algorithm for Counting and Sampling Arbitrary Motifs in Sublinear Time
TLDR
This work presents a new algorithm for sampling and approximately counting arbitrary motifs which, up to poly(logn) factors, is always at least as good as previous results, and for most graphs G is strictly better.
Nearly optimal edge estimation with independent set queries
TLDR
This work studies the problem of estimating the number of edges of an unknown, undirected graph G=([n],E) with access to an independent set oracle and shows that the algorithm is optimal up to a factor of $\textrm{poly}(\log n, 1/\epsilon)$.
Efficient and near-optimal algorithms for sampling connected subgraphs
TLDR
A near-optimal mixing time bound for the classic k-graphlet random walk, as a function of the mixing time of G, and the first efficient algorithm for uniform graphlet sampling.
Almost Optimal Bounds for Sublinear-Time Sampling of k-Cliques: Sampling Cliques is Harder Than Counting
TLDR
The lower bound follows from a construction of a family of graphs with arboricity $\alpha$ such that in each graph there are $n_k$ cliques, where one of these cliques is "hidden" and hence hard to sample.
...
1
2
3
4
...

References

SHOWING 1-10 OF 57 REFERENCES
Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling
TLDR
With the ability to sample edges randomly, it is shown that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work.
Arboricity and Subgraph Listing Algorithms
TLDR
A new simple strategy into edge-searching of a graph, which is useful to the various subgraph listing problems, is introduced, and an upper bound on $a(G)$ is established for a graph $G:a (G) \leqq \lceil (2m + n)^{1/2} \rceil $, where n is the number of vertices in G.
On Sampling Edges Almost Uniformly
TLDR
The algorithm is optimal in the sense that any algorithm that samples an edge from an almost-uniform distribution must perform $\Omega(n / \sqrt{m})$ queries.
Sublinear Time Estimation of Degree Distribution Moments: The Degeneracy Connection
TLDR
This work revisits the classic problem of estimating the degree distribution moments of an undirected graph and designs a new, significantly simpler algorithm that exactly matches the bounds of Gonen-Ron-Shavitt, and has a much simpler proof.
Counting stars and other small subgraphs in sublinear time
Detecting and counting the number of copies of certain subgraphs (also known as <i>network motifs</i> or <i>graphlets</i>), is motivated by applications in a variety of areas ranging from Biology to
Approximating the Minimum Spanning Tree Weight in Sublinear Time
TLDR
It is shown how estimates on the number of components in various subgraphs of G can be used to estimate the weight of its MST, and a nearly matching lower bound of $\Omega( dw \varepsilon^{-2} )$ is proved on the probe and time complexity of any approximation algorithm for MST weight.
Approximately Counting Triangles in Sublinear Time
  • T. Eden, Amit Levi, D. Ron
  • Mathematics, Computer Science
    2015 IEEE 56th Annual Symposium on Foundations of Computer Science
  • 2015
TLDR
This work designs a sublinear-time algorithm for approximating the number of triangles in a graph, where the algorithm is given query access to the graph, and proves that Omega(n/t̂{1/3} + min {m, m̂3/2}/t}) queries are necessary, thus establishing that the query complexity of this algorithm is optimal up to polylogarithmic factors in n.
The Sketching Complexity of Graph and Hypergraph Counting
TLDR
A tight bound is given on the sketching complexity of counting the number of occurrences of a small subgraph H in a bounded degree graph G presented as a stream of edge updates, and it is shown that the space complexity of the problem is governed by the fractional vertex cover number of the graph H.
A near-optimal sublinear-time algorithm for approximating the minimum vertex cover size
TLDR
An algorithm is given that outputs a (2, e)-estimate of the size of a minimum vertex cover whose query complexity and running time are O(n) · poly(1/e) and the result is nearly optimal.
On Approximating the Minimum Vertex Cover in Sublinear Time and the Connection to Distributed Algorithms
TLDR
The lower bound holds for every multiplicative factor @a and small constant @e as long as d@?=O(n/@a) means that for dense graphs it is not possible to design an algorithm whose complexity is o(n).
...
1
2
3
4
5
...