# A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling

@inproceedings{Assadi2019ASS, title={A Simple Sublinear-Time Algorithm for Counting Arbitrary Subgraphs via Edge Sampling}, author={Sepehr Assadi and Mikhail Kapralov and Sanjeev Khanna}, booktitle={ITCS}, year={2019} }

In the subgraph counting problem, we are given a input graph $G(V, E)$ and a target graph $H$; the goal is to estimate the number of occurrences of $H$ in $G$. Our focus here is on designing sublinear-time algorithms for approximately counting occurrences of $H$ in $G$ in the setting where the algorithm is given query access to $G$. This problem has been studied in several recent papers which primarily focused on specific families of graphs $H$ such as triangles, cliques, and stars. Howeverâ€¦Â

## Figures and Topics from this paper

## 33 Citations

Linear Time Subgraph Counting, Graph Degeneracy, and the Chasm at Size Six

- Computer Science, MathematicsITCS
- 2020

It is proved that for all $k < 6, sub-cnt$_k$ cannot be solved even in near-linear time, which is a standard conjecture in fine-grained complexity.

The Arboricity Captures the Complexity of Sampling Edges

- Mathematics, Computer ScienceICALP
- 2019

The upper bound is tight (up to poly-logarithmic factors and the dependence in $\varepsilon$), as Omega queries are necessary for the easier task of sampling edges from any distribution over $E$ that is close to uniform in total variational distance.

Amortized Edge Sampling

- Mathematics, Computer ScienceArXiv
- 2020

An algorithm is presented that allows one to sample multiple edges from a distribution that is pointwise $\epsilon$-close to the uniform distribution, in an \emph{amortized-efficient} fashion, and which, if one knows the number of required samples $q$ in advance, has an overall cost of O^*(\sqrt q \cdot(n/\sqrt m)), which is strictly preferable.

A Lower Bound on Cycle-Finding in Sparse Digraphs

- Computer Science, MathematicsSODA
- 2020

An information-theoretic lower bound is proved, showing that for N-vertex graphs with constant outdegree any algorithm for this problem must make $\tilde{\Omega}(N^{5/9})$ queries to an adjacency list representation of $G$.

Sampling connected subgraphs: nearly-optimal mixing time bounds, nearly-optimal $\epsilon$-uniform sampling, and perfect uniform sampling

- Physics
- 2020

We study the connected subgraph sampling problem: given an integer $k \ge 3$ and a simple graph $G=(V,E)$, sample a connected induced $k$-node subgraph of $G$ (also called $k$-graphlet) uniformly atâ€¦

Near-Linear Time Homomorphism Counting in Bounded Degeneracy Graphs: The Barrier of Long Induced Cycles

- Computer Science, MathematicsSODA
- 2021

The following are proved: if the largest induced cycle in H has length at most $5, then there is an O(m\log m) algorithm for counting H-homomorphisms in bounded degeneracy graphs, and there is a constant $\gamma > 0, such that there is no o(m^{1+\gamma})$ time algorithm.

Towards a Decomposition-Optimal Algorithm for Counting and Sampling Arbitrary Motifs in Sublinear Time

- Computer ScienceAPPROX-RANDOM
- 2021

This work presents a new algorithm for sampling and approximately counting arbitrary motifs which, up to poly(logn) factors, is always at least as good as previous results, and for most graphs G is strictly better.

Nearly optimal edge estimation with independent set queries

- Computer Science, MathematicsSODA
- 2020

This work studies the problem of estimating the number of edges of an unknown, undirected graph G=([n],E) with access to an independent set oracle and shows that the algorithm is optimal up to a factor of $\textrm{poly}(\log n, 1/\epsilon)$.

Efficient and near-optimal algorithms for sampling connected subgraphs

- Computer ScienceSTOC
- 2021

A near-optimal mixing time bound for the classic k-graphlet random walk, as a function of the mixing time of G, and the first efficient algorithm for uniform graphlet sampling.

Almost Optimal Bounds for Sublinear-Time Sampling of k-Cliques: Sampling Cliques is Harder Than Counting

- Mathematics, Computer ScienceArXiv
- 2020

The lower bound follows from a construction of a family of graphs with arboricity $\alpha$ such that in each graph there are $n_k$ cliques, where one of these cliques is "hidden" and hence hard to sample.

## References

SHOWING 1-10 OF 57 REFERENCES

Sublinear-Time Algorithms for Counting Star Subgraphs via Edge Sampling

- Computer Science, MathematicsAlgorithmica
- 2017

With the ability to sample edges randomly, it is shown that one can achieve faster algorithms for approximating the number of star subgraphs, bypassing the lower bounds in this prior work.

Arboricity and Subgraph Listing Algorithms

- Mathematics, Computer ScienceSIAM J. Comput.
- 1985

A new simple strategy into edge-searching of a graph, which is useful to the various subgraph listing problems, is introduced, and an upper bound on $a(G)$ is established for a graph $G:a (G) \leqq \lceil (2m + n)^{1/2} \rceil $, where n is the number of vertices in G.

On Sampling Edges Almost Uniformly

- Computer Science, MathematicsSOSA
- 2018

The algorithm is optimal in the sense that any algorithm that samples an edge from an almost-uniform distribution must perform $\Omega(n / \sqrt{m})$ queries.

Sublinear Time Estimation of Degree Distribution Moments: The Degeneracy Connection

- Mathematics, Computer ScienceICALP
- 2017

This work revisits the classic problem of estimating the degree distribution moments of an undirected graph and designs a new, significantly simpler algorithm that exactly matches the bounds of Gonen-Ron-Shavitt, and has a much simpler proof.

Counting stars and other small subgraphs in sublinear time

- Computer Science, MathematicsSODA '10
- 2010

Detecting and counting the number of copies of certain subgraphs (also known as <i>network motifs</i> or <i>graphlets</i>), is motivated by applications in a variety of areas ranging from Biology toâ€¦

Approximating the Minimum Spanning Tree Weight in Sublinear Time

- Mathematics, Computer ScienceSIAM J. Comput.
- 2005

It is shown how estimates on the number of components in various subgraphs of G can be used to estimate the weight of its MST, and a nearly matching lower bound of $\Omega( dw \varepsilon^{-2} )$ is proved on the probe and time complexity of any approximation algorithm for MST weight.

Approximately Counting Triangles in Sublinear Time

- Mathematics, Computer Science2015 IEEE 56th Annual Symposium on Foundations of Computer Science
- 2015

This work designs a sublinear-time algorithm for approximating the number of triangles in a graph, where the algorithm is given query access to the graph, and proves that Omega(n/tÌ‚{1/3} + min {m, mÌ‚3/2}/t}) queries are necessary, thus establishing that the query complexity of this algorithm is optimal up to polylogarithmic factors in n.

The Sketching Complexity of Graph and Hypergraph Counting

- Computer Science, Mathematics2018 IEEE 59th Annual Symposium on Foundations of Computer Science (FOCS)
- 2018

A tight bound is given on the sketching complexity of counting the number of occurrences of a small subgraph H in a bounded degree graph G presented as a stream of edge updates, and it is shown that the space complexity of the problem is governed by the fractional vertex cover number of the graph H.

A near-optimal sublinear-time algorithm for approximating the minimum vertex cover size

- Computer Science, MathematicsSODA
- 2012

An algorithm is given that outputs a (2, e)-estimate of the size of a minimum vertex cover whose query complexity and running time are O(n) Â· poly(1/e) and the result is nearly optimal.

On Approximating the Minimum Vertex Cover in Sublinear Time and the Connection to Distributed Algorithms

- Computer Science, MathematicsElectron. Colloquium Comput. Complex.
- 2005

The lower bound holds for every multiplicative factor @a and small constant @e as long as d@?=O(n/@a) means that for dense graphs it is not possible to design an algorithm whose complexity is o(n).