Counting Butterfies from a Large Bipartite Graph Stream

@article{SaneiMehri2018CountingBF,
  title={Counting Butterfies from a Large Bipartite Graph Stream},
  author={Seyed-Vahid Sanei-Mehri and Y. Zhang and Ahmet Erdem Sariy{\"u}ce and Srikanta Tirthapura},
  journal={ArXiv},
  year={2018},
  volume={abs/1812.03398}
}
We consider the estimation of properties on massive bipartite graph streams, where each edge represents a connection between entities in two different partitions. We present sublinear-space one-pass algorithms for accurately estimating the number of butterflies in the graph stream. Our estimates have provable guarantees on their quality, and experiments show promising tradeoffs between space and accuracy. We also present extensions to sliding windows. While there are many works on counting… Expand

References

SHOWING 1-10 OF 59 REFERENCES
Butterfly Counting in Bipartite Networks
TLDR
This work focuses on counting the number of occurrences of a "butterfly", a complete 2x2 biclique, the simplest cohesive higher-order structure in a bipartite graph, and develops a suite of randomized algorithms that can quickly approximate the numberof butterflies in a graph with a provable guarantee on accuracy. Expand
Counting Arbitrary Subgraphs in Data Streams
TLDR
This work provides the first non-trivial estimator for approximately counting the number of occurrences of an arbitrary subgraph H of constant size in a (large) graph G and demonstrates the applicability of the estimator by analyzing its concentration for several graphs H and the case where G is a power law graph. Expand
Approximate Counting of Cycles in Streams
TLDR
This work considers the subgraph counting problem in data streams and develops the first non-trivial algorithm for approximately counting cycles of an arbitrary but fixed size and improves drastically upon the naive sampling algorithm. Expand
Parallel triangle counting in massive streaming graphs
TLDR
This paper presents the design and implementation of a fast parallel algorithm for estimating the number of triangles in a massive undirected graph whose edges arrive as a stream, designed for shared-memory multicore machines and can make efficient use of parallelism and the memory hierarchy. Expand
Triangle Counting in Dynamic Graph Streams
TLDR
A new algorithm estimating the number of triangles in dynamic graph streams where edges can be both inserted and deleted is presented, and it is shown that this algorithm achieves better time and space complexity than previous solutions for various graph classes, for example sparse graphs with a relatively smallNumber of triangles. Expand
Counting and Sampling Triangles from a Graph Stream
This paper presents a new space-efficient algorithm for counting and sampling triangles--and more generally, constant-sized cliques--in a massive graph whose edges arrive as a stream. Compared toExpand
Rectangle Counting in Large Bipartite Graphs
TLDR
Three different types of algorithms are proposed to cope with different data volumes and the availability of computing resources for rectangle counting, which has many important applications where data is modeled as bipartite graphs. Expand
Parallel Algorithms for Butterfly Computations
TLDR
A framework called ParButterfly is designed that contains new parallel algorithms for the following problems on processing butterflies: global counting, per-vertex counting,per-edge counting, tip decomposition (vertex peeling, and wing decomposition), and edge peeling. Expand
ESCAPE: Efficiently Counting All 5-Vertex Subgraphs
TLDR
It is proved that it suffices to enumerate only four specific subgraphs (three of them have less than 5 vertices) to exactly count all 5-vertex patterns, the first practical algorithm for 5- Vertex pattern counting that runs at this scale and is able to compute counts of graphs with tens of millions of edges in minutes on a commodity machine. Expand
Efficient Graphlet Counting for Large Networks
TLDR
This paper proposes a fast, efficient, and parallel algorithm for counting graphlets of size k={3,4}-nodes that take only a fraction of the time to compute when compared with the current methods used, and is on average 460x faster than current methods. Expand
...
1
2
3
4
5
...