I/O efficient bisimulation partitioning on very large directed acyclic graphs

  title={I/O efficient bisimulation partitioning on very large directed acyclic graphs},
  author={Jelle Hellings and G. Fletcher and Herman J. Haverkort},
In this paper we introduce the first efficient external-memory algorithm to compute the bisimilarity equivalence classes of a directed acyclic graph (DAG). DAGs are commonly used to model data in a wide variety of practical applications, ranging from XML documents and data provenance models, to web taxonomies and scientific workflows. In the study of efficient reasoning over massive graphs, the notion of node bisimilarity plays a central role. For example, grouping together bisimilar nodes in… 
8 Citations

Figures from this paper

Incremental Maintenance of the Minimum Bisimulation of Cyclic Graphs

This paper proposes a novel hybrid algorithm that is the first maintenance algorithm that guarantees minimum bisimulation of cyclic graphs and presents an experimental study on both synthetic and real-data graphs that verified the efficiency and effectiveness of the algorithms.

The Graph Signature: A Scalable Query Optimization Index for RDF Graph Databases Using Bisimulation and Trace Equivalence Summarization

The authors propose the Graph Signature Index, a novel and scalable approach to index and query large data graphs that summarize a graph and instead of executing the query on the original graph, they execute it on the summaries.

I/O-Efficient Algorithms on Triangle Listing and Counting

A new algorithm is developed that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all, and outperforms the existing competitors by a factor of over an order of magnitude in the authors' extensive experimentation.

I/O efficient: computing SCCs in massive graphs

A weak order is explored based on which a new single-phase algorithm, which combines the tree construction and tree search phases into a single phase, with three new optimization techniques are proposed, which can significantly reduce the number of I/Os and the CPU cost.

Massive graph triangulation

A new algorithm is developed that is provably I/O and CPU efficient at the same time, without making any assumption on the input G at all, and outperformed the existing competitors by a factor over an order of magnitude in extensive experimentation.

Context-Free Path Queries on RDF Graphs

This paper presents cfSPARQL, an extension of SPARQL query language equipped with context-free grammars, which is strictly more expressive than property paths and nested expressions and can be used for modelling graph similarities, graph summarization and ontology alignment.

sGrapp: Butterfly Approximation in Streaming Graphs

An empirical analysis is conducted to uncover temporal organizing principles of butterflies in real streaming graphs and then an approximate adaptive window-based algorithm is introduced, sGrapp, for counting butterflies as well as its optimized version s Grapp-x, designed to operate efficiently and effectively over any graph stream with any temporal behavior.

Indexing for Graph Query Evaluation

The evaluation of queries on graph databases is often facilitated by index data structures, which can be the primary representation of the graph or be a secondary access path to elements of thegraph.



Optimizing Incremental Maintenance of Minimal Bisimulation of Cyclic Graphs

This paper proposes a maintenance algorithm for a minimal bisimulation of a cyclic graph, in the style of merging, and proposes a feature-based optimization to prune the computation on non-bisimilar SCCs.

Querying DAG-shaped Execution Traces Through Views

It is shown that the particular DAG shape of BP execution traces makes the problem easier than for general graphs, yet harder than for XML trees.

Engineering a Topological Sorting Algorithm for Massive Graphs

In an I/O-efficient algorithm for topologically sorting directed acyclic graphs (DAGs), IterTS consistently outperformed PeelTS and ReachTS, by at least an order of magnitude in most cases.

Linear Computation of the Maximum Simultaneous Forward and Backward Bisimulation for Node-Labeled Trees

It is proved that the result equals the maximum F&B-bisimulation, which is used to speed up pattern matching in tree and graph data.

An efficient algorithm for computing bisimulation equivalence

External-memory graph algorithms

We present a collection of new techniques for designing and analyzing e cient external-memory algorithms for graph problems and illustrate how these techniques can be applied to a wide variety of

Index Structures for Path Expressions

In recent years there has been an increased interest in managing data that does not conform to traditional data models, like the relational or object oriented model, and the term semistructured data has been used to refer to such data.

Covering indexes for branching path queries

In this paper, we ask if the traditional relational query acceleration techniques of summary tables and covering indexes have analogs for branching path expression queries over tree- or

On sorting strings in external memory (extended abstract)

This paper addresses for the first time the I/O complexity of the problem of sorting strings in external memory, which is a fundamental component of many large-scale text applications and shows, somewhat counterintuitively, that the length of the strings relative to the block size depends upon the size of the memory.