Scaling-up reasoning and advanced analytics on BigData

  title={Scaling-up reasoning and advanced analytics on BigData},
  author={Tyson Condie and Ariyam Das and Matteo Interlandi and Alexander Shkapsky and Mohan Yang and Carlo Zaniolo},
  journal={Theory and Practice of Logic Programming},
  pages={806 - 845}
Abstract BigDatalog is an extension of Datalog that achieves performance and scalability on both Apache Spark and multicore systems to the point that its graph analytics outperform those written in GraphX. Looking back, we see how this realizes the ambitious goal pursued by deductive database researchers beginning 40 years ago: this is the goal of combining the rigor and power of logic in expressing queries and reasoning with the performance and scalability by which relational databases managed… 
Demonstration of LogicLib: An Expressive Multi-Language Interface over Scalable Datalog System
Logic Library is developed, a library of recursive algorithms written in Datalog that can be executed in BigDatalog, a Datalogs engine on top of Apache Spark developed by us, which encapsulates complex logic-based algorithms into high-level APIs, which simplify the development and provide a unified interface akin to the one of Spark MLlib.
BigData Applications from Graph Analytics to Machine Learning by Aggregates in Recursion
This paper shows that with PreM, a wide spectrum of classical algorithms of practical interest can be concisely expressed in declarative languages by using aggregates in recursion, enabling their execution with superior performance and scalability.
Formal semantics and high performance in declarative machine learning using Datalog
It is shown that using aggregates in recursive Datalog programs entails a concise expression of ML applications, while providing a strictly declarative formal semantics, by introducing simple conditions under which the semantics of recursive programs is guaranteed to be equivalent to that of aggregate-stratified ones.
RaSQL: Greater Power and Performance for Big Data Analytics with Recursive-aggregate-SQL on Spark
The RaSQL system, which extends Spark SQL with the before-mentioned new constructs and implementation techniques, matches and often surpasses the performance of other systems, including Apache Giraph, GraphX and Myria.
A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation
It is proved that reM-optimized lock-free and decomposable parallel semi-naive evaluations produce the same results as the single executor programs, and that reM can be assimilated into the data-parallel computation plans of different distributed systems, irrespective of whether these follow bulk synchronous parallel (BSP) or asynchronous computing models.
Rethinking Defeasible Reasoning: A Scalable Approach
This work designs a new logic for defeasible reasoning, thus ensuring scalability by design and establishes several properties of the logic, including its relation to existing defeasibility logics.
Parallel Logic Programming: A Sequel
This survey provides a review of the research in parallel logic programming covering the period since 2001 and restricts its attention to parallelization of the major logic programming languages (Prolog, Datalog, Answer Set Programming) and with an emphasis on automated parallelization and preservation of the sequential observable semantics of such languages.
Defeasible Reasoning via Datalog$^\neg$
This work addresses the problem of compiling defeasible theories to Datalog programs and identifies structural properties of DL(∂||) that support efficient implementation and/or approximation of the conclusions of defeasable theories in the logic, compared with other defeasibility logics.
Research on Collaborative Application of Power Big Data and External Data
The industry has laid the foundation for the era of big data, in order to better promote the level of intelligence and automation of power regulation, and the core technology of power big data is comprehensively analyzed.


Declarative Languages and Scalable Systems for Graph Analytics and Knowledge Discovery
A positive answer to the question whether it is possible to design an efficient query language that simplifies the writing of advanced analytical applications and provides a unified environment for their development and deployment on multiple platforms is provided by demonstrating extensions of the logic-based query language Datalog.
Efficient Computation of the Well-Founded Semantics over Big Data
This work proposes and evaluates a parallel approach using the MapReduce framework and is believed to be the first work that addresses large scale nonmonotonic reasoning without the restriction of stratification for predicates of arbitrary arity.
WebPIE: A Web-scale Parallel Inference Engine using MapReduce
Large-scale Parallel Stratified Defeasible Reasoning
This paper considers inconsistency-tolerant reasoning in the form of defeasible logic, and analyzes how parallelization, using the MapReduce framework, can be used to reason with defeasibility rules over huge datasets.
Big Data Analytics with Datalog Queries on Spark
This work proposes compilation and optimization techniques that tackle the important problem of efficiently supporting recursion in Spark and performs an experimental comparison with other state-of-the-art large-scale Datalog systems to verify the efficacy of these techniques and effectiveness of Spark in supporting Datalogs-based analytics.
Declarative BigData Algorithms via Aggregates and Relational Database Dependencies
These templates are based on simple extensions of Functional and Multivalued Dependencies whereby properties such as the mixed transitivity of MVDs and FDs are used to prove the validity of these powerful declarative algorithms.
Scaling Datalog for Machine Learning on Big Data
This paper argues for the use of recursive queries to program a variety of machine learning systems using database query optimization techniques to identify effective execution plans, and the resulting runtime plans can be executed on a single unified data-parallel query processing engine.
Scaling up the performance of more powerful Datalog systems on multicore machines
Extending RDBMS technology to achieve performance and scalability for queries that are much more powerful than those of SQL-2 has been the goal of deductive database research for more than thirty
Parallel Bottom-Up Evaluation of Logic Programs: DeALS on Shared-Memory Multicore Machines
A technique is described which finds an ecient hash partitioning strategy of the tables that minimizes the use of locks during the evaluation of logic programs in DeALS, which achieves competitive performance on non-recursive programs compared with commercial RDBMSs and superior performance on recursive Programs compared with other existing systems.
Extending the power of datalog recursion
Datalog enables the efficient formulation of queries that could not be expressed efficiently or could not been expressed at all in Datalog with stratified negation and aggregates, and shows that diffusion models and page rank computations can be easily expressed and efficiently implemented using Datalogs.