Formal semantics and high performance in declarative machine learning using Datalog

  title={Formal semantics and high performance in declarative machine learning using Datalog},
  author={Jin Wang and Jiacheng Wu and Mingda Li and Jiaqi Gu and Ariyam Das and Carlo Zaniolo},
  journal={VLDB J.},
With an escalating arms race to adopt machine learning (ML) in diverse application domains, there is an urgent need to support declarative machine learning over distributed data platforms. Toward this goal, a new framework is needed where users can specify ML tasks in a manner where programming is decoupled from the underlying algorithmic and system concerns. In this paper, we argue that declarative abstractions based on Datalog are natural fits for machine learning and propose a purely… 
Developing Big-Data Application as Queries: an Aggregate-Based approach
This paper discusses how classical algorithms can be expressed concisely using queries with aggregates in recursion that have a rigorous declarative semantics, and what modifications are needed on such programs to have an efficient and scalable fixpoint-based operational semantics.
Optimizing Parallel Recursive Datalog Evaluation on Multicore Machines
DCDatalog is presented, an in-memory Datalog engine specifically designed for modern shared-memory multicore architectures and proposes a dynamic scheduling strategy that can generate the parallel execution plan on-the-fly while reducing concurrent accesses to the shared memory.
Data Sensitivity and Classification Management: A Declarative Approach
An approach is proposed to express and compute data sensitivity and multidimensional data classification in fine granularity based on a declarative logic programming language, which is able to separate security requirement definitions and deduction from implementation details.
Recent Progress in Conversational AI
A brief review of the recent progress in the Conversational AI, including the commonly adopted techniques, notable works, famous competitions from academia and industry and widely used datasets.
Demonstration of LogicLib: An Expressive Multi-Language Interface over Scalable Datalog System
Logic Library is developed, a library of recursive algorithms written in Datalog that can be executed in BigDatalog, a Datalogs engine on top of Apache Spark developed by us, which encapsulates complex logic-based algorithms into high-level APIs, which simplify the development and provide a unified interface akin to the one of Spark MLlib.


BigData Applications from Graph Analytics to Machine Learning by Aggregates in Recursion
This paper shows that with PreM, a wide spectrum of classical algorithms of practical interest can be concisely expressed in declarative languages by using aggregates in recursion, enabling their execution with superior performance and scalability.
Scaling Datalog for Machine Learning on Big Data
This paper argues for the use of recursive queries to program a variety of machine learning systems using database query optimization techniques to identify effective execution plans, and the resulting runtime plans can be executed on a single unified data-parallel query processing engine.
Optimizing recursive queries with monotonic aggregates in DeALS
This paper describes how DeALS extends their definitions and modifies their syntax to enable a concise expression of applications that, without them, could not be expressed in performance-conducive ways, or could not been expressed at all, and introduces novel implementation and optimization techniques that outperform traditional approaches, including Semi-naive evaluation.
SystemML: Declarative Machine Learning on Spark
This paper describes SystemML on Apache Spark, end to end, including insights into various optimizer and runtime techniques as well as performance characteristics.
A Case for Stale Synchronous Distributed Model for Declarative Recursive Computation
It is proved that reM-optimized lock-free and decomposable parallel semi-naive evaluations produce the same results as the single executor programs, and that reM can be assimilated into the data-parallel computation plans of different distributed systems, irrespective of whether these follow bulk synchronous parallel (BSP) or asynchronous computing models.
Declarative BigData Algorithms via Aggregates and Relational Database Dependencies
These templates are based on simple extensions of Functional and Multivalued Dependencies whereby properties such as the mixed transitivity of MVDs and FDs are used to prove the validity of these powerful declarative algorithms.
RASQL: A Powerful Language and its System for Big Data Applications
This paper proposes the Recursive-aggregate-SQL (RASQL) language and its system on top of Apache Spark to express and execute complex queries and declarative algorithms in many applications, such as graph search and machine learning.
KDDLog:Performance and Scalability in Knowledge Discovery by Declarative Queries with Aggregates
KDDLog is introduced, a scalable framework which leverages recursive queries with aggregates and the authors' newly-proposed chain aggregates to enable users to build or customize knowledge discovery models with concise and expressive queries, and proposes specialized compilation techniques for semi-naive fix-point computation in the presence of aggregates.
MLog: Towards Declarative In-Database Machine Learning
This paper demonstrates how query/program optimization techniques can be leveraged to translate MLog programs into native TensorFlow programs, and shows how the performance of the automatically generated Tensor-Flow programs is comparable to that of hand-optimized ones.
Asynchronous and Fault-Tolerant Recursive Datalog Evaluation in Shared-Nothing Engines
It is found that no single method outperforms others but rather that application properties must drive the selection of the iterative query execution model.