• Corpus ID: 8359892

Incremental Maintenance of Regression Models over Joins

  title={Incremental Maintenance of Regression Models over Joins},
  author={Milos Nikolic and Dan Olteanu},
This paper introduces a principled incremental view maintenance (IVM) mechanism for in-database computation described by rings. We exemplify our approach by introducing the covariance matrix ring that we use for learning linear regression models over arbitrary equi-join queries. Our approach is a higher-order IVM algorithm that exploits the factorized structure of joins and aggregates to avoid redundant computation and improve performance. We implemented it in DBToaster, which uses program… 
1 Citations
Enumeration on Trees under Relabelings
This work reuse the circuit-based enumeration structure from earlier work, and develop techniques to maintain its index under node relabelings, and shows how enumeration under relabeling can be applied to evaluate practical query languages, such as aggregate, group-by, and parameterized queries.


F: Regression Models over Factorized Views
F, a system for building regression models over database views, can outperform the state-of-the-art systems MADlib, R, and Python StatsModels by orders of magnitude on real-world datasets and demonstrates the effective use of F for model selection.
Aggregation and Ordering in Factorised Databases
It is shown how factorisation coupled with partial aggregation can effectively reduce the number of operations needed for query evaluation and how factorisations of query results can support enumeration of tuples in desired orders as efficiently as listing them from the unfactorised, sorted results.
Incremental query evaluation in a ring of databases
The algebraic structure of a ring of databases is constructed and used as the foundation of the design of a query calculus that allows to express powerful aggregate queries and shows that, for non-nested queries, each individual aggregate value can be incrementally maintained using a constant amount of work.
LINVIEW: incremental view maintenance for complex analytical queries
This paper develops a framework, called LINVIEW, for capturing deltas of linear algebra programs and understanding their computational cost, and develops techniques based on matrix factorizations to contain epidemics of change in linear algebra.
DBToaster: higher-order delta processing for dynamic, frequently fresh views
This article presents the DBToaster system, which keeps materialized views of standard SQL queries continuously fresh as data changes very rapidly, and supports tens of thousands of complete view refreshes per second for a wide range of queries.
How to Win a Hot Dog Eating Contest: Distributed Incremental View Maintenance with Batch Updates
This paper identifies the cases in which batch processing can boost the performance of incremental view maintenance but also demonstrates that tuple-at-a-time processing often can achieve better performance in local mode.
Learning Generalized Linear Models Over Normalized Data
A new approach named factorized learning is introduced that pushes ML computations through joins and avoids redundancy in both I/O and computations and is often substantially faster than the alternatives, but is not always the fastest, necessitating a cost-based approach.
Utilizing IDs to Accelerate Incremental View Maintenance
An ID-based IVM system for a large subset of SQL that includes the algebraic operators selection, join, grouping and aggregation, generalized projection involving functions, antisemijoin (and therefore negation/difference) and union is proposed.
Skew strikes back: new developments in the theory of join algorithms
A survey of recent work on join algorithms that have provable worst-case optimality runtime guarantees is described and a simpler and unified description of these algorithms is provided that is useful for theory-minded readers, algorithm designers, and systems implementors.
Materialized Views
This monograph provides an accessible introduction and reference to materialized views, explains its core ideas, highlights its recent developments, and points out its sometimes subtle connections to other research topics in databases.