• Corpus ID: 8270786

Lara: A Key-Value Algebra underlying Arrays and Relations

  title={Lara: A Key-Value Algebra underlying Arrays and Relations},
  author={Dylan Hutchison and Bill Howe and Dan Suciu},
Data processing systems roughly group into families such as relational, array, graph, and key-value. [] Key Method We describe the operations and objects of Lara---union, join, and ext on associative tables---and show her properties and equivalences to other algebras. Multi-system optimization has a bright future, in which we proffer Lara for the role of universal connector.
LaraDB: A Minimalist Kernel for Linear and Relational Algebra Computation
The LARADB implementation outperforms Accumulo's native MapReduce integration on a core task involving join and aggregation in the form of matrix multiply, especially at smaller scales that are typically a poor fit for scale-out approaches.
On the Expressiveness of LARA: A Unified Language for Linear and Relational Algebra
This work shows LARA to be expressive complete with respect to first-order logic with aggregation, and distinguishes two main cases depending on the level of genericity queries are enforced to satisfy.
TDM: A Tensor Data Model for Logical Data Independence in Polystore Systems
This paper presents a Tensor Data Model to carry out logical data independence in polystore systems and defines a data model based on tensors for which the notions of typed schema using associative arrays are added.
Towards scalable dataframe systems
This paper reports on the experience building Modin, a scaled-up implementation of the most widely-used and complex dataframe API today, Python's pandas, and proposes a simple data model and algebra for dataframes to ground discussion in the field.
Expressiveness of Matrix and Tensor Query Languages in terms of ML Operators
This short paper studies a matrix and a tensor query language that have been recently proposed in the database literature and shows, by using examples, how these proposals are in line with the practical interest in rethinking tensor abstractions.
Transactions on Large-Scale Data- and Knowledge-Centered Systems XLII
This work proposes a novel system, called SD-TOPK, which is able to evaluate top-k queries over encrypted distributed data without needing to decrypt the data in the nodes where they are stored, and implemented and evaluated the system over synthetic and real databases.
Associative array model of SQL, NoSQL, and NewSQL databases
This work presents the SQL relational model in terms of associative arrays and identifies the key mathematical properties that are preserved within SQL that include associativity, commutativity, distributivity, identities, annihilators, and inverses.


Associative Arrays: Unified Mathematics for Spreadsheets, Databases, Matrices, and Graphs
Associative arrays reduce the effort required to pass data between steps in a data processing system, allow steps to be interchanged with full confidence that the results will be unchanged, and make it possible to recognize when steps can be simplified or eliminated.
A multi-set extended relational algebra: a formal approach to a practical issue
  • P. Grefen, R. D. By
  • Computer Science
    Proceedings of 1994 IEEE 10th International Conference on Data Engineering
  • 1994
This paper proposes a complete extended relational algebra with multi-set semantics, having a clear formal background and a close connection to the standard relational algebra, that includes constructs that extend the algebra to a complete sequential database manipulation language.
Principles of Programming with Complex Objects and Collection Types
Graphulo implementation of server-side sparse matrix multiply in the Accumulo database
A server-side implementation of GraphBLAS sparse matrix multiplication that leverages Accumulo's native, high-performance iterators and offers its work as a core component to the Graphulo library that will deliver matrix math primitives for graph analytics within Accumulus.
The Combinatorial BLAS: design, implementation, and applications
The parallel Combinatorial BLAS is described, which consists of a small but powerful set of linear algebra primitives specifically targeting graph and data mining applications, and an extensible library interface and some guiding principles for future development are provided.
Dynamic distributed dimensional data model (D4M) database and computation system
  • J. Kepner, W. Arcand, Charles Yee
  • Computer Science
    2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
  • 2012
D4M (Dynamic Distributed Dimensional Data Model) has been developed to provide a mathematically rich interface to tuple stores (and structured query language “SQL” databases) and it is possible to create composable analytics with significantly less effort than using traditional approaches.
Declarative Data Cleaning: Language, Model, and Algorithms
This paper presents a language, an execution model and algorithms that enable users to express data cleaning specifications declaratively and perform the cleaning efficiently and experimental results report on the assessement of the proposed framework for data cleaning.
Summingbird: A Framework for Integrating Batch and Online MapReduce Computations
The key insight is that certain algebraic structures provide the theoretical foundation for integrating batch and online processing in a seamless fashion and this means that Summingbird imposes constraints on the types of aggregations that can be performed, although in practice it has not found these constraints to be overly restrictive for a broad range of analytics tasks at Twitter.
A Demonstration of the BigDAWG Polystore System
BigDAWG is presented, a reference implementation of a new architecture for "Big Data" applications that showcases novel approaches for querying across multiple storage engines, data visualization, and scalable real-time analytics.
First Steps in Relational Lattice
An introduction to relational lattice with emphasis on formal algebraic laws is given and new results include Spight distributivity criteria and its applications to query transformations.