• Corpus ID: 13702834

A Database System with Amnesia

@inproceedings{Kersten2017ADS,
  title={A Database System with Amnesia},
  author={Martin L. Kersten and Lefteris Sidirourgos},
  booktitle={CIDR},
  year={2017}
}
textabstractBig Data comes with huge challenges. Its volume and velocity makes handling, curating, and analytical processing a costly affair. Even to simply “look at” the data within an a priori defined budget and with a guaranteed interactive response time might be impossible to achieve. Commonly applied scale-out approaches will hit the technology and monetary wall soon, if not done so already. Likewise, blindly rejecting data when the channels are full, or reducing the data resolution at the… 

Figures from this paper

Getting Rid of Data

  • T. Milo
  • Computer Science
    ACM J. Data Inf. Qual.
  • 2020
TLDR
The logical, algorithmic, and methodological foundations required for the systematic disposal of large-scale data, for constraints enforcement and for the development of applications over the retained information are discussed.

DBEst: Revisiting ApproximateQuery Processing Engines with Machine Learning Models

TLDR
DBEst1 is presented, a system based on Machine Learning models (regression models and probability density estimators) that can complement existing systems and substantiate its advantages using queries and data from the TPC-DS benchmark and real-life datasets, compared against state of the art AQP engines.

Model Joins: Enabling Analytics Over Joins of Absent Big Tables

TLDR
A framework, Model Join, addressing challenges to be able to learn and perform knowledge discovery and analytics tasks without the need to access raw-data tables and generates a uniform and independent sample that is a high-quality approximation of an approximation of the actual raw- data join.

DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models

TLDR
DBEst is presented, a system based on Machine Learning models (regression models and probability density estimators) that can complement existing systems and substantiate its advantages using queries and data from the TPC-DS benchmark and real-life datasets, compared against state of the art AQP engines.

Data Errors: Symptoms, Causes and Origins

TLDR
A vision for automating data disposal – disposal by design – which takes into account processing constraints, regulatory constraints as well as storage constraints, is presented, and three concrete examples which address aspects of this vision are given.

From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive Systems

TLDR
New opportunities to design data systems, data structures and algorithms that can adapt to both data and query workloads are surveyed, and how machine learning inspired designs as well as a detailed mapping of the possible design space of solutions can drive innovation to create tailored systems are surveyed.

Learning Data Structure Alchemy

TLDR
This work proposes the construction of an engine, a Data Alchemist, which learns how to blend fine-grained data structure design principles to automatically synthesize brand new data structures.

Less Data Delivers Higher Effectiveness for KeywordQueries

TLDR
This paper proposes an approach that uses only a relatively small subset of the database to answer most queries effectively and proposes a method that predicts whether a query can be answered more effectively using this subset or the entire database.

Less Data Delivers Higher Search Effectiveness for Keyword Queries

TLDR
This paper proposes an approach that uses only a relatively small subset of the database to answer most queries effectively and proposes a method that predicts whether a query can be answered more effectively using this subset or the entire database.

Decaying Telco Big Data with Data Postdiction

TLDR
This paper presents a novel decaying operator for Telco Big Data (TBD), coined TBD-DP (Data Postdiction), which relies on existing Machine Learning (ML) algorithms to abstract TBD into compact models that can be stored and queried when necessary.

References

SHOWING 1-10 OF 20 REFERENCES

SciBORQ: Scientific data management with Bounds On Runtime and Quality

TLDR
This paper proposes SciBORQ, a framework for scientific data exploration that gives precise control over runtime and quality of query answering, and presents novel techniques to derive multiple interesting data samples, called impressions.

BlinkDB: queries with bounded errors and bounded response times on very large data

TLDR
BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars.

Anti-Caching: A New Approach to Database Management System Architecture

TLDR
The results show that for higher skewed workloads the anti-caching architecture has a performance advantage over either of the other architectures tested of up to 9× for a data size 8× larger than memory.

Larger-than-memory data management on modern storage hardware for in-memory OLTP database systems

TLDR
Choosing the best strategy based on the hardware improves throughput by 92-340% over a generic configuration and several approaches to solve the design decisions in implementing cold data storage are explored.

Dynamic and Transparent Data Tiering for In-Memory Databases in Mixed Workload Environments

TLDR
This paper presents a new approach for in-memory databases that exploits data relevance and places less relevant data onto a NAND flash device and is able to efficiently evict a substantial share of the data stored in memory while suffering a performance loss of less than 30%.

Identifying hot and cold data in main-memory databases

TLDR
This work proposes to log record accesses - possibly only a sample to reduce overhead - and performs offline analysis to estimate record access frequencies and finds that exponential smoothing produces very accurate estimates, leading to higher hit rates than the best caching techniques.

A formal framework for database sampling

Capturing the Laws of (Data) Nature

TLDR
This work proposes to harvest the statistical models that users attach to the stored data as part of their analysis and use them to advance physical data storage and approximate query answering to unprecedented levels of performance.

MonetDB: Two Decades of Research in Column-oriented Database Architectures

TLDR
This paper gives a brief overview of the MonetDB technology as it developed over the past two decades and the main research highlights which drive the current Monet DB design and form the basis for its future evolution.

Big Data Space Fungus

TLDR
The corollary of the chess board payment scheme using rice (or wheat) is that not only you can not find enough of it in the universe, you will also never be able to consume the amount collected, so the lesson is Don’t collect more rice you can eat, otherwise it will rot away in storage.