Corpus ID: 13702834

A Database System with Amnesia

@inproceedings{Kersten2017ADS,
  title={A Database System with Amnesia},
  author={Martin L. Kersten and Lefteris Sidirourgos},
  booktitle={CIDR},
  year={2017}
}
textabstractBig Data comes with huge challenges. Its volume and velocity makes handling, curating, and analytical processing a costly affair. Even to simply “look at” the data within an a priori defined budget and with a guaranteed interactive response time might be impossible to achieve. Commonly applied scale-out approaches will hit the technology and monetary wall soon, if not done so already. Likewise, blindly rejecting data when the channels are full, or reducing the data resolution at the… Expand
Getting Rid of Data
TLDR
The logical, algorithmic, and methodological foundations required for the systematic disposal of large-scale data, for constraints enforcement and for the development of applications over the retained information are discussed. Expand
DBEst: Revisiting ApproximateQuery Processing Engines with Machine Learning Models
In the era of big data, computing exact answers to analytical queries becomes prohibitively expensive. This greatly increases the value of approaches that can compute efficiently approximate, butExpand
DBEst: Revisiting Approximate Query Processing Engines with Machine Learning Models
TLDR
DBEst is presented, a system based on Machine Learning models (regression models and probability density estimators) that can complement existing systems and substantiate its advantages using queries and data from the TPC-DS benchmark and real-life datasets, compared against state of the art AQP engines. Expand
From Auto-tuning One Size Fits All to Self-designed and Learned Data-intensive Systems
TLDR
New opportunities to design data systems, data structures and algorithms that can adapt to both data and query workloads are surveyed, and how machine learning inspired designs as well as a detailed mapping of the possible design space of solutions can drive innovation to create tailored systems are surveyed. Expand
Learning Data Structure Alchemy
TLDR
This work proposes the construction of an engine, a Data Alchemist, which learns how to blend fine-grained data structure design principles to automatically synthesize brand new data structures. Expand
Less Data Delivers Higher Effectiveness for KeywordQueries
As many users, such as scientists, do not know the schema and/or content of their databases, they cannot precisely formulate their information needs using formal query languages, such as SQL. To helpExpand
Less Data Delivers Higher Search Effectiveness for Keyword Queries
TLDR
This paper proposes an approach that uses only a relatively small subset of the database to answer most queries effectively and proposes a method that predicts whether a query can be answered more effectively using this subset or the entire database. Expand
Decaying Telco Big Data with Data Postdiction
TLDR
This paper presents a novel decaying operator for Telco Big Data (TBD), coined TBD-DP (Data Postdiction), which relies on existing Machine Learning (ML) algorithms to abstract TBD into compact models that can be stored and queried when necessary. Expand
Continuous decaying of telco big data with data postdiction
TLDR
This paper presents two novel decaying operators for Telco Big Data, coined TBD-DP and CTBD-DP that are founded on the notion of Data Postdiction, which aims to make a statement about the past value of some tuple which does not exist anymore as it had to be deleted to free up disk space. Expand
Learning Key-Value Store Design
TLDR
These properties allow us to envision a new class of self-designing key-value stores with a substantially improved ability to adapt to workload and hardware changes by transitioning between drastically different data structure designs to assume a diverse set of performance properties at will. Expand
...
1
2
...

References

SHOWING 1-10 OF 20 REFERENCES
SciBORQ: Scientific data management with Bounds On Runtime and Quality
TLDR
This paper proposes SciBORQ, a framework for scientific data exploration that gives precise control over runtime and quality of query answering, and presents novel techniques to derive multiple interesting data samples, called impressions. Expand
BlinkDB: queries with bounded errors and bounded response times on very large data
TLDR
BlinkDB allows users to trade-off query accuracy for response time, enabling interactive queries over massive data by running queries on data samples and presenting results annotated with meaningful error bars. Expand
Anti-Caching: A New Approach to Database Management System Architecture
TLDR
The results show that for higher skewed workloads the anti-caching architecture has a performance advantage over either of the other architectures tested of up to 9× for a data size 8× larger than memory. Expand
Larger-than-memory data management on modern storage hardware for in-memory OLTP database systems
TLDR
Choosing the best strategy based on the hardware improves throughput by 92-340% over a generic configuration and several approaches to solve the design decisions in implementing cold data storage are explored. Expand
Dynamic and Transparent Data Tiering for In-Memory Databases in Mixed Workload Environments
TLDR
This paper presents a new approach for in-memory databases that exploits data relevance and places less relevant data onto a NAND flash device and is able to efficiently evict a substantial share of the data stored in memory while suffering a performance loss of less than 30%. Expand
Identifying hot and cold data in main-memory databases
TLDR
This work proposes to log record accesses - possibly only a sample to reduce overhead - and performs offline analysis to estimate record access frequencies and finds that exponential smoothing produces very accurate estimates, leading to higher hit rates than the best caching techniques. Expand
A formal framework for database sampling
TLDR
The sampling strategy presented here is applied to improve the data quality of a (legacy) database, used to incrementally identify the set of tuples which are the cause of inconsistencies in the database, and therefore should be addressed by the data cleaning process. Expand
Capturing the Laws of (Data) Nature
TLDR
This work proposes to harvest the statistical models that users attach to the stored data as part of their analysis and use them to advance physical data storage and approximate query answering to unprecedented levels of performance. Expand
MonetDB: Two Decades of Research in Column-oriented Database Architectures
TLDR
This paper gives a brief overview of the MonetDB technology as it developed over the past two decades and the main research highlights which drive the current Monet DB design and form the basis for its future evolution. Expand
Specification-based data reduction in dimensional data warehouses
TLDR
Effective techniques for data reduction that enable the gradual aggregation of detailed data as the data ages are presented, enabling the maintenance of more compact, consolidated data and the compliance with privacy requirements. Expand
...
1
2
...