RStore: efficient multiversion document management in the cloud

@article{Bhattacherjee2017RStoreEM,
  title={RStore: efficient multiversion document management in the cloud},
  author={Souvik Bhattacherjee and Amol Deshpande},
  journal={Proceedings of the 2017 Symposium on Cloud Computing},
  year={2017}
}
Motivation.The iterative and exploratory nature of the data science process, combined with an increasing need to support debugging, historical queries, auditing, provenance, and reproducibility, warrants the need to store and query a large number of versions of a dataset. This realization has led to many efforts at building data management systems that support versioning as a first-class construct, both in academia [1, 3, 5, 6] and in industry (e.g., git, Datomic, noms). These systems typically… 
A Vision for Managing Extreme-Scale Data Hoards
  • J. Logan, Kshitij Mehta, M. Wolf
  • Computer Science
    2019 IEEE 39th International Conference on Distributed Computing Systems (ICDCS)
  • 2019
TLDR
This work introduces the Hoarde abstraction as an attempt to formalize a way of looking at collections of data to make them more tractable for later use and leverages middleware and systems infrastructures for scientific and technical data management.

References

SHOWING 1-6 OF 6 REFERENCES
Decibel: The Relational Dataset Branching System
TLDR
The Relational Dataset Branching System, Decibel, a new relational storage system with built-in version control designed to address shortcomings of current versioned storage engine designs are introduced.
Scalable SQL and NoSQL data stores
TLDR
This paper examines a number of SQL and socalled "NoSQL" data stores designed to scale simple OLTP-style application loads over many servers, and contrasts the new systems on their data model, consistency mechanisms, storage mechanisms, durability guarantees, availability, query support, and other dimensions.
Archiving scientific data
TLDR
An archiving tool for XML data that is capable of providing meaningful change descriptions and can also efficiently support a variety of basic functions concerning the evolution of data such as retrieving any specific version from the archive and querying the temporal history of any element.
Principles of Dataset Versioning: Exploring the Recreation/Storage Tradeoff
TLDR
A prototype version management system that aims to serve as a foundation to the DataHub system for facilitating collaborative data science, and proposes a suite of inexpensive heuristics drawing from techniques in delay-constrained scheduling, and spanning tree literature to solve these problems.
Ground: A Data Context Service
TLDR
This paper frames the challenges of managing data context with basic ABCs: Applications, Behavior, and Change, and presents the initial design of a common metamodel and API, and explores the current state of the storage solutions that could serve the needs of a data context service.
Storing and querying versioned documents in the cloud
  • University of Maryland, College Park. Accessible at: https://www.cs. umd.edu/~bsouvik/paper/tech-report.pdf,
  • 2017