DDFlasks: Deduplicated Very Large Scale Data Store
@inproceedings{Maia2017DDFlasksDV, title={DDFlasks: Deduplicated Very Large Scale Data Store}, author={Francisco Maia and Jo{\~a}o Paulo and F{\'a}bio Coelho and Francisco Neves and J. Pereira and R. Oliveira}, booktitle={DAIS}, year={2017} }
With the increasing number of connected devices, it becomes essential to find novel data management solutions that can leverage their computational and storage capabilities. However, developing very large scale data management systems requires tackling a number of interesting distributed systems challenges, namely continuous failures and high levels of node churn. In this context, epidemic-based protocols proved suitable and effective and have been successfully used to build DataFlasks, an…
References
SHOWING 1-10 OF 33 REFERENCES
DATAFLASKS: Epidemic Store for Massive Scale Systems
- Computer Science2014 IEEE 33rd International Symposium on Reliable Distributed Systems
- 2014
This paper proposes a novel data store solely based on epidemic (or gossip-based) protocols that leverages the capacity of these protocols to provide data persistence guarantees even in highly dynamic, massive scale systems.
Tradeoffs in Scalable Data Routing for Deduplication Clusters
- Computer ScienceFAST
- 2011
A cluster-based deduplication system that can dedupleicate with high throughput, support dedUplication ratios comparable to that of a single system, and maintain a low variation in the storage utilization of individual nodes is presented.
Probabilistic deduplication for cluster-based storage systems
- Computer ScienceSoCC '12
- 2012
Produck is proposed, a stateful, yet light-weight cluster-based backup system that provides deduplication rates close to those of a single-node system at a very low computational cost and with minimal memory overhead, and provides two main contributions: a lightweight probabilistic node-assignment mechanism and a new bucket-based load-balancing strategy.
A Scalable Inline Cluster Deduplication Framework for Big Data Protection
- Computer ScienceMiddleware
- 2012
Cluster deduplication has become a widely deployed technology in data protection services for Big Data to satisfy the requirements of service level agreement (SLA). However, it remains a great…
Cassandra: a decentralized structured storage system
- Computer ScienceOPSR
- 2010
Cassandra is a distributed storage system for managing very large amounts of structured data spread out across many commodity servers, while providing highly available service with no single point of…
High Availability, Scalable Storage, Dynamic Peer Networks: Pick Two
- Computer ScienceHotOS
- 2003
This work uses a simple resource usage model to measured behavior from the Gnutella file-sharing network to argue that large-scale cooperative storage is limited by likely dynamics and cross-system bandwidth -- not by local disk space.
HYDRAstor: A Scalable Secondary Storage
- Computer ScienceFAST
- 2009
This paper concentrates on the back-end which is, to this knowledge, the first commercial implementation of a scalable, high-performance content-addressable secondary storage delivering global duplicate elimination, per-block user-selectable failure resiliency, self-maintenance including automatic recovery from failures with data and network overlay rebuilding.
Extreme Binning: Scalable, parallel deduplication for chunk-based file backup
- Computer Science2009 IEEE International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems
- 2009
Extreme Binning is presented, a scalable deduplication technique for non-traditional backup workloads that are made up of individual files with no locality among consecutive files in a given window of time.
On the Expressiveness and Trade-Offs of Large Scale Tuple Stores
- Computer ScienceOTM Conferences
- 2010
DataDroplets is introduced, a novel tuple store that shifts the current trade-off towards the needs of common business users, providing additional consistency guarantees and higher level data processing primitives smoothing the migration path for existing applications.
Bigtable: A Distributed Storage System for Structured Data
- Computer ScienceTOCS
- 2008
The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.