Clustering-based fragmentation and data replication for flexible query answering in distributed databases

@article{Wiese2014ClusteringbasedFA,
  title={Clustering-based fragmentation and data replication for flexible query answering in distributed databases},
  author={Lena Wiese},
  journal={Journal of Cloud Computing},
  year={2014},
  volume={3},
  pages={1-15}
}
  • L. Wiese
  • Published 28 October 2014
  • Computer Science
  • Journal of Cloud Computing
One feature of cloud storage systems is data fragmentation (or sharding) so that data can be distributed over multiple servers and subqueries can be run in parallel on the fragments. On the other hand, flexible query answering can enable a database system to find related information for a user whose original query cannot be answered exactly. Query generalization is a way to implement flexible query answering on the syntax level. In this paper we study a clustering-based fragmentation for the… 
Access Patterns Optimization in Distributed Databases Using Data Reallocation
TLDR
The proposed method gathers incremental knowledge about data access patterns and database statistics to solve the following problem: online re-allocation of the fragments in order to constantly optimize the query response time.
A linear approach to distributed database optimization using data reallocation
TLDR
This paper proposes a solution for minimizing raw data transfers between distant nodes by online re-arranging and replicating data within the constraints of the original database architecture by gathering online incremental knowledge about data access patterns and database statistics.
Horizontal Fragmentation and Replication for Multiple Relaxation Attributes
TLDR
This paper provides a formulation of the DRP for horizontal fragmentations with overlapping fragments and devise a recovery procedure based on these fragmentations.
A horizontal fragmentation method based on data semantics
TLDR
This paper proposes a new horizontal fragmentation scheme based on clustering and finds performance and time improvements are found while performing with clustered fragments.
A Replication Scheme for Multiple Fragmentations with Overlapping Fragments
TLDR
The data replication problem (DRP) is extended by not only considering hard constraints to ensure a fixed replication factor but also adding soft constraints that express desired data locality of fragments.
Hybridized fragmentation of very large databases using clustering
  • Sandhya Harikumar, R. Ramachandran
  • Computer Science
    2015 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES)
  • 2015
TLDR
This paper presents a unique approach towards hybridized fragmentation, by applying subspace clustering algorithm, to come up with a set of fragments which partitions the data with respect to tuples as well as attributes, giving good hybridized fragments for distributed databases.
A Case Study of Snapshot Replication and Transfer of Data in Distributed Databases
TLDR
The experimental results show that at both vertical and horizontal fragmentation, the proposed approach of replicating distributed database is efficient and the performance is significantly improved in terms of data transfer time, load sharing and update of database fragmentation.
Ontology-Driven Data Partitioning and Recovery for Flexible Query Answering
TLDR
A method to partition the data by using an ontology that semantically guides the query relaxation and if several different partitioning strategies are applied in parallel, a lookup table is maintained in order to recover the ontology-driven partitioning in case of data loss or server failure.
A Comprehensive Taxonomy of Fragmentation and Allocation Techniques in Distributed Database Design
TLDR
This article presents an attempt to propose a comprehensive taxonomy of the available fragmentation and allocation techniques in distributed database design and discusses some case studies of these techniques for a deeper understanding of its achievements and limitations.
Application of data fragmentation and replication methods in the cloud: a review
TLDR
A review of the literature about data fragmentation and replication methods applied in a cloud environment in the last eight years to determine if the methods take into account both techniques together, consider a cloud database, are easy to implement, are focused on improving the performance of the database, and if they provide a cost model.
...
...

References

SHOWING 1-10 OF 42 REFERENCES
Taxonomy-Based Fragmentation for Anti-instantiation in Distributed Databases
  • L. Wiese
  • Computer Science
    2013 IEEE/ACM 6th International Conference on Utility and Cloud Computing
  • 2013
TLDR
This paper studies a taxonomy-based fragmentation for the generalization operator Anti-Instantiation with which related information can be found in distributed data.
Lookup Tables: Fine-Grained Partitioning for Distributed Databases
TLDR
This work presents the design of a data distribution layer that efficiently stores these tables and maintains them in the presence of inserts, deletes, and updates, and shows greater potential for further scale-out on Wikipedia, Twitter, and TPC-E workloads.
Relaxation as a platform for cooperative answering
TLDR
The relaxation method expands the scope of a query by relaxing the constraints implicit in the query, which allows the database to return answers related to the original query as well as the literal answers themselves.
CoBase: A scalable and extensible cooperative information system
TLDR
CoBase has been demonstrated to answer imprecise queries for transportation and logistic planning applications and is applying the CoBase methodology to match medical image features and approximate matching of emitter signals in electronic warfare applications.
Principles of Distributed Database Systems
TLDR
This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels and concentrates on fundamental theories as well as techniques and algorithms in distributed data management.
Schism: a Workload-Driven Approach to Database Replication and Partitioning
TLDR
Schism consistently outperforms simple partitioning schemes, and in some cases proves superior to the best known manual partitioning, reducing the cost of distributed transactions up to 30%.
Principles of Distributed Database Systems, Third Edition
TLDR
This third edition of a classic textbook can be used to teach at the senior undergraduate and graduate levels and concentrates on fundamental theories as well as techniques and algorithms in distributed data management.
Bigtable: A Distributed Storage System for Structured Data
TLDR
The simple data model provided by Bigtable is described, which gives clients dynamic control over data layout and format, and the design and implementation of Bigtable are described.
Machine learning for online query relaxation
TLDR
A novel algorithm, loqr, is introduced, which is designed to relax queries that are in the disjunctive normal form and contain a mixture of discrete and continuous attributes.
Providing ranked cooperative query answers using the metricized knowledge abstraction hierarchy
...
...