RemusDB: transparent high availability for database systems

@article{Minhas2012RemusDBTH,
  title={RemusDB: transparent high availability for database systems},
  author={Umar Farooq Minhas and Shriram Rajagopalan and Brendan Cully and Ashraf Aboulnaga and Kenneth Salem and Andrew Warfield},
  journal={The VLDB Journal},
  year={2012},
  volume={22},
  pages={29-45}
}
In this paper, we present a technique for building a high-availability (HA) database management system (DBMS). The proposed technique can be applied to any DBMS with little or no customization, and with reasonable performance overhead. Our approach is based on Remus, a commodity HA solution implemented in the virtualization layer, that uses asynchronous virtual machine state replication to provide transparent HA and failover capabilities. We show that while Remus and similar systems can protect… 
Elasca: Workload-Aware Elastic Scalability for Partition Based Database Systems
TLDR
Elasca consists of a mechanism for enabling elastic scalability, and a workload-aware optimizer for determining optimal partition placement and migration plans, and its optimizer minimizes computing resources required and balances load effectively without compromising system performance.
Elasca: Workload-Aware Elastic Scalability for Partition Based Database Systems
TLDR
Elasca consists of a mechanism for enabling elastic scalability, and a workload-aware optimizer for determining optimal partition placement and migration plans, and its optimizer minimizes computing resources required and balances load effectively without compromising system performance.
Database high availability using SHADOW systems
TLDR
The results of a performance evaluation are presented, showing that write offloading enables SHADOW to outperform traditional hot standby replication and even a standalone DBMS that does not provide high availability.
Ginja: one-dollar cloud-based disaster recovery for databases
TLDR
Ginja, a DR solution for transactional database management systems (DBMS) that uses only cloud storage services such as Amazon S3, works at file-system level to efficiently capture and replicate data updates to a remote cloud storage service.
Madeus: Database Live Migration Middleware under Heavy Workloads for Cloud Environment
TLDR
Madeus provides efficient database live migration by implementing the lazy snapshot isolation rule (LSIR) under snapshot isolation that enables concurrently propagating syncsets, which are the datasets needed to synchronize slave with master databases.
High availability of databases for cloud
TLDR
This paper proposed a solution which is based on another existing system Threshold Based File Replication (TBFR) and uses the current system load to achieve load distribution.
Database High Availability using SHADOW Systems by Xin Pan
TLDR
This thesis presents SHADOW systems, a new technique for database high availability that avoids the overhead of database-managed synchronized replication, while ensuring that no updates will be lost during a failover.
Application level ballooning for efficient server consolidation
TLDR
This work extends ballooning to applications so that memory can be efficiently and effectively moved between virtualized instances as the demands of each change over time, with significantly lower memory requirements.
Next generation JDBC database drivers for performance, transparent caching, load balancing, and scale-out
TLDR
Experimental results demonstrate how queries cached by the driver can improve query response times by an order of magnitude and reduce the overall load on the database system by up to 50+.
High Availability for Database Systems in Geographically Distributed Cloud Computing Environments
TLDR
CAC-DB takes advantage of this shared storage to ensure that the DBMS service remains available and transactionally consistent in the face of failures up to the loss of one or more data centers.
...
...

References

SHOWING 1-10 OF 48 REFERENCES
The Design and Evaluation of a Practical System for Fault-Tolerant Virtual Machines
We have implemented a commercial enterprise-grade system for providing fault-tolerant virtual machines, based on the approach of replicating the execution of a primary virtual machine (VM) via a
The design of a practical system for fault-tolerant virtual machines
TLDR
An easy-to-use, commercial system that automatically restores redundancy after failure requires many additional components beyond replicated VM execution, and this work has designed and implemented these extra components and addressed many practical issues encountered in supporting VMs running enterprise applications.
Xen and the art of virtualization
TLDR
Xen, an x86 virtual machine monitor which allows multiple commodity operating systems to share conventional hardware in a safe and resource managed fashion, but without sacrificing either performance or functionality, considerably outperform competing commercial and freely available solutions.
Remus: High Availability via Asynchronous Virtual Machine Replication. (Best Paper)
TLDR
Remus provides an extremely high degree of fault tolerance, to the point that a running system can transparently continue execution on an alternate physical host in the face of failure with only seconds of downtime, while completely preserving host state such as active network connections.
Live migration of virtual machines
TLDR
The design options for migrating OSes running services with liveness constraints are considered, the concept of writable working set is introduced, and the design, implementation and evaluation of high-performance OS migration built on top of the Xen VMM are presented.
Automatic virtual machine configuration for database workloads
TLDR
A virtualization design advisor is introduced that uses information about the anticipated workloads of each of the database systems to recommend workload-specific configurations offline and runtime information collected after the deployment of the recommended configurations can be used to refine the recommendation and to handle changes in the workload.
The Recovery Box: Using Fast Recovery to Provide High Availability in the UNIX Environment
TLDR
Using the recovery box implementation of Sprite, a UNIX-like distributed operating system, a Sprite file server recovers in 26 seconds and a database manager with ten remote client processes recovers in six seconds ‐ fast enough that many users and applications will not care that the system crashed.
A "flight data recorder" for enabling full-system multiprocessor deterministic replay
TLDR
A practical low-overhead hardware recorder for cache-coherent multiprocessors, called flight data recorder (FDR), which continuously records the execution, even on deployed systems, logging the execution for post-mortem analysis.
SecondSite: disaster tolerance as a service
TLDR
The design and implementation of SecondSite is described, a cloud-based service for disaster tolerance that extends the Remus virtualization-based high availability system by allowing groups of virtual machines to be replicated across data centers over wide-area Internet links.
Deploying Database Appliances in the Cloud
TLDR
An end-to-end solution to one tuning problem in this environment, namely partitioning the CPU capacity of a physical machine among multiple database appliances running on this machine is presented.
...
...