Mariposa: a wide-area distributed database system

  title={Mariposa: a wide-area distributed database system
  author={Michael Stonebraker and Paul M. Aoki and Witold Litwin and Avi Pfeffer and Adam Sah and Jeff Sidell and Carl Staelin and Andrew Yu},
  journal={The VLDB Journal},
Abstract. The requirements of wide-area distributed database systems differ dramatically from those of local-area network systems. In a wide-area network (WAN) configuration, individual sites usually report to different system administrators, have different access and charging algorithms, install site-specific data type extensions, and have different constraints on servicing remote requests. Typical of the last point are production transaction environments, which are fully engaged during normal… 

The state of the art in distributed query processing

The paper presents the “textbook” architecture for distributed query processing and a series of techniques that are particularly useful for distributed database systems, and discusses different kinds of distributed systems such as client-server, middleware (multitier), and heterogeneous database systems and shows how query processing works in these systems.

Site selection for real-time client request handling

  • V. KanitkarA. Delis
  • Computer Science
    Proceedings. 19th IEEE International Conference on Distributed Computing Systems (Cat. No.99CB37003)
  • 1999
This paper proposes a load-sharing mechanism that oversees the shipment of data and transactions in order to increase the efficiency of a client-server cluster and makes use of the concept of grouped locks to schedule the movement of data objects in the cluster in a more efficient manner.

Dynamic Query Operator Scheduling for Wide-Area Remote Access

This paper focuses on the dynamic scheduling of query operators in the context of query scrambling, and shows that scrambling rescheduling is effective in hiding the impact of delays on query response time for a number of different delay scenarios.

Performance of Adaptive Query Processing in the Mariposa Distributed Database Management System

Results are presented which show that in multi-user situations, when response time is used as a metric, the Mariposa system outperforms a static optimizer by causing work to be distributed more evenly among the available sites and that the overhead introduced byMariposa’s bidding protocol is insignificant when used with large, expensive queries and is outweighed by the benefits of load balancing.

Proxy-server architectures for OLAP

This paper proposes an architecture for OLAP cache servers (OCS), an OCS is the equivalent of a proxy-server for web documents, but it is designed to accommodate data from warehouses and support OLAP operations, and is complementary both to existingOLAP cache systems and distributed OLAP approaches.

H2O: An Autonomic, Resource-Aware Distributed Database System

The requirements for an autonomic, resource-aware distributed database which enables data to be backed up and shared without complex manual administration are discussed and the design and implementation of H2O are presented.

Efficient Processing of Client Transactions in Real-Time

A load-sharing framework that oversees the shipment of data and transactions so as to increase the efficiency of a cluster consisting of a server and a number of clients, and uses the concept of grouped locks, along with transaction deadline information, in order to schedule the movement of data objects in the cluster in a more efficient manner.

Review of dynamic query optimization strategies in distributed database

The traditional optimization strategies, like Static and various dynamic strategies for non-autonomous distributed database systems are reviewed and the suitability of these strategies for autonomous systems is analyzed.

The architecture of an autonomic, resource-aware, workstation-based distributed database system

This thesis describes the design and implementation of a workstation-based database system and investigates its viability by evaluating its performance against existing clustered database systems and testing its availability during machine failures.

DYTAF: Dynamic Table Fragmentation in Distributed Database Systems

DYTAF is presented, a decentralized approach for dynamic table fragmentation and allocation in distributed database systems based on observation of the access patterns of sites to tables, aiming at being able to maximize the number local accesses compared to accesses from remote sites.



An economic paradigm for query processing and data migration in Mariposa

This work presents the protocols which underlie the Mariposa economy, a database system under construction at Berkeley which combines the best features of traditional distributed database systems, object-oriented DBMSs, tertiary memory file systems and distributed file systems.

Mariposa: a new architecture for distributed data

The design of Mariposa is described, an experimental distributed data management system that provides high performance in an environment of high data mobility and heterogeneous host capabilities and a general, flexible platform for the development of new algorithms for distributed query optimization, storage management, and scalable data storage structures.

Query processing in a system for distributed databases (SDD-1)

The semijoin operator is defined, why Semijoin is an effective reduction operator is explained, and an algorithm is presented that constructs a cost-effective program of semijoins, given an envelope and a database.

R* Optimizer Validation and Performance Evaluation for Distributed Queries

This paper extends an earlier optimizer validation and performance evaluation of R’ to di.rfribu& queries, i.e. single SQL statements having tables at multiple sites, confirming the accuracy of R*‘s message cost model and the significant contribution of local (CPU and I/O) costs, even for a medium-speed network.

Decentralizing a global naming service for improved performance and fault tolerance

This paper proposes a three-level naming architecture that consists of global, administrational, and managerial naming mechanisms, each optimized to meet the performance, reliability, and security requirements at its own level.

Data placement in Bubba

It is argued that data placement, especially declustering, in a highly-parallel system must be considered early in the design, so that mechanisms can be included for supporting variable declustered, for minimizing the most significant overheads associated with large-scale declusting, and for gathering the required statistics.

An economy for managing replicated data in autonomous decentralized systems

A new approach to performing resource allocation in autonomous distributed computer systems is explored and it is shown that the economy can substantially improve performance by varying the placement and number of copies of each data object.

Data replication in Mariposa

It is shown how the replica control mechanism can be used to provide consistent, although potentially stale, views of data across many machines without expensive per-transaction synchronization.

An Economic Paradigm for Query Processing and

The protocols which underlie Mariposa, a distributed data base system under construction at Berkeley, are presented, which adopt an under- lying economic paradigm for both query execution and storage management issues.

A Microeconomic Approach to Optimal Resource Allocation in Distributed Computer Systems

Decentralized algorithms are examined for optimally distributing a divisible resource in a distributed computer system and have several attractive properties, including their simplicity and distributed nature, the computation of feasible and increasingly better resource allocations as the result of each iteration, and rapid convergence.