A Study of Designing Distributed and Persistent Computing Systems

Abstract

This thesis describes a new distributed computing system in which all programs, data, and execution images are allowed to be both distributed and persistent. The essence of distributed computing can be regarded as information exchange between different address spaces of different computer sites; the essence of persistent computing can be regarded as information exchange between a volatile virtual address space of an application session and persistent store. This thesis presents a distributed shared repository (DSR) designed on this observation. The DSR allows application programs to handle distribution and persistency in a unified frame work. To distribute any statically typed data including higher-order functions and pointers, the higher-order remote procedure calls technique is proposed in this thesis. This technique can also provide a basis for a multilanguage abstract data type system and a multilanguage persistent system. As an application of this technique, design and implementation of the Distributed C language are also described. To improve the productivity of distributed data processing, a language based on list comprehensions is presented in this thesis. The language allows programmers to very concisely describe most typical data retrieval operations on a higher description level. By applying uncomplicated program transformation rules, several independently described functions can be synthesized into a single function, and functions can be easily transformed into procedural programs of a lower level that communicate and synchronize with one another in a shared-nothing distributed environment. To share persistent data among independently developed application programs, the type information of the persistent data should be kept independent of the application programs. For this purpose, the complex object file system presented in this thesis manages types as well as the persistent data itself. In the file system, persistent objects are composed of basic data types, predefined constructor types such as the tuple or set types, user-defined types, or a combination of these. A complex object can be composed of several subobjects by referencing the object identifiers (OIDs) of the subobjects. One of the most costly operations on complex objects is a navigation operation that dereferences a reference by an object identifier, since each navigation generally causes disk access. This thesis proposes a persistent caching technique, a kind of replication technique using secondary storage. The technique reduces the number of disk accesses in navigation operations. Storing a replica of a part or the whole of the subobjects within the page containing the parent object eliminates the need for additional disk accesses when dereferencing the object identifiers of the subobjects. To maintain consistency between a replica and the original, the technique uses an invalidation scheme based on time-stamps. By the scheme, update propagation to replicas is delayed until the reference time, and the overhead of update propagation is minimized. I deeply thank the committee members of my thesis, Akinori Yonezawa (Chair), Masami Hagiya, Kei Hiraki, Kentaro Shimizu, and Shojiro Nishio.

40 Figures and Tables

Cite this paper

@inproceedings{Kato1992ASO, title={A Study of Designing Distributed and Persistent Computing Systems}, author={Kazuhiko Kato}, year={1992} }