Swiss Federal Institute of Technology in Lausanne, Switzerland (EPFL)EPFL Logo
Computer Science Department (DI)
Software Engineering Laboratory (LGL)


I LABORATOIRE DE GENIE LOGICIEL 11.35
II Replication Of Non-Deterministic Objects Thèse
1903

III Mots-clés

Distributed Systems, Fault Tolerance, Replication, Semi-Active Replication, Transactions, RAPIDS, Group Communication, Ada 95

IV Auteur de la thèse

Thomas Wolf

Directeur de la thèse

Prof. Alfred Strohmeier

V Description (objectifs, méthodes, perspectives) :

This thesis discusses replication of non-deterministic objects in distributed systems to achieve fault tolerance against crash failures. The objects replicated are the virtual nodes of a distributed application. Replication is viewed as an issue that is to be dealt with only during the configuration of a distributed application and that should not affect the development of the application. Hence, replication of virtual nodes should be transparent to the application.

Like all measures to achieve fault tolerance, replication introduces redundancy in the system. Not surprisingly, the main difficulty is guaranteeing the consistency of all replicas such that they behave in the same way as if the object was not replicated (replication transparency). This is further complicated if active objects (like virtual nodes) are replicated, and these objects themselves can be clients of still further objects in the distributed application.

The problems of replication of active non-deterministic objects are analyzed in the context of distributed Ada 95 applications. The ISO standard for Ada 95 defines a model for distributed execution based on remote procedure calls (RPC). Virtual nodes in Ada 95 use this as their sole communication paradigm, but they may contain tasks to execute activities concurrently, thus making the execution potentially non-deterministic due to implicit timing dependencies. Such non-determinism cannot be avoided by choosing deterministic tasking policies.

VI Contributions majeures

The main contributions of this work are the following:
  • An analysis of replication for fault tolerance of objects that may behave non-deterministically. This analysis is done in the context of Ada 95, where the objects correspond to partitions, i.e., virtual nodes.

    Two different approaches to replication of non-deterministic objects are examined. A first approach is based upon the transparent implementation of remote procedure calls as nested subtransactions, but it turns out that this method is not suitable for achieving transparent replication.

    A second approach assumes a piecewise deterministic computation model, in which deterministic execution state intervals are separated by non-deterministically occurring events. I show that this model preserves the correctness of Ada 95 partitions and can be used to offer replication in a transparent way.

  • A prototype implementation of a replication manager for Ada 95 partitions called RAPIDS (Replicated Ada Partitions In Distributed Systems), which is based upon the second approach using a piecewise deterministic model of computation. It employs semi-active replication for maintaining replica consistency.

For more information, see the thesis itself.

VII Publications principales parues

Th. Wolf: "Fault Tolerance in Distributed Ada 95", Position Paper; in Proceedings of the 8th International Real-Time Ada Workshop in Ravenscar, North Yorkshire, UK in April 1997. Published as Ada Letters XVII(5), Sept./Oct. 1997, pp. 106 - 110. Also available as TR 99/302, Computer Science Department, EPFL.

Th. Wolf: "Replication of Non-Deterministic Objects", PhD Thesis #1903, Ecole Polytechnique Fédérale de Lausanne (EPFL), Nov. 1998

Th. Wolf: "Transparent Replication for Fault Tolerance in Distributed Ada 95", Position paper for the 9th International Real-Time Ada Workshop, Tallahassee FL, USA, March 1999. To be published in the workshop proceedings, which will appear in Ada Letters. Also available as TR 99/307, Computer Science Department, EPFL.

Th.Wolf, A. Strohmeier: "Fault Tolerance by Transparent Replication for Distributed Ada 95", accepted by Ada Europe '99, to be held in Santander, Spain, June 1999. To appear in the conference proceedings, which will be published in Lecture Notes in Computer Science (LNCS), Springer Verlag. Also available as TR 99/305, Computer Science Department, EPFL.


Last update: July 21, 1999 / Thomas Wolf
Page maintenance by Webmaster