Low overhead checkpointing and rollback recovery scheme for distributed systems

Zhijun Tong, Richard Y. Kain, W. T. Tsai

Research output: Chapter in Book/Report/Conference proceedingConference contribution

14 Scopus citations

Abstract

A major obstacle in implementing a rollback recovery scheme for fault tolerance in a concurrent distributed system is the domino effect. A low overhead checkpointing scheme is proposed to prevent this effect. Each process saves its state periodically. The state-save synchronization among processes is implemented by bounding clock drifts. A communication protocol that assures that all saved states are consistent is developed.

Original languageEnglish (US)
Title of host publicationProceedings - Symposium on Reliability in Distributed Software and Database Systems
Editors Anon
Place of PublicationPiscataway, NJ, United States
PublisherPubl by IEEE
Pages12-20
Number of pages9
StatePublished - 1989
Externally publishedYes
EventProceedings of the Eighth Symposium on Reliable Distributed Systems - Seattle, WA, USA
Duration: Oct 10 1989Oct 12 1989

Other

OtherProceedings of the Eighth Symposium on Reliable Distributed Systems
CitySeattle, WA, USA
Period10/10/8910/12/89

    Fingerprint

ASJC Scopus subject areas

  • Software

Cite this

Tong, Z., Kain, R. Y., & Tsai, W. T. (1989). Low overhead checkpointing and rollback recovery scheme for distributed systems. In Anon (Ed.), Proceedings - Symposium on Reliability in Distributed Software and Database Systems (pp. 12-20). Publ by IEEE.