Rollback recovery in distributed systems using loosely synchronized clocks

Zhijun Tong, Richard Y. Kain, W. T. Tsai

Research output: Contribution to journalArticle

26 Citations (Scopus)

Abstract

A rollback recovery scheme for distributed systems is proposed. The state-save synchronization among processes is implemented by bounding clock drifts such that no state-save synchronization messages are required. Since the clocks are only loosely synchronized, the synchronization overhead can be negligible in many applications. An interprocess communication protocol which encodes state-save progress information within message frames is introduced to checkpoint consistent system states. A rollback recovery algorithm that will force a minimum number of nodes to roll back after failures is developed.

Original languageEnglish (US)
Pages (from-to)246-251
Number of pages6
JournalIEEE Transactions on Parallel and Distributed Systems
Volume3
Issue number2
DOIs
StatePublished - Mar 1992

Fingerprint

Rollback Recovery
Distributed Systems
Clocks
Synchronization
Recovery
Checkpoint
Communication Protocol
Network protocols
Vertex of a graph

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Electrical and Electronic Engineering
  • Theoretical Computer Science

Cite this

Rollback recovery in distributed systems using loosely synchronized clocks. / Tong, Zhijun; Kain, Richard Y.; Tsai, W. T.

In: IEEE Transactions on Parallel and Distributed Systems, Vol. 3, No. 2, 03.1992, p. 246-251.

Research output: Contribution to journalArticle

Tong, Zhijun ; Kain, Richard Y. ; Tsai, W. T. / Rollback recovery in distributed systems using loosely synchronized clocks. In: IEEE Transactions on Parallel and Distributed Systems. 1992 ; Vol. 3, No. 2. pp. 246-251.
@article{3a272712006a402bb7970324fd8603e9,
title = "Rollback recovery in distributed systems using loosely synchronized clocks",
abstract = "A rollback recovery scheme for distributed systems is proposed. The state-save synchronization among processes is implemented by bounding clock drifts such that no state-save synchronization messages are required. Since the clocks are only loosely synchronized, the synchronization overhead can be negligible in many applications. An interprocess communication protocol which encodes state-save progress information within message frames is introduced to checkpoint consistent system states. A rollback recovery algorithm that will force a minimum number of nodes to roll back after failures is developed.",
author = "Zhijun Tong and Kain, {Richard Y.} and Tsai, {W. T.}",
year = "1992",
month = "3",
doi = "10.1109/71.127264",
language = "English (US)",
volume = "3",
pages = "246--251",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "2",

}

TY - JOUR

T1 - Rollback recovery in distributed systems using loosely synchronized clocks

AU - Tong, Zhijun

AU - Kain, Richard Y.

AU - Tsai, W. T.

PY - 1992/3

Y1 - 1992/3

N2 - A rollback recovery scheme for distributed systems is proposed. The state-save synchronization among processes is implemented by bounding clock drifts such that no state-save synchronization messages are required. Since the clocks are only loosely synchronized, the synchronization overhead can be negligible in many applications. An interprocess communication protocol which encodes state-save progress information within message frames is introduced to checkpoint consistent system states. A rollback recovery algorithm that will force a minimum number of nodes to roll back after failures is developed.

AB - A rollback recovery scheme for distributed systems is proposed. The state-save synchronization among processes is implemented by bounding clock drifts such that no state-save synchronization messages are required. Since the clocks are only loosely synchronized, the synchronization overhead can be negligible in many applications. An interprocess communication protocol which encodes state-save progress information within message frames is introduced to checkpoint consistent system states. A rollback recovery algorithm that will force a minimum number of nodes to roll back after failures is developed.

UR - http://www.scopus.com/inward/record.url?scp=0026825917&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0026825917&partnerID=8YFLogxK

U2 - 10.1109/71.127264

DO - 10.1109/71.127264

M3 - Article

VL - 3

SP - 246

EP - 251

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 2

ER -