Rollback recovery in distributed systems using loosely synchronized clocks

Zhijun Tong; Richard Y. Kain; W. T. Tsai

doi:10.1109/71.127264

Rollback recovery in distributed systems using loosely synchronized clocks

Zhijun Tong, Richard Y. Kain, W. T. Tsai

Computing and Augmented Intelligence, School of (IAFSE-SCAI)

Research output: Contribution to journal › Article › peer-review

27 Scopus citations

Abstract

A rollback recovery scheme for distributed systems is proposed. The state-save synchronization among processes is implemented by bounding clock drifts such that no state-save synchronization messages are required. Since the clocks are only loosely synchronized, the synchronization overhead can be negligible in many applications. An interprocess communication protocol which encodes state-save progress information within message frames is introduced to checkpoint consistent system states. A rollback recovery algorithm that will force a minimum number of nodes to roll back after failures is developed.

Original language	English (US)
Pages (from-to)	246-251
Number of pages	6
Journal	IEEE Transactions on Parallel and Distributed Systems
Volume	3
Issue number	2
DOIs	https://doi.org/10.1109/71.127264
State	Published - Mar 1992

ASJC Scopus subject areas

Computational Theory and Mathematics
Electrical and Electronic Engineering
Theoretical Computer Science

Access to Document

10.1109/71.127264

Cite this

@article{3a272712006a402bb7970324fd8603e9,

title = "Rollback recovery in distributed systems using loosely synchronized clocks",

abstract = "A rollback recovery scheme for distributed systems is proposed. The state-save synchronization among processes is implemented by bounding clock drifts such that no state-save synchronization messages are required. Since the clocks are only loosely synchronized, the synchronization overhead can be negligible in many applications. An interprocess communication protocol which encodes state-save progress information within message frames is introduced to checkpoint consistent system states. A rollback recovery algorithm that will force a minimum number of nodes to roll back after failures is developed.",

author = "Zhijun Tong and Kain, {Richard Y.} and Tsai, {W. T.}",

year = "1992",

month = mar,

doi = "10.1109/71.127264",

language = "English (US)",

volume = "3",

pages = "246--251",

journal = "IEEE Transactions on Parallel and Distributed Systems",

issn = "1045-9219",

publisher = "IEEE Computer Society",

number = "2",

}

TY - JOUR

T1 - Rollback recovery in distributed systems using loosely synchronized clocks

AU - Tong, Zhijun

AU - Kain, Richard Y.

AU - Tsai, W. T.

PY - 1992/3

Y1 - 1992/3

N2 - A rollback recovery scheme for distributed systems is proposed. The state-save synchronization among processes is implemented by bounding clock drifts such that no state-save synchronization messages are required. Since the clocks are only loosely synchronized, the synchronization overhead can be negligible in many applications. An interprocess communication protocol which encodes state-save progress information within message frames is introduced to checkpoint consistent system states. A rollback recovery algorithm that will force a minimum number of nodes to roll back after failures is developed.

AB - A rollback recovery scheme for distributed systems is proposed. The state-save synchronization among processes is implemented by bounding clock drifts such that no state-save synchronization messages are required. Since the clocks are only loosely synchronized, the synchronization overhead can be negligible in many applications. An interprocess communication protocol which encodes state-save progress information within message frames is introduced to checkpoint consistent system states. A rollback recovery algorithm that will force a minimum number of nodes to roll back after failures is developed.

UR - http://www.scopus.com/inward/record.url?scp=0026825917&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0026825917&partnerID=8YFLogxK

U2 - 10.1109/71.127264

DO - 10.1109/71.127264

M3 - Article

AN - SCOPUS:0026825917

SN - 1045-9219

VL - 3

SP - 246

EP - 251

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

IS - 2

ER -

Rollback recovery in distributed systems using loosely synchronized clocks

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this