Abstract
A rollback recovery scheme for distributed systems is proposed. The state-save synchronization among processes is implemented by bounding clock drifts such that no state-save synchronization messages are required. Since the clocks are only loosely synchronized, the synchronization overhead can be negligible in many applications. An interprocess communication protocol which encodes state-save progress information within message frames is introduced to checkpoint consistent system states. A rollback recovery algorithm that will force a minimum number of nodes to roll back after failures is developed.
Original language | English (US) |
---|---|
Pages (from-to) | 246-251 |
Number of pages | 6 |
Journal | IEEE Transactions on Parallel and Distributed Systems |
Volume | 3 |
Issue number | 2 |
DOIs | |
State | Published - Mar 1992 |
ASJC Scopus subject areas
- Computational Theory and Mathematics
- Electrical and Electronic Engineering
- Theoretical Computer Science