DESIGN AND EVALUATION OF A FAULT-TOLERANT MULTIPROCESSOR USING HARDWARE RECOVERY BLOCKS.

Yann-Hang Lee, Kang G. Shin

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

The authors consider the design and evaluation of a fault-tolerant multiprocessor with a rollback recovery mechanism. The mechanism is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery blocks are constructed by consecutive state-save operations and several state-save units in every processor and memory module. Upon detection of failure, the multiprocessor reconfigures itself to replace the faulty module and then the process originally assigned to the faulty module retreats to one of the previously saved states in order to resume fault-free execution. Due to random interactions among cooperating processes and also due to asynchrony in the state-savings, the rollback of a process may propagate to others, and thus the need for multiple-step rollbacks may arise. In the worst case, when all the available saved states are exhausted, the processes have to restart from the beginning as if they were executed in a system without any rollback recovery mechanism. A mathematical model is proposed to calculate both the coverage of multistep rollback recovery and the risk of restart. The mean and variance of execution time of a given task with occurrence of rollbacks and/or restarts is evaluated.

Original languageEnglish (US)
Pages (from-to)113-124
Number of pages12
JournalIEEE Transactions on Computers
VolumeC-33
Issue number2
StatePublished - Feb 1984
Externally publishedYes

Fingerprint

Rollback Recovery
Multiprocessor
Fault-tolerant
Restart
Recovery
Hardware
Evaluation
Module
Execution Time
Consecutive
Coverage
Fault
Mathematical Model
Calculate
Unit
Software
Design
Interaction
Mathematical models
Data storage equipment

ASJC Scopus subject areas

  • Hardware and Architecture
  • Electrical and Electronic Engineering

Cite this

DESIGN AND EVALUATION OF A FAULT-TOLERANT MULTIPROCESSOR USING HARDWARE RECOVERY BLOCKS. / Lee, Yann-Hang; Shin, Kang G.

In: IEEE Transactions on Computers, Vol. C-33, No. 2, 02.1984, p. 113-124.

Research output: Contribution to journalArticle

@article{cd1c90bde9434a8391e9f0d6b86e7746,
title = "DESIGN AND EVALUATION OF A FAULT-TOLERANT MULTIPROCESSOR USING HARDWARE RECOVERY BLOCKS.",
abstract = "The authors consider the design and evaluation of a fault-tolerant multiprocessor with a rollback recovery mechanism. The mechanism is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery blocks are constructed by consecutive state-save operations and several state-save units in every processor and memory module. Upon detection of failure, the multiprocessor reconfigures itself to replace the faulty module and then the process originally assigned to the faulty module retreats to one of the previously saved states in order to resume fault-free execution. Due to random interactions among cooperating processes and also due to asynchrony in the state-savings, the rollback of a process may propagate to others, and thus the need for multiple-step rollbacks may arise. In the worst case, when all the available saved states are exhausted, the processes have to restart from the beginning as if they were executed in a system without any rollback recovery mechanism. A mathematical model is proposed to calculate both the coverage of multistep rollback recovery and the risk of restart. The mean and variance of execution time of a given task with occurrence of rollbacks and/or restarts is evaluated.",
author = "Yann-Hang Lee and Shin, {Kang G.}",
year = "1984",
month = "2",
language = "English (US)",
volume = "C-33",
pages = "113--124",
journal = "IEEE Transactions on Computers",
issn = "0018-9340",
publisher = "IEEE Computer Society",
number = "2",

}

TY - JOUR

T1 - DESIGN AND EVALUATION OF A FAULT-TOLERANT MULTIPROCESSOR USING HARDWARE RECOVERY BLOCKS.

AU - Lee, Yann-Hang

AU - Shin, Kang G.

PY - 1984/2

Y1 - 1984/2

N2 - The authors consider the design and evaluation of a fault-tolerant multiprocessor with a rollback recovery mechanism. The mechanism is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery blocks are constructed by consecutive state-save operations and several state-save units in every processor and memory module. Upon detection of failure, the multiprocessor reconfigures itself to replace the faulty module and then the process originally assigned to the faulty module retreats to one of the previously saved states in order to resume fault-free execution. Due to random interactions among cooperating processes and also due to asynchrony in the state-savings, the rollback of a process may propagate to others, and thus the need for multiple-step rollbacks may arise. In the worst case, when all the available saved states are exhausted, the processes have to restart from the beginning as if they were executed in a system without any rollback recovery mechanism. A mathematical model is proposed to calculate both the coverage of multistep rollback recovery and the risk of restart. The mean and variance of execution time of a given task with occurrence of rollbacks and/or restarts is evaluated.

AB - The authors consider the design and evaluation of a fault-tolerant multiprocessor with a rollback recovery mechanism. The mechanism is based on the hardware recovery block which is a hardware equivalent to the software recovery block. The hardware recovery blocks are constructed by consecutive state-save operations and several state-save units in every processor and memory module. Upon detection of failure, the multiprocessor reconfigures itself to replace the faulty module and then the process originally assigned to the faulty module retreats to one of the previously saved states in order to resume fault-free execution. Due to random interactions among cooperating processes and also due to asynchrony in the state-savings, the rollback of a process may propagate to others, and thus the need for multiple-step rollbacks may arise. In the worst case, when all the available saved states are exhausted, the processes have to restart from the beginning as if they were executed in a system without any rollback recovery mechanism. A mathematical model is proposed to calculate both the coverage of multistep rollback recovery and the risk of restart. The mean and variance of execution time of a given task with occurrence of rollbacks and/or restarts is evaluated.

UR - http://www.scopus.com/inward/record.url?scp=0021382125&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0021382125&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0021382125

VL - C-33

SP - 113

EP - 124

JO - IEEE Transactions on Computers

JF - IEEE Transactions on Computers

SN - 0018-9340

IS - 2

ER -