Optimal Checkpointing of Real-Time Tasks

Kang G. Shin; Tein Hsiang Lin; Yann Hang Lee

doi:10.1109/TC.1987.5009472

Optimal Checkpointing of Real-Time Tasks

Kang G. Shin, Tein Hsiang Lin, Yann Hang Lee

Research output: Contribution to journal › Article › peer-review

74 Scopus citations

Abstract

Analytical models for the design and evaluation of checkpointing of Real-Time tasks are developed. First, the execution of a Real-Time task is modeled under a common assumption of perfect coverage of online detection mechanisms (which is termed a basic model). Then, the model is generalized (to an extended model) to include more realistic cases, i.e., imperfect coverages of online detection mechanisms and acceptance tests. Finally, we determine an optimal placement of checkpoints to minimize the mean task execution time while the probability of an unreliable result (or lack of confidence) is kept below a specified level. In the basic model, it is shown that equidistant intercheckpoint intervals are optimal, whereas this is not necessarily true in the extended model. An algorithm for calculating the optimal number of checkpoints and intercheckpoint intervals is presented with some numerical examples for the extended model.

Original language	English (US)
Pages (from-to)	1328-1341
Number of pages	14
Journal	IEEE Transactions on Computers
Volume	C-36
Issue number	11
DOIs	https://doi.org/10.1109/TC.1987.5009472
State	Published - Nov 1987
Externally published	Yes

Keywords

Checkpointing
failure coverages
mean task execution time
on-line detection mechanisms and acceptance tests
optimal placement of checkpoints
probability of an unreliable result
rollback and restart failure recovery

ASJC Scopus subject areas

Software
Theoretical Computer Science
Hardware and Architecture
Computational Theory and Mathematics

Access to Document

10.1109/TC.1987.5009472

Cite this

@article{2276b7daa1b041888069969b7131b8c2,

title = "Optimal Checkpointing of Real-Time Tasks",

abstract = "Analytical models for the design and evaluation of checkpointing of Real-Time tasks are developed. First, the execution of a Real-Time task is modeled under a common assumption of perfect coverage of online detection mechanisms (which is termed a basic model). Then, the model is generalized (to an extended model) to include more realistic cases, i.e., imperfect coverages of online detection mechanisms and acceptance tests. Finally, we determine an optimal placement of checkpoints to minimize the mean task execution time while the probability of an unreliable result (or lack of confidence) is kept below a specified level. In the basic model, it is shown that equidistant intercheckpoint intervals are optimal, whereas this is not necessarily true in the extended model. An algorithm for calculating the optimal number of checkpoints and intercheckpoint intervals is presented with some numerical examples for the extended model.",

keywords = "Checkpointing, failure coverages, mean task execution time, on-line detection mechanisms and acceptance tests, optimal placement of checkpoints, probability of an unreliable result, rollback and restart failure recovery",

author = "Shin, {Kang G.} and Lin, {Tein Hsiang} and Lee, {Yann Hang}",

year = "1987",

month = nov,

doi = "10.1109/TC.1987.5009472",

language = "English (US)",

volume = "C-36",

pages = "1328--1341",

journal = "IEEE Transactions on Computers",

issn = "0018-9340",

publisher = "IEEE Computer Society",

number = "11",

}

TY - JOUR

T1 - Optimal Checkpointing of Real-Time Tasks

AU - Shin, Kang G.

AU - Lin, Tein Hsiang

AU - Lee, Yann Hang

PY - 1987/11

Y1 - 1987/11

N2 - Analytical models for the design and evaluation of checkpointing of Real-Time tasks are developed. First, the execution of a Real-Time task is modeled under a common assumption of perfect coverage of online detection mechanisms (which is termed a basic model). Then, the model is generalized (to an extended model) to include more realistic cases, i.e., imperfect coverages of online detection mechanisms and acceptance tests. Finally, we determine an optimal placement of checkpoints to minimize the mean task execution time while the probability of an unreliable result (or lack of confidence) is kept below a specified level. In the basic model, it is shown that equidistant intercheckpoint intervals are optimal, whereas this is not necessarily true in the extended model. An algorithm for calculating the optimal number of checkpoints and intercheckpoint intervals is presented with some numerical examples for the extended model.

AB - Analytical models for the design and evaluation of checkpointing of Real-Time tasks are developed. First, the execution of a Real-Time task is modeled under a common assumption of perfect coverage of online detection mechanisms (which is termed a basic model). Then, the model is generalized (to an extended model) to include more realistic cases, i.e., imperfect coverages of online detection mechanisms and acceptance tests. Finally, we determine an optimal placement of checkpoints to minimize the mean task execution time while the probability of an unreliable result (or lack of confidence) is kept below a specified level. In the basic model, it is shown that equidistant intercheckpoint intervals are optimal, whereas this is not necessarily true in the extended model. An algorithm for calculating the optimal number of checkpoints and intercheckpoint intervals is presented with some numerical examples for the extended model.

KW - Checkpointing

KW - failure coverages

KW - mean task execution time

KW - on-line detection mechanisms and acceptance tests

KW - optimal placement of checkpoints

KW - probability of an unreliable result

KW - rollback and restart failure recovery

UR - http://www.scopus.com/inward/record.url?scp=0023456347&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0023456347&partnerID=8YFLogxK

U2 - 10.1109/TC.1987.5009472

DO - 10.1109/TC.1987.5009472

M3 - Article

AN - SCOPUS:0023456347

SN - 0018-9340

VL - C-36

SP - 1328

EP - 1341

JO - IEEE Transactions on Computers

JF - IEEE Transactions on Computers

IS - 11

ER -

Optimal Checkpointing of Real-Time Tasks

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this