InCheck

An In-Application Recovery Scheme for Soft Errors

Moslem Didehban, Sai Ram Dheeraj Lokam, Aviral Shrivastava

Research output: Chapter in Book/Report/Conference proceedingConference contribution

5 Citations (Scopus)

Abstract

An ideal solution for soft error tolerance should hide the effect of soft errors from user and provide correct results at expected time. Software solutions are attractive because they can provide flexible reliability without imposing any hardware modifications. Our investigation of state-of-The-Art error recovery techniques reveals that they suffer from poor coverage (ability to detect and correctly recover from soft errors). This paper presents InCheck (In-Application Checkpointing and Recovery) as an effective, safe and timely software technique for complete error coverage. The key features of InCheck are: verified register preservation, single memory location checkpoints, and safe & timely recovery. To evaluate the effectiveness of InCheck, we performed more than 210,000 fault injection experiments on different hardware components of an ARM cortex53-like processor running MiBench applications. The original and SWIFTR (state-of-The-Art) protected programs suffered from 8000 and 1800 instances of wrong outputs respectively, but when protected by InCheck, there was no failure.

Original languageEnglish (US)
Title of host publicationProceedings of the 54th Annual Design Automation Conference 2017, DAC 2017
PublisherInstitute of Electrical and Electronics Engineers Inc.
VolumePart 128280
ISBN (Electronic)9781450349277
DOIs
StatePublished - Jun 18 2017
Event54th Annual Design Automation Conference, DAC 2017 - Austin, United States
Duration: Jun 18 2017Jun 22 2017

Other

Other54th Annual Design Automation Conference, DAC 2017
CountryUnited States
CityAustin
Period6/18/176/22/17

Fingerprint

Soft Error
Recovery
Coverage
Hardware
Error Recovery
Fault Injection
Checkpointing
Checkpoint
Software
Preservation
Tolerance
Evaluate
Output
Data storage equipment
Experiment
Experiments

ASJC Scopus subject areas

  • Computer Science Applications
  • Control and Systems Engineering
  • Electrical and Electronic Engineering
  • Modeling and Simulation

Cite this

Didehban, M., Lokam, S. R. D., & Shrivastava, A. (2017). InCheck: An In-Application Recovery Scheme for Soft Errors. In Proceedings of the 54th Annual Design Automation Conference 2017, DAC 2017 (Vol. Part 128280). [40] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1145/3061639.3062265

InCheck : An In-Application Recovery Scheme for Soft Errors. / Didehban, Moslem; Lokam, Sai Ram Dheeraj; Shrivastava, Aviral.

Proceedings of the 54th Annual Design Automation Conference 2017, DAC 2017. Vol. Part 128280 Institute of Electrical and Electronics Engineers Inc., 2017. 40.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Didehban, M, Lokam, SRD & Shrivastava, A 2017, InCheck: An In-Application Recovery Scheme for Soft Errors. in Proceedings of the 54th Annual Design Automation Conference 2017, DAC 2017. vol. Part 128280, 40, Institute of Electrical and Electronics Engineers Inc., 54th Annual Design Automation Conference, DAC 2017, Austin, United States, 6/18/17. https://doi.org/10.1145/3061639.3062265
Didehban M, Lokam SRD, Shrivastava A. InCheck: An In-Application Recovery Scheme for Soft Errors. In Proceedings of the 54th Annual Design Automation Conference 2017, DAC 2017. Vol. Part 128280. Institute of Electrical and Electronics Engineers Inc. 2017. 40 https://doi.org/10.1145/3061639.3062265
Didehban, Moslem ; Lokam, Sai Ram Dheeraj ; Shrivastava, Aviral. / InCheck : An In-Application Recovery Scheme for Soft Errors. Proceedings of the 54th Annual Design Automation Conference 2017, DAC 2017. Vol. Part 128280 Institute of Electrical and Electronics Engineers Inc., 2017.
@inproceedings{4f212d04a9e04396a4d7a9eb68a5b164,
title = "InCheck: An In-Application Recovery Scheme for Soft Errors",
abstract = "An ideal solution for soft error tolerance should hide the effect of soft errors from user and provide correct results at expected time. Software solutions are attractive because they can provide flexible reliability without imposing any hardware modifications. Our investigation of state-of-The-Art error recovery techniques reveals that they suffer from poor coverage (ability to detect and correctly recover from soft errors). This paper presents InCheck (In-Application Checkpointing and Recovery) as an effective, safe and timely software technique for complete error coverage. The key features of InCheck are: verified register preservation, single memory location checkpoints, and safe & timely recovery. To evaluate the effectiveness of InCheck, we performed more than 210,000 fault injection experiments on different hardware components of an ARM cortex53-like processor running MiBench applications. The original and SWIFTR (state-of-The-Art) protected programs suffered from 8000 and 1800 instances of wrong outputs respectively, but when protected by InCheck, there was no failure.",
author = "Moslem Didehban and Lokam, {Sai Ram Dheeraj} and Aviral Shrivastava",
year = "2017",
month = "6",
day = "18",
doi = "10.1145/3061639.3062265",
language = "English (US)",
volume = "Part 128280",
booktitle = "Proceedings of the 54th Annual Design Automation Conference 2017, DAC 2017",
publisher = "Institute of Electrical and Electronics Engineers Inc.",
address = "United States",

}

TY - GEN

T1 - InCheck

T2 - An In-Application Recovery Scheme for Soft Errors

AU - Didehban, Moslem

AU - Lokam, Sai Ram Dheeraj

AU - Shrivastava, Aviral

PY - 2017/6/18

Y1 - 2017/6/18

N2 - An ideal solution for soft error tolerance should hide the effect of soft errors from user and provide correct results at expected time. Software solutions are attractive because they can provide flexible reliability without imposing any hardware modifications. Our investigation of state-of-The-Art error recovery techniques reveals that they suffer from poor coverage (ability to detect and correctly recover from soft errors). This paper presents InCheck (In-Application Checkpointing and Recovery) as an effective, safe and timely software technique for complete error coverage. The key features of InCheck are: verified register preservation, single memory location checkpoints, and safe & timely recovery. To evaluate the effectiveness of InCheck, we performed more than 210,000 fault injection experiments on different hardware components of an ARM cortex53-like processor running MiBench applications. The original and SWIFTR (state-of-The-Art) protected programs suffered from 8000 and 1800 instances of wrong outputs respectively, but when protected by InCheck, there was no failure.

AB - An ideal solution for soft error tolerance should hide the effect of soft errors from user and provide correct results at expected time. Software solutions are attractive because they can provide flexible reliability without imposing any hardware modifications. Our investigation of state-of-The-Art error recovery techniques reveals that they suffer from poor coverage (ability to detect and correctly recover from soft errors). This paper presents InCheck (In-Application Checkpointing and Recovery) as an effective, safe and timely software technique for complete error coverage. The key features of InCheck are: verified register preservation, single memory location checkpoints, and safe & timely recovery. To evaluate the effectiveness of InCheck, we performed more than 210,000 fault injection experiments on different hardware components of an ARM cortex53-like processor running MiBench applications. The original and SWIFTR (state-of-The-Art) protected programs suffered from 8000 and 1800 instances of wrong outputs respectively, but when protected by InCheck, there was no failure.

UR - http://www.scopus.com/inward/record.url?scp=85023645294&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85023645294&partnerID=8YFLogxK

U2 - 10.1145/3061639.3062265

DO - 10.1145/3061639.3062265

M3 - Conference contribution

VL - Part 128280

BT - Proceedings of the 54th Annual Design Automation Conference 2017, DAC 2017

PB - Institute of Electrical and Electronics Engineers Inc.

ER -