Expert: Effective and flexible error protection by redundant multithreading

Hwisoo So, Moslem Didehban, Yohan Ko, Aviral Shrivastava, Kyoungwoo Lee

Research output: Chapter in Book/Report/Conference proceedingConference contribution

3 Citations (Scopus)

Abstract

Resiliency is a first-order design concern in modern microprocessor design. Compiler-level Redundant MultiThreading (RMT) schemes are promising because of their capability to detect the manifestation of hardware transient and permanent faults. In this work, we propose EXPERT, a compiler-level RMT scheme which can detect the manifestation of hardware faults in all hardware components. EXPERT transformation generates a checker thread for program main execution thread. These redundant threads execute simultaneously on two physically different cores of a multi-core processor. They perform mostly same computations, however, after each memory write operation committed by the main thread, the checker thread loads back the written data from the memory and checks it against its own locally computed values. If they match, execution continues. Otherwise, the error flag will be raised. Our processor-wide statistical transient and permanent fault injection experiments show that EXPERT error coverage is ∼65x better than the state-of-The-Art scheme.

Original languageEnglish (US)
Title of host publicationProceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages533-538
Number of pages6
Volume2018-January
ISBN (Electronic)9783981926316
DOIs
StatePublished - Apr 19 2018
Event2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018 - Dresden, Germany
Duration: Mar 19 2018Mar 23 2018

Other

Other2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018
CountryGermany
CityDresden
Period3/19/183/23/18

Fingerprint

Hardware
Data storage equipment
Microprocessor chips
Thread
Experiments
Fault

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Hardware and Architecture
  • Software
  • Information Systems and Management

Cite this

So, H., Didehban, M., Ko, Y., Shrivastava, A., & Lee, K. (2018). Expert: Effective and flexible error protection by redundant multithreading. In Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018 (Vol. 2018-January, pp. 533-538). Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.23919/DATE.2018.8342065

Expert : Effective and flexible error protection by redundant multithreading. / So, Hwisoo; Didehban, Moslem; Ko, Yohan; Shrivastava, Aviral; Lee, Kyoungwoo.

Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018. Vol. 2018-January Institute of Electrical and Electronics Engineers Inc., 2018. p. 533-538.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

So, H, Didehban, M, Ko, Y, Shrivastava, A & Lee, K 2018, Expert: Effective and flexible error protection by redundant multithreading. in Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018. vol. 2018-January, Institute of Electrical and Electronics Engineers Inc., pp. 533-538, 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018, Dresden, Germany, 3/19/18. https://doi.org/10.23919/DATE.2018.8342065
So H, Didehban M, Ko Y, Shrivastava A, Lee K. Expert: Effective and flexible error protection by redundant multithreading. In Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018. Vol. 2018-January. Institute of Electrical and Electronics Engineers Inc. 2018. p. 533-538 https://doi.org/10.23919/DATE.2018.8342065
So, Hwisoo ; Didehban, Moslem ; Ko, Yohan ; Shrivastava, Aviral ; Lee, Kyoungwoo. / Expert : Effective and flexible error protection by redundant multithreading. Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018. Vol. 2018-January Institute of Electrical and Electronics Engineers Inc., 2018. pp. 533-538
@inproceedings{9aa6690cc22a4486b9302905eb464310,
title = "Expert: Effective and flexible error protection by redundant multithreading",
abstract = "Resiliency is a first-order design concern in modern microprocessor design. Compiler-level Redundant MultiThreading (RMT) schemes are promising because of their capability to detect the manifestation of hardware transient and permanent faults. In this work, we propose EXPERT, a compiler-level RMT scheme which can detect the manifestation of hardware faults in all hardware components. EXPERT transformation generates a checker thread for program main execution thread. These redundant threads execute simultaneously on two physically different cores of a multi-core processor. They perform mostly same computations, however, after each memory write operation committed by the main thread, the checker thread loads back the written data from the memory and checks it against its own locally computed values. If they match, execution continues. Otherwise, the error flag will be raised. Our processor-wide statistical transient and permanent fault injection experiments show that EXPERT error coverage is ∼65x better than the state-of-The-Art scheme.",
author = "Hwisoo So and Moslem Didehban and Yohan Ko and Aviral Shrivastava and Kyoungwoo Lee",
year = "2018",
month = "4",
day = "19",
doi = "10.23919/DATE.2018.8342065",
language = "English (US)",
volume = "2018-January",
pages = "533--538",
booktitle = "Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - GEN

T1 - Expert

T2 - Effective and flexible error protection by redundant multithreading

AU - So, Hwisoo

AU - Didehban, Moslem

AU - Ko, Yohan

AU - Shrivastava, Aviral

AU - Lee, Kyoungwoo

PY - 2018/4/19

Y1 - 2018/4/19

N2 - Resiliency is a first-order design concern in modern microprocessor design. Compiler-level Redundant MultiThreading (RMT) schemes are promising because of their capability to detect the manifestation of hardware transient and permanent faults. In this work, we propose EXPERT, a compiler-level RMT scheme which can detect the manifestation of hardware faults in all hardware components. EXPERT transformation generates a checker thread for program main execution thread. These redundant threads execute simultaneously on two physically different cores of a multi-core processor. They perform mostly same computations, however, after each memory write operation committed by the main thread, the checker thread loads back the written data from the memory and checks it against its own locally computed values. If they match, execution continues. Otherwise, the error flag will be raised. Our processor-wide statistical transient and permanent fault injection experiments show that EXPERT error coverage is ∼65x better than the state-of-The-Art scheme.

AB - Resiliency is a first-order design concern in modern microprocessor design. Compiler-level Redundant MultiThreading (RMT) schemes are promising because of their capability to detect the manifestation of hardware transient and permanent faults. In this work, we propose EXPERT, a compiler-level RMT scheme which can detect the manifestation of hardware faults in all hardware components. EXPERT transformation generates a checker thread for program main execution thread. These redundant threads execute simultaneously on two physically different cores of a multi-core processor. They perform mostly same computations, however, after each memory write operation committed by the main thread, the checker thread loads back the written data from the memory and checks it against its own locally computed values. If they match, execution continues. Otherwise, the error flag will be raised. Our processor-wide statistical transient and permanent fault injection experiments show that EXPERT error coverage is ∼65x better than the state-of-The-Art scheme.

UR - http://www.scopus.com/inward/record.url?scp=85048956772&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85048956772&partnerID=8YFLogxK

U2 - 10.23919/DATE.2018.8342065

DO - 10.23919/DATE.2018.8342065

M3 - Conference contribution

AN - SCOPUS:85048956772

VL - 2018-January

SP - 533

EP - 538

BT - Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018

PB - Institute of Electrical and Electronics Engineers Inc.

ER -