TY - GEN
T1 - Expert
T2 - 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018
AU - So, Hwisoo
AU - Didehban, Moslem
AU - Ko, Yohan
AU - Shrivastava, Aviral
AU - Lee, Kyoungwoo
N1 - Funding Information:
VII. ACKNOWLEDGEMENTS This work was partially supported by funding from NSF CCF 1055094 (CAREER); by global PH.D fellowship program through the NRF funded by the Ministry of Education (NRF-2016H1A2A1909470); by next-generation information computing development program through the NRF funded by the Ministry of Science, ICT, and Future Planning (NRF-2015M3C4A7065522).
Publisher Copyright:
© 2018 EDAA.
PY - 2018/4/19
Y1 - 2018/4/19
N2 - Resiliency is a first-order design concern in modern microprocessor design. Compiler-level Redundant MultiThreading (RMT) schemes are promising because of their capability to detect the manifestation of hardware transient and permanent faults. In this work, we propose EXPERT, a compiler-level RMT scheme which can detect the manifestation of hardware faults in all hardware components. EXPERT transformation generates a checker thread for program main execution thread. These redundant threads execute simultaneously on two physically different cores of a multi-core processor. They perform mostly same computations, however, after each memory write operation committed by the main thread, the checker thread loads back the written data from the memory and checks it against its own locally computed values. If they match, execution continues. Otherwise, the error flag will be raised. Our processor-wide statistical transient and permanent fault injection experiments show that EXPERT error coverage is ∼65x better than the state-of-The-Art scheme.
AB - Resiliency is a first-order design concern in modern microprocessor design. Compiler-level Redundant MultiThreading (RMT) schemes are promising because of their capability to detect the manifestation of hardware transient and permanent faults. In this work, we propose EXPERT, a compiler-level RMT scheme which can detect the manifestation of hardware faults in all hardware components. EXPERT transformation generates a checker thread for program main execution thread. These redundant threads execute simultaneously on two physically different cores of a multi-core processor. They perform mostly same computations, however, after each memory write operation committed by the main thread, the checker thread loads back the written data from the memory and checks it against its own locally computed values. If they match, execution continues. Otherwise, the error flag will be raised. Our processor-wide statistical transient and permanent fault injection experiments show that EXPERT error coverage is ∼65x better than the state-of-The-Art scheme.
UR - http://www.scopus.com/inward/record.url?scp=85048956772&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85048956772&partnerID=8YFLogxK
U2 - 10.23919/DATE.2018.8342065
DO - 10.23919/DATE.2018.8342065
M3 - Conference contribution
AN - SCOPUS:85048956772
T3 - Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018
SP - 533
EP - 538
BT - Proceedings of the 2018 Design, Automation and Test in Europe Conference and Exhibition, DATE 2018
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 19 March 2018 through 23 March 2018
ER -