Abstract
Error resilience is the primary design concern for safety- and mission-critical applications. Redundant MultiThreading (RMT) is one of the most promising soft and hard error resilience strategies because it does not require additional hardware modification. While the state-of-the-art software RMT scheme can achieve a high degree of error protection, our detailed investigation revealed that it suffers from performance overhead and insufficient fault coverage. This paper proposes EXPERTISE, a compiler-level RMT scheme that can detect the manifestation of hardware faults in all processor components. EXPERTISE transformation generates a checker-thread for the main execution thread. These redundant threads are executed simultaneously on two physically different cores of a multicore processor and perform almost the same computations. After each memory write operation is committed by the main-thread, the checker-thread loads back the written data from the memory and checks it against its own locally computed values. If they match, the execution continues. Otherwise, the error flag is raised. In order to evaluate the effectiveness of the proposed solution, we performed soft and hard error injection experiments on all the different hardware components of an ARM Cortex53-like μ-architecturally simulated microprocessor. Based on statistical fault injection campaigns, we have found that EXPERTISE provides 188× better fault coverage with 27% faster performance as compared to the state-of-the-art scheme.
Original language | English (US) |
---|---|
Article number | 3546073 |
Journal | ACM Transactions on Architecture and Code Optimization |
Volume | 19 |
Issue number | 4 |
DOIs | |
State | Published - Sep 16 2022 |
Externally published | Yes |
Keywords
- redundant multithreading
- Soft error
- transient fault
ASJC Scopus subject areas
- Software
- Information Systems
- Hardware and Architecture