TY - GEN
T1 - NEMESIS
T2 - 36th IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017
AU - Didehban, Moslem
AU - Shrivastava, Aviral
AU - Lokam, Sai Ram Dheeraj
N1 - Funding Information:
ACKNOWLEDGEMENT This work was supported by funding from National Science Foundation grants CCF 1055094 (CAREER).
PY - 2017/12/13
Y1 - 2017/12/13
N2 - Soft errors are considered as the main reliability challenge for sub-nanoscale microprocessors. Software-level soft error resilience schemes are desirable because they require no hardware modifications and their protection can be tuned based on the application requirements. However, existing software-level error tolerant schemes do not provide high-level of protection. In this work, we present NEMESIS - a compiler-level fine-grain soft error detection, diagnosis and recovery technique that can provide high degree of error-resiliency. NEMESIS runs three versions of computations and detects soft errors by checking the results of all memory write and branch operations. In the case of mismatch, NEMESIS recovery routine reverts the effect of error from the architectural state of the program and program resumes its normal execution. Our extensive μ-architectural-level fault injection experiments results show that NEMESIS transformation is able to detect all soft errors and recover from 97% of detected errors.
AB - Soft errors are considered as the main reliability challenge for sub-nanoscale microprocessors. Software-level soft error resilience schemes are desirable because they require no hardware modifications and their protection can be tuned based on the application requirements. However, existing software-level error tolerant schemes do not provide high-level of protection. In this work, we present NEMESIS - a compiler-level fine-grain soft error detection, diagnosis and recovery technique that can provide high degree of error-resiliency. NEMESIS runs three versions of computations and detects soft errors by checking the results of all memory write and branch operations. In the case of mismatch, NEMESIS recovery routine reverts the effect of error from the architectural state of the program and program resumes its normal execution. Our extensive μ-architectural-level fault injection experiments results show that NEMESIS transformation is able to detect all soft errors and recover from 97% of detected errors.
KW - Compiler Optimization
KW - Reliability
KW - Silent Data Corruption
KW - Soft Errors
UR - http://www.scopus.com/inward/record.url?scp=85030676683&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85030676683&partnerID=8YFLogxK
U2 - 10.1109/ICCAD.2017.8203792
DO - 10.1109/ICCAD.2017.8203792
M3 - Conference contribution
AN - SCOPUS:85030676683
T3 - IEEE/ACM International Conference on Computer-Aided Design, Digest of Technical Papers, ICCAD
SP - 297
EP - 304
BT - 2017 IEEE/ACM International Conference on Computer-Aided Design, ICCAD 2017
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 13 November 2017 through 16 November 2017
ER -