TY - GEN
T1 - Comprehensive Failure Analysis against Soft Errors from Hardware and Software Perspectives
AU - Ko, Yohan
AU - So, Hwisoo
AU - Jung, Jinhyo
AU - Lee, Kyoungwoo
AU - Shrivastava, Aviral
N1 - Funding Information:
This work was partially supported by funding from National Science Foundation Grants No. CNS 1525855, CPS 1646235, CCF 1723476 - the NSF/Intel joint research center for Computer Assisted Programming for Heterogeneous Architectures (CAPA), 2014-3-00035 (High Performance and Scalable Manycore Operating System, IITP, MSIT), and Samsung Electronics Co., Ltd(FOUNDRY-202108DD007F).
Publisher Copyright:
© 2021 IEEE.
PY - 2021
Y1 - 2021
N2 - With technology scaling, reliability against soft errors is becoming an important design concern for modern embedded systems. To avoid the high cost and performance overheads of full protection techniques, several researches have therefore turned their focus to selective protection techniques. This increases the need to accurately identify the most vulnerable components or instructions in a system. In this paper, we analyze the vulnerability of a system from both the hardware and software perspectives through intensive fault injection trials. From the hardware perspective, we find the most vulnerable hardware components by calculating component-wise failure rates. From the software perspective, we identify the most vulnerable instructions by using the novel root cause instruction analysis. With our results, we show that it is possible to reduce the failure rate of a system to only 12.40% with minimal protection.
AB - With technology scaling, reliability against soft errors is becoming an important design concern for modern embedded systems. To avoid the high cost and performance overheads of full protection techniques, several researches have therefore turned their focus to selective protection techniques. This increases the need to accurately identify the most vulnerable components or instructions in a system. In this paper, we analyze the vulnerability of a system from both the hardware and software perspectives through intensive fault injection trials. From the hardware perspective, we find the most vulnerable hardware components by calculating component-wise failure rates. From the software perspective, we identify the most vulnerable instructions by using the novel root cause instruction analysis. With our results, we show that it is possible to reduce the failure rate of a system to only 12.40% with minimal protection.
KW - Failure Analysis
KW - Fault Injection
KW - Reliability
KW - Soft Error
KW - Transient Fault
UR - http://www.scopus.com/inward/record.url?scp=85123955245&partnerID=8YFLogxK
UR - http://www.scopus.com/inward/citedby.url?scp=85123955245&partnerID=8YFLogxK
U2 - 10.1109/ICCD53106.2021.00041
DO - 10.1109/ICCD53106.2021.00041
M3 - Conference contribution
AN - SCOPUS:85123955245
T3 - Proceedings - IEEE International Conference on Computer Design: VLSI in Computers and Processors
SP - 204
EP - 207
BT - Proceedings - 2021 IEEE 39th International Conference on Computer Design, ICCD 2021
PB - Institute of Electrical and Electronics Engineers Inc.
T2 - 39th IEEE International Conference on Computer Design, ICCD 2021
Y2 - 24 October 2021 through 27 October 2021
ER -