Root cause analysis of soft-error-induced failures from hardware and software perspectives

Jinhyo Jung, Yohan Ko, Hwisoo So, Kyoungwoo Lee, Aviral Shrivastava

Research output: Contribution to journalArticlepeer-review

Abstract

Because the dangers of soft errors are increasing with continued technology scaling, reliability against soft errors is becoming an important design concern for modern embedded systems. Various schemes have been proposed to protect embedded systems from the threat of soft errors, but they incur considerable overheads in terms of cost and performance. Selective protection techniques seem promising because they can achieve high levels of protection with low overhead. Though these techniques can be applied to any system, the most vulnerable parts must first be identified. We, therefore, present CFA, a comprehensive failure analysis framework that can analyze the vulnerability of microarchitectural components and software instructions through intensive fault injection campaigns. With CFA, we also explore the vulnerability of ten benchmarks from the MiBench benchmark suite. We found that protecting a part of the system heavily affects the reliability of the other parts. Therefore, all combinations of protection methods must be examined to present the most efficient and effective protection guidelines. Throughout the experiments, we observed that protection methods offered by single-perspective analyses are sub-optimal. On the other hand, CFA finds the optimal solution in every case, reducing the AVF of a system by up to 82% with minimal protection.

Original languageEnglish (US)
Article number102652
JournalJournal of Systems Architecture
Volume130
DOIs
StatePublished - Sep 2022

Keywords

  • Failure analysis
  • Fault injection
  • Reliability
  • Soft error
  • Transient fault

ASJC Scopus subject areas

  • Software
  • Hardware and Architecture

Fingerprint

Dive into the research topics of 'Root cause analysis of soft-error-induced failures from hardware and software perspectives'. Together they form a unique fingerprint.

Cite this