A Compiler Technique for Processor-Wide Protection From Soft Errors in Multithreaded Environments

Moslem Didehban, Aviral Shrivastava

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

Aggressive transistor scaling down and near-threshold computing have rendered modern microprocessor susceptible to soft errors. Software approaches that protect computations against soft errors are desirable because they offer flexible protection and are suitable for mixed-critical systems. In particular, fine-grain instruction duplication based techniques are deemed to be most effective; however, many of the existing instruction duplication techniques either suffer from many vulnerable intervals or are not suitable for multithreaded environments. In this paper, we present multithreded near zero silent data corruption (MZDC), a software scheme which provides high-level processor-wide error coverage in multithreaded environments. MZDC duplicates all programs&#x0027; instructions and uses diagnosis block after replicated memory operations to overcome the inconsistency issue in a multithread environment. Statistical fault injection experiments on a dual-core ARM cortex-A53 <formula><tex>$\mu$</tex></formula> architecturally simulated microprocessor show that on average, MZDC can achieve more than 37<formula><tex>$\times$</tex></formula> better fault coverage than the state-of-the-art.

Original languageEnglish (US)
JournalIEEE Transactions on Reliability
DOIs
StateAccepted/In press - Feb 9 2018

Fingerprint

Microprocessor chips
Transistors
Data storage equipment
Experiments

Keywords

  • Compiler transformation
  • Hardware
  • Instruction sets
  • Microprocessors
  • multithreading
  • Registers
  • reliability
  • soft errors
  • Transient analysis
  • transient faults

ASJC Scopus subject areas

  • Safety, Risk, Reliability and Quality
  • Electrical and Electronic Engineering

Cite this

@article{6442a67289834867b87e151bfa77cf16,
title = "A Compiler Technique for Processor-Wide Protection From Soft Errors in Multithreaded Environments",
abstract = "Aggressive transistor scaling down and near-threshold computing have rendered modern microprocessor susceptible to soft errors. Software approaches that protect computations against soft errors are desirable because they offer flexible protection and are suitable for mixed-critical systems. In particular, fine-grain instruction duplication based techniques are deemed to be most effective; however, many of the existing instruction duplication techniques either suffer from many vulnerable intervals or are not suitable for multithreaded environments. In this paper, we present multithreded near zero silent data corruption (MZDC), a software scheme which provides high-level processor-wide error coverage in multithreaded environments. MZDC duplicates all programs' instructions and uses diagnosis block after replicated memory operations to overcome the inconsistency issue in a multithread environment. Statistical fault injection experiments on a dual-core ARM cortex-A53 $\mu$ architecturally simulated microprocessor show that on average, MZDC can achieve more than 37$\times$ better fault coverage than the state-of-the-art.",
keywords = "Compiler transformation, Hardware, Instruction sets, Microprocessors, multithreading, Registers, reliability, soft errors, Transient analysis, transient faults",
author = "Moslem Didehban and Aviral Shrivastava",
year = "2018",
month = "2",
day = "9",
doi = "10.1109/TR.2018.2793098",
language = "English (US)",
journal = "IEEE Transactions on Reliability",
issn = "0018-9529",
publisher = "Institute of Electrical and Electronics Engineers Inc.",

}

TY - JOUR

T1 - A Compiler Technique for Processor-Wide Protection From Soft Errors in Multithreaded Environments

AU - Didehban, Moslem

AU - Shrivastava, Aviral

PY - 2018/2/9

Y1 - 2018/2/9

N2 - Aggressive transistor scaling down and near-threshold computing have rendered modern microprocessor susceptible to soft errors. Software approaches that protect computations against soft errors are desirable because they offer flexible protection and are suitable for mixed-critical systems. In particular, fine-grain instruction duplication based techniques are deemed to be most effective; however, many of the existing instruction duplication techniques either suffer from many vulnerable intervals or are not suitable for multithreaded environments. In this paper, we present multithreded near zero silent data corruption (MZDC), a software scheme which provides high-level processor-wide error coverage in multithreaded environments. MZDC duplicates all programs' instructions and uses diagnosis block after replicated memory operations to overcome the inconsistency issue in a multithread environment. Statistical fault injection experiments on a dual-core ARM cortex-A53 $\mu$ architecturally simulated microprocessor show that on average, MZDC can achieve more than 37$\times$ better fault coverage than the state-of-the-art.

AB - Aggressive transistor scaling down and near-threshold computing have rendered modern microprocessor susceptible to soft errors. Software approaches that protect computations against soft errors are desirable because they offer flexible protection and are suitable for mixed-critical systems. In particular, fine-grain instruction duplication based techniques are deemed to be most effective; however, many of the existing instruction duplication techniques either suffer from many vulnerable intervals or are not suitable for multithreaded environments. In this paper, we present multithreded near zero silent data corruption (MZDC), a software scheme which provides high-level processor-wide error coverage in multithreaded environments. MZDC duplicates all programs' instructions and uses diagnosis block after replicated memory operations to overcome the inconsistency issue in a multithread environment. Statistical fault injection experiments on a dual-core ARM cortex-A53 $\mu$ architecturally simulated microprocessor show that on average, MZDC can achieve more than 37$\times$ better fault coverage than the state-of-the-art.

KW - Compiler transformation

KW - Hardware

KW - Instruction sets

KW - Microprocessors

KW - multithreading

KW - Registers

KW - reliability

KW - soft errors

KW - Transient analysis

KW - transient faults

UR - http://www.scopus.com/inward/record.url?scp=85041823284&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85041823284&partnerID=8YFLogxK

U2 - 10.1109/TR.2018.2793098

DO - 10.1109/TR.2018.2793098

M3 - Article

JO - IEEE Transactions on Reliability

JF - IEEE Transactions on Reliability

SN - 0018-9529

ER -