Design techniques to improve the resilience of computing systems: Architectural layer

Aviral Shrivastava, Kyoungwoo Lee, Hwisoo So, Jinhyo Jung, Prudhvi Gali

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Unreliable hardware components will affect computing system at several levels -all the way from incorrect transistor outputs, to incorrect values in memory elements, incorrect program variables and control flow, finally causing application failure. Resilience is the ability of the system to tolerate errors when they occur and comprises two main aspects-(i) how to detect the errors and (ii) how to recover from the errors. The lower the level of abstraction at which we can detect and correct the error, the less disruption it causes to all the upper layers of computing abstraction. This chapter gives the overview of all the techniques at processor architecture level to detect and correct the errors.

Original languageEnglish (US)
Title of host publicationCross-Layer Reliability of Computing Systems
PublisherInstitution of Engineering and Technology
Pages43-94
Number of pages52
ISBN (Electronic)9781785617973
DOIs
StatePublished - Jan 1 2020

Keywords

  • Computing abstraction
  • Computing system
  • Error correction codes
  • Error detection codes
  • Incorrect transistor outputs
  • Memory elements
  • Processor architecture
  • Resilience
  • Unreliable hardware components

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint

Dive into the research topics of 'Design techniques to improve the resilience of computing systems: Architectural layer'. Together they form a unique fingerprint.

Cite this