Abstract
Unreliable hardware components will affect computing system at several levels -all the way from incorrect transistor outputs, to incorrect values in memory elements, incorrect program variables and control flow, finally causing application failure. Resilience is the ability of the system to tolerate errors when they occur and comprises two main aspects-(i) how to detect the errors and (ii) how to recover from the errors. The lower the level of abstraction at which we can detect and correct the error, the less disruption it causes to all the upper layers of computing abstraction. This chapter gives the overview of all the techniques at processor architecture level to detect and correct the errors.
Original language | English (US) |
---|---|
Title of host publication | Cross-Layer Reliability of Computing Systems |
Publisher | Institution of Engineering and Technology |
Pages | 43-94 |
Number of pages | 52 |
ISBN (Electronic) | 9781785617973 |
DOIs | |
State | Published - Jan 1 2020 |
Keywords
- Computing abstraction
- Computing system
- Error correction codes
- Error detection codes
- Incorrect transistor outputs
- Memory elements
- Processor architecture
- Resilience
- Unreliable hardware components
ASJC Scopus subject areas
- Engineering(all)