Error Detection Process—Model, Design, and Its Impact on Computer Performance

Kang G. Shin, Yann Hang Lee

Research output: Contribution to journalArticle

36 Scopus citations

Abstract

Conventionally, reliability analyses either assume that a fault/error is detected immediately as it occurs, or ignore damage caused by imperfect detection mechanisms and error latency, namely, the time interval between the occurrence of an error and the detection of that error. In this paper we consider a remedy for this problem. We first propose a model to describe the entire error detection process and then apply the model to the analysis of the impact of error detection on computer performance under moderate assumptions. Error latency is used to measure the effectiveness of detection mechanisms. Due to the presence of error latency, (i) it is possible to have undetected errors at the end of process execution making the computation result unreliable, and (ii) even if all errors were detected before the completion of process, it is required to apply complicated error recovery resulting in considerable computation loss. We have used the model to (1) predict the probability of producing an unreliable result, and (2) estimate the loss of computation due to fault and/or error. The former can be used as a measure of lack of confidence in the computation results whereas the latter is important to the timing analysis, particularly for real-time computations. Various error recovery techniques and their associated overheads are considered for the estimation of the computation loss which can be used for analyzing suitability for time-critical applications. Finally, a design problem associated with the error detection process is discussed and a feasible design space is outlined.

Original languageEnglish (US)
Pages (from-to)529-540
Number of pages12
JournalIEEE Transactions on Computers
VolumeC-33
Issue number6
DOIs
StatePublished - Jun 1984
Externally publishedYes

    Fingerprint

Keywords

  • Computation loss
  • diagnostics
  • error detection
  • latent errors/faults
  • recovery methods
  • unreliable results

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics

Cite this