A mechanism for online diagnosis of hard faults in microprocessors

Fred A. Bower, Daniel J. Sorin, Sule Ozev

Research output: Chapter in Book/Report/Conference proceedingConference contribution

60 Citations (Scopus)

Abstract

We develop a microprocessor design that tolerates hard faults, including fabrication defects and in-field faults, by leveraging existing microprocessor redundancy. To do this, we must: detect and correct errors, diagnose hard faults at the field deconfigurable unit (FDU) granularity, and deconfigure FDUs with hard faults, In our reliable microprocessor design, we use DIVA dynamic verification to detect and correct errors. Our new scheme for diagnosing hard faults tracks instructions' core structure occupancy from decode until commit. If a DIVA checker detects an error in an instruction, it increments a small saturating error counter for every FDU used by that instruction, including that DIVA checker. A hard fault in an FDU quickly leads to an above-threshold error counter for that FDU and thus diagnoses the fault. For deconfiguration, we use previously developed schemes for functional units and buffers, and we present a scheme for deconfiguring DIVA checkers. Experimental results show that our reliable microprocessor quickly and accurately diagnoses each hard fault that is injected and continues to function, albeit with somewhat degraded performance.

Original languageEnglish (US)
Title of host publicationProceedings of the Annual International Symposium on Microarchitecture, MICRO
Pages197-208
Number of pages12
DOIs
StatePublished - 2005
Externally publishedYes
EventMICRO-38: 38th Annual IEEE/ACM International Symposium on Microarchitecture - Barcelona, Spain
Duration: Nov 12 2005Nov 16 2005

Other

OtherMICRO-38: 38th Annual IEEE/ACM International Symposium on Microarchitecture
CountrySpain
CityBarcelona
Period11/12/0511/16/05

Fingerprint

Microprocessor chips
Redundancy
Fabrication
Defects

ASJC Scopus subject areas

  • Engineering(all)

Cite this

Bower, F. A., Sorin, D. J., & Ozev, S. (2005). A mechanism for online diagnosis of hard faults in microprocessors. In Proceedings of the Annual International Symposium on Microarchitecture, MICRO (pp. 197-208). [1540960] https://doi.org/10.1109/MICRO.2005.8

A mechanism for online diagnosis of hard faults in microprocessors. / Bower, Fred A.; Sorin, Daniel J.; Ozev, Sule.

Proceedings of the Annual International Symposium on Microarchitecture, MICRO. 2005. p. 197-208 1540960.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Bower, FA, Sorin, DJ & Ozev, S 2005, A mechanism for online diagnosis of hard faults in microprocessors. in Proceedings of the Annual International Symposium on Microarchitecture, MICRO., 1540960, pp. 197-208, MICRO-38: 38th Annual IEEE/ACM International Symposium on Microarchitecture, Barcelona, Spain, 11/12/05. https://doi.org/10.1109/MICRO.2005.8
Bower FA, Sorin DJ, Ozev S. A mechanism for online diagnosis of hard faults in microprocessors. In Proceedings of the Annual International Symposium on Microarchitecture, MICRO. 2005. p. 197-208. 1540960 https://doi.org/10.1109/MICRO.2005.8
Bower, Fred A. ; Sorin, Daniel J. ; Ozev, Sule. / A mechanism for online diagnosis of hard faults in microprocessors. Proceedings of the Annual International Symposium on Microarchitecture, MICRO. 2005. pp. 197-208
@inproceedings{70638fba1be3452185f321f1e1cee2b1,
title = "A mechanism for online diagnosis of hard faults in microprocessors",
abstract = "We develop a microprocessor design that tolerates hard faults, including fabrication defects and in-field faults, by leveraging existing microprocessor redundancy. To do this, we must: detect and correct errors, diagnose hard faults at the field deconfigurable unit (FDU) granularity, and deconfigure FDUs with hard faults, In our reliable microprocessor design, we use DIVA dynamic verification to detect and correct errors. Our new scheme for diagnosing hard faults tracks instructions' core structure occupancy from decode until commit. If a DIVA checker detects an error in an instruction, it increments a small saturating error counter for every FDU used by that instruction, including that DIVA checker. A hard fault in an FDU quickly leads to an above-threshold error counter for that FDU and thus diagnoses the fault. For deconfiguration, we use previously developed schemes for functional units and buffers, and we present a scheme for deconfiguring DIVA checkers. Experimental results show that our reliable microprocessor quickly and accurately diagnoses each hard fault that is injected and continues to function, albeit with somewhat degraded performance.",
author = "Bower, {Fred A.} and Sorin, {Daniel J.} and Sule Ozev",
year = "2005",
doi = "10.1109/MICRO.2005.8",
language = "English (US)",
isbn = "0769524400",
pages = "197--208",
booktitle = "Proceedings of the Annual International Symposium on Microarchitecture, MICRO",

}

TY - GEN

T1 - A mechanism for online diagnosis of hard faults in microprocessors

AU - Bower, Fred A.

AU - Sorin, Daniel J.

AU - Ozev, Sule

PY - 2005

Y1 - 2005

N2 - We develop a microprocessor design that tolerates hard faults, including fabrication defects and in-field faults, by leveraging existing microprocessor redundancy. To do this, we must: detect and correct errors, diagnose hard faults at the field deconfigurable unit (FDU) granularity, and deconfigure FDUs with hard faults, In our reliable microprocessor design, we use DIVA dynamic verification to detect and correct errors. Our new scheme for diagnosing hard faults tracks instructions' core structure occupancy from decode until commit. If a DIVA checker detects an error in an instruction, it increments a small saturating error counter for every FDU used by that instruction, including that DIVA checker. A hard fault in an FDU quickly leads to an above-threshold error counter for that FDU and thus diagnoses the fault. For deconfiguration, we use previously developed schemes for functional units and buffers, and we present a scheme for deconfiguring DIVA checkers. Experimental results show that our reliable microprocessor quickly and accurately diagnoses each hard fault that is injected and continues to function, albeit with somewhat degraded performance.

AB - We develop a microprocessor design that tolerates hard faults, including fabrication defects and in-field faults, by leveraging existing microprocessor redundancy. To do this, we must: detect and correct errors, diagnose hard faults at the field deconfigurable unit (FDU) granularity, and deconfigure FDUs with hard faults, In our reliable microprocessor design, we use DIVA dynamic verification to detect and correct errors. Our new scheme for diagnosing hard faults tracks instructions' core structure occupancy from decode until commit. If a DIVA checker detects an error in an instruction, it increments a small saturating error counter for every FDU used by that instruction, including that DIVA checker. A hard fault in an FDU quickly leads to an above-threshold error counter for that FDU and thus diagnoses the fault. For deconfiguration, we use previously developed schemes for functional units and buffers, and we present a scheme for deconfiguring DIVA checkers. Experimental results show that our reliable microprocessor quickly and accurately diagnoses each hard fault that is injected and continues to function, albeit with somewhat degraded performance.

UR - http://www.scopus.com/inward/record.url?scp=33749413197&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33749413197&partnerID=8YFLogxK

U2 - 10.1109/MICRO.2005.8

DO - 10.1109/MICRO.2005.8

M3 - Conference contribution

AN - SCOPUS:33749413197

SN - 0769524400

SN - 9780769524405

SP - 197

EP - 208

BT - Proceedings of the Annual International Symposium on Microarchitecture, MICRO

ER -