Autonomic microprocessor execution via self-repairing arrays

Fred A. Bower; Sule Ozev; Daniel J. Sorin

doi:10.1109/TDSC.2005.44

Autonomic microprocessor execution via self-repairing arrays

Fred A. Bower, Sule Ozev, Daniel J. Sorin

Research output: Contribution to journal › Article › peer-review

3 Scopus citations

Abstract

To achieve high reliability despite hard faults that occur during operation and to achieve high yield despite defects introduced at fabrication, a microprocessor must be able to tolerate hard faults. In this paper, we present a framework for autonomic self-repair of the array structures in microprocessors (e.g., reorder buffer, instruction window, etc.). The framework consists of three aspects: 1) detecting/diagnosing the fault, 2) recovering from the resultant error, and 3) mapping out the faulty portion of the array. For each aspect, we present design options. Based on this framework, we develop two particular schemes for self-repairing array structures (SRAS). Simulation results show that one of our SRAS schemes adds some performance overhead in the fault-free case, but that both of them mask hard faults 1) with less hardware overhead cost than higher-level redundancy (e.g., IBM mainframes) and 2) without the per-error performance penalty of existing low-cost techniques that combine error detection with pipeline flushes for backward error recovery (BER). When hard faults are present in arrays, due to operational faults or fabrication defects, SRAS schemes outperform BER due to not having to frequently flush the pipeline.

Original language	English (US)
Pages (from-to)	297-310
Number of pages	14
Journal	IEEE Transactions on Dependable and Secure Computing
Volume	2
Issue number	4
DOIs	https://doi.org/10.1109/TDSC.2005.44
State	Published - 2005
Externally published	Yes

Keywords

Logic design reliability and testing
Microcomputers
Microprocessors

ASJC Scopus subject areas

Electrical and Electronic Engineering
General Computer Science

Access to Document

10.1109/TDSC.2005.44

Cite this

@article{718e022e35b34769bc226a8f021573c1,

title = "Autonomic microprocessor execution via self-repairing arrays",

abstract = "To achieve high reliability despite hard faults that occur during operation and to achieve high yield despite defects introduced at fabrication, a microprocessor must be able to tolerate hard faults. In this paper, we present a framework for autonomic self-repair of the array structures in microprocessors (e.g., reorder buffer, instruction window, etc.). The framework consists of three aspects: 1) detecting/diagnosing the fault, 2) recovering from the resultant error, and 3) mapping out the faulty portion of the array. For each aspect, we present design options. Based on this framework, we develop two particular schemes for self-repairing array structures (SRAS). Simulation results show that one of our SRAS schemes adds some performance overhead in the fault-free case, but that both of them mask hard faults 1) with less hardware overhead cost than higher-level redundancy (e.g., IBM mainframes) and 2) without the per-error performance penalty of existing low-cost techniques that combine error detection with pipeline flushes for backward error recovery (BER). When hard faults are present in arrays, due to operational faults or fabrication defects, SRAS schemes outperform BER due to not having to frequently flush the pipeline.",

keywords = "Logic design reliability and testing, Microcomputers, Microprocessors",

author = "Bower, {Fred A.} and Sule Ozev and Sorin, {Daniel J.}",

note = "Funding Information: This material is based on work supported by the US National Science Foundation under grants CCR-0309164, CCF-0444516, and EIA-9972879, the National Aeronautics and Space Administration under grant NNG04GQ06G, Intel Corporation, IBM, and a Duke Warren Faculty Scholarship. The authors thank Alvy Lebeck, Tong Li, Pete Marinos, and Ismet Bayraktaroglu for their insightful comments and criticisms of this work. This paper partially includes research that appeared in the Proceedings of the 2004 International Conference on Dependable Systems and Networks [8].",

year = "2005",

doi = "10.1109/TDSC.2005.44",

language = "English (US)",

volume = "2",

pages = "297--310",

journal = "IEEE Transactions on Dependable and Secure Computing",

issn = "1545-5971",

publisher = "Institute of Electrical and Electronics Engineers Inc.",

number = "4",

}

TY - JOUR

T1 - Autonomic microprocessor execution via self-repairing arrays

AU - Bower, Fred A.

AU - Ozev, Sule

AU - Sorin, Daniel J.

N1 - Funding Information: This material is based on work supported by the US National Science Foundation under grants CCR-0309164, CCF-0444516, and EIA-9972879, the National Aeronautics and Space Administration under grant NNG04GQ06G, Intel Corporation, IBM, and a Duke Warren Faculty Scholarship. The authors thank Alvy Lebeck, Tong Li, Pete Marinos, and Ismet Bayraktaroglu for their insightful comments and criticisms of this work. This paper partially includes research that appeared in the Proceedings of the 2004 International Conference on Dependable Systems and Networks [8].

PY - 2005

Y1 - 2005

N2 - To achieve high reliability despite hard faults that occur during operation and to achieve high yield despite defects introduced at fabrication, a microprocessor must be able to tolerate hard faults. In this paper, we present a framework for autonomic self-repair of the array structures in microprocessors (e.g., reorder buffer, instruction window, etc.). The framework consists of three aspects: 1) detecting/diagnosing the fault, 2) recovering from the resultant error, and 3) mapping out the faulty portion of the array. For each aspect, we present design options. Based on this framework, we develop two particular schemes for self-repairing array structures (SRAS). Simulation results show that one of our SRAS schemes adds some performance overhead in the fault-free case, but that both of them mask hard faults 1) with less hardware overhead cost than higher-level redundancy (e.g., IBM mainframes) and 2) without the per-error performance penalty of existing low-cost techniques that combine error detection with pipeline flushes for backward error recovery (BER). When hard faults are present in arrays, due to operational faults or fabrication defects, SRAS schemes outperform BER due to not having to frequently flush the pipeline.

AB - To achieve high reliability despite hard faults that occur during operation and to achieve high yield despite defects introduced at fabrication, a microprocessor must be able to tolerate hard faults. In this paper, we present a framework for autonomic self-repair of the array structures in microprocessors (e.g., reorder buffer, instruction window, etc.). The framework consists of three aspects: 1) detecting/diagnosing the fault, 2) recovering from the resultant error, and 3) mapping out the faulty portion of the array. For each aspect, we present design options. Based on this framework, we develop two particular schemes for self-repairing array structures (SRAS). Simulation results show that one of our SRAS schemes adds some performance overhead in the fault-free case, but that both of them mask hard faults 1) with less hardware overhead cost than higher-level redundancy (e.g., IBM mainframes) and 2) without the per-error performance penalty of existing low-cost techniques that combine error detection with pipeline flushes for backward error recovery (BER). When hard faults are present in arrays, due to operational faults or fabrication defects, SRAS schemes outperform BER due to not having to frequently flush the pipeline.

KW - Logic design reliability and testing

KW - Microcomputers

KW - Microprocessors

UR - http://www.scopus.com/inward/record.url?scp=30344441095&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=30344441095&partnerID=8YFLogxK

U2 - 10.1109/TDSC.2005.44

DO - 10.1109/TDSC.2005.44

M3 - Article

AN - SCOPUS:30344441095

SN - 1545-5971

VL - 2

SP - 297

EP - 310

JO - IEEE Transactions on Dependable and Secure Computing

JF - IEEE Transactions on Dependable and Secure Computing

IS - 4

ER -

Autonomic microprocessor execution via self-repairing arrays

Abstract

Keywords

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this