Correlation of no trouble found errors to negative bias temperature instability

Robert LiVolsi, Kevin McCormick, Myra Torres, Jyothi Velamala, Rui Zheng, Yu Cao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

No Trouble Found (NTF) and Cannot Duplicate (CND) errors on modern digital electronics are increasingly prevalent and occur at a rate of 50-60% using conventional bench top diagnostics [1]. This work correlates NTF diagnostic errors to Negative Bias Temperature Instability (NBTI), a prominent failure degradation mode and self annealing mechanism in sub-100 nm CMOS technology. NBTI degradation is duplicated in laboratory experiments on 90 nm MPC7448 Freescale Microprocessors. Accelerated aging via in situ thermal and voltage cycling is conducted while benchmark scripts are running on the MPC7448. Faults observed include premature program termination, corruption of system services, L1 and L2 cache errors as reported by the kernel, and total system failure. After 8 hours of rest, the system boots up normally with no indication of system degradation. Final system failure is observed after several faults. Conventional Built-In Test (BIT) fails to detect these faults upon reboot of the system. Various control tests and test profiles are used to accelerate NBTI degradation on the microprocessor samples. The challenge of faulty behavior is distinguishing between health and degradation leading to failure. Analysis techniques are used to show separation between healthy and degraded data, and independent NBTI research at Arizona State University is used to correlate NBTI behavior to NTF diagnostic errors.

Original languageEnglish (US)
Title of host publicationIEEE Aerospace Conference Proceedings
DOIs
StatePublished - 2011
Event2011 IEEE Aerospace Conference, AERO 2011 - Big Sky, MT, United States
Duration: Mar 5 2011Mar 12 2011

Other

Other2011 IEEE Aerospace Conference, AERO 2011
CountryUnited States
CityBig Sky, MT
Period3/5/113/12/11

Fingerprint

degradation
Degradation
system failures
microprocessors
temperature
Microprocessor chips
digital electronics
corruption
annealing
seats
health
CMOS
indication
Electronic equipment
Aging of materials
Negative bias temperature instability
Health
Annealing
cycles
Electric potential

ASJC Scopus subject areas

  • Aerospace Engineering
  • Space and Planetary Science

Cite this

LiVolsi, R., McCormick, K., Torres, M., Velamala, J., Zheng, R., & Cao, Y. (2011). Correlation of no trouble found errors to negative bias temperature instability. In IEEE Aerospace Conference Proceedings [5747585] https://doi.org/10.1109/AERO.2011.5747585

Correlation of no trouble found errors to negative bias temperature instability. / LiVolsi, Robert; McCormick, Kevin; Torres, Myra; Velamala, Jyothi; Zheng, Rui; Cao, Yu.

IEEE Aerospace Conference Proceedings. 2011. 5747585.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

LiVolsi, R, McCormick, K, Torres, M, Velamala, J, Zheng, R & Cao, Y 2011, Correlation of no trouble found errors to negative bias temperature instability. in IEEE Aerospace Conference Proceedings., 5747585, 2011 IEEE Aerospace Conference, AERO 2011, Big Sky, MT, United States, 3/5/11. https://doi.org/10.1109/AERO.2011.5747585
LiVolsi R, McCormick K, Torres M, Velamala J, Zheng R, Cao Y. Correlation of no trouble found errors to negative bias temperature instability. In IEEE Aerospace Conference Proceedings. 2011. 5747585 https://doi.org/10.1109/AERO.2011.5747585
LiVolsi, Robert ; McCormick, Kevin ; Torres, Myra ; Velamala, Jyothi ; Zheng, Rui ; Cao, Yu. / Correlation of no trouble found errors to negative bias temperature instability. IEEE Aerospace Conference Proceedings. 2011.
@inproceedings{ecbc3fd1de764e20a31a2811d0201231,
title = "Correlation of no trouble found errors to negative bias temperature instability",
abstract = "No Trouble Found (NTF) and Cannot Duplicate (CND) errors on modern digital electronics are increasingly prevalent and occur at a rate of 50-60{\%} using conventional bench top diagnostics [1]. This work correlates NTF diagnostic errors to Negative Bias Temperature Instability (NBTI), a prominent failure degradation mode and self annealing mechanism in sub-100 nm CMOS technology. NBTI degradation is duplicated in laboratory experiments on 90 nm MPC7448 Freescale Microprocessors. Accelerated aging via in situ thermal and voltage cycling is conducted while benchmark scripts are running on the MPC7448. Faults observed include premature program termination, corruption of system services, L1 and L2 cache errors as reported by the kernel, and total system failure. After 8 hours of rest, the system boots up normally with no indication of system degradation. Final system failure is observed after several faults. Conventional Built-In Test (BIT) fails to detect these faults upon reboot of the system. Various control tests and test profiles are used to accelerate NBTI degradation on the microprocessor samples. The challenge of faulty behavior is distinguishing between health and degradation leading to failure. Analysis techniques are used to show separation between healthy and degraded data, and independent NBTI research at Arizona State University is used to correlate NBTI behavior to NTF diagnostic errors.",
author = "Robert LiVolsi and Kevin McCormick and Myra Torres and Jyothi Velamala and Rui Zheng and Yu Cao",
year = "2011",
doi = "10.1109/AERO.2011.5747585",
language = "English (US)",
isbn = "9781424473502",
booktitle = "IEEE Aerospace Conference Proceedings",

}

TY - GEN

T1 - Correlation of no trouble found errors to negative bias temperature instability

AU - LiVolsi, Robert

AU - McCormick, Kevin

AU - Torres, Myra

AU - Velamala, Jyothi

AU - Zheng, Rui

AU - Cao, Yu

PY - 2011

Y1 - 2011

N2 - No Trouble Found (NTF) and Cannot Duplicate (CND) errors on modern digital electronics are increasingly prevalent and occur at a rate of 50-60% using conventional bench top diagnostics [1]. This work correlates NTF diagnostic errors to Negative Bias Temperature Instability (NBTI), a prominent failure degradation mode and self annealing mechanism in sub-100 nm CMOS technology. NBTI degradation is duplicated in laboratory experiments on 90 nm MPC7448 Freescale Microprocessors. Accelerated aging via in situ thermal and voltage cycling is conducted while benchmark scripts are running on the MPC7448. Faults observed include premature program termination, corruption of system services, L1 and L2 cache errors as reported by the kernel, and total system failure. After 8 hours of rest, the system boots up normally with no indication of system degradation. Final system failure is observed after several faults. Conventional Built-In Test (BIT) fails to detect these faults upon reboot of the system. Various control tests and test profiles are used to accelerate NBTI degradation on the microprocessor samples. The challenge of faulty behavior is distinguishing between health and degradation leading to failure. Analysis techniques are used to show separation between healthy and degraded data, and independent NBTI research at Arizona State University is used to correlate NBTI behavior to NTF diagnostic errors.

AB - No Trouble Found (NTF) and Cannot Duplicate (CND) errors on modern digital electronics are increasingly prevalent and occur at a rate of 50-60% using conventional bench top diagnostics [1]. This work correlates NTF diagnostic errors to Negative Bias Temperature Instability (NBTI), a prominent failure degradation mode and self annealing mechanism in sub-100 nm CMOS technology. NBTI degradation is duplicated in laboratory experiments on 90 nm MPC7448 Freescale Microprocessors. Accelerated aging via in situ thermal and voltage cycling is conducted while benchmark scripts are running on the MPC7448. Faults observed include premature program termination, corruption of system services, L1 and L2 cache errors as reported by the kernel, and total system failure. After 8 hours of rest, the system boots up normally with no indication of system degradation. Final system failure is observed after several faults. Conventional Built-In Test (BIT) fails to detect these faults upon reboot of the system. Various control tests and test profiles are used to accelerate NBTI degradation on the microprocessor samples. The challenge of faulty behavior is distinguishing between health and degradation leading to failure. Analysis techniques are used to show separation between healthy and degraded data, and independent NBTI research at Arizona State University is used to correlate NBTI behavior to NTF diagnostic errors.

UR - http://www.scopus.com/inward/record.url?scp=79955771952&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=79955771952&partnerID=8YFLogxK

U2 - 10.1109/AERO.2011.5747585

DO - 10.1109/AERO.2011.5747585

M3 - Conference contribution

AN - SCOPUS:79955771952

SN - 9781424473502

BT - IEEE Aerospace Conference Proceedings

ER -