The time interval between the occurrence of a fault and the detection of the error caused by the fault is divided by the generation of that error into two parts: fault latency and error latency. Since the moment of error generation is not directly observable, all related works in the literature have dealt with only the sum of fault and error latencies, thereby making the analysis of their separate effects impossible. To remedy this deficiency, the authors 1) present a new methodology for indirectly measuring fault latency, 2) derive the distribution of fault latency from the methodology, and 3) apply the knowledge of fault latency to the analysis of two important examples. The proposed methodology has been implemented for measuring fault latency in the Fault-Tolerant Multiprocessor (FTMP) at the NASA Airlab. The experimental results show wide variations in the mean fault latencies of different function circuits within FTMP. Also, the measured distributions of fault latency are shown to have monotone hazard rates. Consequently, Gamma and Weibull distributions are selected for the least-squares fit as the distribution of fault latency.
ASJC Scopus subject areas
- Theoretical Computer Science
- Hardware and Architecture
- Computational Theory and Mathematics