A hierachical view of fault-tolerant distributed computers is presented, viewing a distributed computing system as composed of interconnected, interacting, functional modules. Each module, modeled by a directed-state graph, is governed by internal random failure events and counteracting recovery processes, and also by coupling of external random events from other modules. It is shown that, under certain assumptions, the system is governed by a multidimensional Markov process, with non-Markov module processes as components. Mathematical properties of this model are formally analyzed. Performance measures are found from the steady-state distribution and visitation rate of each system and module state. A numerical example is presented exemplifying its practical application. The results are shown to fit very well the actual statistical data collected on an AT&T Bell Laboratories Electronic Switching System.
ASJC Scopus subject areas
- Control and Systems Engineering
- Theoretical Computer Science
- Computer Science Applications
- Information Systems and Management
- Artificial Intelligence