Markov reliability models of fault-tolerant distributed computing systems

M. Liron, B. Melamed, S. S. Yau

Research output: Contribution to journalArticle

2 Scopus citations

Abstract

A hierachical view of fault-tolerant distributed computers is presented, viewing a distributed computing system as composed of interconnected, interacting, functional modules. Each module, modeled by a directed-state graph, is governed by internal random failure events and counteracting recovery processes, and also by coupling of external random events from other modules. It is shown that, under certain assumptions, the system is governed by a multidimensional Markov process, with non-Markov module processes as components. Mathematical properties of this model are formally analyzed. Performance measures are found from the steady-state distribution and visitation rate of each system and module state. A numerical example is presented exemplifying its practical application. The results are shown to fit very well the actual statistical data collected on an AT&T Bell Laboratories Electronic Switching System.

Original languageEnglish (US)
Pages (from-to)183-206
Number of pages24
JournalInformation Sciences
Volume40
Issue number3
DOIs
StatePublished - Dec 31 1986
Externally publishedYes

ASJC Scopus subject areas

  • Software
  • Control and Systems Engineering
  • Theoretical Computer Science
  • Computer Science Applications
  • Information Systems and Management
  • Artificial Intelligence

Fingerprint Dive into the research topics of 'Markov reliability models of fault-tolerant distributed computing systems'. Together they form a unique fingerprint.

  • Cite this