Markov reliability models of fault-tolerant distributed computing systems

M. Liron, B. Melamed, Sik-Sang Yau

Research output: Contribution to journalArticle

2 Citations (Scopus)

Abstract

A hierachical view of fault-tolerant distributed computers is presented, viewing a distributed computing system as composed of interconnected, interacting, functional modules. Each module, modeled by a directed-state graph, is governed by internal random failure events and counteracting recovery processes, and also by coupling of external random events from other modules. It is shown that, under certain assumptions, the system is governed by a multidimensional Markov process, with non-Markov module processes as components. Mathematical properties of this model are formally analyzed. Performance measures are found from the steady-state distribution and visitation rate of each system and module state. A numerical example is presented exemplifying its practical application. The results are shown to fit very well the actual statistical data collected on an AT&T Bell Laboratories Electronic Switching System.

Original languageEnglish (US)
Pages (from-to)183-206
Number of pages24
JournalInformation Sciences
Volume40
Issue number3
DOIs
StatePublished - Dec 31 1986
Externally publishedYes

Fingerprint

Switching systems
Distributed computer systems
Distributed Computing
Fault-tolerant
Markov processes
Computer systems
Recovery
Module
Switching Systems
Model
Steady-state Distribution
Performance Measures
Markov Process
Distributed computing
Fault
Electronics
Internal
Numerical Examples
Graph in graph theory

ASJC Scopus subject areas

  • Artificial Intelligence
  • Computer Science Applications
  • Information Systems
  • Information Systems and Management
  • Statistics, Probability and Uncertainty
  • Electrical and Electronic Engineering
  • Statistics and Probability

Cite this

Markov reliability models of fault-tolerant distributed computing systems. / Liron, M.; Melamed, B.; Yau, Sik-Sang.

In: Information Sciences, Vol. 40, No. 3, 31.12.1986, p. 183-206.

Research output: Contribution to journalArticle

@article{c2a906d43aa74684a40b609780a02838,
title = "Markov reliability models of fault-tolerant distributed computing systems",
abstract = "A hierachical view of fault-tolerant distributed computers is presented, viewing a distributed computing system as composed of interconnected, interacting, functional modules. Each module, modeled by a directed-state graph, is governed by internal random failure events and counteracting recovery processes, and also by coupling of external random events from other modules. It is shown that, under certain assumptions, the system is governed by a multidimensional Markov process, with non-Markov module processes as components. Mathematical properties of this model are formally analyzed. Performance measures are found from the steady-state distribution and visitation rate of each system and module state. A numerical example is presented exemplifying its practical application. The results are shown to fit very well the actual statistical data collected on an AT&T Bell Laboratories Electronic Switching System.",
author = "M. Liron and B. Melamed and Sik-Sang Yau",
year = "1986",
month = "12",
day = "31",
doi = "10.1016/0020-0255(86)90057-5",
language = "English (US)",
volume = "40",
pages = "183--206",
journal = "Information Sciences",
issn = "0020-0255",
publisher = "Elsevier Inc.",
number = "3",

}

TY - JOUR

T1 - Markov reliability models of fault-tolerant distributed computing systems

AU - Liron, M.

AU - Melamed, B.

AU - Yau, Sik-Sang

PY - 1986/12/31

Y1 - 1986/12/31

N2 - A hierachical view of fault-tolerant distributed computers is presented, viewing a distributed computing system as composed of interconnected, interacting, functional modules. Each module, modeled by a directed-state graph, is governed by internal random failure events and counteracting recovery processes, and also by coupling of external random events from other modules. It is shown that, under certain assumptions, the system is governed by a multidimensional Markov process, with non-Markov module processes as components. Mathematical properties of this model are formally analyzed. Performance measures are found from the steady-state distribution and visitation rate of each system and module state. A numerical example is presented exemplifying its practical application. The results are shown to fit very well the actual statistical data collected on an AT&T Bell Laboratories Electronic Switching System.

AB - A hierachical view of fault-tolerant distributed computers is presented, viewing a distributed computing system as composed of interconnected, interacting, functional modules. Each module, modeled by a directed-state graph, is governed by internal random failure events and counteracting recovery processes, and also by coupling of external random events from other modules. It is shown that, under certain assumptions, the system is governed by a multidimensional Markov process, with non-Markov module processes as components. Mathematical properties of this model are formally analyzed. Performance measures are found from the steady-state distribution and visitation rate of each system and module state. A numerical example is presented exemplifying its practical application. The results are shown to fit very well the actual statistical data collected on an AT&T Bell Laboratories Electronic Switching System.

UR - http://www.scopus.com/inward/record.url?scp=0022952824&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0022952824&partnerID=8YFLogxK

U2 - 10.1016/0020-0255(86)90057-5

DO - 10.1016/0020-0255(86)90057-5

M3 - Article

AN - SCOPUS:0022952824

VL - 40

SP - 183

EP - 206

JO - Information Sciences

JF - Information Sciences

SN - 0020-0255

IS - 3

ER -