Simulating crash failures with many faulty processors

Rida Bazzi, Gil Neiger

Research output: Chapter in Book/Report/Conference proceedingConference contribution

11 Scopus citations

Abstract

The difficulty of designing fault-tolerant distributed algorithms increases with the severity of failures that an algorithm must tolerate. This paper considers methods that automatically translate algorithms tolerant of simple crash failures into ones tolerant of more severe omission failures. These translations simplify the design task by allowing algorithm designers to assume that processors fail only by stopping. Earlier results had suggested that these translations must, in general, have limited fault-tolerance: that crash failures could not be simulated unless a majority of processors remained correct throughout any execution. We show that this limitation does not apply when considering a broad range of distributed computing problems that includes most classical problems in the field. We do this by exhibiting a hierarchy of translations, each with different fault-tolerance and complexity; for any number of possible failures, we give an appropriate translation. Each of these translations is shown to be optimal with respect to the joint measures of fault-tolerance and round-complexity (the round-complexity of a translation is the number of communication rounds that the translation uses to simulate one round of the original algorithm). That is, the hierarchy of translations is matched by a corresponding hierarchy of impossibility results. Furthermore, this hierarchy has more structure than that seen for other failure models, indicating that the relationship between crash and omission failures is more complex than had been previously thought.

Original languageEnglish (US)
Title of host publicationDistributed Algorithms - 6th International Workshop, WDAG 1992, Proceedings
EditorsAdrian Segall, Shmuel Zaks
PublisherSpringer Verlag
Pages166-184
Number of pages19
ISBN (Print)9783540561880
DOIs
StatePublished - 1992
Externally publishedYes
Event6th International Workshop on Distributed Algorithms, WDAG 1992 - Haifa, Israel
Duration: Nov 2 1992Nov 4 1992

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume647 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other6th International Workshop on Distributed Algorithms, WDAG 1992
Country/TerritoryIsrael
CityHaifa
Period11/2/9211/4/92

ASJC Scopus subject areas

  • Theoretical Computer Science
  • General Computer Science

Fingerprint

Dive into the research topics of 'Simulating crash failures with many faulty processors'. Together they form a unique fingerprint.

Cite this