### Abstract

The difficulty of designing fault-tolerant distributed algorithms increases with the severity of failures that an algorithm must tolerate. This paper considers methods that automatically translate algorithms tolerant of simple crash failures into ones tolerant of more severe omission failures. These translations simplify the design task by allowing algorithm designers to assume that processors fail only by stopping. Earlier results had suggested that these translations must, in general, have limited fault-tolerance: that crash failures could not be simulated unless a majority of processors remained correct throughout any execution. We show that this limitation does not apply when considering a broad range of distributed computing problems that includes most classical problems in the field. We do this by exhibiting a hierarchy of translations, each with different fault-tolerance and complexity; for any number of possible failures, we give an appropriate translation. Each of these translations is shown to be optimal with respect to the joint measures of fault-tolerance and round-complexity (the round-complexity of a translation is the number of communication rounds that the translation uses to simulate one round of the original algorithm). That is, the hierarchy of translations is matched by a corresponding hierarchy of impossibility results. Furthermore, this hierarchy has more structure than that seen for other failure models, indicating that the relationship between crash and omission failures is more complex than had been previously thought.

Original language | English (US) |
---|---|

Title of host publication | Distributed Algorithms - 6th International Workshop, WDAG 1992, Proceedings |

Publisher | Springer Verlag |

Pages | 166-184 |

Number of pages | 19 |

Volume | 647 LNCS |

ISBN (Print) | 9783540561880 |

State | Published - 1992 |

Externally published | Yes |

Event | 6th International Workshop on Distributed Algorithms, WDAG 1992 - Haifa, Israel Duration: Nov 2 1992 → Nov 4 1992 |

### Publication series

Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|

Volume | 647 LNCS |

ISSN (Print) | 0302-9743 |

ISSN (Electronic) | 1611-3349 |

### Other

Other | 6th International Workshop on Distributed Algorithms, WDAG 1992 |
---|---|

Country | Israel |

City | Haifa |

Period | 11/2/92 → 11/4/92 |

### Fingerprint

### ASJC Scopus subject areas

- Theoretical Computer Science
- Computer Science(all)

### Cite this

*Distributed Algorithms - 6th International Workshop, WDAG 1992, Proceedings*(Vol. 647 LNCS, pp. 166-184). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 647 LNCS). Springer Verlag.

**Simulating crash failures with many faulty processors.** / Bazzi, Rida; Neiger, Gil.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*Distributed Algorithms - 6th International Workshop, WDAG 1992, Proceedings.*vol. 647 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 647 LNCS, Springer Verlag, pp. 166-184, 6th International Workshop on Distributed Algorithms, WDAG 1992, Haifa, Israel, 11/2/92.

}

TY - GEN

T1 - Simulating crash failures with many faulty processors

AU - Bazzi, Rida

AU - Neiger, Gil

PY - 1992

Y1 - 1992

N2 - The difficulty of designing fault-tolerant distributed algorithms increases with the severity of failures that an algorithm must tolerate. This paper considers methods that automatically translate algorithms tolerant of simple crash failures into ones tolerant of more severe omission failures. These translations simplify the design task by allowing algorithm designers to assume that processors fail only by stopping. Earlier results had suggested that these translations must, in general, have limited fault-tolerance: that crash failures could not be simulated unless a majority of processors remained correct throughout any execution. We show that this limitation does not apply when considering a broad range of distributed computing problems that includes most classical problems in the field. We do this by exhibiting a hierarchy of translations, each with different fault-tolerance and complexity; for any number of possible failures, we give an appropriate translation. Each of these translations is shown to be optimal with respect to the joint measures of fault-tolerance and round-complexity (the round-complexity of a translation is the number of communication rounds that the translation uses to simulate one round of the original algorithm). That is, the hierarchy of translations is matched by a corresponding hierarchy of impossibility results. Furthermore, this hierarchy has more structure than that seen for other failure models, indicating that the relationship between crash and omission failures is more complex than had been previously thought.

AB - The difficulty of designing fault-tolerant distributed algorithms increases with the severity of failures that an algorithm must tolerate. This paper considers methods that automatically translate algorithms tolerant of simple crash failures into ones tolerant of more severe omission failures. These translations simplify the design task by allowing algorithm designers to assume that processors fail only by stopping. Earlier results had suggested that these translations must, in general, have limited fault-tolerance: that crash failures could not be simulated unless a majority of processors remained correct throughout any execution. We show that this limitation does not apply when considering a broad range of distributed computing problems that includes most classical problems in the field. We do this by exhibiting a hierarchy of translations, each with different fault-tolerance and complexity; for any number of possible failures, we give an appropriate translation. Each of these translations is shown to be optimal with respect to the joint measures of fault-tolerance and round-complexity (the round-complexity of a translation is the number of communication rounds that the translation uses to simulate one round of the original algorithm). That is, the hierarchy of translations is matched by a corresponding hierarchy of impossibility results. Furthermore, this hierarchy has more structure than that seen for other failure models, indicating that the relationship between crash and omission failures is more complex than had been previously thought.

UR - http://www.scopus.com/inward/record.url?scp=0342365620&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0342365620&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9783540561880

VL - 647 LNCS

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 166

EP - 184

BT - Distributed Algorithms - 6th International Workshop, WDAG 1992, Proceedings

PB - Springer Verlag

ER -