A deadlock-free and connectivity-guaranteed methodology for achieving fault-tolerance in on-chip networks

Pengju Ren, Xiaowei Ren, Sudhanshu Sane, Michel A. Kinsy, Nanning Zheng

Research output: Contribution to journalArticlepeer-review

17 Scopus citations

Abstract

To improve the reliability of on-chip network based systems, we design a deadlock-free routing technique that is more resilient to component failures and guarantees a higher degree of node connectivity. The routing methodology consists of three key steps. First, we determine the maximal connected subgraph of the faulty network by checking whether the defective components happen to be the cut vertices and bridges of the network topology. A precise fault diagnosis mechanism is used to identify partial defective routers. Second, we construct an acyclic channel dependency graph that breaks all cycles and preserves connectivity of the maximal connected subgraph. This is done through the cycle-breaking and connectivity guaranteed (CBCG) algorithm. Finally, we introduce a fault-tolerant adaptive routing scheme that can be used with or without virtual channels for network congestion avoidance and high-throughput routing. The simulation results show both the effectiveness and robustness of the proposed approach. For an 8× 8 2D-Mesh with 40 percent of link damage, full connectivity and deadlock freedom are still archived without disabling any faultless router in 98.18 percent of the simulations. In a 2D-Torus, the simulation percentage is even higher (99.93 percent). The hardware overhead for supporting the introduced features is minimal. An on-line implementation of cbcg using TSMC 65nm library has only 0.966 and 1.139 percent area overhead for the 8×8 and 16 × 16 2D-Meshes.

Original languageEnglish (US)
Article number7093169
Pages (from-to)353-366
Number of pages14
JournalIEEE Transactions on Computers
Volume65
Issue number2
DOIs
StatePublished - Feb 1 2016
Externally publishedYes

Keywords

  • Channel Dependency Graph
  • Fault-tolerance
  • Network-on-chip
  • Reliability
  • Routing algorithm

ASJC Scopus subject areas

  • Software
  • Theoretical Computer Science
  • Hardware and Architecture
  • Computational Theory and Mathematics

Fingerprint

Dive into the research topics of 'A deadlock-free and connectivity-guaranteed methodology for achieving fault-tolerance in on-chip networks'. Together they form a unique fingerprint.

Cite this