Bandwidth-optimal complete exchange on wormhole-routed 2D/3D torus networks: A diagonal-propagation approach

Yu Chee Tseng, Ting Hsien Lin, Sandeep Gupta, Dhabaleswar K. Panda

Research output: Contribution to journalArticle

30 Citations (Scopus)

Abstract

All-to-all personalized communication, or complete exchange, is at the heart of numerous applications in parallel computing. Several complete exchange algorithms have been proposed in the literature for wormhole meshes. However, these algorithms, when applied to tori, cannot take advantage of wrap-around interconnections to implement complete exchange with reduced latency. In this paper, a new diagonal-propagation approach is proposed to develop a set of complete exchange algorithms for 2D and 3D tori. This approach exploits the symmetric interconnections of tori and allows to develop a communication schedule consisting of several contention-free phases. These algorithms are indirect in nature and they use message combining to reduce the number of phases (message start-ups). It is shown that these algorithms effectively use the bisection bandwidth of a torus which is twice that for an equal sized mesh, to achieve complete exchange in time which is almost half of the best known complete exchange time on an equal sized mesh. The effectiveness of these algorithms is verified through simulation studies for varying system and technological parameters. It is also demonstrated that synchronous implementations of these algorithms (by introducing barriers between phases) lead to reduced latency for complete exchange with large messages, while the asynchronous ones are better for smaller messages.

Original languageEnglish (US)
Pages (from-to)380-396
Number of pages17
JournalIEEE Transactions on Parallel and Distributed Systems
Volume8
Issue number4
DOIs
StatePublished - 1997
Externally publishedYes

Fingerprint

Optimal Bandwidth
Wormhole
Torus
Propagation
Bandwidth
Exchange Algorithm
Mesh
Interconnection
Latency
Bisection
Communication
Parallel processing systems
Contention
Parallel Computing
Schedule
Simulation Study

Keywords

  • Collective communication
  • Complete exchange
  • Distributed memory systems
  • Interprocessor communication
  • Parallel computing
  • Torus
  • Wormhole routing

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Theoretical Computer Science
  • Computational Theory and Mathematics

Cite this

Bandwidth-optimal complete exchange on wormhole-routed 2D/3D torus networks : A diagonal-propagation approach. / Tseng, Yu Chee; Lin, Ting Hsien; Gupta, Sandeep; Panda, Dhabaleswar K.

In: IEEE Transactions on Parallel and Distributed Systems, Vol. 8, No. 4, 1997, p. 380-396.

Research output: Contribution to journalArticle

@article{9a41b02688054922bf2e7b7227279f8b,
title = "Bandwidth-optimal complete exchange on wormhole-routed 2D/3D torus networks: A diagonal-propagation approach",
abstract = "All-to-all personalized communication, or complete exchange, is at the heart of numerous applications in parallel computing. Several complete exchange algorithms have been proposed in the literature for wormhole meshes. However, these algorithms, when applied to tori, cannot take advantage of wrap-around interconnections to implement complete exchange with reduced latency. In this paper, a new diagonal-propagation approach is proposed to develop a set of complete exchange algorithms for 2D and 3D tori. This approach exploits the symmetric interconnections of tori and allows to develop a communication schedule consisting of several contention-free phases. These algorithms are indirect in nature and they use message combining to reduce the number of phases (message start-ups). It is shown that these algorithms effectively use the bisection bandwidth of a torus which is twice that for an equal sized mesh, to achieve complete exchange in time which is almost half of the best known complete exchange time on an equal sized mesh. The effectiveness of these algorithms is verified through simulation studies for varying system and technological parameters. It is also demonstrated that synchronous implementations of these algorithms (by introducing barriers between phases) lead to reduced latency for complete exchange with large messages, while the asynchronous ones are better for smaller messages.",
keywords = "Collective communication, Complete exchange, Distributed memory systems, Interprocessor communication, Parallel computing, Torus, Wormhole routing",
author = "Tseng, {Yu Chee} and Lin, {Ting Hsien} and Sandeep Gupta and Panda, {Dhabaleswar K.}",
year = "1997",
doi = "10.1109/71.588613",
language = "English (US)",
volume = "8",
pages = "380--396",
journal = "IEEE Transactions on Parallel and Distributed Systems",
issn = "1045-9219",
publisher = "IEEE Computer Society",
number = "4",

}

TY - JOUR

T1 - Bandwidth-optimal complete exchange on wormhole-routed 2D/3D torus networks

T2 - A diagonal-propagation approach

AU - Tseng, Yu Chee

AU - Lin, Ting Hsien

AU - Gupta, Sandeep

AU - Panda, Dhabaleswar K.

PY - 1997

Y1 - 1997

N2 - All-to-all personalized communication, or complete exchange, is at the heart of numerous applications in parallel computing. Several complete exchange algorithms have been proposed in the literature for wormhole meshes. However, these algorithms, when applied to tori, cannot take advantage of wrap-around interconnections to implement complete exchange with reduced latency. In this paper, a new diagonal-propagation approach is proposed to develop a set of complete exchange algorithms for 2D and 3D tori. This approach exploits the symmetric interconnections of tori and allows to develop a communication schedule consisting of several contention-free phases. These algorithms are indirect in nature and they use message combining to reduce the number of phases (message start-ups). It is shown that these algorithms effectively use the bisection bandwidth of a torus which is twice that for an equal sized mesh, to achieve complete exchange in time which is almost half of the best known complete exchange time on an equal sized mesh. The effectiveness of these algorithms is verified through simulation studies for varying system and technological parameters. It is also demonstrated that synchronous implementations of these algorithms (by introducing barriers between phases) lead to reduced latency for complete exchange with large messages, while the asynchronous ones are better for smaller messages.

AB - All-to-all personalized communication, or complete exchange, is at the heart of numerous applications in parallel computing. Several complete exchange algorithms have been proposed in the literature for wormhole meshes. However, these algorithms, when applied to tori, cannot take advantage of wrap-around interconnections to implement complete exchange with reduced latency. In this paper, a new diagonal-propagation approach is proposed to develop a set of complete exchange algorithms for 2D and 3D tori. This approach exploits the symmetric interconnections of tori and allows to develop a communication schedule consisting of several contention-free phases. These algorithms are indirect in nature and they use message combining to reduce the number of phases (message start-ups). It is shown that these algorithms effectively use the bisection bandwidth of a torus which is twice that for an equal sized mesh, to achieve complete exchange in time which is almost half of the best known complete exchange time on an equal sized mesh. The effectiveness of these algorithms is verified through simulation studies for varying system and technological parameters. It is also demonstrated that synchronous implementations of these algorithms (by introducing barriers between phases) lead to reduced latency for complete exchange with large messages, while the asynchronous ones are better for smaller messages.

KW - Collective communication

KW - Complete exchange

KW - Distributed memory systems

KW - Interprocessor communication

KW - Parallel computing

KW - Torus

KW - Wormhole routing

UR - http://www.scopus.com/inward/record.url?scp=0031120391&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0031120391&partnerID=8YFLogxK

U2 - 10.1109/71.588613

DO - 10.1109/71.588613

M3 - Article

AN - SCOPUS:0031120391

VL - 8

SP - 380

EP - 396

JO - IEEE Transactions on Parallel and Distributed Systems

JF - IEEE Transactions on Parallel and Distributed Systems

SN - 1045-9219

IS - 4

ER -