A technique for overlapping computation and communication for block recursive algorithms

S. K.S. Gupta, C. H. Huang, P. Sadayappan, R. W. Johnson

Research output: Contribution to journalArticle

4 Scopus citations

Abstract

This paper presents a design methodology for developing efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with a circuit-switched or wormhole routed mesh or a hypercube interconnection network. A mathematical framework based on the tensor product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tensor product algebra. Performance results for FFT programs on the Intel Paragon are presented.

Original languageEnglish (US)
Pages (from-to)73-90
Number of pages18
JournalConcurrency Practice and Experience
Volume10
Issue number2
DOIs
StatePublished - Feb 1998
Externally publishedYes

ASJC Scopus subject areas

  • Engineering(all)

Fingerprint Dive into the research topics of 'A technique for overlapping computation and communication for block recursive algorithms'. Together they form a unique fingerprint.

  • Cite this