A technique for overlapping computation and communication for block recursive algorithms

S. K.S. Gupta; C. H. Huang; P. Sadayappan; R. W. Johnson

doi:10.1002/(SICI)1096-9128(199802)10:2<73::AID-CPE289>3.0.CO;2-N

A technique for overlapping computation and communication for block recursive algorithms

S. K.S. Gupta, C. H. Huang, P. Sadayappan, R. W. Johnson

Research output: Contribution to journal › Article › peer-review

5 Scopus citations

Abstract

This paper presents a design methodology for developing efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with a circuit-switched or wormhole routed mesh or a hypercube interconnection network. A mathematical framework based on the tensor product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tensor product algebra. Performance results for FFT programs on the Intel Paragon are presented.

Original language	English (US)
Pages (from-to)	73-90
Number of pages	18
Journal	Concurrency Practice and Experience
Volume	10
Issue number	2
DOIs	https://doi.org/10.1002/(SICI)1096-9128(199802)10:2<73::AID-CPE289>3.0.CO;2-N
State	Published - Feb 1998
Externally published	Yes

ASJC Scopus subject areas

General Engineering

Access to Document

10.1002/(SICI)1096-9128(199802)10:2<73::AID-CPE289>3.0.CO;2-N

Cite this

@article{f1826b16f81e4a75b0b7d224978e3c18,

title = "A technique for overlapping computation and communication for block recursive algorithms",

abstract = "This paper presents a design methodology for developing efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with a circuit-switched or wormhole routed mesh or a hypercube interconnection network. A mathematical framework based on the tensor product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tensor product algebra. Performance results for FFT programs on the Intel Paragon are presented.",

author = "Gupta, {S. K.S.} and Huang, {C. H.} and P. Sadayappan and Johnson, {R. W.}",

year = "1998",

month = feb,

doi = "10.1002/(SICI)1096-9128(199802)10:2<73::AID-CPE289>3.0.CO;2-N",

language = "English (US)",

volume = "10",

pages = "73--90",

journal = "Concurrency Practice and Experience",

issn = "1040-3108",

publisher = "John Wiley and Sons Ltd",

number = "2",

}

TY - JOUR

T1 - A technique for overlapping computation and communication for block recursive algorithms

AU - Gupta, S. K.S.

AU - Huang, C. H.

AU - Sadayappan, P.

AU - Johnson, R. W.

PY - 1998/2

Y1 - 1998/2

N2 - This paper presents a design methodology for developing efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with a circuit-switched or wormhole routed mesh or a hypercube interconnection network. A mathematical framework based on the tensor product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tensor product algebra. Performance results for FFT programs on the Intel Paragon are presented.

AB - This paper presents a design methodology for developing efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with a circuit-switched or wormhole routed mesh or a hypercube interconnection network. A mathematical framework based on the tensor product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tensor product algebra. Performance results for FFT programs on the Intel Paragon are presented.

UR - http://www.scopus.com/inward/record.url?scp=0032002843&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032002843&partnerID=8YFLogxK

U2 - 10.1002/(SICI)1096-9128(199802)10:2<73::AID-CPE289>3.0.CO;2-N

DO - 10.1002/(SICI)1096-9128(199802)10:2<73::AID-CPE289>3.0.CO;2-N

M3 - Article

AN - SCOPUS:0032002843

SN - 1040-3108

VL - 10

SP - 73

EP - 90

JO - Concurrency Practice and Experience

JF - Concurrency Practice and Experience

IS - 2

ER -

A technique for overlapping computation and communication for block recursive algorithms

Abstract

ASJC Scopus subject areas

Access to Document

Other files and links

Fingerprint

Cite this