A technique for overlapping computation and communication for block recursive algorithms

Sandeep Gupta, C. H. Huang, P. Sadayappan, R. W. Johnson

Research output: Contribution to journalArticle

4 Citations (Scopus)

Abstract

This paper presents a design methodology for developing efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with a circuit-switched or wormhole routed mesh or a hypercube interconnection network. A mathematical framework based on the tensor product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tensor product algebra. Performance results for FFT programs on the Intel Paragon are presented.

Original languageEnglish (US)
Pages (from-to)73-90
Number of pages18
JournalConcurrency Practice and Experience
Volume10
Issue number2
StatePublished - Feb 1998
Externally publishedYes

Fingerprint

Block Algorithm
Distributed Memory
Recursive Algorithm
Fast Fourier transform
Fast Fourier transforms
Tensor Product
Design Methodology
Tensors
Overlapping
Hypercube networks
Memory architecture
Wormhole
Interconnection networks (circuit switching)
Supercomputers
Parallel Programs
Communication
Interconnection Networks
Supercomputer
Efficient Implementation
Hypercube

ASJC Scopus subject areas

  • Engineering(all)

Cite this

A technique for overlapping computation and communication for block recursive algorithms. / Gupta, Sandeep; Huang, C. H.; Sadayappan, P.; Johnson, R. W.

In: Concurrency Practice and Experience, Vol. 10, No. 2, 02.1998, p. 73-90.

Research output: Contribution to journalArticle

Gupta, Sandeep ; Huang, C. H. ; Sadayappan, P. ; Johnson, R. W. / A technique for overlapping computation and communication for block recursive algorithms. In: Concurrency Practice and Experience. 1998 ; Vol. 10, No. 2. pp. 73-90.
@article{f1826b16f81e4a75b0b7d224978e3c18,
title = "A technique for overlapping computation and communication for block recursive algorithms",
abstract = "This paper presents a design methodology for developing efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with a circuit-switched or wormhole routed mesh or a hypercube interconnection network. A mathematical framework based on the tensor product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tensor product algebra. Performance results for FFT programs on the Intel Paragon are presented.",
author = "Sandeep Gupta and Huang, {C. H.} and P. Sadayappan and Johnson, {R. W.}",
year = "1998",
month = "2",
language = "English (US)",
volume = "10",
pages = "73--90",
journal = "Concurrency Computation Practice and Experience",
issn = "1532-0626",
publisher = "John Wiley and Sons Ltd",
number = "2",

}

TY - JOUR

T1 - A technique for overlapping computation and communication for block recursive algorithms

AU - Gupta, Sandeep

AU - Huang, C. H.

AU - Sadayappan, P.

AU - Johnson, R. W.

PY - 1998/2

Y1 - 1998/2

N2 - This paper presents a design methodology for developing efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with a circuit-switched or wormhole routed mesh or a hypercube interconnection network. A mathematical framework based on the tensor product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tensor product algebra. Performance results for FFT programs on the Intel Paragon are presented.

AB - This paper presents a design methodology for developing efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and bitonic sort. This design methodology is specifically suited for most modern supercomputers having a distributed-memory architecture with a circuit-switched or wormhole routed mesh or a hypercube interconnection network. A mathematical framework based on the tensor product and other matrix operations is used for representing algorithms. Communication-efficient implementations with effectively overlapped computation and communication are achieved by manipulating the mathematical representation using the tensor product algebra. Performance results for FFT programs on the Intel Paragon are presented.

UR - http://www.scopus.com/inward/record.url?scp=0032002843&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0032002843&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:0032002843

VL - 10

SP - 73

EP - 90

JO - Concurrency Computation Practice and Experience

JF - Concurrency Computation Practice and Experience

SN - 1532-0626

IS - 2

ER -