A framework for generating distributed-memory parallel programs for block recursive algorithms 1

Sandeep Gupta, C. H. Huang, P. Sadayappan, R. W. Johnson

Research output: Contribution to journalArticle

16 Citations (Scopus)

Abstract

A framework for synthesizing communication-efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and Strassen's matrix multiplication is presented. This framework is based on an algebraic representation of the algorithms, which involves the tensor (Kronecker) product and other matrix operations. This representation is useful in analyzing the communication implications of computation partitioning and data distributions. The programs are synthesized under two different target program models. These two models are based on different ways of managing the distribution of data for optimizing communication. The first model uses point-to-point interprocessor communication primitives, whereas the second model uses data redistribution primitives involving collective all-to-many communication. These two program models are shown to be suitable for different ranges of problem size. The methodology is illustrated by synthesizing communication-efficient programs for the FFT. This framework has been incorporated into the EXTENT system for automatic generation of parallel/vector programs for block recursive algorithms.

Original languageEnglish (US)
Pages (from-to)137-153
Number of pages17
JournalJournal of Parallel and Distributed Computing
Volume34
Issue number2
DOIs
StatePublished - May 1 1996
Externally publishedYes

Fingerprint

Block Algorithm
Parallel Programs
Distributed Memory
Recursive Algorithm
Data storage equipment
Communication
Fast Fourier transform
Fast Fourier transforms
Interprocessor Communication
Model
Kronecker Product
Matrix multiplication
Data Distribution
Redistribution
Tensor Product
Partitioning
Tensors
Framework
Target
Methodology

ASJC Scopus subject areas

  • Computer Science Applications
  • Hardware and Architecture
  • Control and Systems Engineering

Cite this

A framework for generating distributed-memory parallel programs for block recursive algorithms 1. / Gupta, Sandeep; Huang, C. H.; Sadayappan, P.; Johnson, R. W.

In: Journal of Parallel and Distributed Computing, Vol. 34, No. 2, 01.05.1996, p. 137-153.

Research output: Contribution to journalArticle

@article{1a22c9f9eafb489891fb63367b0d42f5,
title = "A framework for generating distributed-memory parallel programs for block recursive algorithms 1",
abstract = "A framework for synthesizing communication-efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and Strassen's matrix multiplication is presented. This framework is based on an algebraic representation of the algorithms, which involves the tensor (Kronecker) product and other matrix operations. This representation is useful in analyzing the communication implications of computation partitioning and data distributions. The programs are synthesized under two different target program models. These two models are based on different ways of managing the distribution of data for optimizing communication. The first model uses point-to-point interprocessor communication primitives, whereas the second model uses data redistribution primitives involving collective all-to-many communication. These two program models are shown to be suitable for different ranges of problem size. The methodology is illustrated by synthesizing communication-efficient programs for the FFT. This framework has been incorporated into the EXTENT system for automatic generation of parallel/vector programs for block recursive algorithms.",
author = "Sandeep Gupta and Huang, {C. H.} and P. Sadayappan and Johnson, {R. W.}",
year = "1996",
month = "5",
day = "1",
doi = "10.1006/jpdc.1996.0051",
language = "English (US)",
volume = "34",
pages = "137--153",
journal = "Journal of Parallel and Distributed Computing",
issn = "0743-7315",
publisher = "Academic Press Inc.",
number = "2",

}

TY - JOUR

T1 - A framework for generating distributed-memory parallel programs for block recursive algorithms 1

AU - Gupta, Sandeep

AU - Huang, C. H.

AU - Sadayappan, P.

AU - Johnson, R. W.

PY - 1996/5/1

Y1 - 1996/5/1

N2 - A framework for synthesizing communication-efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and Strassen's matrix multiplication is presented. This framework is based on an algebraic representation of the algorithms, which involves the tensor (Kronecker) product and other matrix operations. This representation is useful in analyzing the communication implications of computation partitioning and data distributions. The programs are synthesized under two different target program models. These two models are based on different ways of managing the distribution of data for optimizing communication. The first model uses point-to-point interprocessor communication primitives, whereas the second model uses data redistribution primitives involving collective all-to-many communication. These two program models are shown to be suitable for different ranges of problem size. The methodology is illustrated by synthesizing communication-efficient programs for the FFT. This framework has been incorporated into the EXTENT system for automatic generation of parallel/vector programs for block recursive algorithms.

AB - A framework for synthesizing communication-efficient distributed-memory parallel programs for block recursive algorithms such as the fast Fourier transform (FFT) and Strassen's matrix multiplication is presented. This framework is based on an algebraic representation of the algorithms, which involves the tensor (Kronecker) product and other matrix operations. This representation is useful in analyzing the communication implications of computation partitioning and data distributions. The programs are synthesized under two different target program models. These two models are based on different ways of managing the distribution of data for optimizing communication. The first model uses point-to-point interprocessor communication primitives, whereas the second model uses data redistribution primitives involving collective all-to-many communication. These two program models are shown to be suitable for different ranges of problem size. The methodology is illustrated by synthesizing communication-efficient programs for the FFT. This framework has been incorporated into the EXTENT system for automatic generation of parallel/vector programs for block recursive algorithms.

UR - http://www.scopus.com/inward/record.url?scp=0030143875&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0030143875&partnerID=8YFLogxK

U2 - 10.1006/jpdc.1996.0051

DO - 10.1006/jpdc.1996.0051

M3 - Article

AN - SCOPUS:0030143875

VL - 34

SP - 137

EP - 153

JO - Journal of Parallel and Distributed Computing

JF - Journal of Parallel and Distributed Computing

SN - 0743-7315

IS - 2

ER -