Automatic data and computation decomposition on distributed memory parallel computers

Peizong Z. Lee, Zvi Meir Kedem

Research output: Contribution to journalArticle

Abstract

To exploit parallelism on shared memory parallel computers (SMPCs), it is natural to focus on decomposing the computation (mainly by distributing the iterations of the nested Do-Loops). In contrast, on distributed memory parallel computers (DMPCs), the decomposition of computation and the distribution of data must both be handled - in order to balance the computation load and to minimize the migration of data. We propose and validate experimentally a method for handling computations and data synergistically to minimize the overall execution time on DMPCs. The method is based on a number of novel techniques, also presented in this article. The core idea is to rank the "importance" of data arrays in a program and specify some of the dominant. The intuition is that the dominant arrays are the ones whose migration would be the most expensive. Using the correspondence between iteration space mapping vectors and distributed dimensions of the dominant data array in each nested Do-loop, allows us to design algorithms for determining data and computation decompositions at the same time. Based on data distribution, computation decomposition for each nested Do-loop is determined based on either the "owner computes" rule or the "owner stores" rule with respect to the dominant data array. If all temporal dependence relations across iteration partitions are regular we use tiling to allow pipelining and the overlapping of computation and communication However, in order to use tiling on DMPCs, we needed to extend the existing techniques for determining tiling vectors and tile sizes, as they were originally suited for SMPCs only. The overall method is illustrated on programs for the 2D heat equation, for the Gaussian elimination with pivoting, and for the 2D fast Fourier transform on a linear processor array and on a 2D processor grid.

Original languageEnglish (US)
Pages (from-to)1-50
Number of pages50
JournalACM Transactions on Programming Languages and Systems
Volume24
Issue number1
DOIs
StatePublished - Jan 2002

Fingerprint

Decomposition
Data storage equipment
Parallel processing systems
Tile
Fast Fourier transforms
Communication

Keywords

  • Algorithms
  • Computation decomposition
  • D.3.4 [Programming Languages]: Processors - compilers
  • Data alignment
  • Data distribution
  • Distributed-memory computers
  • Dominant data array
  • E.1 [Data Structures]: arrays
  • Languages
  • Optimization

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Software

Cite this

Automatic data and computation decomposition on distributed memory parallel computers. / Lee, Peizong Z.; Kedem, Zvi Meir.

In: ACM Transactions on Programming Languages and Systems, Vol. 24, No. 1, 01.2002, p. 1-50.

Research output: Contribution to journalArticle

@article{9d7f51133642470b93ca2dbad2f4cd54,
title = "Automatic data and computation decomposition on distributed memory parallel computers",
abstract = "To exploit parallelism on shared memory parallel computers (SMPCs), it is natural to focus on decomposing the computation (mainly by distributing the iterations of the nested Do-Loops). In contrast, on distributed memory parallel computers (DMPCs), the decomposition of computation and the distribution of data must both be handled - in order to balance the computation load and to minimize the migration of data. We propose and validate experimentally a method for handling computations and data synergistically to minimize the overall execution time on DMPCs. The method is based on a number of novel techniques, also presented in this article. The core idea is to rank the {"}importance{"} of data arrays in a program and specify some of the dominant. The intuition is that the dominant arrays are the ones whose migration would be the most expensive. Using the correspondence between iteration space mapping vectors and distributed dimensions of the dominant data array in each nested Do-loop, allows us to design algorithms for determining data and computation decompositions at the same time. Based on data distribution, computation decomposition for each nested Do-loop is determined based on either the {"}owner computes{"} rule or the {"}owner stores{"} rule with respect to the dominant data array. If all temporal dependence relations across iteration partitions are regular we use tiling to allow pipelining and the overlapping of computation and communication However, in order to use tiling on DMPCs, we needed to extend the existing techniques for determining tiling vectors and tile sizes, as they were originally suited for SMPCs only. The overall method is illustrated on programs for the 2D heat equation, for the Gaussian elimination with pivoting, and for the 2D fast Fourier transform on a linear processor array and on a 2D processor grid.",
keywords = "Algorithms, Computation decomposition, D.3.4 [Programming Languages]: Processors - compilers, Data alignment, Data distribution, Distributed-memory computers, Dominant data array, E.1 [Data Structures]: arrays, Languages, Optimization",
author = "Lee, {Peizong Z.} and Kedem, {Zvi Meir}",
year = "2002",
month = "1",
doi = "10.1145/509705.509706",
language = "English (US)",
volume = "24",
pages = "1--50",
journal = "ACM Transactions on Programming Languages and Systems",
issn = "0164-0925",
publisher = "Association for Computing Machinery (ACM)",
number = "1",

}

TY - JOUR

T1 - Automatic data and computation decomposition on distributed memory parallel computers

AU - Lee, Peizong Z.

AU - Kedem, Zvi Meir

PY - 2002/1

Y1 - 2002/1

N2 - To exploit parallelism on shared memory parallel computers (SMPCs), it is natural to focus on decomposing the computation (mainly by distributing the iterations of the nested Do-Loops). In contrast, on distributed memory parallel computers (DMPCs), the decomposition of computation and the distribution of data must both be handled - in order to balance the computation load and to minimize the migration of data. We propose and validate experimentally a method for handling computations and data synergistically to minimize the overall execution time on DMPCs. The method is based on a number of novel techniques, also presented in this article. The core idea is to rank the "importance" of data arrays in a program and specify some of the dominant. The intuition is that the dominant arrays are the ones whose migration would be the most expensive. Using the correspondence between iteration space mapping vectors and distributed dimensions of the dominant data array in each nested Do-loop, allows us to design algorithms for determining data and computation decompositions at the same time. Based on data distribution, computation decomposition for each nested Do-loop is determined based on either the "owner computes" rule or the "owner stores" rule with respect to the dominant data array. If all temporal dependence relations across iteration partitions are regular we use tiling to allow pipelining and the overlapping of computation and communication However, in order to use tiling on DMPCs, we needed to extend the existing techniques for determining tiling vectors and tile sizes, as they were originally suited for SMPCs only. The overall method is illustrated on programs for the 2D heat equation, for the Gaussian elimination with pivoting, and for the 2D fast Fourier transform on a linear processor array and on a 2D processor grid.

AB - To exploit parallelism on shared memory parallel computers (SMPCs), it is natural to focus on decomposing the computation (mainly by distributing the iterations of the nested Do-Loops). In contrast, on distributed memory parallel computers (DMPCs), the decomposition of computation and the distribution of data must both be handled - in order to balance the computation load and to minimize the migration of data. We propose and validate experimentally a method for handling computations and data synergistically to minimize the overall execution time on DMPCs. The method is based on a number of novel techniques, also presented in this article. The core idea is to rank the "importance" of data arrays in a program and specify some of the dominant. The intuition is that the dominant arrays are the ones whose migration would be the most expensive. Using the correspondence between iteration space mapping vectors and distributed dimensions of the dominant data array in each nested Do-loop, allows us to design algorithms for determining data and computation decompositions at the same time. Based on data distribution, computation decomposition for each nested Do-loop is determined based on either the "owner computes" rule or the "owner stores" rule with respect to the dominant data array. If all temporal dependence relations across iteration partitions are regular we use tiling to allow pipelining and the overlapping of computation and communication However, in order to use tiling on DMPCs, we needed to extend the existing techniques for determining tiling vectors and tile sizes, as they were originally suited for SMPCs only. The overall method is illustrated on programs for the 2D heat equation, for the Gaussian elimination with pivoting, and for the 2D fast Fourier transform on a linear processor array and on a 2D processor grid.

KW - Algorithms

KW - Computation decomposition

KW - D.3.4 [Programming Languages]: Processors - compilers

KW - Data alignment

KW - Data distribution

KW - Distributed-memory computers

KW - Dominant data array

KW - E.1 [Data Structures]: arrays

KW - Languages

KW - Optimization

UR - http://www.scopus.com/inward/record.url?scp=0040027475&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0040027475&partnerID=8YFLogxK

U2 - 10.1145/509705.509706

DO - 10.1145/509705.509706

M3 - Article

VL - 24

SP - 1

EP - 50

JO - ACM Transactions on Programming Languages and Systems

JF - ACM Transactions on Programming Languages and Systems

SN - 0164-0925

IS - 1

ER -