Petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures

Abtin Rahimian, Ilya Lashuk, Shravan K. Veerapaneni, Aparna Chandramowlishwaran, Dhairya Malhotra, Logan Moon, Rahul Sampath, Aashay Shringarpure, Jeffrey Vetter, Richard Vuduc, Denis Zorin, George Biros

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a fast, petaflop-scalable algorithm for Stokesian particulate flows. Our goal is the direct simulation of blood, which we model as a mixture of a Stokesian fluid (plasma) and red blood cells (RBCs). Directly simulating blood is a challenging multiscale, multiphysics problem. We report simulations with up to 200 million deformable RBCs. The largest simulation amounts to 90 billion unknowns in space. In terms of the number of cells, we improve the state-of-the art by several orders of magnitude: the previous largest simulation, at the same physical fidelity as ours, resolved the flow of O(1,000-10,000) RBCs. Our approach has three distinct characteristics: (1) we faithfully represent the physics of RBCs by using nonlinear solid mechanics to capture the deformations of each cell; (2) we accurately resolve the long-range, N-body, hydrodynamic interactions between RBCs (which are caused by the surrounding plasma); and (3) we allow for the highly non-uniform distribution of RBCs in space. The new method has been implemented in the software library MOBO (for "Moving Boundaries"). We designed MOBO to support parallelism at all levels, including inter-node distributed memory parallelism, intra-node shared memory parallelism, data parallelism (vectorization), and fine-grained multithreading for GPUs. We have implemented and optimized the majority of the computation kernels on both Intel/AMD x86 and NVidia's Tesla/Fermi platforms for single and double floating point precision. Overall, the code has scaled on 256 CPU-GPUs on the Teragrid's Lincoln cluster and on 200,000 AMD cores of the Oak Ridge National Laboratory's Jaguar PF system. In our largest simulation, we have achieved 0.7 Petaflops/s of sustained performance on Jaguar.

Original languageEnglish (US)
Title of host publication2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010
DOIs
StatePublished - 2010
Event2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010 - New Orleans, LA, United States
Duration: Nov 13 2010Nov 19 2010

Other

Other2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010
CountryUnited States
CityNew Orleans, LA
Period11/13/1011/19/10

Fingerprint

Direct numerical simulation
Blood
Plasmas
Data storage equipment
Program processors
Mechanics
Hydrodynamics
Physics
Cells
Fluids

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Hardware and Architecture

Cite this

Rahimian, A., Lashuk, I., Veerapaneni, S. K., Chandramowlishwaran, A., Malhotra, D., Moon, L., ... Biros, G. (2010). Petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures. In 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010 [5644910] https://doi.org/10.1109/SC.2010.42

Petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures. / Rahimian, Abtin; Lashuk, Ilya; Veerapaneni, Shravan K.; Chandramowlishwaran, Aparna; Malhotra, Dhairya; Moon, Logan; Sampath, Rahul; Shringarpure, Aashay; Vetter, Jeffrey; Vuduc, Richard; Zorin, Denis; Biros, George.

2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010. 2010. 5644910.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Rahimian, A, Lashuk, I, Veerapaneni, SK, Chandramowlishwaran, A, Malhotra, D, Moon, L, Sampath, R, Shringarpure, A, Vetter, J, Vuduc, R, Zorin, D & Biros, G 2010, Petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures. in 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010., 5644910, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010, New Orleans, LA, United States, 11/13/10. https://doi.org/10.1109/SC.2010.42
Rahimian A, Lashuk I, Veerapaneni SK, Chandramowlishwaran A, Malhotra D, Moon L et al. Petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures. In 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010. 2010. 5644910 https://doi.org/10.1109/SC.2010.42
Rahimian, Abtin ; Lashuk, Ilya ; Veerapaneni, Shravan K. ; Chandramowlishwaran, Aparna ; Malhotra, Dhairya ; Moon, Logan ; Sampath, Rahul ; Shringarpure, Aashay ; Vetter, Jeffrey ; Vuduc, Richard ; Zorin, Denis ; Biros, George. / Petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures. 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010. 2010.
@inproceedings{4eb6b7c75c394a81b6f02dc697e33389,
title = "Petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures",
abstract = "We present a fast, petaflop-scalable algorithm for Stokesian particulate flows. Our goal is the direct simulation of blood, which we model as a mixture of a Stokesian fluid (plasma) and red blood cells (RBCs). Directly simulating blood is a challenging multiscale, multiphysics problem. We report simulations with up to 200 million deformable RBCs. The largest simulation amounts to 90 billion unknowns in space. In terms of the number of cells, we improve the state-of-the art by several orders of magnitude: the previous largest simulation, at the same physical fidelity as ours, resolved the flow of O(1,000-10,000) RBCs. Our approach has three distinct characteristics: (1) we faithfully represent the physics of RBCs by using nonlinear solid mechanics to capture the deformations of each cell; (2) we accurately resolve the long-range, N-body, hydrodynamic interactions between RBCs (which are caused by the surrounding plasma); and (3) we allow for the highly non-uniform distribution of RBCs in space. The new method has been implemented in the software library MOBO (for {"}Moving Boundaries{"}). We designed MOBO to support parallelism at all levels, including inter-node distributed memory parallelism, intra-node shared memory parallelism, data parallelism (vectorization), and fine-grained multithreading for GPUs. We have implemented and optimized the majority of the computation kernels on both Intel/AMD x86 and NVidia's Tesla/Fermi platforms for single and double floating point precision. Overall, the code has scaled on 256 CPU-GPUs on the Teragrid's Lincoln cluster and on 200,000 AMD cores of the Oak Ridge National Laboratory's Jaguar PF system. In our largest simulation, we have achieved 0.7 Petaflops/s of sustained performance on Jaguar.",
author = "Abtin Rahimian and Ilya Lashuk and Veerapaneni, {Shravan K.} and Aparna Chandramowlishwaran and Dhairya Malhotra and Logan Moon and Rahul Sampath and Aashay Shringarpure and Jeffrey Vetter and Richard Vuduc and Denis Zorin and George Biros",
year = "2010",
doi = "10.1109/SC.2010.42",
language = "English (US)",
isbn = "9781424475575",
booktitle = "2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010",

}

TY - GEN

T1 - Petascale direct numerical simulation of blood flow on 200K cores and heterogeneous architectures

AU - Rahimian, Abtin

AU - Lashuk, Ilya

AU - Veerapaneni, Shravan K.

AU - Chandramowlishwaran, Aparna

AU - Malhotra, Dhairya

AU - Moon, Logan

AU - Sampath, Rahul

AU - Shringarpure, Aashay

AU - Vetter, Jeffrey

AU - Vuduc, Richard

AU - Zorin, Denis

AU - Biros, George

PY - 2010

Y1 - 2010

N2 - We present a fast, petaflop-scalable algorithm for Stokesian particulate flows. Our goal is the direct simulation of blood, which we model as a mixture of a Stokesian fluid (plasma) and red blood cells (RBCs). Directly simulating blood is a challenging multiscale, multiphysics problem. We report simulations with up to 200 million deformable RBCs. The largest simulation amounts to 90 billion unknowns in space. In terms of the number of cells, we improve the state-of-the art by several orders of magnitude: the previous largest simulation, at the same physical fidelity as ours, resolved the flow of O(1,000-10,000) RBCs. Our approach has three distinct characteristics: (1) we faithfully represent the physics of RBCs by using nonlinear solid mechanics to capture the deformations of each cell; (2) we accurately resolve the long-range, N-body, hydrodynamic interactions between RBCs (which are caused by the surrounding plasma); and (3) we allow for the highly non-uniform distribution of RBCs in space. The new method has been implemented in the software library MOBO (for "Moving Boundaries"). We designed MOBO to support parallelism at all levels, including inter-node distributed memory parallelism, intra-node shared memory parallelism, data parallelism (vectorization), and fine-grained multithreading for GPUs. We have implemented and optimized the majority of the computation kernels on both Intel/AMD x86 and NVidia's Tesla/Fermi platforms for single and double floating point precision. Overall, the code has scaled on 256 CPU-GPUs on the Teragrid's Lincoln cluster and on 200,000 AMD cores of the Oak Ridge National Laboratory's Jaguar PF system. In our largest simulation, we have achieved 0.7 Petaflops/s of sustained performance on Jaguar.

AB - We present a fast, petaflop-scalable algorithm for Stokesian particulate flows. Our goal is the direct simulation of blood, which we model as a mixture of a Stokesian fluid (plasma) and red blood cells (RBCs). Directly simulating blood is a challenging multiscale, multiphysics problem. We report simulations with up to 200 million deformable RBCs. The largest simulation amounts to 90 billion unknowns in space. In terms of the number of cells, we improve the state-of-the art by several orders of magnitude: the previous largest simulation, at the same physical fidelity as ours, resolved the flow of O(1,000-10,000) RBCs. Our approach has three distinct characteristics: (1) we faithfully represent the physics of RBCs by using nonlinear solid mechanics to capture the deformations of each cell; (2) we accurately resolve the long-range, N-body, hydrodynamic interactions between RBCs (which are caused by the surrounding plasma); and (3) we allow for the highly non-uniform distribution of RBCs in space. The new method has been implemented in the software library MOBO (for "Moving Boundaries"). We designed MOBO to support parallelism at all levels, including inter-node distributed memory parallelism, intra-node shared memory parallelism, data parallelism (vectorization), and fine-grained multithreading for GPUs. We have implemented and optimized the majority of the computation kernels on both Intel/AMD x86 and NVidia's Tesla/Fermi platforms for single and double floating point precision. Overall, the code has scaled on 256 CPU-GPUs on the Teragrid's Lincoln cluster and on 200,000 AMD cores of the Oak Ridge National Laboratory's Jaguar PF system. In our largest simulation, we have achieved 0.7 Petaflops/s of sustained performance on Jaguar.

UR - http://www.scopus.com/inward/record.url?scp=78650814738&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=78650814738&partnerID=8YFLogxK

U2 - 10.1109/SC.2010.42

DO - 10.1109/SC.2010.42

M3 - Conference contribution

AN - SCOPUS:78650814738

SN - 9781424475575

BT - 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis, SC 2010

ER -