Highly scalable Ab initio genomic motif identification

Benoit Marchand, Vladimir B. Bajic, Dinesh K. Kaushik

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time.

Original languageEnglish (US)
Title of host publicationProceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis
DOIs
StatePublished - Dec 14 2011
Event2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11 - Seattle, WA, United States
Duration: Nov 12 2011Nov 18 2011

Other

Other2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11
CountryUnited States
CitySeattle, WA
Period11/12/1111/18/11

Fingerprint

Scalability
Parallel programming
Identification (control systems)
Genes
Polynucleotides

Keywords

  • Data-flow parallel processing
  • Master-slave MPI parallel processing
  • Mixed-mode MPI-openMP parallel processing
  • Multi-level MPI collective operations
  • Multi-level workload distribution

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Cite this

Marchand, B., Bajic, V. B., & Kaushik, D. K. (2011). Highly scalable Ab initio genomic motif identification. In Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis [56] https://doi.org/10.1145/2063384.2063459

Highly scalable Ab initio genomic motif identification. / Marchand, Benoit; Bajic, Vladimir B.; Kaushik, Dinesh K.

Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis. 2011. 56.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Marchand, B, Bajic, VB & Kaushik, DK 2011, Highly scalable Ab initio genomic motif identification. in Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis., 56, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis, SC11, Seattle, WA, United States, 11/12/11. https://doi.org/10.1145/2063384.2063459
Marchand B, Bajic VB, Kaushik DK. Highly scalable Ab initio genomic motif identification. In Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis. 2011. 56 https://doi.org/10.1145/2063384.2063459
Marchand, Benoit ; Bajic, Vladimir B. ; Kaushik, Dinesh K. / Highly scalable Ab initio genomic motif identification. Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis. 2011.
@inproceedings{4dd62c3f2c7d4b4799b64be47a1d1daf,
title = "Highly scalable Ab initio genomic motif identification",
abstract = "We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94{\%} parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time.",
keywords = "Data-flow parallel processing, Master-slave MPI parallel processing, Mixed-mode MPI-openMP parallel processing, Multi-level MPI collective operations, Multi-level workload distribution",
author = "Benoit Marchand and Bajic, {Vladimir B.} and Kaushik, {Dinesh K.}",
year = "2011",
month = "12",
day = "14",
doi = "10.1145/2063384.2063459",
language = "English (US)",
isbn = "9781450307710",
booktitle = "Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis",

}

TY - GEN

T1 - Highly scalable Ab initio genomic motif identification

AU - Marchand, Benoit

AU - Bajic, Vladimir B.

AU - Kaushik, Dinesh K.

PY - 2011/12/14

Y1 - 2011/12/14

N2 - We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time.

AB - We present results of scaling an ab initio motif family identification system, Dragon Motif Finder (DMF), to 65,536 processor cores of IBM Blue Gene/P. DMF seeks groups of mutually similar polynucleotide patterns within a set of genomic sequences and builds various motif families from them. Such information is of relevance to many problems in life sciences. Prior attempts to scale such ab initio motif-finding algorithms achieved limited success. We solve the scalability issues using a combination of mixed-mode MPI-OpenMP parallel programming, master-slave work assignment, multi-level workload distribution, multi-level MPI collectives, and serial optimizations. While the scalability of our algorithm was excellent (94% parallel efficiency on 65,536 cores relative to 256 cores on a modest-size problem), the final speedup with respect to the original serial code exceeded 250,000 when serial optimizations are included. This enabled us to carry out many large-scale ab initio motiffinding simulations in a few hours while the original serial code would have needed decades of execution time.

KW - Data-flow parallel processing

KW - Master-slave MPI parallel processing

KW - Mixed-mode MPI-openMP parallel processing

KW - Multi-level MPI collective operations

KW - Multi-level workload distribution

UR - http://www.scopus.com/inward/record.url?scp=83155173350&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=83155173350&partnerID=8YFLogxK

U2 - 10.1145/2063384.2063459

DO - 10.1145/2063384.2063459

M3 - Conference contribution

SN - 9781450307710

BT - Proceedings of 2011 SC - International Conference for High Performance Computing, Networking, Storage and Analysis

ER -