Efficient resource oblivious algorithms for multicores with false sharing

Richard Cole, Vijaya Ramachandran

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider algorithms for a multicore environment in which each core has its own private cache and false sharing can occur. False sharing happens when two or more processors access the same block (i.e., cache-line) in parallel, and at least one processor writes into a location in the block. False sharing causes different processors to have inconsistent views of the data in the block, and many of the methods currently used to resolve these inconsistencies can cause large delays. We analyze the cost of false sharing both for variables stored on the execution stacks of the parallel tasks and for output variables. Our main technical contribution is to establish a low cost for this overhead for the class of multithreaded block-resilient HBP (Hierarchical Balanced Parallel) computations. Using this and other techniques, we develop block-resilient HBP algorithms with low false sharing costs for several fundamental problems including scans, matrix multiplication, FFT, sorting, and hybrid block-resilient HBP algorithms for list ranking and graph connected components. Most of these algorithms are derived from known multicore algorithms, but are further refined to achieve a low false sharing overhead. Our algorithms make no mention of machine parameters, and our analysis of the false sharing overhead is mostly in terms of the the number of tasks generated in parallel during the computation, and thus applies to a variety of schedulers.

Original languageEnglish (US)
Title of host publicationProceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
Pages201-214
Number of pages14
DOIs
StatePublished - 2012
Event2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012 - Shanghai, China
Duration: May 21 2012May 25 2012

Other

Other2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012
CountryChina
CityShanghai
Period5/21/125/25/12

Fingerprint

Parallel algorithms
Costs
Sorting
Fast Fourier transforms

Keywords

  • cache-efficiency
  • false-sharing
  • multicores

ASJC Scopus subject areas

  • Software

Cite this

Cole, R., & Ramachandran, V. (2012). Efficient resource oblivious algorithms for multicores with false sharing. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012 (pp. 201-214). [6267836] https://doi.org/10.1109/IPDPS.2012.28

Efficient resource oblivious algorithms for multicores with false sharing. / Cole, Richard; Ramachandran, Vijaya.

Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012. 2012. p. 201-214 6267836.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Cole, R & Ramachandran, V 2012, Efficient resource oblivious algorithms for multicores with false sharing. in Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012., 6267836, pp. 201-214, 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012, Shanghai, China, 5/21/12. https://doi.org/10.1109/IPDPS.2012.28
Cole R, Ramachandran V. Efficient resource oblivious algorithms for multicores with false sharing. In Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012. 2012. p. 201-214. 6267836 https://doi.org/10.1109/IPDPS.2012.28
Cole, Richard ; Ramachandran, Vijaya. / Efficient resource oblivious algorithms for multicores with false sharing. Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012. 2012. pp. 201-214
@inproceedings{c180e2b667c94cab914dc82c36765e2c,
title = "Efficient resource oblivious algorithms for multicores with false sharing",
abstract = "We consider algorithms for a multicore environment in which each core has its own private cache and false sharing can occur. False sharing happens when two or more processors access the same block (i.e., cache-line) in parallel, and at least one processor writes into a location in the block. False sharing causes different processors to have inconsistent views of the data in the block, and many of the methods currently used to resolve these inconsistencies can cause large delays. We analyze the cost of false sharing both for variables stored on the execution stacks of the parallel tasks and for output variables. Our main technical contribution is to establish a low cost for this overhead for the class of multithreaded block-resilient HBP (Hierarchical Balanced Parallel) computations. Using this and other techniques, we develop block-resilient HBP algorithms with low false sharing costs for several fundamental problems including scans, matrix multiplication, FFT, sorting, and hybrid block-resilient HBP algorithms for list ranking and graph connected components. Most of these algorithms are derived from known multicore algorithms, but are further refined to achieve a low false sharing overhead. Our algorithms make no mention of machine parameters, and our analysis of the false sharing overhead is mostly in terms of the the number of tasks generated in parallel during the computation, and thus applies to a variety of schedulers.",
keywords = "cache-efficiency, false-sharing, multicores",
author = "Richard Cole and Vijaya Ramachandran",
year = "2012",
doi = "10.1109/IPDPS.2012.28",
language = "English (US)",
isbn = "9780769546759",
pages = "201--214",
booktitle = "Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012",

}

TY - GEN

T1 - Efficient resource oblivious algorithms for multicores with false sharing

AU - Cole, Richard

AU - Ramachandran, Vijaya

PY - 2012

Y1 - 2012

N2 - We consider algorithms for a multicore environment in which each core has its own private cache and false sharing can occur. False sharing happens when two or more processors access the same block (i.e., cache-line) in parallel, and at least one processor writes into a location in the block. False sharing causes different processors to have inconsistent views of the data in the block, and many of the methods currently used to resolve these inconsistencies can cause large delays. We analyze the cost of false sharing both for variables stored on the execution stacks of the parallel tasks and for output variables. Our main technical contribution is to establish a low cost for this overhead for the class of multithreaded block-resilient HBP (Hierarchical Balanced Parallel) computations. Using this and other techniques, we develop block-resilient HBP algorithms with low false sharing costs for several fundamental problems including scans, matrix multiplication, FFT, sorting, and hybrid block-resilient HBP algorithms for list ranking and graph connected components. Most of these algorithms are derived from known multicore algorithms, but are further refined to achieve a low false sharing overhead. Our algorithms make no mention of machine parameters, and our analysis of the false sharing overhead is mostly in terms of the the number of tasks generated in parallel during the computation, and thus applies to a variety of schedulers.

AB - We consider algorithms for a multicore environment in which each core has its own private cache and false sharing can occur. False sharing happens when two or more processors access the same block (i.e., cache-line) in parallel, and at least one processor writes into a location in the block. False sharing causes different processors to have inconsistent views of the data in the block, and many of the methods currently used to resolve these inconsistencies can cause large delays. We analyze the cost of false sharing both for variables stored on the execution stacks of the parallel tasks and for output variables. Our main technical contribution is to establish a low cost for this overhead for the class of multithreaded block-resilient HBP (Hierarchical Balanced Parallel) computations. Using this and other techniques, we develop block-resilient HBP algorithms with low false sharing costs for several fundamental problems including scans, matrix multiplication, FFT, sorting, and hybrid block-resilient HBP algorithms for list ranking and graph connected components. Most of these algorithms are derived from known multicore algorithms, but are further refined to achieve a low false sharing overhead. Our algorithms make no mention of machine parameters, and our analysis of the false sharing overhead is mostly in terms of the the number of tasks generated in parallel during the computation, and thus applies to a variety of schedulers.

KW - cache-efficiency

KW - false-sharing

KW - multicores

UR - http://www.scopus.com/inward/record.url?scp=84866842082&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84866842082&partnerID=8YFLogxK

U2 - 10.1109/IPDPS.2012.28

DO - 10.1109/IPDPS.2012.28

M3 - Conference contribution

SN - 9780769546759

SP - 201

EP - 214

BT - Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium, IPDPS 2012

ER -