Resource oblivious sorting on multicores

Richard Cole, Vijaya Ramachandran

Research output: Contribution to journalArticle

Abstract

We present a deterministic sorting algorithm, Sample, Partition, and Merge Sort (SPMS), that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(nlog n) time cacheobliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(log nlog log n), which improves on previous bounds for deterministic sample sort. The algorithm also has low false sharing costs. When scheduled by a work-stealing scheduler in a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, the costs of the additional cache misses and false sharing misses due to this parallel execution are bounded by the cost of O(S M/B) and O(S B) cache misses, respectively, where S is the number of steals performed during the execution. Finally, SPMS is resource oblivious in that the dependence on machine parameters appear only in the analysis of its performance and not within the algorithm itself.

Original languageEnglish (US)
Article numbera23
JournalACM Transactions on Parallel Computing
Volume3
Issue number4
DOIs
StatePublished - Mar 1 2017

Fingerprint

Sorting
Sort
Cache
Resources
Costs
Partition
Cost Sharing
Merging
Critical Path
Sorting algorithm
Deterministic Algorithm
Path Length
Shared Memory
Scheduler
Partitioning
Sharing
Data storage equipment
Computing
False

Keywords

  • Cache oblivious
  • merge sort
  • Sample sort
  • Sorting

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Hardware and Architecture
  • Software
  • Modeling and Simulation

Cite this

Resource oblivious sorting on multicores. / Cole, Richard; Ramachandran, Vijaya.

In: ACM Transactions on Parallel Computing, Vol. 3, No. 4, a23, 01.03.2017.

Research output: Contribution to journalArticle

@article{abe001caaa5449f3982a9889568f59fa,
title = "Resource oblivious sorting on multicores",
abstract = "We present a deterministic sorting algorithm, Sample, Partition, and Merge Sort (SPMS), that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(nlog n) time cacheobliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(log nlog log n), which improves on previous bounds for deterministic sample sort. The algorithm also has low false sharing costs. When scheduled by a work-stealing scheduler in a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, the costs of the additional cache misses and false sharing misses due to this parallel execution are bounded by the cost of O(S M/B) and O(S B) cache misses, respectively, where S is the number of steals performed during the execution. Finally, SPMS is resource oblivious in that the dependence on machine parameters appear only in the analysis of its performance and not within the algorithm itself.",
keywords = "Cache oblivious, merge sort, Sample sort, Sorting",
author = "Richard Cole and Vijaya Ramachandran",
year = "2017",
month = "3",
day = "1",
doi = "10.1145/3040221",
language = "English (US)",
volume = "3",
journal = "ACM Transactions on Parallel Computing",
issn = "2329-4949",
publisher = "Association for Computing Machinery (ACM)",
number = "4",

}

TY - JOUR

T1 - Resource oblivious sorting on multicores

AU - Cole, Richard

AU - Ramachandran, Vijaya

PY - 2017/3/1

Y1 - 2017/3/1

N2 - We present a deterministic sorting algorithm, Sample, Partition, and Merge Sort (SPMS), that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(nlog n) time cacheobliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(log nlog log n), which improves on previous bounds for deterministic sample sort. The algorithm also has low false sharing costs. When scheduled by a work-stealing scheduler in a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, the costs of the additional cache misses and false sharing misses due to this parallel execution are bounded by the cost of O(S M/B) and O(S B) cache misses, respectively, where S is the number of steals performed during the execution. Finally, SPMS is resource oblivious in that the dependence on machine parameters appear only in the analysis of its performance and not within the algorithm itself.

AB - We present a deterministic sorting algorithm, Sample, Partition, and Merge Sort (SPMS), that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(nlog n) time cacheobliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(log nlog log n), which improves on previous bounds for deterministic sample sort. The algorithm also has low false sharing costs. When scheduled by a work-stealing scheduler in a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, the costs of the additional cache misses and false sharing misses due to this parallel execution are bounded by the cost of O(S M/B) and O(S B) cache misses, respectively, where S is the number of steals performed during the execution. Finally, SPMS is resource oblivious in that the dependence on machine parameters appear only in the analysis of its performance and not within the algorithm itself.

KW - Cache oblivious

KW - merge sort

KW - Sample sort

KW - Sorting

UR - http://www.scopus.com/inward/record.url?scp=85054897853&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054897853&partnerID=8YFLogxK

U2 - 10.1145/3040221

DO - 10.1145/3040221

M3 - Article

VL - 3

JO - ACM Transactions on Parallel Computing

JF - ACM Transactions on Parallel Computing

SN - 2329-4949

IS - 4

M1 - a23

ER -