### Abstract

We present a deterministic sorting algorithm, Sample, Partition, and Merge Sort (SPMS), that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(nlog n) time cacheobliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(log nlog log n), which improves on previous bounds for deterministic sample sort. The algorithm also has low false sharing costs. When scheduled by a work-stealing scheduler in a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, the costs of the additional cache misses and false sharing misses due to this parallel execution are bounded by the cost of O(S M/B) and O(S B) cache misses, respectively, where S is the number of steals performed during the execution. Finally, SPMS is resource oblivious in that the dependence on machine parameters appear only in the analysis of its performance and not within the algorithm itself.

Original language | English (US) |
---|---|

Article number | a23 |

Journal | ACM Transactions on Parallel Computing |

Volume | 3 |

Issue number | 4 |

DOIs | |

State | Published - Mar 1 2017 |

### Fingerprint

### Keywords

- Cache oblivious
- merge sort
- Sample sort
- Sorting

### ASJC Scopus subject areas

- Computational Theory and Mathematics
- Computer Science Applications
- Hardware and Architecture
- Software
- Modeling and Simulation

### Cite this

*ACM Transactions on Parallel Computing*,

*3*(4), [a23]. https://doi.org/10.1145/3040221

**Resource oblivious sorting on multicores.** / Cole, Richard; Ramachandran, Vijaya.

Research output: Contribution to journal › Article

*ACM Transactions on Parallel Computing*, vol. 3, no. 4, a23. https://doi.org/10.1145/3040221

}

TY - JOUR

T1 - Resource oblivious sorting on multicores

AU - Cole, Richard

AU - Ramachandran, Vijaya

PY - 2017/3/1

Y1 - 2017/3/1

N2 - We present a deterministic sorting algorithm, Sample, Partition, and Merge Sort (SPMS), that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(nlog n) time cacheobliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(log nlog log n), which improves on previous bounds for deterministic sample sort. The algorithm also has low false sharing costs. When scheduled by a work-stealing scheduler in a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, the costs of the additional cache misses and false sharing misses due to this parallel execution are bounded by the cost of O(S M/B) and O(S B) cache misses, respectively, where S is the number of steals performed during the execution. Finally, SPMS is resource oblivious in that the dependence on machine parameters appear only in the analysis of its performance and not within the algorithm itself.

AB - We present a deterministic sorting algorithm, Sample, Partition, and Merge Sort (SPMS), that interleaves the partitioning of a sample sort with merging. Sequentially, it sorts n elements in O(nlog n) time cacheobliviously with an optimal number of cache misses. The parallel complexity (or critical path length) of the algorithm is O(log nlog log n), which improves on previous bounds for deterministic sample sort. The algorithm also has low false sharing costs. When scheduled by a work-stealing scheduler in a multicore computing environment with a global shared memory and p cores, each having a cache of size M organized in blocks of size B, the costs of the additional cache misses and false sharing misses due to this parallel execution are bounded by the cost of O(S M/B) and O(S B) cache misses, respectively, where S is the number of steals performed during the execution. Finally, SPMS is resource oblivious in that the dependence on machine parameters appear only in the analysis of its performance and not within the algorithm itself.

KW - Cache oblivious

KW - merge sort

KW - Sample sort

KW - Sorting

UR - http://www.scopus.com/inward/record.url?scp=85054897853&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85054897853&partnerID=8YFLogxK

U2 - 10.1145/3040221

DO - 10.1145/3040221

M3 - Article

VL - 3

JO - ACM Transactions on Parallel Computing

JF - ACM Transactions on Parallel Computing

SN - 2329-4949

IS - 4

M1 - a23

ER -