Fast four-way parallel radix sorting on GPUs

Linh Ha, Jens Krüger, Cláudio T. Silva

Research output: Contribution to journalArticle

Abstract

Efficient sorting is a key requirement for many computer science algorithms. Acceleration of existing techniques as well as developing new sorting approaches is crucial for many real-time graphics scenarios, database systems, and numerical simulations to name just a few. It is one of the most fundamental operations to organize and filter the ever growing massive amounts of data gathered on a daily basis. While optimal sorting models for serial execution on a single processor exist, efficient parallel sorting remains a challenge. In this paper, we present a hardware-optimized parallel implementation of the radix sort algorithm that results in a significant speed up over existing sorting implementations. We outperform all known General Processing Unit (GPU) based sorting systems by about a factor of two and eliminate restrictions on the sorting key space. This makes our algorithm not only the fastest, but also the first general GPU sorting solution.

Original languageEnglish (US)
Pages (from-to)2368-2378
Number of pages11
JournalComputer Graphics Forum
Volume28
Issue number8
DOIs
StatePublished - Dec 2009

Fingerprint

Sorting
Graphics processing unit
Processing
Computer science
Hardware
Computer simulation

Keywords

  • Collision detection
  • GPGPU
  • GPU sorting
  • HPC
  • Parallel sorting

ASJC Scopus subject areas

  • Computer Networks and Communications

Cite this

Fast four-way parallel radix sorting on GPUs. / Ha, Linh; Krüger, Jens; Silva, Cláudio T.

In: Computer Graphics Forum, Vol. 28, No. 8, 12.2009, p. 2368-2378.

Research output: Contribution to journalArticle

Ha, Linh ; Krüger, Jens ; Silva, Cláudio T. / Fast four-way parallel radix sorting on GPUs. In: Computer Graphics Forum. 2009 ; Vol. 28, No. 8. pp. 2368-2378.
@article{b7395438b4504d5badf790fa1eca90c7,
title = "Fast four-way parallel radix sorting on GPUs",
abstract = "Efficient sorting is a key requirement for many computer science algorithms. Acceleration of existing techniques as well as developing new sorting approaches is crucial for many real-time graphics scenarios, database systems, and numerical simulations to name just a few. It is one of the most fundamental operations to organize and filter the ever growing massive amounts of data gathered on a daily basis. While optimal sorting models for serial execution on a single processor exist, efficient parallel sorting remains a challenge. In this paper, we present a hardware-optimized parallel implementation of the radix sort algorithm that results in a significant speed up over existing sorting implementations. We outperform all known General Processing Unit (GPU) based sorting systems by about a factor of two and eliminate restrictions on the sorting key space. This makes our algorithm not only the fastest, but also the first general GPU sorting solution.",
keywords = "Collision detection, GPGPU, GPU sorting, HPC, Parallel sorting",
author = "Linh Ha and Jens Kr{\"u}ger and Silva, {Cl{\'a}udio T.}",
year = "2009",
month = "12",
doi = "10.1111/j.1467-8659.2009.01542.x",
language = "English (US)",
volume = "28",
pages = "2368--2378",
journal = "Computer Graphics Forum",
issn = "0167-7055",
publisher = "Wiley-Blackwell",
number = "8",

}

TY - JOUR

T1 - Fast four-way parallel radix sorting on GPUs

AU - Ha, Linh

AU - Krüger, Jens

AU - Silva, Cláudio T.

PY - 2009/12

Y1 - 2009/12

N2 - Efficient sorting is a key requirement for many computer science algorithms. Acceleration of existing techniques as well as developing new sorting approaches is crucial for many real-time graphics scenarios, database systems, and numerical simulations to name just a few. It is one of the most fundamental operations to organize and filter the ever growing massive amounts of data gathered on a daily basis. While optimal sorting models for serial execution on a single processor exist, efficient parallel sorting remains a challenge. In this paper, we present a hardware-optimized parallel implementation of the radix sort algorithm that results in a significant speed up over existing sorting implementations. We outperform all known General Processing Unit (GPU) based sorting systems by about a factor of two and eliminate restrictions on the sorting key space. This makes our algorithm not only the fastest, but also the first general GPU sorting solution.

AB - Efficient sorting is a key requirement for many computer science algorithms. Acceleration of existing techniques as well as developing new sorting approaches is crucial for many real-time graphics scenarios, database systems, and numerical simulations to name just a few. It is one of the most fundamental operations to organize and filter the ever growing massive amounts of data gathered on a daily basis. While optimal sorting models for serial execution on a single processor exist, efficient parallel sorting remains a challenge. In this paper, we present a hardware-optimized parallel implementation of the radix sort algorithm that results in a significant speed up over existing sorting implementations. We outperform all known General Processing Unit (GPU) based sorting systems by about a factor of two and eliminate restrictions on the sorting key space. This makes our algorithm not only the fastest, but also the first general GPU sorting solution.

KW - Collision detection

KW - GPGPU

KW - GPU sorting

KW - HPC

KW - Parallel sorting

UR - http://www.scopus.com/inward/record.url?scp=72249121198&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=72249121198&partnerID=8YFLogxK

U2 - 10.1111/j.1467-8659.2009.01542.x

DO - 10.1111/j.1467-8659.2009.01542.x

M3 - Article

VL - 28

SP - 2368

EP - 2378

JO - Computer Graphics Forum

JF - Computer Graphics Forum

SN - 0167-7055

IS - 8

ER -