On universal classes of extremely random constant-time hash functions

Research output: Contribution to journalArticle

Abstract

A family of functions F that map [0,m - 1] into [0,n - 1] is said to be k-wise independent if any tuple of k distinct points in [0, m - 1] have a corresponding image, for a randomly selected f ∈ F, that is uniformly distributed in [0, n - 1] k. This paper shows that for suitably fixed ε < 1 and any k < m ε, there are families of k-wise independent functions that can be evaluated in constant time for the standard random access model of computation. It is also proven that any such family requires a storage array of m δ random seeds for a suitable δ < 1. These seeds can be pseudorandom values precomputed from an initial O(k) random seeds. A simple adaptation yields n ε-wise independent functions that require n δ storage in many cases where m ≫ n. Lower bounds are presented to show that neither storage requirement can be materially reduced. Previous constructions of random functions having constant evaluation time and sublinear storage exhibited only a constant degree of independence. Unfortunately, the explicit randomized constructions, while requiring a constant number of operations, are far too slow for any practical application. However, nonconstructive existence arguments are given, which suggest that this factor might be eliminated. The problem of eliminating this factor is shown to be equivalent to a fundamental open question in graph theory. As a consequence of these constructions, many probabilistic algorithms-from traditional hashing to Ranade's emulation of common PRAM algorithms - can for the first time be shown to achieve, up to constant factors, their expected asymptotic performance for a programmable, albeit formal and currently impractical, model of computation, and a research direction is now available that may eventually lead to implementations that are fast and provably sound.

Original languageEnglish (US)
Pages (from-to)505-543
Number of pages39
JournalSIAM Journal on Computing
Volume33
Issue number3
DOIs
StatePublished - 2004

Fingerprint

Hash functions
Hash Function
Time Constant
Seed
Models of Computation
Probabilistic Algorithms
Random Access
Emulation
Random Function
Graph theory
Hashing
Acoustic waves
Lower bound
Distinct
Class
Requirements
Evaluation
Family

Keywords

  • Hash functions
  • Hashing
  • Limited independence
  • Optimal speedup
  • PRAM emulation
  • Storage-time tradeoff
  • Universal hash functions

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computational Theory and Mathematics
  • Applied Mathematics

Cite this

On universal classes of extremely random constant-time hash functions. / Siegel, Alan.

In: SIAM Journal on Computing, Vol. 33, No. 3, 2004, p. 505-543.

Research output: Contribution to journalArticle

@article{fb7024b646b64c1197f276336a910e19,
title = "On universal classes of extremely random constant-time hash functions",
abstract = "A family of functions F that map [0,m - 1] into [0,n - 1] is said to be k-wise independent if any tuple of k distinct points in [0, m - 1] have a corresponding image, for a randomly selected f ∈ F, that is uniformly distributed in [0, n - 1] k. This paper shows that for suitably fixed ε < 1 and any k < m ε, there are families of k-wise independent functions that can be evaluated in constant time for the standard random access model of computation. It is also proven that any such family requires a storage array of m δ random seeds for a suitable δ < 1. These seeds can be pseudorandom values precomputed from an initial O(k) random seeds. A simple adaptation yields n ε-wise independent functions that require n δ storage in many cases where m ≫ n. Lower bounds are presented to show that neither storage requirement can be materially reduced. Previous constructions of random functions having constant evaluation time and sublinear storage exhibited only a constant degree of independence. Unfortunately, the explicit randomized constructions, while requiring a constant number of operations, are far too slow for any practical application. However, nonconstructive existence arguments are given, which suggest that this factor might be eliminated. The problem of eliminating this factor is shown to be equivalent to a fundamental open question in graph theory. As a consequence of these constructions, many probabilistic algorithms-from traditional hashing to Ranade's emulation of common PRAM algorithms - can for the first time be shown to achieve, up to constant factors, their expected asymptotic performance for a programmable, albeit formal and currently impractical, model of computation, and a research direction is now available that may eventually lead to implementations that are fast and provably sound.",
keywords = "Hash functions, Hashing, Limited independence, Optimal speedup, PRAM emulation, Storage-time tradeoff, Universal hash functions",
author = "Alan Siegel",
year = "2004",
doi = "10.1137/S0097539701386216",
language = "English (US)",
volume = "33",
pages = "505--543",
journal = "SIAM Journal on Computing",
issn = "0097-5397",
publisher = "Society for Industrial and Applied Mathematics Publications",
number = "3",

}

TY - JOUR

T1 - On universal classes of extremely random constant-time hash functions

AU - Siegel, Alan

PY - 2004

Y1 - 2004

N2 - A family of functions F that map [0,m - 1] into [0,n - 1] is said to be k-wise independent if any tuple of k distinct points in [0, m - 1] have a corresponding image, for a randomly selected f ∈ F, that is uniformly distributed in [0, n - 1] k. This paper shows that for suitably fixed ε < 1 and any k < m ε, there are families of k-wise independent functions that can be evaluated in constant time for the standard random access model of computation. It is also proven that any such family requires a storage array of m δ random seeds for a suitable δ < 1. These seeds can be pseudorandom values precomputed from an initial O(k) random seeds. A simple adaptation yields n ε-wise independent functions that require n δ storage in many cases where m ≫ n. Lower bounds are presented to show that neither storage requirement can be materially reduced. Previous constructions of random functions having constant evaluation time and sublinear storage exhibited only a constant degree of independence. Unfortunately, the explicit randomized constructions, while requiring a constant number of operations, are far too slow for any practical application. However, nonconstructive existence arguments are given, which suggest that this factor might be eliminated. The problem of eliminating this factor is shown to be equivalent to a fundamental open question in graph theory. As a consequence of these constructions, many probabilistic algorithms-from traditional hashing to Ranade's emulation of common PRAM algorithms - can for the first time be shown to achieve, up to constant factors, their expected asymptotic performance for a programmable, albeit formal and currently impractical, model of computation, and a research direction is now available that may eventually lead to implementations that are fast and provably sound.

AB - A family of functions F that map [0,m - 1] into [0,n - 1] is said to be k-wise independent if any tuple of k distinct points in [0, m - 1] have a corresponding image, for a randomly selected f ∈ F, that is uniformly distributed in [0, n - 1] k. This paper shows that for suitably fixed ε < 1 and any k < m ε, there are families of k-wise independent functions that can be evaluated in constant time for the standard random access model of computation. It is also proven that any such family requires a storage array of m δ random seeds for a suitable δ < 1. These seeds can be pseudorandom values precomputed from an initial O(k) random seeds. A simple adaptation yields n ε-wise independent functions that require n δ storage in many cases where m ≫ n. Lower bounds are presented to show that neither storage requirement can be materially reduced. Previous constructions of random functions having constant evaluation time and sublinear storage exhibited only a constant degree of independence. Unfortunately, the explicit randomized constructions, while requiring a constant number of operations, are far too slow for any practical application. However, nonconstructive existence arguments are given, which suggest that this factor might be eliminated. The problem of eliminating this factor is shown to be equivalent to a fundamental open question in graph theory. As a consequence of these constructions, many probabilistic algorithms-from traditional hashing to Ranade's emulation of common PRAM algorithms - can for the first time be shown to achieve, up to constant factors, their expected asymptotic performance for a programmable, albeit formal and currently impractical, model of computation, and a research direction is now available that may eventually lead to implementations that are fast and provably sound.

KW - Hash functions

KW - Hashing

KW - Limited independence

KW - Optimal speedup

KW - PRAM emulation

KW - Storage-time tradeoff

KW - Universal hash functions

UR - http://www.scopus.com/inward/record.url?scp=3142708934&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=3142708934&partnerID=8YFLogxK

U2 - 10.1137/S0097539701386216

DO - 10.1137/S0097539701386216

M3 - Article

AN - SCOPUS:3142708934

VL - 33

SP - 505

EP - 543

JO - SIAM Journal on Computing

JF - SIAM Journal on Computing

SN - 0097-5397

IS - 3

ER -