### Abstract

A family of functions F that map [0,m - 1] into [0,n - 1] is said to be k-wise independent if any tuple of k distinct points in [0, m - 1] have a corresponding image, for a randomly selected f ∈ F, that is uniformly distributed in [0, n - 1]
^{k}. This paper shows that for suitably fixed ε < 1 and any k < m
^{ε}, there are families of k-wise independent functions that can be evaluated in constant time for the standard random access model of computation. It is also proven that any such family requires a storage array of m
^{δ} random seeds for a suitable δ < 1. These seeds can be pseudorandom values precomputed from an initial O(k) random seeds. A simple adaptation yields n
^{ε}-wise independent functions that require n
^{δ} storage in many cases where m ≫ n. Lower bounds are presented to show that neither storage requirement can be materially reduced. Previous constructions of random functions having constant evaluation time and sublinear storage exhibited only a constant degree of independence. Unfortunately, the explicit randomized constructions, while requiring a constant number of operations, are far too slow for any practical application. However, nonconstructive existence arguments are given, which suggest that this factor might be eliminated. The problem of eliminating this factor is shown to be equivalent to a fundamental open question in graph theory. As a consequence of these constructions, many probabilistic algorithms-from traditional hashing to Ranade's emulation of common PRAM algorithms - can for the first time be shown to achieve, up to constant factors, their expected asymptotic performance for a programmable, albeit formal and currently impractical, model of computation, and a research direction is now available that may eventually lead to implementations that are fast and provably sound.

Original language | English (US) |
---|---|

Pages (from-to) | 505-543 |

Number of pages | 39 |

Journal | SIAM Journal on Computing |

Volume | 33 |

Issue number | 3 |

DOIs | |

State | Published - 2004 |

### Fingerprint

### Keywords

- Hash functions
- Hashing
- Limited independence
- Optimal speedup
- PRAM emulation
- Storage-time tradeoff
- Universal hash functions

### ASJC Scopus subject areas

- Theoretical Computer Science
- Computational Theory and Mathematics
- Applied Mathematics

### Cite this

**On universal classes of extremely random constant-time hash functions.** / Siegel, Alan.

Research output: Contribution to journal › Article

*SIAM Journal on Computing*, vol. 33, no. 3, pp. 505-543. https://doi.org/10.1137/S0097539701386216

}

TY - JOUR

T1 - On universal classes of extremely random constant-time hash functions

AU - Siegel, Alan

PY - 2004

Y1 - 2004

N2 - A family of functions F that map [0,m - 1] into [0,n - 1] is said to be k-wise independent if any tuple of k distinct points in [0, m - 1] have a corresponding image, for a randomly selected f ∈ F, that is uniformly distributed in [0, n - 1] k. This paper shows that for suitably fixed ε < 1 and any k < m ε, there are families of k-wise independent functions that can be evaluated in constant time for the standard random access model of computation. It is also proven that any such family requires a storage array of m δ random seeds for a suitable δ < 1. These seeds can be pseudorandom values precomputed from an initial O(k) random seeds. A simple adaptation yields n ε-wise independent functions that require n δ storage in many cases where m ≫ n. Lower bounds are presented to show that neither storage requirement can be materially reduced. Previous constructions of random functions having constant evaluation time and sublinear storage exhibited only a constant degree of independence. Unfortunately, the explicit randomized constructions, while requiring a constant number of operations, are far too slow for any practical application. However, nonconstructive existence arguments are given, which suggest that this factor might be eliminated. The problem of eliminating this factor is shown to be equivalent to a fundamental open question in graph theory. As a consequence of these constructions, many probabilistic algorithms-from traditional hashing to Ranade's emulation of common PRAM algorithms - can for the first time be shown to achieve, up to constant factors, their expected asymptotic performance for a programmable, albeit formal and currently impractical, model of computation, and a research direction is now available that may eventually lead to implementations that are fast and provably sound.

AB - A family of functions F that map [0,m - 1] into [0,n - 1] is said to be k-wise independent if any tuple of k distinct points in [0, m - 1] have a corresponding image, for a randomly selected f ∈ F, that is uniformly distributed in [0, n - 1] k. This paper shows that for suitably fixed ε < 1 and any k < m ε, there are families of k-wise independent functions that can be evaluated in constant time for the standard random access model of computation. It is also proven that any such family requires a storage array of m δ random seeds for a suitable δ < 1. These seeds can be pseudorandom values precomputed from an initial O(k) random seeds. A simple adaptation yields n ε-wise independent functions that require n δ storage in many cases where m ≫ n. Lower bounds are presented to show that neither storage requirement can be materially reduced. Previous constructions of random functions having constant evaluation time and sublinear storage exhibited only a constant degree of independence. Unfortunately, the explicit randomized constructions, while requiring a constant number of operations, are far too slow for any practical application. However, nonconstructive existence arguments are given, which suggest that this factor might be eliminated. The problem of eliminating this factor is shown to be equivalent to a fundamental open question in graph theory. As a consequence of these constructions, many probabilistic algorithms-from traditional hashing to Ranade's emulation of common PRAM algorithms - can for the first time be shown to achieve, up to constant factors, their expected asymptotic performance for a programmable, albeit formal and currently impractical, model of computation, and a research direction is now available that may eventually lead to implementations that are fast and provably sound.

KW - Hash functions

KW - Hashing

KW - Limited independence

KW - Optimal speedup

KW - PRAM emulation

KW - Storage-time tradeoff

KW - Universal hash functions

UR - http://www.scopus.com/inward/record.url?scp=3142708934&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=3142708934&partnerID=8YFLogxK

U2 - 10.1137/S0097539701386216

DO - 10.1137/S0097539701386216

M3 - Article

AN - SCOPUS:3142708934

VL - 33

SP - 505

EP - 543

JO - SIAM Journal on Computing

JF - SIAM Journal on Computing

SN - 0097-5397

IS - 3

ER -