Lower bounds on locality sensitive hashing

Rajeev Motwani, Assaf Naor, Rina Panigrahy

Research output: Contribution to journalArticle

Abstract

Given a metric space (X, dx), c ≥ 1, r > 0, and p,q ∈ [0, 1], a distribution over mappings ℋ : X → N is called a (r, cr, p, o)-sensitive hash family if any two points in X at distance at most r are mapped by ℋ to the same value with probability at least p, and any two points at distance greater than ℋ are mapped by ℋ to the same value with probability at most q. This notion was introduced by Indyk and Motwani in 1998 as the basis for an efficient approximate nearest neighbor search algorithm and has since been used extensively for this purpose. The performance of these algorithms is governed by the parameter ρ = log(1/p)/log(1/q), and constructing hash families with small p automatically yields improved nearest neighbor algorithms. Here we show that for X - ℓ 1 it is impossible to achieve ρ ≤ 1/2c. This almost matches the construction of Indyk and Motwani which achieves ρ ≤ 1/c.

Original languageEnglish (US)
Pages (from-to)930-935
Number of pages6
JournalSIAM Journal on Discrete Mathematics
Volume21
Issue number4
DOIs
StatePublished - 2007

Fingerprint

Hashing
Locality
Lower bound
Nearest Neighbor Search
Search Algorithm
Metric space
Nearest Neighbor
Family

Keywords

  • Locality sensitive hashing
  • Lower bounds
  • Nearest neighbor search

ASJC Scopus subject areas

  • Mathematics(all)

Cite this

Lower bounds on locality sensitive hashing. / Motwani, Rajeev; Naor, Assaf; Panigrahy, Rina.

In: SIAM Journal on Discrete Mathematics, Vol. 21, No. 4, 2007, p. 930-935.

Research output: Contribution to journalArticle

Motwani, R, Naor, A & Panigrahy, R 2007, 'Lower bounds on locality sensitive hashing', SIAM Journal on Discrete Mathematics, vol. 21, no. 4, pp. 930-935. https://doi.org/10.1137/050646858
Motwani, Rajeev ; Naor, Assaf ; Panigrahy, Rina. / Lower bounds on locality sensitive hashing. In: SIAM Journal on Discrete Mathematics. 2007 ; Vol. 21, No. 4. pp. 930-935.
@article{409c7352315446f2873dd7aa7073e36b,
title = "Lower bounds on locality sensitive hashing",
abstract = "Given a metric space (X, dx), c ≥ 1, r > 0, and p,q ∈ [0, 1], a distribution over mappings ℋ : X → N is called a (r, cr, p, o)-sensitive hash family if any two points in X at distance at most r are mapped by ℋ to the same value with probability at least p, and any two points at distance greater than ℋ are mapped by ℋ to the same value with probability at most q. This notion was introduced by Indyk and Motwani in 1998 as the basis for an efficient approximate nearest neighbor search algorithm and has since been used extensively for this purpose. The performance of these algorithms is governed by the parameter ρ = log(1/p)/log(1/q), and constructing hash families with small p automatically yields improved nearest neighbor algorithms. Here we show that for X - ℓ 1 it is impossible to achieve ρ ≤ 1/2c. This almost matches the construction of Indyk and Motwani which achieves ρ ≤ 1/c.",
keywords = "Locality sensitive hashing, Lower bounds, Nearest neighbor search",
author = "Rajeev Motwani and Assaf Naor and Rina Panigrahy",
year = "2007",
doi = "10.1137/050646858",
language = "English (US)",
volume = "21",
pages = "930--935",
journal = "SIAM Journal on Discrete Mathematics",
issn = "0895-4801",
publisher = "Society for Industrial and Applied Mathematics Publications",
number = "4",

}

TY - JOUR

T1 - Lower bounds on locality sensitive hashing

AU - Motwani, Rajeev

AU - Naor, Assaf

AU - Panigrahy, Rina

PY - 2007

Y1 - 2007

N2 - Given a metric space (X, dx), c ≥ 1, r > 0, and p,q ∈ [0, 1], a distribution over mappings ℋ : X → N is called a (r, cr, p, o)-sensitive hash family if any two points in X at distance at most r are mapped by ℋ to the same value with probability at least p, and any two points at distance greater than ℋ are mapped by ℋ to the same value with probability at most q. This notion was introduced by Indyk and Motwani in 1998 as the basis for an efficient approximate nearest neighbor search algorithm and has since been used extensively for this purpose. The performance of these algorithms is governed by the parameter ρ = log(1/p)/log(1/q), and constructing hash families with small p automatically yields improved nearest neighbor algorithms. Here we show that for X - ℓ 1 it is impossible to achieve ρ ≤ 1/2c. This almost matches the construction of Indyk and Motwani which achieves ρ ≤ 1/c.

AB - Given a metric space (X, dx), c ≥ 1, r > 0, and p,q ∈ [0, 1], a distribution over mappings ℋ : X → N is called a (r, cr, p, o)-sensitive hash family if any two points in X at distance at most r are mapped by ℋ to the same value with probability at least p, and any two points at distance greater than ℋ are mapped by ℋ to the same value with probability at most q. This notion was introduced by Indyk and Motwani in 1998 as the basis for an efficient approximate nearest neighbor search algorithm and has since been used extensively for this purpose. The performance of these algorithms is governed by the parameter ρ = log(1/p)/log(1/q), and constructing hash families with small p automatically yields improved nearest neighbor algorithms. Here we show that for X - ℓ 1 it is impossible to achieve ρ ≤ 1/2c. This almost matches the construction of Indyk and Motwani which achieves ρ ≤ 1/c.

KW - Locality sensitive hashing

KW - Lower bounds

KW - Nearest neighbor search

UR - http://www.scopus.com/inward/record.url?scp=56649117650&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=56649117650&partnerID=8YFLogxK

U2 - 10.1137/050646858

DO - 10.1137/050646858

M3 - Article

VL - 21

SP - 930

EP - 935

JO - SIAM Journal on Discrete Mathematics

JF - SIAM Journal on Discrete Mathematics

SN - 0895-4801

IS - 4

ER -