Learning binary hash codes for large-scale image search

Kristen Grauman, Robert Fergus

Research output: Chapter in Book/Report/Conference proceedingChapter

Abstract

Algorithms to rapidly search massive image or video collections are critical for many vision applications, including visual search, content-based retrieval, and non-parametric models for object recognition. Recent work shows that learned binary projections are a powerful way to index large collections according to their content. The basic idea is to formulate the projections so as to approximately preserve a given similarity function of interest. Having done so, one can then search the data efficiently using hash tables, or by exploring the Hamming ball volume around a novel query. Both enable sub-linear time retrieval with respect to the database size. Further, depending on the design of the projections, in some cases it is possible to bound the number of database examples that must be searched in order to achieve a given level of accuracy. This chapter overviews data structures for fast search with binary codes, and then describes several supervised and unsupervised strategies for generating the codes. In particular, we review supervised methods that integrate metric learning, boosting, and neural networks into the hash key construction, and unsupervised methods based on spectral analysis or kernelized random projections that compute affinity-preserving binary codes.Whether learning from explicit semantic supervision or exploiting the structure among unlabeled data, these methods make scalable retrieval possible for a variety of robust visual similarity measures.We focus on defining the algorithms, and illustrate the main points with results using millions of images.

Original languageEnglish (US)
Title of host publicationMachine Learning for Computer Vision
Pages49-87
Number of pages39
Volume411
DOIs
StatePublished - 2013

Publication series

NameStudies in Computational Intelligence
Volume411
ISSN (Print)1860949X

Fingerprint

Binary codes
Content based retrieval
Object recognition
Spectrum analysis
Data structures
Semantics
Neural networks

ASJC Scopus subject areas

  • Artificial Intelligence

Cite this

Grauman, K., & Fergus, R. (2013). Learning binary hash codes for large-scale image search. In Machine Learning for Computer Vision (Vol. 411, pp. 49-87). (Studies in Computational Intelligence; Vol. 411). https://doi.org/10.1007/978-3-642-28661-2-3

Learning binary hash codes for large-scale image search. / Grauman, Kristen; Fergus, Robert.

Machine Learning for Computer Vision. Vol. 411 2013. p. 49-87 (Studies in Computational Intelligence; Vol. 411).

Research output: Chapter in Book/Report/Conference proceedingChapter

Grauman, K & Fergus, R 2013, Learning binary hash codes for large-scale image search. in Machine Learning for Computer Vision. vol. 411, Studies in Computational Intelligence, vol. 411, pp. 49-87. https://doi.org/10.1007/978-3-642-28661-2-3
Grauman K, Fergus R. Learning binary hash codes for large-scale image search. In Machine Learning for Computer Vision. Vol. 411. 2013. p. 49-87. (Studies in Computational Intelligence). https://doi.org/10.1007/978-3-642-28661-2-3
Grauman, Kristen ; Fergus, Robert. / Learning binary hash codes for large-scale image search. Machine Learning for Computer Vision. Vol. 411 2013. pp. 49-87 (Studies in Computational Intelligence).
@inbook{e94b562e6bb64b718208a30cd60f6615,
title = "Learning binary hash codes for large-scale image search",
abstract = "Algorithms to rapidly search massive image or video collections are critical for many vision applications, including visual search, content-based retrieval, and non-parametric models for object recognition. Recent work shows that learned binary projections are a powerful way to index large collections according to their content. The basic idea is to formulate the projections so as to approximately preserve a given similarity function of interest. Having done so, one can then search the data efficiently using hash tables, or by exploring the Hamming ball volume around a novel query. Both enable sub-linear time retrieval with respect to the database size. Further, depending on the design of the projections, in some cases it is possible to bound the number of database examples that must be searched in order to achieve a given level of accuracy. This chapter overviews data structures for fast search with binary codes, and then describes several supervised and unsupervised strategies for generating the codes. In particular, we review supervised methods that integrate metric learning, boosting, and neural networks into the hash key construction, and unsupervised methods based on spectral analysis or kernelized random projections that compute affinity-preserving binary codes.Whether learning from explicit semantic supervision or exploiting the structure among unlabeled data, these methods make scalable retrieval possible for a variety of robust visual similarity measures.We focus on defining the algorithms, and illustrate the main points with results using millions of images.",
author = "Kristen Grauman and Robert Fergus",
year = "2013",
doi = "10.1007/978-3-642-28661-2-3",
language = "English (US)",
isbn = "9783642286605",
volume = "411",
series = "Studies in Computational Intelligence",
pages = "49--87",
booktitle = "Machine Learning for Computer Vision",

}

TY - CHAP

T1 - Learning binary hash codes for large-scale image search

AU - Grauman, Kristen

AU - Fergus, Robert

PY - 2013

Y1 - 2013

N2 - Algorithms to rapidly search massive image or video collections are critical for many vision applications, including visual search, content-based retrieval, and non-parametric models for object recognition. Recent work shows that learned binary projections are a powerful way to index large collections according to their content. The basic idea is to formulate the projections so as to approximately preserve a given similarity function of interest. Having done so, one can then search the data efficiently using hash tables, or by exploring the Hamming ball volume around a novel query. Both enable sub-linear time retrieval with respect to the database size. Further, depending on the design of the projections, in some cases it is possible to bound the number of database examples that must be searched in order to achieve a given level of accuracy. This chapter overviews data structures for fast search with binary codes, and then describes several supervised and unsupervised strategies for generating the codes. In particular, we review supervised methods that integrate metric learning, boosting, and neural networks into the hash key construction, and unsupervised methods based on spectral analysis or kernelized random projections that compute affinity-preserving binary codes.Whether learning from explicit semantic supervision or exploiting the structure among unlabeled data, these methods make scalable retrieval possible for a variety of robust visual similarity measures.We focus on defining the algorithms, and illustrate the main points with results using millions of images.

AB - Algorithms to rapidly search massive image or video collections are critical for many vision applications, including visual search, content-based retrieval, and non-parametric models for object recognition. Recent work shows that learned binary projections are a powerful way to index large collections according to their content. The basic idea is to formulate the projections so as to approximately preserve a given similarity function of interest. Having done so, one can then search the data efficiently using hash tables, or by exploring the Hamming ball volume around a novel query. Both enable sub-linear time retrieval with respect to the database size. Further, depending on the design of the projections, in some cases it is possible to bound the number of database examples that must be searched in order to achieve a given level of accuracy. This chapter overviews data structures for fast search with binary codes, and then describes several supervised and unsupervised strategies for generating the codes. In particular, we review supervised methods that integrate metric learning, boosting, and neural networks into the hash key construction, and unsupervised methods based on spectral analysis or kernelized random projections that compute affinity-preserving binary codes.Whether learning from explicit semantic supervision or exploiting the structure among unlabeled data, these methods make scalable retrieval possible for a variety of robust visual similarity measures.We focus on defining the algorithms, and illustrate the main points with results using millions of images.

UR - http://www.scopus.com/inward/record.url?scp=84867496527&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84867496527&partnerID=8YFLogxK

U2 - 10.1007/978-3-642-28661-2-3

DO - 10.1007/978-3-642-28661-2-3

M3 - Chapter

SN - 9783642286605

VL - 411

T3 - Studies in Computational Intelligence

SP - 49

EP - 87

BT - Machine Learning for Computer Vision

ER -