Web scale photo hash clustering on a single machine

Yunchao Gong, Marcin Pawlowski, Fei Yang, Louis Brandy, Lubomir Boundev, Robert Fergus

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper addresses the problem of clustering a very large number of photos (i.e. hundreds of millions a day) in a stream into millions of clusters. This is particularly important as the popularity of photo sharing websites, such as Facebook, Google, and Instagram. Given large number of photos available online, how to efficiently organize them is an open problem. To address this problem, we propose to cluster the binary hash codes of a large number of photos into binary cluster centers. We present a fast binary k-means algorithm that works directly on the similarity-preserving hashes of images and clusters them into binary centers on which we can build hash indexes to speedup computation. The proposed method is capable of clustering millions of photos on a single machine in a few minutes. We show that this approach is usually several magnitude faster than standard k-means and produces comparable clustering accuracy. In addition, we propose an online clustering method based on binary k-means that is capable of clustering large photo stream on a single machine, and show applications to spam detection and trending photo discovery.

Original languageEnglish (US)
Title of host publicationIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
PublisherIEEE Computer Society
Pages19-27
Number of pages9
Volume07-12-June-2015
ISBN (Print)9781467369640
DOIs
StatePublished - Oct 14 2015
EventIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015 - Boston, United States
Duration: Jun 7 2015Jun 12 2015

Other

OtherIEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015
CountryUnited States
CityBoston
Period6/7/156/12/15

Fingerprint

Websites

ASJC Scopus subject areas

  • Software
  • Computer Vision and Pattern Recognition

Cite this

Gong, Y., Pawlowski, M., Yang, F., Brandy, L., Boundev, L., & Fergus, R. (2015). Web scale photo hash clustering on a single machine. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015 (Vol. 07-12-June-2015, pp. 19-27). [7298596] IEEE Computer Society. https://doi.org/10.1109/CVPR.2015.7298596

Web scale photo hash clustering on a single machine. / Gong, Yunchao; Pawlowski, Marcin; Yang, Fei; Brandy, Louis; Boundev, Lubomir; Fergus, Robert.

IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015. Vol. 07-12-June-2015 IEEE Computer Society, 2015. p. 19-27 7298596.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Gong, Y, Pawlowski, M, Yang, F, Brandy, L, Boundev, L & Fergus, R 2015, Web scale photo hash clustering on a single machine. in IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015. vol. 07-12-June-2015, 7298596, IEEE Computer Society, pp. 19-27, IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, United States, 6/7/15. https://doi.org/10.1109/CVPR.2015.7298596
Gong Y, Pawlowski M, Yang F, Brandy L, Boundev L, Fergus R. Web scale photo hash clustering on a single machine. In IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015. Vol. 07-12-June-2015. IEEE Computer Society. 2015. p. 19-27. 7298596 https://doi.org/10.1109/CVPR.2015.7298596
Gong, Yunchao ; Pawlowski, Marcin ; Yang, Fei ; Brandy, Louis ; Boundev, Lubomir ; Fergus, Robert. / Web scale photo hash clustering on a single machine. IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015. Vol. 07-12-June-2015 IEEE Computer Society, 2015. pp. 19-27
@inproceedings{ec058773dfcb4a759a366e9e096078c2,
title = "Web scale photo hash clustering on a single machine",
abstract = "This paper addresses the problem of clustering a very large number of photos (i.e. hundreds of millions a day) in a stream into millions of clusters. This is particularly important as the popularity of photo sharing websites, such as Facebook, Google, and Instagram. Given large number of photos available online, how to efficiently organize them is an open problem. To address this problem, we propose to cluster the binary hash codes of a large number of photos into binary cluster centers. We present a fast binary k-means algorithm that works directly on the similarity-preserving hashes of images and clusters them into binary centers on which we can build hash indexes to speedup computation. The proposed method is capable of clustering millions of photos on a single machine in a few minutes. We show that this approach is usually several magnitude faster than standard k-means and produces comparable clustering accuracy. In addition, we propose an online clustering method based on binary k-means that is capable of clustering large photo stream on a single machine, and show applications to spam detection and trending photo discovery.",
author = "Yunchao Gong and Marcin Pawlowski and Fei Yang and Louis Brandy and Lubomir Boundev and Robert Fergus",
year = "2015",
month = "10",
day = "14",
doi = "10.1109/CVPR.2015.7298596",
language = "English (US)",
isbn = "9781467369640",
volume = "07-12-June-2015",
pages = "19--27",
booktitle = "IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015",
publisher = "IEEE Computer Society",

}

TY - GEN

T1 - Web scale photo hash clustering on a single machine

AU - Gong, Yunchao

AU - Pawlowski, Marcin

AU - Yang, Fei

AU - Brandy, Louis

AU - Boundev, Lubomir

AU - Fergus, Robert

PY - 2015/10/14

Y1 - 2015/10/14

N2 - This paper addresses the problem of clustering a very large number of photos (i.e. hundreds of millions a day) in a stream into millions of clusters. This is particularly important as the popularity of photo sharing websites, such as Facebook, Google, and Instagram. Given large number of photos available online, how to efficiently organize them is an open problem. To address this problem, we propose to cluster the binary hash codes of a large number of photos into binary cluster centers. We present a fast binary k-means algorithm that works directly on the similarity-preserving hashes of images and clusters them into binary centers on which we can build hash indexes to speedup computation. The proposed method is capable of clustering millions of photos on a single machine in a few minutes. We show that this approach is usually several magnitude faster than standard k-means and produces comparable clustering accuracy. In addition, we propose an online clustering method based on binary k-means that is capable of clustering large photo stream on a single machine, and show applications to spam detection and trending photo discovery.

AB - This paper addresses the problem of clustering a very large number of photos (i.e. hundreds of millions a day) in a stream into millions of clusters. This is particularly important as the popularity of photo sharing websites, such as Facebook, Google, and Instagram. Given large number of photos available online, how to efficiently organize them is an open problem. To address this problem, we propose to cluster the binary hash codes of a large number of photos into binary cluster centers. We present a fast binary k-means algorithm that works directly on the similarity-preserving hashes of images and clusters them into binary centers on which we can build hash indexes to speedup computation. The proposed method is capable of clustering millions of photos on a single machine in a few minutes. We show that this approach is usually several magnitude faster than standard k-means and produces comparable clustering accuracy. In addition, we propose an online clustering method based on binary k-means that is capable of clustering large photo stream on a single machine, and show applications to spam detection and trending photo discovery.

UR - http://www.scopus.com/inward/record.url?scp=84959231611&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84959231611&partnerID=8YFLogxK

U2 - 10.1109/CVPR.2015.7298596

DO - 10.1109/CVPR.2015.7298596

M3 - Conference contribution

AN - SCOPUS:84959231611

SN - 9781467369640

VL - 07-12-June-2015

SP - 19

EP - 27

BT - IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015

PB - IEEE Computer Society

ER -