Semi-supervised learning in gigantic image collections

Robert Fergus, Yair Weiss, Antonio Torralba

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

With the advent of the Internet it is now possible to collect hundreds of millions of images. These images come with varying degrees of label information. "Clean labels" can be manually obtained on a small fraction, "noisy labels" may be extracted automatically from surrounding text, while for most images there are no labels at all. Semi-supervised learning is a principled framework for combining these different label sources. However, it scales polynomially with the number of images, making it impractical for use on gigantic collections with hundreds of millions of images and thousands of classes. In this paper we show how to utilize recent results in machine learning to obtain highly efficient approximations for semi-supervised learning that are linear in the number of images. Specifically, we use the convergence of the eigenvectors of the normalized graph Laplacian to eigenfunctions of weighted Laplace-Beltrami operators. Our algorithm enables us to apply semi-supervised learning to a database of 80 million images gathered from the Internet.

Original languageEnglish (US)
Title of host publicationAdvances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference
Pages522-530
Number of pages9
StatePublished - 2009
Event23rd Annual Conference on Neural Information Processing Systems, NIPS 2009 - Vancouver, BC, Canada
Duration: Dec 7 2009Dec 10 2009

Other

Other23rd Annual Conference on Neural Information Processing Systems, NIPS 2009
CountryCanada
CityVancouver, BC
Period12/7/0912/10/09

Fingerprint

Supervised learning
Labels
Eigenvalues and eigenfunctions
Internet
Learning systems
Mathematical operators

ASJC Scopus subject areas

  • Information Systems

Cite this

Fergus, R., Weiss, Y., & Torralba, A. (2009). Semi-supervised learning in gigantic image collections. In Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference (pp. 522-530)

Semi-supervised learning in gigantic image collections. / Fergus, Robert; Weiss, Yair; Torralba, Antonio.

Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference. 2009. p. 522-530.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Fergus, R, Weiss, Y & Torralba, A 2009, Semi-supervised learning in gigantic image collections. in Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference. pp. 522-530, 23rd Annual Conference on Neural Information Processing Systems, NIPS 2009, Vancouver, BC, Canada, 12/7/09.
Fergus R, Weiss Y, Torralba A. Semi-supervised learning in gigantic image collections. In Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference. 2009. p. 522-530
Fergus, Robert ; Weiss, Yair ; Torralba, Antonio. / Semi-supervised learning in gigantic image collections. Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference. 2009. pp. 522-530
@inproceedings{59f4589739c349fa8426e3e3ec011e1a,
title = "Semi-supervised learning in gigantic image collections",
abstract = "With the advent of the Internet it is now possible to collect hundreds of millions of images. These images come with varying degrees of label information. {"}Clean labels{"} can be manually obtained on a small fraction, {"}noisy labels{"} may be extracted automatically from surrounding text, while for most images there are no labels at all. Semi-supervised learning is a principled framework for combining these different label sources. However, it scales polynomially with the number of images, making it impractical for use on gigantic collections with hundreds of millions of images and thousands of classes. In this paper we show how to utilize recent results in machine learning to obtain highly efficient approximations for semi-supervised learning that are linear in the number of images. Specifically, we use the convergence of the eigenvectors of the normalized graph Laplacian to eigenfunctions of weighted Laplace-Beltrami operators. Our algorithm enables us to apply semi-supervised learning to a database of 80 million images gathered from the Internet.",
author = "Robert Fergus and Yair Weiss and Antonio Torralba",
year = "2009",
language = "English (US)",
isbn = "9781615679119",
pages = "522--530",
booktitle = "Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference",

}

TY - GEN

T1 - Semi-supervised learning in gigantic image collections

AU - Fergus, Robert

AU - Weiss, Yair

AU - Torralba, Antonio

PY - 2009

Y1 - 2009

N2 - With the advent of the Internet it is now possible to collect hundreds of millions of images. These images come with varying degrees of label information. "Clean labels" can be manually obtained on a small fraction, "noisy labels" may be extracted automatically from surrounding text, while for most images there are no labels at all. Semi-supervised learning is a principled framework for combining these different label sources. However, it scales polynomially with the number of images, making it impractical for use on gigantic collections with hundreds of millions of images and thousands of classes. In this paper we show how to utilize recent results in machine learning to obtain highly efficient approximations for semi-supervised learning that are linear in the number of images. Specifically, we use the convergence of the eigenvectors of the normalized graph Laplacian to eigenfunctions of weighted Laplace-Beltrami operators. Our algorithm enables us to apply semi-supervised learning to a database of 80 million images gathered from the Internet.

AB - With the advent of the Internet it is now possible to collect hundreds of millions of images. These images come with varying degrees of label information. "Clean labels" can be manually obtained on a small fraction, "noisy labels" may be extracted automatically from surrounding text, while for most images there are no labels at all. Semi-supervised learning is a principled framework for combining these different label sources. However, it scales polynomially with the number of images, making it impractical for use on gigantic collections with hundreds of millions of images and thousands of classes. In this paper we show how to utilize recent results in machine learning to obtain highly efficient approximations for semi-supervised learning that are linear in the number of images. Specifically, we use the convergence of the eigenvectors of the normalized graph Laplacian to eigenfunctions of weighted Laplace-Beltrami operators. Our algorithm enables us to apply semi-supervised learning to a database of 80 million images gathered from the Internet.

UR - http://www.scopus.com/inward/record.url?scp=77955655063&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77955655063&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9781615679119

SP - 522

EP - 530

BT - Advances in Neural Information Processing Systems 22 - Proceedings of the 2009 Conference

ER -