Online clustering with experts

Anna Choromanska, Claire Monteleoni

Research output: Contribution to journalArticle

Abstract

Approximating the k-means clustering objective with an online learning algorithm is an open problem. We introduce a family of online clustering algorithms by extending algorithms for online supervised learning, with access to expert predictors, to the unsupervised learning setting. Instead of computing prediction errors in order to re-weight the experts, the algorithms compute an approximation to the current value of the k-means objective obtained by each expert. When the experts are batch clustering algorithms with b-approximation guarantees with respect to the k-means objective (for example, the k-means++ or k-means# algorithms), applied to a sliding window of the data stream, our algorithms obtain approximation guarantees with respect to the kmeans objective. The form of these online clustering approximation guarantees is novel, and extends an evaluation framework proposed by Dasgupta as an analog to regret. Notably, our approximation bounds are with respect to the optimal k-means cost on the entire data stream seen so far, even though the algorithm is online. Our algorithm's empirical performance tracks that of the best clustering algorithm in its expert set.

Original languageEnglish (US)
Pages (from-to)227-235
Number of pages9
JournalJournal of Machine Learning Research
Volume22
StatePublished - 2012

Fingerprint

K-means
Clustering
Clustering algorithms
Clustering Algorithm
Online Learning
Online Algorithms
Approximation
Data Streams
Unsupervised learning
Regret
Unsupervised Learning
K-means Algorithm
Sliding Window
K-means Clustering
Supervised learning
Approximation algorithms
Supervised Learning
Prediction Error
Learning algorithms
Batch

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Software
  • Statistics and Probability
  • Artificial Intelligence

Cite this

Online clustering with experts. / Choromanska, Anna; Monteleoni, Claire.

In: Journal of Machine Learning Research, Vol. 22, 2012, p. 227-235.

Research output: Contribution to journalArticle

Choromanska, Anna ; Monteleoni, Claire. / Online clustering with experts. In: Journal of Machine Learning Research. 2012 ; Vol. 22. pp. 227-235.
@article{c19ecbb826e9485682adff4c403a1dcb,
title = "Online clustering with experts",
abstract = "Approximating the k-means clustering objective with an online learning algorithm is an open problem. We introduce a family of online clustering algorithms by extending algorithms for online supervised learning, with access to expert predictors, to the unsupervised learning setting. Instead of computing prediction errors in order to re-weight the experts, the algorithms compute an approximation to the current value of the k-means objective obtained by each expert. When the experts are batch clustering algorithms with b-approximation guarantees with respect to the k-means objective (for example, the k-means++ or k-means# algorithms), applied to a sliding window of the data stream, our algorithms obtain approximation guarantees with respect to the kmeans objective. The form of these online clustering approximation guarantees is novel, and extends an evaluation framework proposed by Dasgupta as an analog to regret. Notably, our approximation bounds are with respect to the optimal k-means cost on the entire data stream seen so far, even though the algorithm is online. Our algorithm's empirical performance tracks that of the best clustering algorithm in its expert set.",
author = "Anna Choromanska and Claire Monteleoni",
year = "2012",
language = "English (US)",
volume = "22",
pages = "227--235",
journal = "Journal of Machine Learning Research",
issn = "1532-4435",
publisher = "Microtome Publishing",

}

TY - JOUR

T1 - Online clustering with experts

AU - Choromanska, Anna

AU - Monteleoni, Claire

PY - 2012

Y1 - 2012

N2 - Approximating the k-means clustering objective with an online learning algorithm is an open problem. We introduce a family of online clustering algorithms by extending algorithms for online supervised learning, with access to expert predictors, to the unsupervised learning setting. Instead of computing prediction errors in order to re-weight the experts, the algorithms compute an approximation to the current value of the k-means objective obtained by each expert. When the experts are batch clustering algorithms with b-approximation guarantees with respect to the k-means objective (for example, the k-means++ or k-means# algorithms), applied to a sliding window of the data stream, our algorithms obtain approximation guarantees with respect to the kmeans objective. The form of these online clustering approximation guarantees is novel, and extends an evaluation framework proposed by Dasgupta as an analog to regret. Notably, our approximation bounds are with respect to the optimal k-means cost on the entire data stream seen so far, even though the algorithm is online. Our algorithm's empirical performance tracks that of the best clustering algorithm in its expert set.

AB - Approximating the k-means clustering objective with an online learning algorithm is an open problem. We introduce a family of online clustering algorithms by extending algorithms for online supervised learning, with access to expert predictors, to the unsupervised learning setting. Instead of computing prediction errors in order to re-weight the experts, the algorithms compute an approximation to the current value of the k-means objective obtained by each expert. When the experts are batch clustering algorithms with b-approximation guarantees with respect to the k-means objective (for example, the k-means++ or k-means# algorithms), applied to a sliding window of the data stream, our algorithms obtain approximation guarantees with respect to the kmeans objective. The form of these online clustering approximation guarantees is novel, and extends an evaluation framework proposed by Dasgupta as an analog to regret. Notably, our approximation bounds are with respect to the optimal k-means cost on the entire data stream seen so far, even though the algorithm is online. Our algorithm's empirical performance tracks that of the best clustering algorithm in its expert set.

UR - http://www.scopus.com/inward/record.url?scp=84954201695&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84954201695&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:84954201695

VL - 22

SP - 227

EP - 235

JO - Journal of Machine Learning Research

JF - Journal of Machine Learning Research

SN - 1532-4435

ER -