Approximate kernel clustering

Subhash Khot, Assaf Naor

Research output: Contribution to journalArticle

Abstract

In the kernel clustering problem we are given a large n × n positive-semidefinite matrix A = (aij) with Σ i, j=1n aij = 0 and a small k × k positive-semidefinite matrix B = (bij). The goal is to find a partition S1, ⋯, Sk of {1, ⋯ n} which maximizes the quantity Σi, j=1k(p, q)εSi×Sj apq)b ij. We study the computational complexity of this generic clustering problem which originates in the theory of machine learning. We design a constant factor polynomial time approximation algorithm for this problem, answering a question posed by Song et al. In some cases we manage to compute the sharp approximation threshold for this problem assuming the unique games conjecture (UGC). In particular, when B is the 3 × 3 identity matrix the UGC hardness threshold of this problem is exactly 16π/27. We present and study a geometric conjecture of independent interest which we show would imply that the UGC threshold when B is the k × k identity matrix is (8π/9)(1 - 1/k) for every k ≥ 3.

Original languageEnglish (US)
Pages (from-to)129-165
Number of pages37
JournalMathematika
Volume55
Issue number1-2
StatePublished - Jan 2009

Fingerprint

Clustering
kernel
Positive Semidefinite Matrix
Unit matrix
Game
Hardness
Polynomial-time Algorithm
Approximation Algorithms
Machine Learning
Computational Complexity
Maximise
Partition
Imply
Approximation

ASJC Scopus subject areas

  • Mathematics(all)

Cite this

Khot, S., & Naor, A. (2009). Approximate kernel clustering. Mathematika, 55(1-2), 129-165.

Approximate kernel clustering. / Khot, Subhash; Naor, Assaf.

In: Mathematika, Vol. 55, No. 1-2, 01.2009, p. 129-165.

Research output: Contribution to journalArticle

Khot, S & Naor, A 2009, 'Approximate kernel clustering', Mathematika, vol. 55, no. 1-2, pp. 129-165.
Khot S, Naor A. Approximate kernel clustering. Mathematika. 2009 Jan;55(1-2):129-165.
Khot, Subhash ; Naor, Assaf. / Approximate kernel clustering. In: Mathematika. 2009 ; Vol. 55, No. 1-2. pp. 129-165.
@article{7164da4901654bd29397eb4ea5b8d8d1,
title = "Approximate kernel clustering",
abstract = "In the kernel clustering problem we are given a large n × n positive-semidefinite matrix A = (aij) with Σ i, j=1n aij = 0 and a small k × k positive-semidefinite matrix B = (bij). The goal is to find a partition S1, ⋯, Sk of {1, ⋯ n} which maximizes the quantity Σi, j=1k(Σ (p, q)εSi×Sj apq)b ij. We study the computational complexity of this generic clustering problem which originates in the theory of machine learning. We design a constant factor polynomial time approximation algorithm for this problem, answering a question posed by Song et al. In some cases we manage to compute the sharp approximation threshold for this problem assuming the unique games conjecture (UGC). In particular, when B is the 3 × 3 identity matrix the UGC hardness threshold of this problem is exactly 16π/27. We present and study a geometric conjecture of independent interest which we show would imply that the UGC threshold when B is the k × k identity matrix is (8π/9)(1 - 1/k) for every k ≥ 3.",
author = "Subhash Khot and Assaf Naor",
year = "2009",
month = "1",
language = "English (US)",
volume = "55",
pages = "129--165",
journal = "Mathematika",
issn = "0025-5793",
publisher = "University College London",
number = "1-2",

}

TY - JOUR

T1 - Approximate kernel clustering

AU - Khot, Subhash

AU - Naor, Assaf

PY - 2009/1

Y1 - 2009/1

N2 - In the kernel clustering problem we are given a large n × n positive-semidefinite matrix A = (aij) with Σ i, j=1n aij = 0 and a small k × k positive-semidefinite matrix B = (bij). The goal is to find a partition S1, ⋯, Sk of {1, ⋯ n} which maximizes the quantity Σi, j=1k(Σ (p, q)εSi×Sj apq)b ij. We study the computational complexity of this generic clustering problem which originates in the theory of machine learning. We design a constant factor polynomial time approximation algorithm for this problem, answering a question posed by Song et al. In some cases we manage to compute the sharp approximation threshold for this problem assuming the unique games conjecture (UGC). In particular, when B is the 3 × 3 identity matrix the UGC hardness threshold of this problem is exactly 16π/27. We present and study a geometric conjecture of independent interest which we show would imply that the UGC threshold when B is the k × k identity matrix is (8π/9)(1 - 1/k) for every k ≥ 3.

AB - In the kernel clustering problem we are given a large n × n positive-semidefinite matrix A = (aij) with Σ i, j=1n aij = 0 and a small k × k positive-semidefinite matrix B = (bij). The goal is to find a partition S1, ⋯, Sk of {1, ⋯ n} which maximizes the quantity Σi, j=1k(Σ (p, q)εSi×Sj apq)b ij. We study the computational complexity of this generic clustering problem which originates in the theory of machine learning. We design a constant factor polynomial time approximation algorithm for this problem, answering a question posed by Song et al. In some cases we manage to compute the sharp approximation threshold for this problem assuming the unique games conjecture (UGC). In particular, when B is the 3 × 3 identity matrix the UGC hardness threshold of this problem is exactly 16π/27. We present and study a geometric conjecture of independent interest which we show would imply that the UGC threshold when B is the k × k identity matrix is (8π/9)(1 - 1/k) for every k ≥ 3.

UR - http://www.scopus.com/inward/record.url?scp=73949102599&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=73949102599&partnerID=8YFLogxK

M3 - Article

VL - 55

SP - 129

EP - 165

JO - Mathematika

JF - Mathematika

SN - 0025-5793

IS - 1-2

ER -