### Abstract

In the kernel clustering problem we are given a large n × n positive-semidefinite matrix A = (a_{ij}) with Σ _{i, j=1}^{n} a_{ij} = 0 and a small k × k positive-semidefinite matrix B = (b_{ij}). The goal is to find a partition S_{1}, ⋯, S_{k} of {1, ⋯ n} which maximizes the quantity Σ_{i, j=1}^{k}(Σ _{(p, q)}εS_{i}×S_{j} a_{pq})b _{ij}. We study the computational complexity of this generic clustering problem which originates in the theory of machine learning. We design a constant factor polynomial time approximation algorithm for this problem, answering a question posed by Song et al. In some cases we manage to compute the sharp approximation threshold for this problem assuming the unique games conjecture (UGC). In particular, when B is the 3 × 3 identity matrix the UGC hardness threshold of this problem is exactly 16π/27. We present and study a geometric conjecture of independent interest which we show would imply that the UGC threshold when B is the k × k identity matrix is (8π/9)(1 - 1/k) for every k ≥ 3.

Original language | English (US) |
---|---|

Pages (from-to) | 129-165 |

Number of pages | 37 |

Journal | Mathematika |

Volume | 55 |

Issue number | 1-2 |

State | Published - Jan 2009 |

### Fingerprint

### ASJC Scopus subject areas

- Mathematics(all)

### Cite this

**Approximate kernel clustering.** / Khot, Subhash; Naor, Assaf.

Research output: Contribution to journal › Article

*Mathematika*, vol. 55, no. 1-2, pp. 129-165.

}

TY - JOUR

T1 - Approximate kernel clustering

AU - Khot, Subhash

AU - Naor, Assaf

PY - 2009/1

Y1 - 2009/1

N2 - In the kernel clustering problem we are given a large n × n positive-semidefinite matrix A = (aij) with Σ i, j=1n aij = 0 and a small k × k positive-semidefinite matrix B = (bij). The goal is to find a partition S1, ⋯, Sk of {1, ⋯ n} which maximizes the quantity Σi, j=1k(Σ (p, q)εSi×Sj apq)b ij. We study the computational complexity of this generic clustering problem which originates in the theory of machine learning. We design a constant factor polynomial time approximation algorithm for this problem, answering a question posed by Song et al. In some cases we manage to compute the sharp approximation threshold for this problem assuming the unique games conjecture (UGC). In particular, when B is the 3 × 3 identity matrix the UGC hardness threshold of this problem is exactly 16π/27. We present and study a geometric conjecture of independent interest which we show would imply that the UGC threshold when B is the k × k identity matrix is (8π/9)(1 - 1/k) for every k ≥ 3.

AB - In the kernel clustering problem we are given a large n × n positive-semidefinite matrix A = (aij) with Σ i, j=1n aij = 0 and a small k × k positive-semidefinite matrix B = (bij). The goal is to find a partition S1, ⋯, Sk of {1, ⋯ n} which maximizes the quantity Σi, j=1k(Σ (p, q)εSi×Sj apq)b ij. We study the computational complexity of this generic clustering problem which originates in the theory of machine learning. We design a constant factor polynomial time approximation algorithm for this problem, answering a question posed by Song et al. In some cases we manage to compute the sharp approximation threshold for this problem assuming the unique games conjecture (UGC). In particular, when B is the 3 × 3 identity matrix the UGC hardness threshold of this problem is exactly 16π/27. We present and study a geometric conjecture of independent interest which we show would imply that the UGC threshold when B is the k × k identity matrix is (8π/9)(1 - 1/k) for every k ≥ 3.

UR - http://www.scopus.com/inward/record.url?scp=73949102599&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=73949102599&partnerID=8YFLogxK

M3 - Article

VL - 55

SP - 129

EP - 165

JO - Mathematika

JF - Mathematika

SN - 0025-5793

IS - 1-2

ER -