On learning mixtures of well-separated gaussians

Oded Regev, Aravindan Vijayaraghavan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of k standard spherical Gaussians with means mu-1,...,mu-k in R^d, and the goal is to estimate the means up to accuracy Δ using poly(k,d, 1/Δ) samples.In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly min{k,d}^{1/4}. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation o(1), exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results.• We show that with separation o(√log k), super-polynomially many samples are required. In fact, this holds even when the k means of the Gaussians are picked at random in d=O(log k) dimensions.• We show that with separation (√log k), poly(k,d,1/Δ) samples suffice. Notice that the bound on the separation is independent of Δ. This result is based on a new and efficient accuracy boosting algorithm that takes as input coarse estimates of the true means and in time (and samples) poly(k,d, 1Δ) outputs estimates of the means up to arbitrarily good accuracy Δ assuming the separation between the means is min √(log k),√d) (independently of Δ). The idea of the algorithm is to iteratively solve a diagonally dominant system of non-linear equations.We also (1) present a computationally efficient algorithm in d=O(1) dimensions with only (√{d}) separation, and (2) extend our results to the case that components might have different weights and variances. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of k spherical Gaussians with polynomial samples.

Original languageEnglish (US)
Title of host publicationProceedings - 58th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2017
PublisherIEEE Computer Society
Pages85-96
Number of pages12
Volume2017-October
ISBN (Electronic)9781538634646
DOIs
StatePublished - Nov 10 2017
Event58th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2017 - Berkeley, United States
Duration: Oct 15 2017Oct 17 2017

Other

Other58th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2017
CountryUnited States
CityBerkeley
Period10/15/1710/17/17

Fingerprint

Nonlinear equations
Polynomials

Keywords

  • clustering
  • iterative algorithms
  • learning
  • mixtures of Gaussians
  • parameter estimation
  • sample complexity

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Regev, O., & Vijayaraghavan, A. (2017). On learning mixtures of well-separated gaussians. In Proceedings - 58th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2017 (Vol. 2017-October, pp. 85-96). [8104049] IEEE Computer Society. https://doi.org/10.1109/FOCS.2017.17

On learning mixtures of well-separated gaussians. / Regev, Oded; Vijayaraghavan, Aravindan.

Proceedings - 58th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2017. Vol. 2017-October IEEE Computer Society, 2017. p. 85-96 8104049.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Regev, O & Vijayaraghavan, A 2017, On learning mixtures of well-separated gaussians. in Proceedings - 58th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2017. vol. 2017-October, 8104049, IEEE Computer Society, pp. 85-96, 58th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2017, Berkeley, United States, 10/15/17. https://doi.org/10.1109/FOCS.2017.17
Regev O, Vijayaraghavan A. On learning mixtures of well-separated gaussians. In Proceedings - 58th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2017. Vol. 2017-October. IEEE Computer Society. 2017. p. 85-96. 8104049 https://doi.org/10.1109/FOCS.2017.17
Regev, Oded ; Vijayaraghavan, Aravindan. / On learning mixtures of well-separated gaussians. Proceedings - 58th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2017. Vol. 2017-October IEEE Computer Society, 2017. pp. 85-96
@inproceedings{d4e824f92b314a1aada64b80f59c0f66,
title = "On learning mixtures of well-separated gaussians",
abstract = "We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of k standard spherical Gaussians with means mu-1,...,mu-k in R^d, and the goal is to estimate the means up to accuracy Δ using poly(k,d, 1/Δ) samples.In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly min{k,d}^{1/4}. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation o(1), exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results.• We show that with separation o(√log k), super-polynomially many samples are required. In fact, this holds even when the k means of the Gaussians are picked at random in d=O(log k) dimensions.• We show that with separation (√log k), poly(k,d,1/Δ) samples suffice. Notice that the bound on the separation is independent of Δ. This result is based on a new and efficient accuracy boosting algorithm that takes as input coarse estimates of the true means and in time (and samples) poly(k,d, 1Δ) outputs estimates of the means up to arbitrarily good accuracy Δ assuming the separation between the means is min √(log k),√d) (independently of Δ). The idea of the algorithm is to iteratively solve a diagonally dominant system of non-linear equations.We also (1) present a computationally efficient algorithm in d=O(1) dimensions with only (√{d}) separation, and (2) extend our results to the case that components might have different weights and variances. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of k spherical Gaussians with polynomial samples.",
keywords = "clustering, iterative algorithms, learning, mixtures of Gaussians, parameter estimation, sample complexity",
author = "Oded Regev and Aravindan Vijayaraghavan",
year = "2017",
month = "11",
day = "10",
doi = "10.1109/FOCS.2017.17",
language = "English (US)",
volume = "2017-October",
pages = "85--96",
booktitle = "Proceedings - 58th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2017",
publisher = "IEEE Computer Society",
address = "United States",

}

TY - GEN

T1 - On learning mixtures of well-separated gaussians

AU - Regev, Oded

AU - Vijayaraghavan, Aravindan

PY - 2017/11/10

Y1 - 2017/11/10

N2 - We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of k standard spherical Gaussians with means mu-1,...,mu-k in R^d, and the goal is to estimate the means up to accuracy Δ using poly(k,d, 1/Δ) samples.In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly min{k,d}^{1/4}. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation o(1), exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results.• We show that with separation o(√log k), super-polynomially many samples are required. In fact, this holds even when the k means of the Gaussians are picked at random in d=O(log k) dimensions.• We show that with separation (√log k), poly(k,d,1/Δ) samples suffice. Notice that the bound on the separation is independent of Δ. This result is based on a new and efficient accuracy boosting algorithm that takes as input coarse estimates of the true means and in time (and samples) poly(k,d, 1Δ) outputs estimates of the means up to arbitrarily good accuracy Δ assuming the separation between the means is min √(log k),√d) (independently of Δ). The idea of the algorithm is to iteratively solve a diagonally dominant system of non-linear equations.We also (1) present a computationally efficient algorithm in d=O(1) dimensions with only (√{d}) separation, and (2) extend our results to the case that components might have different weights and variances. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of k spherical Gaussians with polynomial samples.

AB - We consider the problem of efficiently learning mixtures of a large number of spherical Gaussians, when the components of the mixture are well separated. In the most basic form of this problem, we are given samples from a uniform mixture of k standard spherical Gaussians with means mu-1,...,mu-k in R^d, and the goal is to estimate the means up to accuracy Δ using poly(k,d, 1/Δ) samples.In this work, we study the following question: what is the minimum separation needed between the means for solving this task? The best known algorithm due to Vempala and Wang [JCSS 2004] requires a separation of roughly min{k,d}^{1/4}. On the other hand, Moitra and Valiant [FOCS 2010] showed that with separation o(1), exponentially many samples are required. We address the significant gap between these two bounds, by showing the following results.• We show that with separation o(√log k), super-polynomially many samples are required. In fact, this holds even when the k means of the Gaussians are picked at random in d=O(log k) dimensions.• We show that with separation (√log k), poly(k,d,1/Δ) samples suffice. Notice that the bound on the separation is independent of Δ. This result is based on a new and efficient accuracy boosting algorithm that takes as input coarse estimates of the true means and in time (and samples) poly(k,d, 1Δ) outputs estimates of the means up to arbitrarily good accuracy Δ assuming the separation between the means is min √(log k),√d) (independently of Δ). The idea of the algorithm is to iteratively solve a diagonally dominant system of non-linear equations.We also (1) present a computationally efficient algorithm in d=O(1) dimensions with only (√{d}) separation, and (2) extend our results to the case that components might have different weights and variances. These results together essentially characterize the optimal order of separation between components that is needed to learn a mixture of k spherical Gaussians with polynomial samples.

KW - clustering

KW - iterative algorithms

KW - learning

KW - mixtures of Gaussians

KW - parameter estimation

KW - sample complexity

UR - http://www.scopus.com/inward/record.url?scp=85041105106&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85041105106&partnerID=8YFLogxK

U2 - 10.1109/FOCS.2017.17

DO - 10.1109/FOCS.2017.17

M3 - Conference contribution

VL - 2017-October

SP - 85

EP - 96

BT - Proceedings - 58th Annual IEEE Symposium on Foundations of Computer Science, FOCS 2017

PB - IEEE Computer Society

ER -