Fitting algebraic curves to noisy data

Sanjeev Arora, Subhash Khot

Research output: Contribution to journalArticle

Abstract

We introduce the following problem which is motivated by applications in vision and pattern detection: We are given pairs of datapoints (x1, y1), (x2, y2), ..., (xm, ym) ∈ [-1, 1] × [-1, 1], a noise parameter δ > 0, a degree bound d, and a threshold ρ > 0. We desire an algorithm that enlists every degree d polynomial h such that |h(xi) - yi|≤δ for at least ρ fraction of the indices i. If δ = 0, this is just the list decoding problem that has been popular in complexity theory and for which Sudan gave a poly(m, d) time algorithm. However, for δ>0, the problem as stated becomes ill-posed and one needs a careful reformulation (see the Introduction). We prove a few basic results about this (reformulated) problem. We show that the problem has no polynomial-time algorithm (our counterexample works for ρ = 0.5). This is shown by exhibiting an instance of the problem where the number of solutions is as large as exp(d0.5-ε) and every pair of solutions is far from each other in ℓ norm. On the algorithmic side, we give a rigorous analysis of a brute force algorithm that runs in exponential time. Also, in surprising contrast to our lowerbound, we give a polynomial-time algorithm for learning the polynomials assuming the data is generated using a mixture model in which the mixing weights are "nondegenerate.".

Original languageEnglish (US)
Pages (from-to)325-340
Number of pages16
JournalJournal of Computer and System Sciences
Volume67
Issue number2
DOIs
StatePublished - Sep 2003

Fingerprint

Algebraic curve
Curve fitting
Noisy Data
Polynomials
Polynomial-time Algorithm
List Decoding
Polynomial
Complexity Theory
Exponential time
Number of Solutions
Decoding
Reformulation
Mixture Model
Counterexample
Norm

ASJC Scopus subject areas

  • Computational Theory and Mathematics

Cite this

Fitting algebraic curves to noisy data. / Arora, Sanjeev; Khot, Subhash.

In: Journal of Computer and System Sciences, Vol. 67, No. 2, 09.2003, p. 325-340.

Research output: Contribution to journalArticle

@article{56bf829fd85a4c7b93386a76f60e5c7f,
title = "Fitting algebraic curves to noisy data",
abstract = "We introduce the following problem which is motivated by applications in vision and pattern detection: We are given pairs of datapoints (x1, y1), (x2, y2), ..., (xm, ym) ∈ [-1, 1] × [-1, 1], a noise parameter δ > 0, a degree bound d, and a threshold ρ > 0. We desire an algorithm that enlists every degree d polynomial h such that |h(xi) - yi|≤δ for at least ρ fraction of the indices i. If δ = 0, this is just the list decoding problem that has been popular in complexity theory and for which Sudan gave a poly(m, d) time algorithm. However, for δ>0, the problem as stated becomes ill-posed and one needs a careful reformulation (see the Introduction). We prove a few basic results about this (reformulated) problem. We show that the problem has no polynomial-time algorithm (our counterexample works for ρ = 0.5). This is shown by exhibiting an instance of the problem where the number of solutions is as large as exp(d0.5-ε) and every pair of solutions is far from each other in ℓ∞ norm. On the algorithmic side, we give a rigorous analysis of a brute force algorithm that runs in exponential time. Also, in surprising contrast to our lowerbound, we give a polynomial-time algorithm for learning the polynomials assuming the data is generated using a mixture model in which the mixing weights are {"}nondegenerate.{"}.",
author = "Sanjeev Arora and Subhash Khot",
year = "2003",
month = "9",
doi = "10.1016/S0022-0000(03)00012-6",
language = "English (US)",
volume = "67",
pages = "325--340",
journal = "Journal of Computer and System Sciences",
issn = "0022-0000",
publisher = "Academic Press Inc.",
number = "2",

}

TY - JOUR

T1 - Fitting algebraic curves to noisy data

AU - Arora, Sanjeev

AU - Khot, Subhash

PY - 2003/9

Y1 - 2003/9

N2 - We introduce the following problem which is motivated by applications in vision and pattern detection: We are given pairs of datapoints (x1, y1), (x2, y2), ..., (xm, ym) ∈ [-1, 1] × [-1, 1], a noise parameter δ > 0, a degree bound d, and a threshold ρ > 0. We desire an algorithm that enlists every degree d polynomial h such that |h(xi) - yi|≤δ for at least ρ fraction of the indices i. If δ = 0, this is just the list decoding problem that has been popular in complexity theory and for which Sudan gave a poly(m, d) time algorithm. However, for δ>0, the problem as stated becomes ill-posed and one needs a careful reformulation (see the Introduction). We prove a few basic results about this (reformulated) problem. We show that the problem has no polynomial-time algorithm (our counterexample works for ρ = 0.5). This is shown by exhibiting an instance of the problem where the number of solutions is as large as exp(d0.5-ε) and every pair of solutions is far from each other in ℓ∞ norm. On the algorithmic side, we give a rigorous analysis of a brute force algorithm that runs in exponential time. Also, in surprising contrast to our lowerbound, we give a polynomial-time algorithm for learning the polynomials assuming the data is generated using a mixture model in which the mixing weights are "nondegenerate.".

AB - We introduce the following problem which is motivated by applications in vision and pattern detection: We are given pairs of datapoints (x1, y1), (x2, y2), ..., (xm, ym) ∈ [-1, 1] × [-1, 1], a noise parameter δ > 0, a degree bound d, and a threshold ρ > 0. We desire an algorithm that enlists every degree d polynomial h such that |h(xi) - yi|≤δ for at least ρ fraction of the indices i. If δ = 0, this is just the list decoding problem that has been popular in complexity theory and for which Sudan gave a poly(m, d) time algorithm. However, for δ>0, the problem as stated becomes ill-posed and one needs a careful reformulation (see the Introduction). We prove a few basic results about this (reformulated) problem. We show that the problem has no polynomial-time algorithm (our counterexample works for ρ = 0.5). This is shown by exhibiting an instance of the problem where the number of solutions is as large as exp(d0.5-ε) and every pair of solutions is far from each other in ℓ∞ norm. On the algorithmic side, we give a rigorous analysis of a brute force algorithm that runs in exponential time. Also, in surprising contrast to our lowerbound, we give a polynomial-time algorithm for learning the polynomials assuming the data is generated using a mixture model in which the mixing weights are "nondegenerate.".

UR - http://www.scopus.com/inward/record.url?scp=0142091455&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0142091455&partnerID=8YFLogxK

U2 - 10.1016/S0022-0000(03)00012-6

DO - 10.1016/S0022-0000(03)00012-6

M3 - Article

AN - SCOPUS:0142091455

VL - 67

SP - 325

EP - 340

JO - Journal of Computer and System Sciences

JF - Journal of Computer and System Sciences

SN - 0022-0000

IS - 2

ER -