### Abstract

We present and analyze a sampling algorithm for the basic linear-algebraic problem of ℓ _{2} regression. The ℓ _{2} regression (or least-squares fit) problem takes as input a matrix A ∈ ℝ ^{n×d} (where we assume n ≫ d) and a target vector b ∈ ℝ ^{n}, and it returns as output cross Z sign = min _{x∈ℝd} |b - Ax| _{2}. Also of interest is x _{opt} = A ^{+}b, where A ^{+} is the Moore-Penrose generalized inverse, which is the minimum-length vector achieving the minimum. Our algorithm randomly samples r rows from the matrix A and vector b to construct an induced ℓ _{2} regression problem with many fewer rows, but with the same number of columns. A crucial feature of the algorithm is the nonuniform sampling probabilities. These probabilities depend in a sophisticated manner on the lengths, i.e., the Euclidean norms, of the rows of the left singular vectors of A and the manner in which b lies in the complement of the column space of A. Under appropriate assumptions, we show relative error approximations for both cross Z sign and x _{opt}. Applications of this sampling methodology are briefly discussed.

Original language | English (US) |
---|---|

Pages | 1127-1136 |

Number of pages | 10 |

DOIs | |

State | Published - Feb 28 2006 |

Event | Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms - Miami, FL, United States Duration: Jan 22 2006 → Jan 24 2006 |

### Other

Other | Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms |
---|---|

Country | United States |

City | Miami, FL |

Period | 1/22/06 → 1/24/06 |

### Fingerprint

### ASJC Scopus subject areas

- Software
- Discrete Mathematics and Combinatorics
- Safety, Risk, Reliability and Quality
- Chemical Health and Safety

### Cite this

*Sampling algorithms for ℓ 2 regression and applications*. 1127-1136. Paper presented at Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, Miami, FL, United States. https://doi.org/10.1145/1109557.1109682

**Sampling algorithms for ℓ 2 regression and applications.** / Drineas, Petros; Mahoney, Michael W.; Muthukrishnan, Shanmugavelayutham.

Research output: Contribution to conference › Paper

}

TY - CONF

T1 - Sampling algorithms for ℓ 2 regression and applications

AU - Drineas, Petros

AU - Mahoney, Michael W.

AU - Muthukrishnan, Shanmugavelayutham

PY - 2006/2/28

Y1 - 2006/2/28

N2 - We present and analyze a sampling algorithm for the basic linear-algebraic problem of ℓ 2 regression. The ℓ 2 regression (or least-squares fit) problem takes as input a matrix A ∈ ℝ n×d (where we assume n ≫ d) and a target vector b ∈ ℝ n, and it returns as output cross Z sign = min x∈ℝd |b - Ax| 2. Also of interest is x opt = A +b, where A + is the Moore-Penrose generalized inverse, which is the minimum-length vector achieving the minimum. Our algorithm randomly samples r rows from the matrix A and vector b to construct an induced ℓ 2 regression problem with many fewer rows, but with the same number of columns. A crucial feature of the algorithm is the nonuniform sampling probabilities. These probabilities depend in a sophisticated manner on the lengths, i.e., the Euclidean norms, of the rows of the left singular vectors of A and the manner in which b lies in the complement of the column space of A. Under appropriate assumptions, we show relative error approximations for both cross Z sign and x opt. Applications of this sampling methodology are briefly discussed.

AB - We present and analyze a sampling algorithm for the basic linear-algebraic problem of ℓ 2 regression. The ℓ 2 regression (or least-squares fit) problem takes as input a matrix A ∈ ℝ n×d (where we assume n ≫ d) and a target vector b ∈ ℝ n, and it returns as output cross Z sign = min x∈ℝd |b - Ax| 2. Also of interest is x opt = A +b, where A + is the Moore-Penrose generalized inverse, which is the minimum-length vector achieving the minimum. Our algorithm randomly samples r rows from the matrix A and vector b to construct an induced ℓ 2 regression problem with many fewer rows, but with the same number of columns. A crucial feature of the algorithm is the nonuniform sampling probabilities. These probabilities depend in a sophisticated manner on the lengths, i.e., the Euclidean norms, of the rows of the left singular vectors of A and the manner in which b lies in the complement of the column space of A. Under appropriate assumptions, we show relative error approximations for both cross Z sign and x opt. Applications of this sampling methodology are briefly discussed.

UR - http://www.scopus.com/inward/record.url?scp=33244493810&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=33244493810&partnerID=8YFLogxK

U2 - 10.1145/1109557.1109682

DO - 10.1145/1109557.1109682

M3 - Paper

AN - SCOPUS:33244493810

SP - 1127

EP - 1136

ER -