Sampling algorithms for ℓ 2 regression and applications

Petros Drineas, Michael W. Mahoney, Shanmugavelayutham Muthukrishnan

    Research output: Contribution to conferencePaper

    Abstract

    We present and analyze a sampling algorithm for the basic linear-algebraic problem of ℓ 2 regression. The ℓ 2 regression (or least-squares fit) problem takes as input a matrix A ∈ ℝ n×d (where we assume n ≫ d) and a target vector b ∈ ℝ n, and it returns as output cross Z sign = min x∈ℝd |b - Ax| 2. Also of interest is x opt = A +b, where A + is the Moore-Penrose generalized inverse, which is the minimum-length vector achieving the minimum. Our algorithm randomly samples r rows from the matrix A and vector b to construct an induced ℓ 2 regression problem with many fewer rows, but with the same number of columns. A crucial feature of the algorithm is the nonuniform sampling probabilities. These probabilities depend in a sophisticated manner on the lengths, i.e., the Euclidean norms, of the rows of the left singular vectors of A and the manner in which b lies in the complement of the column space of A. Under appropriate assumptions, we show relative error approximations for both cross Z sign and x opt. Applications of this sampling methodology are briefly discussed.

    Original languageEnglish (US)
    Pages1127-1136
    Number of pages10
    DOIs
    StatePublished - Feb 28 2006
    EventSeventeenth Annual ACM-SIAM Symposium on Discrete Algorithms - Miami, FL, United States
    Duration: Jan 22 2006Jan 24 2006

    Other

    OtherSeventeenth Annual ACM-SIAM Symposium on Discrete Algorithms
    CountryUnited States
    CityMiami, FL
    Period1/22/061/24/06

    Fingerprint

    Regression
    Sampling
    Column space
    Moore-Penrose Generalized Inverse
    Nonuniform Sampling
    Singular Vectors
    Euclidean norm
    Relative Error
    Least Squares
    Complement
    Target
    Methodology
    Output
    Approximation

    ASJC Scopus subject areas

    • Software
    • Discrete Mathematics and Combinatorics
    • Safety, Risk, Reliability and Quality
    • Chemical Health and Safety

    Cite this

    Drineas, P., Mahoney, M. W., & Muthukrishnan, S. (2006). Sampling algorithms for ℓ 2 regression and applications. 1127-1136. Paper presented at Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, Miami, FL, United States. https://doi.org/10.1145/1109557.1109682

    Sampling algorithms for ℓ 2 regression and applications. / Drineas, Petros; Mahoney, Michael W.; Muthukrishnan, Shanmugavelayutham.

    2006. 1127-1136 Paper presented at Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, Miami, FL, United States.

    Research output: Contribution to conferencePaper

    Drineas, P, Mahoney, MW & Muthukrishnan, S 2006, 'Sampling algorithms for ℓ 2 regression and applications' Paper presented at Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, Miami, FL, United States, 1/22/06 - 1/24/06, pp. 1127-1136. https://doi.org/10.1145/1109557.1109682
    Drineas P, Mahoney MW, Muthukrishnan S. Sampling algorithms for ℓ 2 regression and applications. 2006. Paper presented at Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, Miami, FL, United States. https://doi.org/10.1145/1109557.1109682
    Drineas, Petros ; Mahoney, Michael W. ; Muthukrishnan, Shanmugavelayutham. / Sampling algorithms for ℓ 2 regression and applications. Paper presented at Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms, Miami, FL, United States.10 p.
    @conference{fc9242cd6f9c465dab4810b72b934420,
    title = "Sampling algorithms for ℓ 2 regression and applications",
    abstract = "We present and analyze a sampling algorithm for the basic linear-algebraic problem of ℓ 2 regression. The ℓ 2 regression (or least-squares fit) problem takes as input a matrix A ∈ ℝ n×d (where we assume n ≫ d) and a target vector b ∈ ℝ n, and it returns as output cross Z sign = min x∈ℝd |b - Ax| 2. Also of interest is x opt = A +b, where A + is the Moore-Penrose generalized inverse, which is the minimum-length vector achieving the minimum. Our algorithm randomly samples r rows from the matrix A and vector b to construct an induced ℓ 2 regression problem with many fewer rows, but with the same number of columns. A crucial feature of the algorithm is the nonuniform sampling probabilities. These probabilities depend in a sophisticated manner on the lengths, i.e., the Euclidean norms, of the rows of the left singular vectors of A and the manner in which b lies in the complement of the column space of A. Under appropriate assumptions, we show relative error approximations for both cross Z sign and x opt. Applications of this sampling methodology are briefly discussed.",
    author = "Petros Drineas and Mahoney, {Michael W.} and Shanmugavelayutham Muthukrishnan",
    year = "2006",
    month = "2",
    day = "28",
    doi = "10.1145/1109557.1109682",
    language = "English (US)",
    pages = "1127--1136",
    note = "Seventeenth Annual ACM-SIAM Symposium on Discrete Algorithms ; Conference date: 22-01-2006 Through 24-01-2006",

    }

    TY - CONF

    T1 - Sampling algorithms for ℓ 2 regression and applications

    AU - Drineas, Petros

    AU - Mahoney, Michael W.

    AU - Muthukrishnan, Shanmugavelayutham

    PY - 2006/2/28

    Y1 - 2006/2/28

    N2 - We present and analyze a sampling algorithm for the basic linear-algebraic problem of ℓ 2 regression. The ℓ 2 regression (or least-squares fit) problem takes as input a matrix A ∈ ℝ n×d (where we assume n ≫ d) and a target vector b ∈ ℝ n, and it returns as output cross Z sign = min x∈ℝd |b - Ax| 2. Also of interest is x opt = A +b, where A + is the Moore-Penrose generalized inverse, which is the minimum-length vector achieving the minimum. Our algorithm randomly samples r rows from the matrix A and vector b to construct an induced ℓ 2 regression problem with many fewer rows, but with the same number of columns. A crucial feature of the algorithm is the nonuniform sampling probabilities. These probabilities depend in a sophisticated manner on the lengths, i.e., the Euclidean norms, of the rows of the left singular vectors of A and the manner in which b lies in the complement of the column space of A. Under appropriate assumptions, we show relative error approximations for both cross Z sign and x opt. Applications of this sampling methodology are briefly discussed.

    AB - We present and analyze a sampling algorithm for the basic linear-algebraic problem of ℓ 2 regression. The ℓ 2 regression (or least-squares fit) problem takes as input a matrix A ∈ ℝ n×d (where we assume n ≫ d) and a target vector b ∈ ℝ n, and it returns as output cross Z sign = min x∈ℝd |b - Ax| 2. Also of interest is x opt = A +b, where A + is the Moore-Penrose generalized inverse, which is the minimum-length vector achieving the minimum. Our algorithm randomly samples r rows from the matrix A and vector b to construct an induced ℓ 2 regression problem with many fewer rows, but with the same number of columns. A crucial feature of the algorithm is the nonuniform sampling probabilities. These probabilities depend in a sophisticated manner on the lengths, i.e., the Euclidean norms, of the rows of the left singular vectors of A and the manner in which b lies in the complement of the column space of A. Under appropriate assumptions, we show relative error approximations for both cross Z sign and x opt. Applications of this sampling methodology are briefly discussed.

    UR - http://www.scopus.com/inward/record.url?scp=33244493810&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33244493810&partnerID=8YFLogxK

    U2 - 10.1145/1109557.1109682

    DO - 10.1145/1109557.1109682

    M3 - Paper

    SP - 1127

    EP - 1136

    ER -