Subspace sampling and relative-error matrix approximation: Column-based methods

Petros Drineas, Michael W. Mahoney, Shanmugavelayutham Muthukrishnan

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Given an m × n matrix A and an integer k less than the rank of A, the "best" rank k approximation to A that minimizes the error with respect to the Frobenius norm is Ak, which is obtained by projecting A on the top k left singular vectors of A. While Ak is routinely used in data analysis, it is difficult to interpret and understand it in terms of the original data, namely the columns and rows of A. For example, these columns and rows often come from some application domain, whereas the singular vectors are linear combinations of (up to all) the columns or rows of A. We address the problem of obtaining low-rank approximations that are directly interpretable in terms of the original columns or rows of A. Our main results are two polynomial time randomized algorithms that take as input a matrix A and return as output a matrix C, consisting of a "small" (i.e., a low-degree polynomial in k, 1/ε;, and log(1/δ)) number of actual columns of A such that ||A - CC+A||F ≤ (1 + ε) ||A - Ak|| F with probability at least 1 - δ. Our algorithms are simple, and they take time of the order of the time needed to compute the top k right singular vectors of A. In addition, they sample the columns of A via the method of "subspace sampling," so-named since the sampling probabilities depend on the lengths of the rows of the top singular vectors and since they ensure that we capture entirely a certain subspace of interest.

    Original languageEnglish (US)
    Title of host publicationApproximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 a
    PublisherSpringer-Verlag
    Pages316-326
    Number of pages11
    ISBN (Print)3540380442, 9783540380443
    StatePublished - Jan 1 2006
    Event9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 and 10th International Workshop on Randomization and Computation, RANDOM 2006 - Barcelona, Spain
    Duration: Aug 28 2006Aug 30 2006

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume4110 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 and 10th International Workshop on Randomization and Computation, RANDOM 2006
    CountrySpain
    CityBarcelona
    Period8/28/068/30/06

    Fingerprint

    Matrix Approximation
    Relative Error
    Singular Vectors
    Subspace
    Sampling
    Polynomials
    Low-rank Approximation
    Frobenius norm
    Randomized Algorithms
    Polynomial-time Algorithm
    Linear Combination
    Data analysis
    Minimise
    Polynomial
    Integer
    Output
    Approximation

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Drineas, P., Mahoney, M. W., & Muthukrishnan, S. (2006). Subspace sampling and relative-error matrix approximation: Column-based methods. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 a (pp. 316-326). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4110 LNCS). Springer-Verlag.

    Subspace sampling and relative-error matrix approximation : Column-based methods. / Drineas, Petros; Mahoney, Michael W.; Muthukrishnan, Shanmugavelayutham.

    Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 a. Springer-Verlag, 2006. p. 316-326 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 4110 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Drineas, P, Mahoney, MW & Muthukrishnan, S 2006, Subspace sampling and relative-error matrix approximation: Column-based methods. in Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 a. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 4110 LNCS, Springer-Verlag, pp. 316-326, 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 and 10th International Workshop on Randomization and Computation, RANDOM 2006, Barcelona, Spain, 8/28/06.
    Drineas P, Mahoney MW, Muthukrishnan S. Subspace sampling and relative-error matrix approximation: Column-based methods. In Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 a. Springer-Verlag. 2006. p. 316-326. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    Drineas, Petros ; Mahoney, Michael W. ; Muthukrishnan, Shanmugavelayutham. / Subspace sampling and relative-error matrix approximation : Column-based methods. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 a. Springer-Verlag, 2006. pp. 316-326 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{cf995c5bbb804324a3f80f5ab13028fe,
    title = "Subspace sampling and relative-error matrix approximation: Column-based methods",
    abstract = "Given an m × n matrix A and an integer k less than the rank of A, the {"}best{"} rank k approximation to A that minimizes the error with respect to the Frobenius norm is Ak, which is obtained by projecting A on the top k left singular vectors of A. While Ak is routinely used in data analysis, it is difficult to interpret and understand it in terms of the original data, namely the columns and rows of A. For example, these columns and rows often come from some application domain, whereas the singular vectors are linear combinations of (up to all) the columns or rows of A. We address the problem of obtaining low-rank approximations that are directly interpretable in terms of the original columns or rows of A. Our main results are two polynomial time randomized algorithms that take as input a matrix A and return as output a matrix C, consisting of a {"}small{"} (i.e., a low-degree polynomial in k, 1/ε;, and log(1/δ)) number of actual columns of A such that ||A - CC+A||F ≤ (1 + ε) ||A - Ak|| F with probability at least 1 - δ. Our algorithms are simple, and they take time of the order of the time needed to compute the top k right singular vectors of A. In addition, they sample the columns of A via the method of {"}subspace sampling,{"} so-named since the sampling probabilities depend on the lengths of the rows of the top singular vectors and since they ensure that we capture entirely a certain subspace of interest.",
    author = "Petros Drineas and Mahoney, {Michael W.} and Shanmugavelayutham Muthukrishnan",
    year = "2006",
    month = "1",
    day = "1",
    language = "English (US)",
    isbn = "3540380442",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer-Verlag",
    pages = "316--326",
    booktitle = "Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 a",

    }

    TY - GEN

    T1 - Subspace sampling and relative-error matrix approximation

    T2 - Column-based methods

    AU - Drineas, Petros

    AU - Mahoney, Michael W.

    AU - Muthukrishnan, Shanmugavelayutham

    PY - 2006/1/1

    Y1 - 2006/1/1

    N2 - Given an m × n matrix A and an integer k less than the rank of A, the "best" rank k approximation to A that minimizes the error with respect to the Frobenius norm is Ak, which is obtained by projecting A on the top k left singular vectors of A. While Ak is routinely used in data analysis, it is difficult to interpret and understand it in terms of the original data, namely the columns and rows of A. For example, these columns and rows often come from some application domain, whereas the singular vectors are linear combinations of (up to all) the columns or rows of A. We address the problem of obtaining low-rank approximations that are directly interpretable in terms of the original columns or rows of A. Our main results are two polynomial time randomized algorithms that take as input a matrix A and return as output a matrix C, consisting of a "small" (i.e., a low-degree polynomial in k, 1/ε;, and log(1/δ)) number of actual columns of A such that ||A - CC+A||F ≤ (1 + ε) ||A - Ak|| F with probability at least 1 - δ. Our algorithms are simple, and they take time of the order of the time needed to compute the top k right singular vectors of A. In addition, they sample the columns of A via the method of "subspace sampling," so-named since the sampling probabilities depend on the lengths of the rows of the top singular vectors and since they ensure that we capture entirely a certain subspace of interest.

    AB - Given an m × n matrix A and an integer k less than the rank of A, the "best" rank k approximation to A that minimizes the error with respect to the Frobenius norm is Ak, which is obtained by projecting A on the top k left singular vectors of A. While Ak is routinely used in data analysis, it is difficult to interpret and understand it in terms of the original data, namely the columns and rows of A. For example, these columns and rows often come from some application domain, whereas the singular vectors are linear combinations of (up to all) the columns or rows of A. We address the problem of obtaining low-rank approximations that are directly interpretable in terms of the original columns or rows of A. Our main results are two polynomial time randomized algorithms that take as input a matrix A and return as output a matrix C, consisting of a "small" (i.e., a low-degree polynomial in k, 1/ε;, and log(1/δ)) number of actual columns of A such that ||A - CC+A||F ≤ (1 + ε) ||A - Ak|| F with probability at least 1 - δ. Our algorithms are simple, and they take time of the order of the time needed to compute the top k right singular vectors of A. In addition, they sample the columns of A via the method of "subspace sampling," so-named since the sampling probabilities depend on the lengths of the rows of the top singular vectors and since they ensure that we capture entirely a certain subspace of interest.

    UR - http://www.scopus.com/inward/record.url?scp=33750079844&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=33750079844&partnerID=8YFLogxK

    M3 - Conference contribution

    SN - 3540380442

    SN - 9783540380443

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 316

    EP - 326

    BT - Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques - 9th International Workshop on Approximation Algorithms for Combinatorial Optimization Problems, APPROX 2006 a

    PB - Springer-Verlag

    ER -