Engineering the compression of massive tables: An experimental approach

Adam L. Buchsbaum, Donald F. Caldwell, Kenneth W. Church, Glenn S. Fowler, Shanmugavelayutham Muthukrishnan

    Research output: Contribution to conferencePaper

    Abstract

    We study the problem of compressing massive tables. We devise a novel compression paradigm - training for lossless compression - which assumes that the data exhibit dependencies that can be learned by examining a small amount of training material. We develop an experimental methodology to test the approach. Our result is a system, pzip, which outperforms gzip by factors of two in compression size and both compression and uncompression time for various tabular data. Pzip is now in production use in an AT&T network traffic data warehouse.

    Original languageEnglish (US)
    Pages175-184
    Number of pages10
    StatePublished - Jan 1 2000
    Event11th Annual ACM-SIAM Symposium on Discrete Algorithms - San Francisco, CA, USA
    Duration: Jan 9 2000Jan 11 2000

    Other

    Other11th Annual ACM-SIAM Symposium on Discrete Algorithms
    CitySan Francisco, CA, USA
    Period1/9/001/11/00

    Fingerprint

    Data warehouses
    Tables
    Compression
    Engineering
    Lossless Compression
    Data Warehouse
    Network Traffic
    Paradigm
    Methodology
    Training

    ASJC Scopus subject areas

    • Software
    • Mathematics(all)

    Cite this

    Buchsbaum, A. L., Caldwell, D. F., Church, K. W., Fowler, G. S., & Muthukrishnan, S. (2000). Engineering the compression of massive tables: An experimental approach. 175-184. Paper presented at 11th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA, USA, .

    Engineering the compression of massive tables : An experimental approach. / Buchsbaum, Adam L.; Caldwell, Donald F.; Church, Kenneth W.; Fowler, Glenn S.; Muthukrishnan, Shanmugavelayutham.

    2000. 175-184 Paper presented at 11th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA, USA, .

    Research output: Contribution to conferencePaper

    Buchsbaum, AL, Caldwell, DF, Church, KW, Fowler, GS & Muthukrishnan, S 2000, 'Engineering the compression of massive tables: An experimental approach' Paper presented at 11th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA, USA, 1/9/00 - 1/11/00, pp. 175-184.
    Buchsbaum AL, Caldwell DF, Church KW, Fowler GS, Muthukrishnan S. Engineering the compression of massive tables: An experimental approach. 2000. Paper presented at 11th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA, USA, .
    Buchsbaum, Adam L. ; Caldwell, Donald F. ; Church, Kenneth W. ; Fowler, Glenn S. ; Muthukrishnan, Shanmugavelayutham. / Engineering the compression of massive tables : An experimental approach. Paper presented at 11th Annual ACM-SIAM Symposium on Discrete Algorithms, San Francisco, CA, USA, .10 p.
    @conference{a88a6d84a6254230a9fba0d351983a21,
    title = "Engineering the compression of massive tables: An experimental approach",
    abstract = "We study the problem of compressing massive tables. We devise a novel compression paradigm - training for lossless compression - which assumes that the data exhibit dependencies that can be learned by examining a small amount of training material. We develop an experimental methodology to test the approach. Our result is a system, pzip, which outperforms gzip by factors of two in compression size and both compression and uncompression time for various tabular data. Pzip is now in production use in an AT&T network traffic data warehouse.",
    author = "Buchsbaum, {Adam L.} and Caldwell, {Donald F.} and Church, {Kenneth W.} and Fowler, {Glenn S.} and Shanmugavelayutham Muthukrishnan",
    year = "2000",
    month = "1",
    day = "1",
    language = "English (US)",
    pages = "175--184",
    note = "11th Annual ACM-SIAM Symposium on Discrete Algorithms ; Conference date: 09-01-2000 Through 11-01-2000",

    }

    TY - CONF

    T1 - Engineering the compression of massive tables

    T2 - An experimental approach

    AU - Buchsbaum, Adam L.

    AU - Caldwell, Donald F.

    AU - Church, Kenneth W.

    AU - Fowler, Glenn S.

    AU - Muthukrishnan, Shanmugavelayutham

    PY - 2000/1/1

    Y1 - 2000/1/1

    N2 - We study the problem of compressing massive tables. We devise a novel compression paradigm - training for lossless compression - which assumes that the data exhibit dependencies that can be learned by examining a small amount of training material. We develop an experimental methodology to test the approach. Our result is a system, pzip, which outperforms gzip by factors of two in compression size and both compression and uncompression time for various tabular data. Pzip is now in production use in an AT&T network traffic data warehouse.

    AB - We study the problem of compressing massive tables. We devise a novel compression paradigm - training for lossless compression - which assumes that the data exhibit dependencies that can be learned by examining a small amount of training material. We develop an experimental methodology to test the approach. Our result is a system, pzip, which outperforms gzip by factors of two in compression size and both compression and uncompression time for various tabular data. Pzip is now in production use in an AT&T network traffic data warehouse.

    UR - http://www.scopus.com/inward/record.url?scp=0033906346&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=0033906346&partnerID=8YFLogxK

    M3 - Paper

    SP - 175

    EP - 184

    ER -