Graphical model sketch

Branislav Kveton, Hung Bui, Mohammad Ghavamzadeh, Georgios Theocharous, Shanmugavelayutham Muthukrishnan, Siqi Sun

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables. The count-min (CM) sketch is a popular approach to estimating probabilities in high-cardinality data but it does not scale well beyond a few variables. In this work, we bring together the ideas of graphical models and count sketches; and propose and analyze several approaches to estimating probabilities in structured high-cardinality streams of data. The key idea of our approximations is to use the structure of a graphical model and approximately estimate its factors by “sketches”, which hash high-cardinality variables using random projections. Our approximations are computationally efficient and their space complexity is independent of the cardinality of variables. Our error bounds are multiplicative and significantly improve upon those of the CM sketch, a state-of-the-art approach to estimating probabilities in streams. We evaluate our approximations on synthetic and real-world problems, and report an order of magnitude improvements over the CM sketch.

    Original languageEnglish (US)
    Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Proceedings
    EditorsJilles Giuseppe, Niels Landwehr, Giuseppe Manco, Paolo Frasconi
    PublisherSpringer-Verlag
    Pages81-97
    Number of pages17
    ISBN (Print)9783319461274
    DOIs
    StatePublished - Jan 1 2016
    Event15th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2016 - Riva del Garda, Italy
    Duration: Sep 19 2016Sep 23 2016

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume9851 LNAI
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Conference

    Conference15th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2016
    CountryItaly
    CityRiva del Garda
    Period9/19/169/23/16

    Fingerprint

    Graphical Models
    Cardinality
    Count
    Random variables
    Approximation
    Random Projection
    Space Complexity
    Modeling
    Error Bounds
    Multiplicative
    Evaluate
    Estimate

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Kveton, B., Bui, H., Ghavamzadeh, M., Theocharous, G., Muthukrishnan, S., & Sun, S. (2016). Graphical model sketch. In J. Giuseppe, N. Landwehr, G. Manco, & P. Frasconi (Eds.), Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Proceedings (pp. 81-97). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9851 LNAI). Springer-Verlag. https://doi.org/10.1007/978-3-319-46128-1_6

    Graphical model sketch. / Kveton, Branislav; Bui, Hung; Ghavamzadeh, Mohammad; Theocharous, Georgios; Muthukrishnan, Shanmugavelayutham; Sun, Siqi.

    Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Proceedings. ed. / Jilles Giuseppe; Niels Landwehr; Giuseppe Manco; Paolo Frasconi. Springer-Verlag, 2016. p. 81-97 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 9851 LNAI).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Kveton, B, Bui, H, Ghavamzadeh, M, Theocharous, G, Muthukrishnan, S & Sun, S 2016, Graphical model sketch. in J Giuseppe, N Landwehr, G Manco & P Frasconi (eds), Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Proceedings. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 9851 LNAI, Springer-Verlag, pp. 81-97, 15th European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, ECML PKDD 2016, Riva del Garda, Italy, 9/19/16. https://doi.org/10.1007/978-3-319-46128-1_6
    Kveton B, Bui H, Ghavamzadeh M, Theocharous G, Muthukrishnan S, Sun S. Graphical model sketch. In Giuseppe J, Landwehr N, Manco G, Frasconi P, editors, Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Proceedings. Springer-Verlag. 2016. p. 81-97. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)). https://doi.org/10.1007/978-3-319-46128-1_6
    Kveton, Branislav ; Bui, Hung ; Ghavamzadeh, Mohammad ; Theocharous, Georgios ; Muthukrishnan, Shanmugavelayutham ; Sun, Siqi. / Graphical model sketch. Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Proceedings. editor / Jilles Giuseppe ; Niels Landwehr ; Giuseppe Manco ; Paolo Frasconi. Springer-Verlag, 2016. pp. 81-97 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{7f78a74f52044c4da888a8d9c56da097,
    title = "Graphical model sketch",
    abstract = "Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables. The count-min (CM) sketch is a popular approach to estimating probabilities in high-cardinality data but it does not scale well beyond a few variables. In this work, we bring together the ideas of graphical models and count sketches; and propose and analyze several approaches to estimating probabilities in structured high-cardinality streams of data. The key idea of our approximations is to use the structure of a graphical model and approximately estimate its factors by “sketches”, which hash high-cardinality variables using random projections. Our approximations are computationally efficient and their space complexity is independent of the cardinality of variables. Our error bounds are multiplicative and significantly improve upon those of the CM sketch, a state-of-the-art approach to estimating probabilities in streams. We evaluate our approximations on synthetic and real-world problems, and report an order of magnitude improvements over the CM sketch.",
    author = "Branislav Kveton and Hung Bui and Mohammad Ghavamzadeh and Georgios Theocharous and Shanmugavelayutham Muthukrishnan and Siqi Sun",
    year = "2016",
    month = "1",
    day = "1",
    doi = "10.1007/978-3-319-46128-1_6",
    language = "English (US)",
    isbn = "9783319461274",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer-Verlag",
    pages = "81--97",
    editor = "Jilles Giuseppe and Niels Landwehr and Giuseppe Manco and Paolo Frasconi",
    booktitle = "Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Proceedings",

    }

    TY - GEN

    T1 - Graphical model sketch

    AU - Kveton, Branislav

    AU - Bui, Hung

    AU - Ghavamzadeh, Mohammad

    AU - Theocharous, Georgios

    AU - Muthukrishnan, Shanmugavelayutham

    AU - Sun, Siqi

    PY - 2016/1/1

    Y1 - 2016/1/1

    N2 - Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables. The count-min (CM) sketch is a popular approach to estimating probabilities in high-cardinality data but it does not scale well beyond a few variables. In this work, we bring together the ideas of graphical models and count sketches; and propose and analyze several approaches to estimating probabilities in structured high-cardinality streams of data. The key idea of our approximations is to use the structure of a graphical model and approximately estimate its factors by “sketches”, which hash high-cardinality variables using random projections. Our approximations are computationally efficient and their space complexity is independent of the cardinality of variables. Our error bounds are multiplicative and significantly improve upon those of the CM sketch, a state-of-the-art approach to estimating probabilities in streams. We evaluate our approximations on synthetic and real-world problems, and report an order of magnitude improvements over the CM sketch.

    AB - Structured high-cardinality data arises in many domains, and poses a major challenge for both modeling and inference. Graphical models are a popular approach to modeling structured data but they are unsuitable for high-cardinality variables. The count-min (CM) sketch is a popular approach to estimating probabilities in high-cardinality data but it does not scale well beyond a few variables. In this work, we bring together the ideas of graphical models and count sketches; and propose and analyze several approaches to estimating probabilities in structured high-cardinality streams of data. The key idea of our approximations is to use the structure of a graphical model and approximately estimate its factors by “sketches”, which hash high-cardinality variables using random projections. Our approximations are computationally efficient and their space complexity is independent of the cardinality of variables. Our error bounds are multiplicative and significantly improve upon those of the CM sketch, a state-of-the-art approach to estimating probabilities in streams. We evaluate our approximations on synthetic and real-world problems, and report an order of magnitude improvements over the CM sketch.

    UR - http://www.scopus.com/inward/record.url?scp=84988584640&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84988584640&partnerID=8YFLogxK

    U2 - 10.1007/978-3-319-46128-1_6

    DO - 10.1007/978-3-319-46128-1_6

    M3 - Conference contribution

    AN - SCOPUS:84988584640

    SN - 9783319461274

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 81

    EP - 97

    BT - Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2016, Proceedings

    A2 - Giuseppe, Jilles

    A2 - Landwehr, Niels

    A2 - Manco, Giuseppe

    A2 - Frasconi, Paolo

    PB - Springer-Verlag

    ER -