Trace complexity of network inference

Bruno Abrahao, Flavio Chierichetti, Robert Kleinberg, Alessandro Panconesi

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    The network inference problem consists of reconstructing the edge set of a network given traces representing the chronology of infection times as epidemics spread through the network. This problem is a paradigmatic representative of prediction tasks in machine learning that require deducing a latent structure from observed patterns of activity in a network, which often require an unrealistically large number of resources (e.g., amount of available data, or computational time). A fundamental question is to understand which properties we can predict with a reasonable degree of accuracy with the available resources, and which we cannot. We define the trace complexity as the number of distinct traces required to achieve high fidelity in reconstructing the topology of the unobserved network or, more generally, some of its properties. We give algorithms that are competitive with, while being simpler and more efficient than, existing network inference approaches. Moreover, we prove that our algorithms are nearly optimal, by proving an information theoretic lower bound on the number of traces that an optimal inference algorithm requires for performing this task in the general case. Given these strong lower bounds, we turn our attention to special cases, such as trees and bounded-degree graphs, and to property recovery tasks, such as reconstructing the degree distribution without inferring the network. We show that these problems require a much smaller (and more realistic) number of traces, making them potentially solvable in practice.

    Original languageEnglish (US)
    Title of host publicationKDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    EditorsRajesh Parekh, Jingrui He, Dhillon S. Inderjit, Paul Bradley, Yehuda Koren, Rayid Ghani, Ted E. Senator, Robert L. Grossman, Ramasamy Uthurusamy
    PublisherAssociation for Computing Machinery
    Pages491-499
    Number of pages9
    ISBN (Electronic)9781450321747
    DOIs
    StatePublished - Aug 11 2013
    Event19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013 - Chicago, United States
    Duration: Aug 11 2013Aug 14 2013

    Publication series

    NameProceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
    VolumePart F128815

    Conference

    Conference19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013
    CountryUnited States
    CityChicago
    Period8/11/138/14/13

    Fingerprint

    Learning systems
    Topology
    Recovery

    Keywords

    • Independent cascade model
    • Network epidemics
    • Network inference
    • Sampling complexity

    ASJC Scopus subject areas

    • Software
    • Information Systems

    Cite this

    Abrahao, B., Chierichetti, F., Kleinberg, R., & Panconesi, A. (2013). Trace complexity of network inference. In R. Parekh, J. He, D. S. Inderjit, P. Bradley, Y. Koren, R. Ghani, T. E. Senator, R. L. Grossman, ... R. Uthurusamy (Eds.), KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 491-499). [2487664] (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. Part F128815). Association for Computing Machinery. https://doi.org/10.1145/2487575.2487664

    Trace complexity of network inference. / Abrahao, Bruno; Chierichetti, Flavio; Kleinberg, Robert; Panconesi, Alessandro.

    KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ed. / Rajesh Parekh; Jingrui He; Dhillon S. Inderjit; Paul Bradley; Yehuda Koren; Rayid Ghani; Ted E. Senator; Robert L. Grossman; Ramasamy Uthurusamy. Association for Computing Machinery, 2013. p. 491-499 2487664 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. Part F128815).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abrahao, B, Chierichetti, F, Kleinberg, R & Panconesi, A 2013, Trace complexity of network inference. in R Parekh, J He, DS Inderjit, P Bradley, Y Koren, R Ghani, TE Senator, RL Grossman & R Uthurusamy (eds), KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining., 2487664, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. Part F128815, Association for Computing Machinery, pp. 491-499, 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, United States, 8/11/13. https://doi.org/10.1145/2487575.2487664
    Abrahao B, Chierichetti F, Kleinberg R, Panconesi A. Trace complexity of network inference. In Parekh R, He J, Inderjit DS, Bradley P, Koren Y, Ghani R, Senator TE, Grossman RL, Uthurusamy R, editors, KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. Association for Computing Machinery. 2013. p. 491-499. 2487664. (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining). https://doi.org/10.1145/2487575.2487664
    Abrahao, Bruno ; Chierichetti, Flavio ; Kleinberg, Robert ; Panconesi, Alessandro. / Trace complexity of network inference. KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. editor / Rajesh Parekh ; Jingrui He ; Dhillon S. Inderjit ; Paul Bradley ; Yehuda Koren ; Rayid Ghani ; Ted E. Senator ; Robert L. Grossman ; Ramasamy Uthurusamy. Association for Computing Machinery, 2013. pp. 491-499 (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining).
    @inproceedings{9b8494b945da4dad9e69105743b0b723,
    title = "Trace complexity of network inference",
    abstract = "The network inference problem consists of reconstructing the edge set of a network given traces representing the chronology of infection times as epidemics spread through the network. This problem is a paradigmatic representative of prediction tasks in machine learning that require deducing a latent structure from observed patterns of activity in a network, which often require an unrealistically large number of resources (e.g., amount of available data, or computational time). A fundamental question is to understand which properties we can predict with a reasonable degree of accuracy with the available resources, and which we cannot. We define the trace complexity as the number of distinct traces required to achieve high fidelity in reconstructing the topology of the unobserved network or, more generally, some of its properties. We give algorithms that are competitive with, while being simpler and more efficient than, existing network inference approaches. Moreover, we prove that our algorithms are nearly optimal, by proving an information theoretic lower bound on the number of traces that an optimal inference algorithm requires for performing this task in the general case. Given these strong lower bounds, we turn our attention to special cases, such as trees and bounded-degree graphs, and to property recovery tasks, such as reconstructing the degree distribution without inferring the network. We show that these problems require a much smaller (and more realistic) number of traces, making them potentially solvable in practice.",
    keywords = "Independent cascade model, Network epidemics, Network inference, Sampling complexity",
    author = "Bruno Abrahao and Flavio Chierichetti and Robert Kleinberg and Alessandro Panconesi",
    year = "2013",
    month = "8",
    day = "11",
    doi = "10.1145/2487575.2487664",
    language = "English (US)",
    series = "Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",
    publisher = "Association for Computing Machinery",
    pages = "491--499",
    editor = "Rajesh Parekh and Jingrui He and Inderjit, {Dhillon S.} and Paul Bradley and Yehuda Koren and Rayid Ghani and Senator, {Ted E.} and Grossman, {Robert L.} and Ramasamy Uthurusamy",
    booktitle = "KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining",

    }

    TY - GEN

    T1 - Trace complexity of network inference

    AU - Abrahao, Bruno

    AU - Chierichetti, Flavio

    AU - Kleinberg, Robert

    AU - Panconesi, Alessandro

    PY - 2013/8/11

    Y1 - 2013/8/11

    N2 - The network inference problem consists of reconstructing the edge set of a network given traces representing the chronology of infection times as epidemics spread through the network. This problem is a paradigmatic representative of prediction tasks in machine learning that require deducing a latent structure from observed patterns of activity in a network, which often require an unrealistically large number of resources (e.g., amount of available data, or computational time). A fundamental question is to understand which properties we can predict with a reasonable degree of accuracy with the available resources, and which we cannot. We define the trace complexity as the number of distinct traces required to achieve high fidelity in reconstructing the topology of the unobserved network or, more generally, some of its properties. We give algorithms that are competitive with, while being simpler and more efficient than, existing network inference approaches. Moreover, we prove that our algorithms are nearly optimal, by proving an information theoretic lower bound on the number of traces that an optimal inference algorithm requires for performing this task in the general case. Given these strong lower bounds, we turn our attention to special cases, such as trees and bounded-degree graphs, and to property recovery tasks, such as reconstructing the degree distribution without inferring the network. We show that these problems require a much smaller (and more realistic) number of traces, making them potentially solvable in practice.

    AB - The network inference problem consists of reconstructing the edge set of a network given traces representing the chronology of infection times as epidemics spread through the network. This problem is a paradigmatic representative of prediction tasks in machine learning that require deducing a latent structure from observed patterns of activity in a network, which often require an unrealistically large number of resources (e.g., amount of available data, or computational time). A fundamental question is to understand which properties we can predict with a reasonable degree of accuracy with the available resources, and which we cannot. We define the trace complexity as the number of distinct traces required to achieve high fidelity in reconstructing the topology of the unobserved network or, more generally, some of its properties. We give algorithms that are competitive with, while being simpler and more efficient than, existing network inference approaches. Moreover, we prove that our algorithms are nearly optimal, by proving an information theoretic lower bound on the number of traces that an optimal inference algorithm requires for performing this task in the general case. Given these strong lower bounds, we turn our attention to special cases, such as trees and bounded-degree graphs, and to property recovery tasks, such as reconstructing the degree distribution without inferring the network. We show that these problems require a much smaller (and more realistic) number of traces, making them potentially solvable in practice.

    KW - Independent cascade model

    KW - Network epidemics

    KW - Network inference

    KW - Sampling complexity

    UR - http://www.scopus.com/inward/record.url?scp=84962021366&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84962021366&partnerID=8YFLogxK

    U2 - 10.1145/2487575.2487664

    DO - 10.1145/2487575.2487664

    M3 - Conference contribution

    AN - SCOPUS:84962021366

    T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    SP - 491

    EP - 499

    BT - KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

    A2 - Parekh, Rajesh

    A2 - He, Jingrui

    A2 - Inderjit, Dhillon S.

    A2 - Bradley, Paul

    A2 - Koren, Yehuda

    A2 - Ghani, Rayid

    A2 - Senator, Ted E.

    A2 - Grossman, Robert L.

    A2 - Uthurusamy, Ramasamy

    PB - Association for Computing Machinery

    ER -