### Abstract

The network inference problem consists of reconstructing the edge set of a network given traces representing the chronology of infection times as epidemics spread through the network. This problem is a paradigmatic representative of prediction tasks in machine learning that require deducing a latent structure from observed patterns of activity in a network, which often require an unrealistically large number of resources (e.g., amount of available data, or computational time). A fundamental question is to understand which properties we can predict with a reasonable degree of accuracy with the available resources, and which we cannot. We define the trace complexity as the number of distinct traces required to achieve high fidelity in reconstructing the topology of the unobserved network or, more generally, some of its properties. We give algorithms that are competitive with, while being simpler and more efficient than, existing network inference approaches. Moreover, we prove that our algorithms are nearly optimal, by proving an information theoretic lower bound on the number of traces that an optimal inference algorithm requires for performing this task in the general case. Given these strong lower bounds, we turn our attention to special cases, such as trees and bounded-degree graphs, and to property recovery tasks, such as reconstructing the degree distribution without inferring the network. We show that these problems require a much smaller (and more realistic) number of traces, making them potentially solvable in practice.

Original language | English (US) |
---|---|

Title of host publication | KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |

Editors | Rajesh Parekh, Jingrui He, Dhillon S. Inderjit, Paul Bradley, Yehuda Koren, Rayid Ghani, Ted E. Senator, Robert L. Grossman, Ramasamy Uthurusamy |

Publisher | Association for Computing Machinery |

Pages | 491-499 |

Number of pages | 9 |

ISBN (Electronic) | 9781450321747 |

DOIs | |

State | Published - Aug 11 2013 |

Event | 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013 - Chicago, United States Duration: Aug 11 2013 → Aug 14 2013 |

### Publication series

Name | Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining |
---|---|

Volume | Part F128815 |

### Conference

Conference | 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013 |
---|---|

Country | United States |

City | Chicago |

Period | 8/11/13 → 8/14/13 |

### Fingerprint

### Keywords

- Independent cascade model
- Network epidemics
- Network inference
- Sampling complexity

### ASJC Scopus subject areas

- Software
- Information Systems

### Cite this

*KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*(pp. 491-499). [2487664] (Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Vol. Part F128815). Association for Computing Machinery. https://doi.org/10.1145/2487575.2487664

**Trace complexity of network inference.** / Abrahao, Bruno; Chierichetti, Flavio; Kleinberg, Robert; Panconesi, Alessandro.

Research output: Chapter in Book/Report/Conference proceeding › Conference contribution

*KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.*, 2487664, Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, vol. Part F128815, Association for Computing Machinery, pp. 491-499, 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2013, Chicago, United States, 8/11/13. https://doi.org/10.1145/2487575.2487664

}

TY - GEN

T1 - Trace complexity of network inference

AU - Abrahao, Bruno

AU - Chierichetti, Flavio

AU - Kleinberg, Robert

AU - Panconesi, Alessandro

PY - 2013/8/11

Y1 - 2013/8/11

N2 - The network inference problem consists of reconstructing the edge set of a network given traces representing the chronology of infection times as epidemics spread through the network. This problem is a paradigmatic representative of prediction tasks in machine learning that require deducing a latent structure from observed patterns of activity in a network, which often require an unrealistically large number of resources (e.g., amount of available data, or computational time). A fundamental question is to understand which properties we can predict with a reasonable degree of accuracy with the available resources, and which we cannot. We define the trace complexity as the number of distinct traces required to achieve high fidelity in reconstructing the topology of the unobserved network or, more generally, some of its properties. We give algorithms that are competitive with, while being simpler and more efficient than, existing network inference approaches. Moreover, we prove that our algorithms are nearly optimal, by proving an information theoretic lower bound on the number of traces that an optimal inference algorithm requires for performing this task in the general case. Given these strong lower bounds, we turn our attention to special cases, such as trees and bounded-degree graphs, and to property recovery tasks, such as reconstructing the degree distribution without inferring the network. We show that these problems require a much smaller (and more realistic) number of traces, making them potentially solvable in practice.

AB - The network inference problem consists of reconstructing the edge set of a network given traces representing the chronology of infection times as epidemics spread through the network. This problem is a paradigmatic representative of prediction tasks in machine learning that require deducing a latent structure from observed patterns of activity in a network, which often require an unrealistically large number of resources (e.g., amount of available data, or computational time). A fundamental question is to understand which properties we can predict with a reasonable degree of accuracy with the available resources, and which we cannot. We define the trace complexity as the number of distinct traces required to achieve high fidelity in reconstructing the topology of the unobserved network or, more generally, some of its properties. We give algorithms that are competitive with, while being simpler and more efficient than, existing network inference approaches. Moreover, we prove that our algorithms are nearly optimal, by proving an information theoretic lower bound on the number of traces that an optimal inference algorithm requires for performing this task in the general case. Given these strong lower bounds, we turn our attention to special cases, such as trees and bounded-degree graphs, and to property recovery tasks, such as reconstructing the degree distribution without inferring the network. We show that these problems require a much smaller (and more realistic) number of traces, making them potentially solvable in practice.

KW - Independent cascade model

KW - Network epidemics

KW - Network inference

KW - Sampling complexity

UR - http://www.scopus.com/inward/record.url?scp=84962021366&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84962021366&partnerID=8YFLogxK

U2 - 10.1145/2487575.2487664

DO - 10.1145/2487575.2487664

M3 - Conference contribution

AN - SCOPUS:84962021366

T3 - Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

SP - 491

EP - 499

BT - KDD 2013 - 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining

A2 - Parekh, Rajesh

A2 - He, Jingrui

A2 - Inderjit, Dhillon S.

A2 - Bradley, Paul

A2 - Koren, Yehuda

A2 - Ghani, Rayid

A2 - Senator, Ted E.

A2 - Grossman, Robert L.

A2 - Uthurusamy, Ramasamy

PB - Association for Computing Machinery

ER -