Deriving probabilistic databases with inference ensembles

Julia Stoyanovich, Susan Davidson, Tova Milo, Val Tannen

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Many real-world applications deal with uncertain or missing data, prompting a surge of activity in the area of probabilistic databases. A shortcoming of prior work is the assumption that an appropriate probabilistic model, along with the necessary probability distributions, is given. We address this shortcoming by presenting a framework for learning a set of inference ensembles, termed meta-rule semi-lattices, or MRSL, from the complete portion of the data. We use the MRSL to infer probability distributions for missing data, and demonstrate experimentally that high accuracy is achieved when a single attribute value is missing per tuple. We next propose an inference algorithm based on Gibbs sampling that accurately predicts the probability distribution for multiple missing values. We also develop an optimization that greatly improves performance of multi-attribute inference for collections of tuples, while maintaining high accuracy. Finally, we develop an experimental framework to evaluate the efficiency and accuracy of our approach.

    Original languageEnglish (US)
    Title of host publication2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
    Pages303-314
    Number of pages12
    DOIs
    StatePublished - Jun 6 2011
    Event2011 IEEE 27th International Conference on Data Engineering, ICDE 2011 - Hannover, Germany
    Duration: Apr 11 2011Apr 16 2011

    Other

    Other2011 IEEE 27th International Conference on Data Engineering, ICDE 2011
    CountryGermany
    CityHannover
    Period4/11/114/16/11

    Fingerprint

    Probability distributions
    Sampling
    Statistical Models

    ASJC Scopus subject areas

    • Software
    • Signal Processing
    • Information Systems

    Cite this

    Stoyanovich, J., Davidson, S., Milo, T., & Tannen, V. (2011). Deriving probabilistic databases with inference ensembles. In 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011 (pp. 303-314). [5767854] https://doi.org/10.1109/ICDE.2011.5767854

    Deriving probabilistic databases with inference ensembles. / Stoyanovich, Julia; Davidson, Susan; Milo, Tova; Tannen, Val.

    2011 IEEE 27th International Conference on Data Engineering, ICDE 2011. 2011. p. 303-314 5767854.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Stoyanovich, J, Davidson, S, Milo, T & Tannen, V 2011, Deriving probabilistic databases with inference ensembles. in 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011., 5767854, pp. 303-314, 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011, Hannover, Germany, 4/11/11. https://doi.org/10.1109/ICDE.2011.5767854
    Stoyanovich J, Davidson S, Milo T, Tannen V. Deriving probabilistic databases with inference ensembles. In 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011. 2011. p. 303-314. 5767854 https://doi.org/10.1109/ICDE.2011.5767854
    Stoyanovich, Julia ; Davidson, Susan ; Milo, Tova ; Tannen, Val. / Deriving probabilistic databases with inference ensembles. 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011. 2011. pp. 303-314
    @inproceedings{9876ad5251e54e62bb2d31dea4c0f9f8,
    title = "Deriving probabilistic databases with inference ensembles",
    abstract = "Many real-world applications deal with uncertain or missing data, prompting a surge of activity in the area of probabilistic databases. A shortcoming of prior work is the assumption that an appropriate probabilistic model, along with the necessary probability distributions, is given. We address this shortcoming by presenting a framework for learning a set of inference ensembles, termed meta-rule semi-lattices, or MRSL, from the complete portion of the data. We use the MRSL to infer probability distributions for missing data, and demonstrate experimentally that high accuracy is achieved when a single attribute value is missing per tuple. We next propose an inference algorithm based on Gibbs sampling that accurately predicts the probability distribution for multiple missing values. We also develop an optimization that greatly improves performance of multi-attribute inference for collections of tuples, while maintaining high accuracy. Finally, we develop an experimental framework to evaluate the efficiency and accuracy of our approach.",
    author = "Julia Stoyanovich and Susan Davidson and Tova Milo and Val Tannen",
    year = "2011",
    month = "6",
    day = "6",
    doi = "10.1109/ICDE.2011.5767854",
    language = "English (US)",
    isbn = "9781424489589",
    pages = "303--314",
    booktitle = "2011 IEEE 27th International Conference on Data Engineering, ICDE 2011",

    }

    TY - GEN

    T1 - Deriving probabilistic databases with inference ensembles

    AU - Stoyanovich, Julia

    AU - Davidson, Susan

    AU - Milo, Tova

    AU - Tannen, Val

    PY - 2011/6/6

    Y1 - 2011/6/6

    N2 - Many real-world applications deal with uncertain or missing data, prompting a surge of activity in the area of probabilistic databases. A shortcoming of prior work is the assumption that an appropriate probabilistic model, along with the necessary probability distributions, is given. We address this shortcoming by presenting a framework for learning a set of inference ensembles, termed meta-rule semi-lattices, or MRSL, from the complete portion of the data. We use the MRSL to infer probability distributions for missing data, and demonstrate experimentally that high accuracy is achieved when a single attribute value is missing per tuple. We next propose an inference algorithm based on Gibbs sampling that accurately predicts the probability distribution for multiple missing values. We also develop an optimization that greatly improves performance of multi-attribute inference for collections of tuples, while maintaining high accuracy. Finally, we develop an experimental framework to evaluate the efficiency and accuracy of our approach.

    AB - Many real-world applications deal with uncertain or missing data, prompting a surge of activity in the area of probabilistic databases. A shortcoming of prior work is the assumption that an appropriate probabilistic model, along with the necessary probability distributions, is given. We address this shortcoming by presenting a framework for learning a set of inference ensembles, termed meta-rule semi-lattices, or MRSL, from the complete portion of the data. We use the MRSL to infer probability distributions for missing data, and demonstrate experimentally that high accuracy is achieved when a single attribute value is missing per tuple. We next propose an inference algorithm based on Gibbs sampling that accurately predicts the probability distribution for multiple missing values. We also develop an optimization that greatly improves performance of multi-attribute inference for collections of tuples, while maintaining high accuracy. Finally, we develop an experimental framework to evaluate the efficiency and accuracy of our approach.

    UR - http://www.scopus.com/inward/record.url?scp=79957874172&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=79957874172&partnerID=8YFLogxK

    U2 - 10.1109/ICDE.2011.5767854

    DO - 10.1109/ICDE.2011.5767854

    M3 - Conference contribution

    AN - SCOPUS:79957874172

    SN - 9781424489589

    SP - 303

    EP - 314

    BT - 2011 IEEE 27th International Conference on Data Engineering, ICDE 2011

    ER -