Putting Lipstick on Pig

Enabling databasestyle workflow provenance

Yael Amsterdamer, Susan B. Davidson, Daniel Deutch, Tova Milo, Julia Stoyanovich, Val Tannen

    Research output: Contribution to journalArticle

    Abstract

    Workflow provenance typically assumes that each module is a "black-box", so that each output depends on all in-puts (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an out-put may depend on only a small subset of the inputs (fine-grained dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow prove-nance. It also enables a number of novel graph transforma-tion operations, allowing to choose the desired level of gran-ularity in provenance querying (ZoomIn and ZoomOut), and supporting "what-if" workflow analytic queries. We imple-mented our approach in the Lipstick system and developed a benchmark in support of a systematic performance eval-uation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance.

    Original languageEnglish (US)
    Pages (from-to)346-357
    Number of pages12
    JournalProceedings of the VLDB Endowment
    Volume5
    Issue number4
    DOIs
    StatePublished - Jan 1 2011

    ASJC Scopus subject areas

    • Computer Science (miscellaneous)
    • Computer Science(all)

    Cite this

    Amsterdamer, Y., Davidson, S. B., Deutch, D., Milo, T., Stoyanovich, J., & Tannen, V. (2011). Putting Lipstick on Pig: Enabling databasestyle workflow provenance. Proceedings of the VLDB Endowment, 5(4), 346-357. https://doi.org/10.14778/2095686.2095693

    Putting Lipstick on Pig : Enabling databasestyle workflow provenance. / Amsterdamer, Yael; Davidson, Susan B.; Deutch, Daniel; Milo, Tova; Stoyanovich, Julia; Tannen, Val.

    In: Proceedings of the VLDB Endowment, Vol. 5, No. 4, 01.01.2011, p. 346-357.

    Research output: Contribution to journalArticle

    Amsterdamer, Y, Davidson, SB, Deutch, D, Milo, T, Stoyanovich, J & Tannen, V 2011, 'Putting Lipstick on Pig: Enabling databasestyle workflow provenance', Proceedings of the VLDB Endowment, vol. 5, no. 4, pp. 346-357. https://doi.org/10.14778/2095686.2095693
    Amsterdamer Y, Davidson SB, Deutch D, Milo T, Stoyanovich J, Tannen V. Putting Lipstick on Pig: Enabling databasestyle workflow provenance. Proceedings of the VLDB Endowment. 2011 Jan 1;5(4):346-357. https://doi.org/10.14778/2095686.2095693
    Amsterdamer, Yael ; Davidson, Susan B. ; Deutch, Daniel ; Milo, Tova ; Stoyanovich, Julia ; Tannen, Val. / Putting Lipstick on Pig : Enabling databasestyle workflow provenance. In: Proceedings of the VLDB Endowment. 2011 ; Vol. 5, No. 4. pp. 346-357.
    @article{dc69f1efc9d5443b9a97905ae86ce8f5,
    title = "Putting Lipstick on Pig: Enabling databasestyle workflow provenance",
    abstract = "Workflow provenance typically assumes that each module is a {"}black-box{"}, so that each output depends on all in-puts (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an out-put may depend on only a small subset of the inputs (fine-grained dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow prove-nance. It also enables a number of novel graph transforma-tion operations, allowing to choose the desired level of gran-ularity in provenance querying (ZoomIn and ZoomOut), and supporting {"}what-if{"} workflow analytic queries. We imple-mented our approach in the Lipstick system and developed a benchmark in support of a systematic performance eval-uation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance.",
    author = "Yael Amsterdamer and Davidson, {Susan B.} and Daniel Deutch and Tova Milo and Julia Stoyanovich and Val Tannen",
    year = "2011",
    month = "1",
    day = "1",
    doi = "10.14778/2095686.2095693",
    language = "English (US)",
    volume = "5",
    pages = "346--357",
    journal = "Proceedings of the VLDB Endowment",
    issn = "2150-8097",
    publisher = "Very Large Data Base Endowment Inc.",
    number = "4",

    }

    TY - JOUR

    T1 - Putting Lipstick on Pig

    T2 - Enabling databasestyle workflow provenance

    AU - Amsterdamer, Yael

    AU - Davidson, Susan B.

    AU - Deutch, Daniel

    AU - Milo, Tova

    AU - Stoyanovich, Julia

    AU - Tannen, Val

    PY - 2011/1/1

    Y1 - 2011/1/1

    N2 - Workflow provenance typically assumes that each module is a "black-box", so that each output depends on all in-puts (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an out-put may depend on only a small subset of the inputs (fine-grained dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow prove-nance. It also enables a number of novel graph transforma-tion operations, allowing to choose the desired level of gran-ularity in provenance querying (ZoomIn and ZoomOut), and supporting "what-if" workflow analytic queries. We imple-mented our approach in the Lipstick system and developed a benchmark in support of a systematic performance eval-uation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance.

    AB - Workflow provenance typically assumes that each module is a "black-box", so that each output depends on all in-puts (coarse-grained dependencies). Furthermore, it does not model the internal state of a module, which can change between repeated executions. In practice, however, an out-put may depend on only a small subset of the inputs (fine-grained dependencies) as well as on the internal state of the module. We present a novel provenance framework that marries database-style and workflow-style provenance, by using Pig Latin to expose the functionality of modules, thus capturing internal state and fine-grained dependencies. A critical ingredient in our solution is the use of a novel form of provenance graph that models module invocations and yields a compact representation of fine-grained workflow prove-nance. It also enables a number of novel graph transforma-tion operations, allowing to choose the desired level of gran-ularity in provenance querying (ZoomIn and ZoomOut), and supporting "what-if" workflow analytic queries. We imple-mented our approach in the Lipstick system and developed a benchmark in support of a systematic performance eval-uation. Our results demonstrate the feasibility of tracking and querying fine-grained workflow provenance.

    UR - http://www.scopus.com/inward/record.url?scp=84863479950&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84863479950&partnerID=8YFLogxK

    U2 - 10.14778/2095686.2095693

    DO - 10.14778/2095686.2095693

    M3 - Article

    VL - 5

    SP - 346

    EP - 357

    JO - Proceedings of the VLDB Endowment

    JF - Proceedings of the VLDB Endowment

    SN - 2150-8097

    IS - 4

    ER -