A survey on collecting, managing, and analyzing provenance from scripts

João Felipe Pimentel, Juliana Freire, Leonardo Murta, Vanessa Braganholo

Research output: Contribution to journalReview article

Abstract

Scripts are widely used to design and run scientific experiments. Scripting languages are easy to learn and use, and they allow complex tasks to be specified and executed in fewer steps than with traditional programming languages. However, they also have important limitations for reproducibility and data management. As experiments are iteratively refined, it is challenging to reason about each experiment run (or trial), to keep track of the association between trials and experiment instances as well as the differences across trials, and to connect results to specific input data and parameters. Approaches have been proposed that address these limitations by collecting, managing, and analyzing the provenance of scripts. In this article, we survey the state of the art in provenance for scripts. We have identified the approaches by following an exhaustive protocol of forward and backward literature snowballing. Based on a detailed study, we propose a taxonomy and classify the approaches using this taxonomy.

Original languageEnglish (US)
Article number47
JournalACM Computing Surveys
Volume52
Issue number3
DOIs
StatePublished - Jun 1 2019

Fingerprint

Provenance
Taxonomies
Taxonomy
Experiment
Experiments
Reproducibility
Data Management
Computer programming languages
Information management
Programming Languages
Classify
Network protocols

Keywords

  • Analyzing
  • Collecting
  • Managing
  • Provenance
  • Scripts
  • Survey

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

A survey on collecting, managing, and analyzing provenance from scripts. / Pimentel, João Felipe; Freire, Juliana; Murta, Leonardo; Braganholo, Vanessa.

In: ACM Computing Surveys, Vol. 52, No. 3, 47, 01.06.2019.

Research output: Contribution to journalReview article

Pimentel, João Felipe ; Freire, Juliana ; Murta, Leonardo ; Braganholo, Vanessa. / A survey on collecting, managing, and analyzing provenance from scripts. In: ACM Computing Surveys. 2019 ; Vol. 52, No. 3.
@article{a43e64090fa8496ea0fb5e51abb3a77e,
title = "A survey on collecting, managing, and analyzing provenance from scripts",
abstract = "Scripts are widely used to design and run scientific experiments. Scripting languages are easy to learn and use, and they allow complex tasks to be specified and executed in fewer steps than with traditional programming languages. However, they also have important limitations for reproducibility and data management. As experiments are iteratively refined, it is challenging to reason about each experiment run (or trial), to keep track of the association between trials and experiment instances as well as the differences across trials, and to connect results to specific input data and parameters. Approaches have been proposed that address these limitations by collecting, managing, and analyzing the provenance of scripts. In this article, we survey the state of the art in provenance for scripts. We have identified the approaches by following an exhaustive protocol of forward and backward literature snowballing. Based on a detailed study, we propose a taxonomy and classify the approaches using this taxonomy.",
keywords = "Analyzing, Collecting, Managing, Provenance, Scripts, Survey",
author = "Pimentel, {Jo{\~a}o Felipe} and Juliana Freire and Leonardo Murta and Vanessa Braganholo",
year = "2019",
month = "6",
day = "1",
doi = "10.1145/3311955",
language = "English (US)",
volume = "52",
journal = "ACM Computing Surveys",
issn = "0360-0300",
publisher = "Association for Computing Machinery (ACM)",
number = "3",

}

TY - JOUR

T1 - A survey on collecting, managing, and analyzing provenance from scripts

AU - Pimentel, João Felipe

AU - Freire, Juliana

AU - Murta, Leonardo

AU - Braganholo, Vanessa

PY - 2019/6/1

Y1 - 2019/6/1

N2 - Scripts are widely used to design and run scientific experiments. Scripting languages are easy to learn and use, and they allow complex tasks to be specified and executed in fewer steps than with traditional programming languages. However, they also have important limitations for reproducibility and data management. As experiments are iteratively refined, it is challenging to reason about each experiment run (or trial), to keep track of the association between trials and experiment instances as well as the differences across trials, and to connect results to specific input data and parameters. Approaches have been proposed that address these limitations by collecting, managing, and analyzing the provenance of scripts. In this article, we survey the state of the art in provenance for scripts. We have identified the approaches by following an exhaustive protocol of forward and backward literature snowballing. Based on a detailed study, we propose a taxonomy and classify the approaches using this taxonomy.

AB - Scripts are widely used to design and run scientific experiments. Scripting languages are easy to learn and use, and they allow complex tasks to be specified and executed in fewer steps than with traditional programming languages. However, they also have important limitations for reproducibility and data management. As experiments are iteratively refined, it is challenging to reason about each experiment run (or trial), to keep track of the association between trials and experiment instances as well as the differences across trials, and to connect results to specific input data and parameters. Approaches have been proposed that address these limitations by collecting, managing, and analyzing the provenance of scripts. In this article, we survey the state of the art in provenance for scripts. We have identified the approaches by following an exhaustive protocol of forward and backward literature snowballing. Based on a detailed study, we propose a taxonomy and classify the approaches using this taxonomy.

KW - Analyzing

KW - Collecting

KW - Managing

KW - Provenance

KW - Scripts

KW - Survey

UR - http://www.scopus.com/inward/record.url?scp=85068090647&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85068090647&partnerID=8YFLogxK

U2 - 10.1145/3311955

DO - 10.1145/3311955

M3 - Review article

AN - SCOPUS:85068090647

VL - 52

JO - ACM Computing Surveys

JF - ACM Computing Surveys

SN - 0360-0300

IS - 3

M1 - 47

ER -