A first study on clustering collections of workflow graphs

Emanuele Santos, Lauro Lins, James P. Ahrens, Juliana Freire, Cláudio T. Silva

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

As workflow systems get more widely used, the number of workflows and the volume of provenance they generate has grown considerably. New tools and infrastructure are needed to allow users to interact with, reason about, and re-use this information. In this paper, we explore the use of clustering techniques to organize large collections of workflow and provenance graphs. We propose two different representations for these graphs and present an experimental evaluation, using a collection of 1,700 workflow graphs, where we study the trade-offs of these representations and the effectiveness of alternative clustering techniques.

Original languageEnglish (US)
Title of host publicationProvenance and Annotation of Data and Processes - 2nd International Provenance and Annotation Workshop, IPAW 2008, Revised Selected Papers
PublisherSpringer Verlag
Pages160-173
Number of pages14
Volume5272
ISBN (Print)9783540899648
StatePublished - 2008
Event2nd International Provenance and Annotation Workshop, IPAW 2008 - Salt Lake City, United States
Duration: Jun 17 2008Jun 18 2008

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume5272
ISSN (Print)03029743
ISSN (Electronic)16113349

Other

Other2nd International Provenance and Annotation Workshop, IPAW 2008
CountryUnited States
CitySalt Lake City
Period6/17/086/18/08

Fingerprint

Information use
Work Flow
Clustering
Provenance
Graph in graph theory
Experimental Evaluation
Reuse
Infrastructure
Trade-offs
Alternatives

ASJC Scopus subject areas

  • Computer Science(all)
  • Theoretical Computer Science

Cite this

Santos, E., Lins, L., Ahrens, J. P., Freire, J., & Silva, C. T. (2008). A first study on clustering collections of workflow graphs. In Provenance and Annotation of Data and Processes - 2nd International Provenance and Annotation Workshop, IPAW 2008, Revised Selected Papers (Vol. 5272, pp. 160-173). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5272). Springer Verlag.

A first study on clustering collections of workflow graphs. / Santos, Emanuele; Lins, Lauro; Ahrens, James P.; Freire, Juliana; Silva, Cláudio T.

Provenance and Annotation of Data and Processes - 2nd International Provenance and Annotation Workshop, IPAW 2008, Revised Selected Papers. Vol. 5272 Springer Verlag, 2008. p. 160-173 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 5272).

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Santos, E, Lins, L, Ahrens, JP, Freire, J & Silva, CT 2008, A first study on clustering collections of workflow graphs. in Provenance and Annotation of Data and Processes - 2nd International Provenance and Annotation Workshop, IPAW 2008, Revised Selected Papers. vol. 5272, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5272, Springer Verlag, pp. 160-173, 2nd International Provenance and Annotation Workshop, IPAW 2008, Salt Lake City, United States, 6/17/08.
Santos E, Lins L, Ahrens JP, Freire J, Silva CT. A first study on clustering collections of workflow graphs. In Provenance and Annotation of Data and Processes - 2nd International Provenance and Annotation Workshop, IPAW 2008, Revised Selected Papers. Vol. 5272. Springer Verlag. 2008. p. 160-173. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
Santos, Emanuele ; Lins, Lauro ; Ahrens, James P. ; Freire, Juliana ; Silva, Cláudio T. / A first study on clustering collections of workflow graphs. Provenance and Annotation of Data and Processes - 2nd International Provenance and Annotation Workshop, IPAW 2008, Revised Selected Papers. Vol. 5272 Springer Verlag, 2008. pp. 160-173 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
@inproceedings{3800c19ed6ae4e04b46b275110a7e969,
title = "A first study on clustering collections of workflow graphs",
abstract = "As workflow systems get more widely used, the number of workflows and the volume of provenance they generate has grown considerably. New tools and infrastructure are needed to allow users to interact with, reason about, and re-use this information. In this paper, we explore the use of clustering techniques to organize large collections of workflow and provenance graphs. We propose two different representations for these graphs and present an experimental evaluation, using a collection of 1,700 workflow graphs, where we study the trade-offs of these representations and the effectiveness of alternative clustering techniques.",
author = "Emanuele Santos and Lauro Lins and Ahrens, {James P.} and Juliana Freire and Silva, {Cl{\'a}udio T.}",
year = "2008",
language = "English (US)",
isbn = "9783540899648",
volume = "5272",
series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
publisher = "Springer Verlag",
pages = "160--173",
booktitle = "Provenance and Annotation of Data and Processes - 2nd International Provenance and Annotation Workshop, IPAW 2008, Revised Selected Papers",

}

TY - GEN

T1 - A first study on clustering collections of workflow graphs

AU - Santos, Emanuele

AU - Lins, Lauro

AU - Ahrens, James P.

AU - Freire, Juliana

AU - Silva, Cláudio T.

PY - 2008

Y1 - 2008

N2 - As workflow systems get more widely used, the number of workflows and the volume of provenance they generate has grown considerably. New tools and infrastructure are needed to allow users to interact with, reason about, and re-use this information. In this paper, we explore the use of clustering techniques to organize large collections of workflow and provenance graphs. We propose two different representations for these graphs and present an experimental evaluation, using a collection of 1,700 workflow graphs, where we study the trade-offs of these representations and the effectiveness of alternative clustering techniques.

AB - As workflow systems get more widely used, the number of workflows and the volume of provenance they generate has grown considerably. New tools and infrastructure are needed to allow users to interact with, reason about, and re-use this information. In this paper, we explore the use of clustering techniques to organize large collections of workflow and provenance graphs. We propose two different representations for these graphs and present an experimental evaluation, using a collection of 1,700 workflow graphs, where we study the trade-offs of these representations and the effectiveness of alternative clustering techniques.

UR - http://www.scopus.com/inward/record.url?scp=84961839654&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84961839654&partnerID=8YFLogxK

M3 - Conference contribution

SN - 9783540899648

VL - 5272

T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

SP - 160

EP - 173

BT - Provenance and Annotation of Data and Processes - 2nd International Provenance and Annotation Workshop, IPAW 2008, Revised Selected Papers

PB - Springer Verlag

ER -