The exception that improves the rule

Juliana Freire, Boris Glavic, Oliver Kennedy, Heiko Mueller

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

The database community has developed numerous tools and techniques for data curation and exploration, from declarative languages, to specialized techniques for data repair, and more. Yet, there is currently no consensus on how to best expose these powerful tools to an analyst in a simple, intuitive, and above all, flexible way. Thus, analysts continue to rely on tools such as spreadsheets, imperative languages, and notebook style programming environments like Jupyter for data curation. In this work, we explore the integration of spreadsheets, notebooks, and relational databases. We focus on a key advantage that both spreadsheets and imperative notebook environments have over classical relational databases: ease of exception. By relying on set-at-a-time operations, relational databases sacrifice the ability to easily define singleton operations, exceptions to a normal data processing workflow that affect query processing for a fixed set of explicitly targeted records. In comparison, a spreadsheet user can easily change the formula for just one cell, while a notebook user can add an imperative operation to her notebook that alters an output "view". We believe that enabling such idiosyncratic manual transformations in a classical relational database is critical for curation, as curation operations that are easy to declare for individual values can often be extremely challenging to generalize. We explore the challenges of enabling singletons in relational databases, propose a hybrid spreadsheet/relational notebook environment for data curation, and present our vision of Vizier, a system that exposes data curation through such an interface.

Original languageEnglish (US)
Title of host publicationHILDA 2016 - Proceedings of the Workshop on Human-In-the-Loop Data Analytics
PublisherAssociation for Computing Machinery, Inc
ISBN (Electronic)9781450342070
DOIs
StatePublished - Jun 26 2016
Event1st Workshop on Human-in-the-Loop Data Analytics, HILDA 2016 - San Francisco, United States
Duration: Jun 26 2016 → …

Other

Other1st Workshop on Human-in-the-Loop Data Analytics, HILDA 2016
CountryUnited States
CitySan Francisco
Period6/26/16 → …

Fingerprint

Spreadsheets
Query processing
Repair

ASJC Scopus subject areas

  • Computer Science Applications
  • Information Systems
  • Computational Theory and Mathematics

Cite this

Freire, J., Glavic, B., Kennedy, O., & Mueller, H. (2016). The exception that improves the rule. In HILDA 2016 - Proceedings of the Workshop on Human-In-the-Loop Data Analytics Association for Computing Machinery, Inc. https://doi.org/10.1145/2939502.2939509

The exception that improves the rule. / Freire, Juliana; Glavic, Boris; Kennedy, Oliver; Mueller, Heiko.

HILDA 2016 - Proceedings of the Workshop on Human-In-the-Loop Data Analytics. Association for Computing Machinery, Inc, 2016.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Freire, J, Glavic, B, Kennedy, O & Mueller, H 2016, The exception that improves the rule. in HILDA 2016 - Proceedings of the Workshop on Human-In-the-Loop Data Analytics. Association for Computing Machinery, Inc, 1st Workshop on Human-in-the-Loop Data Analytics, HILDA 2016, San Francisco, United States, 6/26/16. https://doi.org/10.1145/2939502.2939509
Freire J, Glavic B, Kennedy O, Mueller H. The exception that improves the rule. In HILDA 2016 - Proceedings of the Workshop on Human-In-the-Loop Data Analytics. Association for Computing Machinery, Inc. 2016 https://doi.org/10.1145/2939502.2939509
Freire, Juliana ; Glavic, Boris ; Kennedy, Oliver ; Mueller, Heiko. / The exception that improves the rule. HILDA 2016 - Proceedings of the Workshop on Human-In-the-Loop Data Analytics. Association for Computing Machinery, Inc, 2016.
@inproceedings{0550b5a45c444fb3a4e2b37960fd8cab,
title = "The exception that improves the rule",
abstract = "The database community has developed numerous tools and techniques for data curation and exploration, from declarative languages, to specialized techniques for data repair, and more. Yet, there is currently no consensus on how to best expose these powerful tools to an analyst in a simple, intuitive, and above all, flexible way. Thus, analysts continue to rely on tools such as spreadsheets, imperative languages, and notebook style programming environments like Jupyter for data curation. In this work, we explore the integration of spreadsheets, notebooks, and relational databases. We focus on a key advantage that both spreadsheets and imperative notebook environments have over classical relational databases: ease of exception. By relying on set-at-a-time operations, relational databases sacrifice the ability to easily define singleton operations, exceptions to a normal data processing workflow that affect query processing for a fixed set of explicitly targeted records. In comparison, a spreadsheet user can easily change the formula for just one cell, while a notebook user can add an imperative operation to her notebook that alters an output {"}view{"}. We believe that enabling such idiosyncratic manual transformations in a classical relational database is critical for curation, as curation operations that are easy to declare for individual values can often be extremely challenging to generalize. We explore the challenges of enabling singletons in relational databases, propose a hybrid spreadsheet/relational notebook environment for data curation, and present our vision of Vizier, a system that exposes data curation through such an interface.",
author = "Juliana Freire and Boris Glavic and Oliver Kennedy and Heiko Mueller",
year = "2016",
month = "6",
day = "26",
doi = "10.1145/2939502.2939509",
language = "English (US)",
booktitle = "HILDA 2016 - Proceedings of the Workshop on Human-In-the-Loop Data Analytics",
publisher = "Association for Computing Machinery, Inc",

}

TY - GEN

T1 - The exception that improves the rule

AU - Freire, Juliana

AU - Glavic, Boris

AU - Kennedy, Oliver

AU - Mueller, Heiko

PY - 2016/6/26

Y1 - 2016/6/26

N2 - The database community has developed numerous tools and techniques for data curation and exploration, from declarative languages, to specialized techniques for data repair, and more. Yet, there is currently no consensus on how to best expose these powerful tools to an analyst in a simple, intuitive, and above all, flexible way. Thus, analysts continue to rely on tools such as spreadsheets, imperative languages, and notebook style programming environments like Jupyter for data curation. In this work, we explore the integration of spreadsheets, notebooks, and relational databases. We focus on a key advantage that both spreadsheets and imperative notebook environments have over classical relational databases: ease of exception. By relying on set-at-a-time operations, relational databases sacrifice the ability to easily define singleton operations, exceptions to a normal data processing workflow that affect query processing for a fixed set of explicitly targeted records. In comparison, a spreadsheet user can easily change the formula for just one cell, while a notebook user can add an imperative operation to her notebook that alters an output "view". We believe that enabling such idiosyncratic manual transformations in a classical relational database is critical for curation, as curation operations that are easy to declare for individual values can often be extremely challenging to generalize. We explore the challenges of enabling singletons in relational databases, propose a hybrid spreadsheet/relational notebook environment for data curation, and present our vision of Vizier, a system that exposes data curation through such an interface.

AB - The database community has developed numerous tools and techniques for data curation and exploration, from declarative languages, to specialized techniques for data repair, and more. Yet, there is currently no consensus on how to best expose these powerful tools to an analyst in a simple, intuitive, and above all, flexible way. Thus, analysts continue to rely on tools such as spreadsheets, imperative languages, and notebook style programming environments like Jupyter for data curation. In this work, we explore the integration of spreadsheets, notebooks, and relational databases. We focus on a key advantage that both spreadsheets and imperative notebook environments have over classical relational databases: ease of exception. By relying on set-at-a-time operations, relational databases sacrifice the ability to easily define singleton operations, exceptions to a normal data processing workflow that affect query processing for a fixed set of explicitly targeted records. In comparison, a spreadsheet user can easily change the formula for just one cell, while a notebook user can add an imperative operation to her notebook that alters an output "view". We believe that enabling such idiosyncratic manual transformations in a classical relational database is critical for curation, as curation operations that are easy to declare for individual values can often be extremely challenging to generalize. We explore the challenges of enabling singletons in relational databases, propose a hybrid spreadsheet/relational notebook environment for data curation, and present our vision of Vizier, a system that exposes data curation through such an interface.

UR - http://www.scopus.com/inward/record.url?scp=84979792064&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979792064&partnerID=8YFLogxK

U2 - 10.1145/2939502.2939509

DO - 10.1145/2939502.2939509

M3 - Conference contribution

BT - HILDA 2016 - Proceedings of the Workshop on Human-In-the-Loop Data Analytics

PB - Association for Computing Machinery, Inc

ER -