Fides: Towards a platform for responsible data science

Julia Stoyanovich, Bill Howe, Serge Abiteboul, Gerome Miklau, Arnaud Sahuguet, Gerhard Weikum

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Issues of responsible data analysis and use are coming to the forefront of the discourse in data science research and practice, with most significant efforts to date on the part of the data mining, machine learning, and security and privacy communities. In these fields, the research has been focused on analyzing the fairness, accountability and transparency (FAT) properties of specific algorithms and their outputs. Although these issues are most apparent in the social sciences where fairness is interpreted in terms of the distribution of resources across protected groups, management of bias in source data affects a variety of fields. Consider climate change studies that require representative data from geographically diverse regions, or supply chain analyses that require data that represents the diversity of products and customers. Any domain that involves sparse or sampled data has exposure to potential bias. In this vision paper, we argue that FAT properties must be considered as database system issues, further upstream in the data science lifecycle: bias in source data goes unnoticed, and bias may be introduced during pre-processing (fairness), spurious correlations lead to reproducibility problems (accountability), and assumptions made during pre-processing have invisible but significant effects on decisions (transparency). As machine learning methods continue to be applied broadly by non-experts, the potential for misuse increases. We see a need for a data sharing and collaborative analytics platform with features to encourage (and in some cases, enforce) best practices at all stages of the data science lifecycle. We describe features of such a platform, which we term Fides, in the context of urban analytics, outlining a systems research agenda in responsible data science.

    Original languageEnglish (US)
    Title of host publicationSSDBM 2017
    Subtitle of host publication29th International Conference on Scientific and Statistical Database Management
    PublisherAssociation for Computing Machinery
    VolumePart F128636
    ISBN (Electronic)9781450352826
    DOIs
    StatePublished - Jun 27 2017
    Event29th International Conference on Scientific and Statistical Database Management, SSDBM 2017 - Chicago, United States
    Duration: Jun 27 2017Jun 29 2017

    Other

    Other29th International Conference on Scientific and Statistical Database Management, SSDBM 2017
    CountryUnited States
    CityChicago
    Period6/27/176/29/17

    Fingerprint

    Transparency
    Learning systems
    Social sciences
    Processing
    Climate change
    Supply chains
    Data mining

    Keywords

    • Accountability
    • Data
    • Data ethics
    • Data science for social good
    • Fairness
    • Responsibly
    • Transparency

    ASJC Scopus subject areas

    • Human-Computer Interaction
    • Computer Networks and Communications
    • Computer Vision and Pattern Recognition
    • Software

    Cite this

    Stoyanovich, J., Howe, B., Abiteboul, S., Miklau, G., Sahuguet, A., & Weikum, G. (2017). Fides: Towards a platform for responsible data science. In SSDBM 2017: 29th International Conference on Scientific and Statistical Database Management (Vol. Part F128636). [a26] Association for Computing Machinery. https://doi.org/10.1145/3085504.3085530

    Fides : Towards a platform for responsible data science. / Stoyanovich, Julia; Howe, Bill; Abiteboul, Serge; Miklau, Gerome; Sahuguet, Arnaud; Weikum, Gerhard.

    SSDBM 2017: 29th International Conference on Scientific and Statistical Database Management. Vol. Part F128636 Association for Computing Machinery, 2017. a26.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Stoyanovich, J, Howe, B, Abiteboul, S, Miklau, G, Sahuguet, A & Weikum, G 2017, Fides: Towards a platform for responsible data science. in SSDBM 2017: 29th International Conference on Scientific and Statistical Database Management. vol. Part F128636, a26, Association for Computing Machinery, 29th International Conference on Scientific and Statistical Database Management, SSDBM 2017, Chicago, United States, 6/27/17. https://doi.org/10.1145/3085504.3085530
    Stoyanovich J, Howe B, Abiteboul S, Miklau G, Sahuguet A, Weikum G. Fides: Towards a platform for responsible data science. In SSDBM 2017: 29th International Conference on Scientific and Statistical Database Management. Vol. Part F128636. Association for Computing Machinery. 2017. a26 https://doi.org/10.1145/3085504.3085530
    Stoyanovich, Julia ; Howe, Bill ; Abiteboul, Serge ; Miklau, Gerome ; Sahuguet, Arnaud ; Weikum, Gerhard. / Fides : Towards a platform for responsible data science. SSDBM 2017: 29th International Conference on Scientific and Statistical Database Management. Vol. Part F128636 Association for Computing Machinery, 2017.
    @inproceedings{948b8fa9df2d4f6fa78ee028d65ac5f4,
    title = "Fides: Towards a platform for responsible data science",
    abstract = "Issues of responsible data analysis and use are coming to the forefront of the discourse in data science research and practice, with most significant efforts to date on the part of the data mining, machine learning, and security and privacy communities. In these fields, the research has been focused on analyzing the fairness, accountability and transparency (FAT) properties of specific algorithms and their outputs. Although these issues are most apparent in the social sciences where fairness is interpreted in terms of the distribution of resources across protected groups, management of bias in source data affects a variety of fields. Consider climate change studies that require representative data from geographically diverse regions, or supply chain analyses that require data that represents the diversity of products and customers. Any domain that involves sparse or sampled data has exposure to potential bias. In this vision paper, we argue that FAT properties must be considered as database system issues, further upstream in the data science lifecycle: bias in source data goes unnoticed, and bias may be introduced during pre-processing (fairness), spurious correlations lead to reproducibility problems (accountability), and assumptions made during pre-processing have invisible but significant effects on decisions (transparency). As machine learning methods continue to be applied broadly by non-experts, the potential for misuse increases. We see a need for a data sharing and collaborative analytics platform with features to encourage (and in some cases, enforce) best practices at all stages of the data science lifecycle. We describe features of such a platform, which we term Fides, in the context of urban analytics, outlining a systems research agenda in responsible data science.",
    keywords = "Accountability, Data, Data ethics, Data science for social good, Fairness, Responsibly, Transparency",
    author = "Julia Stoyanovich and Bill Howe and Serge Abiteboul and Gerome Miklau and Arnaud Sahuguet and Gerhard Weikum",
    year = "2017",
    month = "6",
    day = "27",
    doi = "10.1145/3085504.3085530",
    language = "English (US)",
    volume = "Part F128636",
    booktitle = "SSDBM 2017",
    publisher = "Association for Computing Machinery",

    }

    TY - GEN

    T1 - Fides

    T2 - Towards a platform for responsible data science

    AU - Stoyanovich, Julia

    AU - Howe, Bill

    AU - Abiteboul, Serge

    AU - Miklau, Gerome

    AU - Sahuguet, Arnaud

    AU - Weikum, Gerhard

    PY - 2017/6/27

    Y1 - 2017/6/27

    N2 - Issues of responsible data analysis and use are coming to the forefront of the discourse in data science research and practice, with most significant efforts to date on the part of the data mining, machine learning, and security and privacy communities. In these fields, the research has been focused on analyzing the fairness, accountability and transparency (FAT) properties of specific algorithms and their outputs. Although these issues are most apparent in the social sciences where fairness is interpreted in terms of the distribution of resources across protected groups, management of bias in source data affects a variety of fields. Consider climate change studies that require representative data from geographically diverse regions, or supply chain analyses that require data that represents the diversity of products and customers. Any domain that involves sparse or sampled data has exposure to potential bias. In this vision paper, we argue that FAT properties must be considered as database system issues, further upstream in the data science lifecycle: bias in source data goes unnoticed, and bias may be introduced during pre-processing (fairness), spurious correlations lead to reproducibility problems (accountability), and assumptions made during pre-processing have invisible but significant effects on decisions (transparency). As machine learning methods continue to be applied broadly by non-experts, the potential for misuse increases. We see a need for a data sharing and collaborative analytics platform with features to encourage (and in some cases, enforce) best practices at all stages of the data science lifecycle. We describe features of such a platform, which we term Fides, in the context of urban analytics, outlining a systems research agenda in responsible data science.

    AB - Issues of responsible data analysis and use are coming to the forefront of the discourse in data science research and practice, with most significant efforts to date on the part of the data mining, machine learning, and security and privacy communities. In these fields, the research has been focused on analyzing the fairness, accountability and transparency (FAT) properties of specific algorithms and their outputs. Although these issues are most apparent in the social sciences where fairness is interpreted in terms of the distribution of resources across protected groups, management of bias in source data affects a variety of fields. Consider climate change studies that require representative data from geographically diverse regions, or supply chain analyses that require data that represents the diversity of products and customers. Any domain that involves sparse or sampled data has exposure to potential bias. In this vision paper, we argue that FAT properties must be considered as database system issues, further upstream in the data science lifecycle: bias in source data goes unnoticed, and bias may be introduced during pre-processing (fairness), spurious correlations lead to reproducibility problems (accountability), and assumptions made during pre-processing have invisible but significant effects on decisions (transparency). As machine learning methods continue to be applied broadly by non-experts, the potential for misuse increases. We see a need for a data sharing and collaborative analytics platform with features to encourage (and in some cases, enforce) best practices at all stages of the data science lifecycle. We describe features of such a platform, which we term Fides, in the context of urban analytics, outlining a systems research agenda in responsible data science.

    KW - Accountability

    KW - Data

    KW - Data ethics

    KW - Data science for social good

    KW - Fairness

    KW - Responsibly

    KW - Transparency

    UR - http://www.scopus.com/inward/record.url?scp=85025672959&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85025672959&partnerID=8YFLogxK

    U2 - 10.1145/3085504.3085530

    DO - 10.1145/3085504.3085530

    M3 - Conference contribution

    AN - SCOPUS:85025672959

    VL - Part F128636

    BT - SSDBM 2017

    PB - Association for Computing Machinery

    ER -