The exploratory labeling assistant: Mixed-initiative label curation with large document collections

Cristian Felix, Aritra Dasgupta, Enrico Bertini

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    In this paper, we define the concept of exploratory labeling: the use of computational and interactive methods to help analysts categorize groups of documents into a set of unknown and evolving labels. While many computational methods exist to analyze data and build models once the data is organized around a set of predefined categories or labels, few methods address the problem of reliably discovering and curating such labels in the first place. In order to move first steps towards bridging this gap, we propose an interactive visual data analysis method that integrates human-driven label ideation, specification and refinement with machine-driven recommendations. The proposed method enables the user to progressively discover and ideate labels in an exploratory fashion and specify rules that can be used to automatically match sets of documents to labels. To support this process of ideation, specification, as well as evaluation of the labels, we use unsupervised machine learning methods that provide suggestions and data summaries. We evaluate our method by applying it to a real-world labeling problem as well as through controlled user studies to identify and reflect on patterns of interaction emerging from exploratory labeling activities.

    Original languageEnglish (US)
    Title of host publicationUIST 2018 - Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology
    PublisherAssociation for Computing Machinery, Inc
    Pages153-164
    Number of pages12
    ISBN (Electronic)9781450359481
    DOIs
    StatePublished - Oct 11 2018
    Event31st Annual ACM Symposium on User Interface Software and Technology, UIST 2018 - Berlin, Germany
    Duration: Oct 14 2018Oct 17 2018

    Other

    Other31st Annual ACM Symposium on User Interface Software and Technology, UIST 2018
    CountryGermany
    CityBerlin
    Period10/14/1810/17/18

    Fingerprint

    Labeling
    Labels
    Specifications
    Computational methods
    Learning systems

    Keywords

    • Document labeling
    • Exploratory labeling
    • Text analysis
    • Visualization

    ASJC Scopus subject areas

    • Human-Computer Interaction
    • Computer Graphics and Computer-Aided Design
    • Software

    Cite this

    Felix, C., Dasgupta, A., & Bertini, E. (2018). The exploratory labeling assistant: Mixed-initiative label curation with large document collections. In UIST 2018 - Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (pp. 153-164). Association for Computing Machinery, Inc. https://doi.org/10.1145/3242587.3242596

    The exploratory labeling assistant : Mixed-initiative label curation with large document collections. / Felix, Cristian; Dasgupta, Aritra; Bertini, Enrico.

    UIST 2018 - Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, Inc, 2018. p. 153-164.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Felix, C, Dasgupta, A & Bertini, E 2018, The exploratory labeling assistant: Mixed-initiative label curation with large document collections. in UIST 2018 - Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, Inc, pp. 153-164, 31st Annual ACM Symposium on User Interface Software and Technology, UIST 2018, Berlin, Germany, 10/14/18. https://doi.org/10.1145/3242587.3242596
    Felix C, Dasgupta A, Bertini E. The exploratory labeling assistant: Mixed-initiative label curation with large document collections. In UIST 2018 - Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, Inc. 2018. p. 153-164 https://doi.org/10.1145/3242587.3242596
    Felix, Cristian ; Dasgupta, Aritra ; Bertini, Enrico. / The exploratory labeling assistant : Mixed-initiative label curation with large document collections. UIST 2018 - Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology. Association for Computing Machinery, Inc, 2018. pp. 153-164
    @inproceedings{4f140656d0d64e58a36c44ea6a0bc473,
    title = "The exploratory labeling assistant: Mixed-initiative label curation with large document collections",
    abstract = "In this paper, we define the concept of exploratory labeling: the use of computational and interactive methods to help analysts categorize groups of documents into a set of unknown and evolving labels. While many computational methods exist to analyze data and build models once the data is organized around a set of predefined categories or labels, few methods address the problem of reliably discovering and curating such labels in the first place. In order to move first steps towards bridging this gap, we propose an interactive visual data analysis method that integrates human-driven label ideation, specification and refinement with machine-driven recommendations. The proposed method enables the user to progressively discover and ideate labels in an exploratory fashion and specify rules that can be used to automatically match sets of documents to labels. To support this process of ideation, specification, as well as evaluation of the labels, we use unsupervised machine learning methods that provide suggestions and data summaries. We evaluate our method by applying it to a real-world labeling problem as well as through controlled user studies to identify and reflect on patterns of interaction emerging from exploratory labeling activities.",
    keywords = "Document labeling, Exploratory labeling, Text analysis, Visualization",
    author = "Cristian Felix and Aritra Dasgupta and Enrico Bertini",
    year = "2018",
    month = "10",
    day = "11",
    doi = "10.1145/3242587.3242596",
    language = "English (US)",
    pages = "153--164",
    booktitle = "UIST 2018 - Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology",
    publisher = "Association for Computing Machinery, Inc",

    }

    TY - GEN

    T1 - The exploratory labeling assistant

    T2 - Mixed-initiative label curation with large document collections

    AU - Felix, Cristian

    AU - Dasgupta, Aritra

    AU - Bertini, Enrico

    PY - 2018/10/11

    Y1 - 2018/10/11

    N2 - In this paper, we define the concept of exploratory labeling: the use of computational and interactive methods to help analysts categorize groups of documents into a set of unknown and evolving labels. While many computational methods exist to analyze data and build models once the data is organized around a set of predefined categories or labels, few methods address the problem of reliably discovering and curating such labels in the first place. In order to move first steps towards bridging this gap, we propose an interactive visual data analysis method that integrates human-driven label ideation, specification and refinement with machine-driven recommendations. The proposed method enables the user to progressively discover and ideate labels in an exploratory fashion and specify rules that can be used to automatically match sets of documents to labels. To support this process of ideation, specification, as well as evaluation of the labels, we use unsupervised machine learning methods that provide suggestions and data summaries. We evaluate our method by applying it to a real-world labeling problem as well as through controlled user studies to identify and reflect on patterns of interaction emerging from exploratory labeling activities.

    AB - In this paper, we define the concept of exploratory labeling: the use of computational and interactive methods to help analysts categorize groups of documents into a set of unknown and evolving labels. While many computational methods exist to analyze data and build models once the data is organized around a set of predefined categories or labels, few methods address the problem of reliably discovering and curating such labels in the first place. In order to move first steps towards bridging this gap, we propose an interactive visual data analysis method that integrates human-driven label ideation, specification and refinement with machine-driven recommendations. The proposed method enables the user to progressively discover and ideate labels in an exploratory fashion and specify rules that can be used to automatically match sets of documents to labels. To support this process of ideation, specification, as well as evaluation of the labels, we use unsupervised machine learning methods that provide suggestions and data summaries. We evaluate our method by applying it to a real-world labeling problem as well as through controlled user studies to identify and reflect on patterns of interaction emerging from exploratory labeling activities.

    KW - Document labeling

    KW - Exploratory labeling

    KW - Text analysis

    KW - Visualization

    UR - http://www.scopus.com/inward/record.url?scp=85056904824&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85056904824&partnerID=8YFLogxK

    U2 - 10.1145/3242587.3242596

    DO - 10.1145/3242587.3242596

    M3 - Conference contribution

    AN - SCOPUS:85056904824

    SP - 153

    EP - 164

    BT - UIST 2018 - Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology

    PB - Association for Computing Machinery, Inc

    ER -