SEER

Auto-generating information extraction rules from user-specified examples

Maeda F. Hanafi, Azza Abouzied, Laura Chiticariu, Yunyao Li

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Time-consuming and complicated best describe the current state of the Information Extraction (IE) field. Machine learning approaches to IE require large collections of labeled datasets that are difficult to create and use obscure mathematical models, occasionally returning unwanted results that are unexplainable. Rule-based approaches, while resulting in easy-to-understand IE rules, are still time-consuming and labor-intensive. SEER combines the best of these two approaches: a learning model for IE rules based on a small number of user-specified examples. In this paper, we explain the design behind SEER and present a user study comparing our system against a commercially available tool in which users create IE rules manually. Our results show that SEER helps users complete text extraction tasks more quickly, as well as more accurately.

    Original languageEnglish (US)
    Title of host publicationCHI 2017 - Proceedings of the 2017 ACM SIGCHI Conference on Human Factors in Computing Systems
    Subtitle of host publicationExplore, Innovate, Inspire
    PublisherAssociation for Computing Machinery
    Pages6672-6682
    Number of pages11
    Volume2017-May
    ISBN (Electronic)9781450346559
    DOIs
    StatePublished - May 2 2017
    Event2017 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI 2017 - Denver, United States
    Duration: May 6 2017May 11 2017

    Other

    Other2017 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI 2017
    CountryUnited States
    CityDenver
    Period5/6/175/11/17

    Fingerprint

    Learning systems
    Personnel
    Mathematical models

    Keywords

    • Data extraction
    • Example-driven learning

    ASJC Scopus subject areas

    • Human-Computer Interaction
    • Computer Graphics and Computer-Aided Design
    • Software

    Cite this

    Hanafi, M. F., Abouzied, A., Chiticariu, L., & Li, Y. (2017). SEER: Auto-generating information extraction rules from user-specified examples. In CHI 2017 - Proceedings of the 2017 ACM SIGCHI Conference on Human Factors in Computing Systems: Explore, Innovate, Inspire (Vol. 2017-May, pp. 6672-6682). Association for Computing Machinery. https://doi.org/10.1145/3025453.3025540

    SEER : Auto-generating information extraction rules from user-specified examples. / Hanafi, Maeda F.; Abouzied, Azza; Chiticariu, Laura; Li, Yunyao.

    CHI 2017 - Proceedings of the 2017 ACM SIGCHI Conference on Human Factors in Computing Systems: Explore, Innovate, Inspire. Vol. 2017-May Association for Computing Machinery, 2017. p. 6672-6682.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Hanafi, MF, Abouzied, A, Chiticariu, L & Li, Y 2017, SEER: Auto-generating information extraction rules from user-specified examples. in CHI 2017 - Proceedings of the 2017 ACM SIGCHI Conference on Human Factors in Computing Systems: Explore, Innovate, Inspire. vol. 2017-May, Association for Computing Machinery, pp. 6672-6682, 2017 ACM SIGCHI Conference on Human Factors in Computing Systems, CHI 2017, Denver, United States, 5/6/17. https://doi.org/10.1145/3025453.3025540
    Hanafi MF, Abouzied A, Chiticariu L, Li Y. SEER: Auto-generating information extraction rules from user-specified examples. In CHI 2017 - Proceedings of the 2017 ACM SIGCHI Conference on Human Factors in Computing Systems: Explore, Innovate, Inspire. Vol. 2017-May. Association for Computing Machinery. 2017. p. 6672-6682 https://doi.org/10.1145/3025453.3025540
    Hanafi, Maeda F. ; Abouzied, Azza ; Chiticariu, Laura ; Li, Yunyao. / SEER : Auto-generating information extraction rules from user-specified examples. CHI 2017 - Proceedings of the 2017 ACM SIGCHI Conference on Human Factors in Computing Systems: Explore, Innovate, Inspire. Vol. 2017-May Association for Computing Machinery, 2017. pp. 6672-6682
    @inproceedings{2fd38057badc4b80a5a155ddc6faae63,
    title = "SEER: Auto-generating information extraction rules from user-specified examples",
    abstract = "Time-consuming and complicated best describe the current state of the Information Extraction (IE) field. Machine learning approaches to IE require large collections of labeled datasets that are difficult to create and use obscure mathematical models, occasionally returning unwanted results that are unexplainable. Rule-based approaches, while resulting in easy-to-understand IE rules, are still time-consuming and labor-intensive. SEER combines the best of these two approaches: a learning model for IE rules based on a small number of user-specified examples. In this paper, we explain the design behind SEER and present a user study comparing our system against a commercially available tool in which users create IE rules manually. Our results show that SEER helps users complete text extraction tasks more quickly, as well as more accurately.",
    keywords = "Data extraction, Example-driven learning",
    author = "Hanafi, {Maeda F.} and Azza Abouzied and Laura Chiticariu and Yunyao Li",
    year = "2017",
    month = "5",
    day = "2",
    doi = "10.1145/3025453.3025540",
    language = "English (US)",
    volume = "2017-May",
    pages = "6672--6682",
    booktitle = "CHI 2017 - Proceedings of the 2017 ACM SIGCHI Conference on Human Factors in Computing Systems",
    publisher = "Association for Computing Machinery",

    }

    TY - GEN

    T1 - SEER

    T2 - Auto-generating information extraction rules from user-specified examples

    AU - Hanafi, Maeda F.

    AU - Abouzied, Azza

    AU - Chiticariu, Laura

    AU - Li, Yunyao

    PY - 2017/5/2

    Y1 - 2017/5/2

    N2 - Time-consuming and complicated best describe the current state of the Information Extraction (IE) field. Machine learning approaches to IE require large collections of labeled datasets that are difficult to create and use obscure mathematical models, occasionally returning unwanted results that are unexplainable. Rule-based approaches, while resulting in easy-to-understand IE rules, are still time-consuming and labor-intensive. SEER combines the best of these two approaches: a learning model for IE rules based on a small number of user-specified examples. In this paper, we explain the design behind SEER and present a user study comparing our system against a commercially available tool in which users create IE rules manually. Our results show that SEER helps users complete text extraction tasks more quickly, as well as more accurately.

    AB - Time-consuming and complicated best describe the current state of the Information Extraction (IE) field. Machine learning approaches to IE require large collections of labeled datasets that are difficult to create and use obscure mathematical models, occasionally returning unwanted results that are unexplainable. Rule-based approaches, while resulting in easy-to-understand IE rules, are still time-consuming and labor-intensive. SEER combines the best of these two approaches: a learning model for IE rules based on a small number of user-specified examples. In this paper, we explain the design behind SEER and present a user study comparing our system against a commercially available tool in which users create IE rules manually. Our results show that SEER helps users complete text extraction tasks more quickly, as well as more accurately.

    KW - Data extraction

    KW - Example-driven learning

    UR - http://www.scopus.com/inward/record.url?scp=85021225025&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85021225025&partnerID=8YFLogxK

    U2 - 10.1145/3025453.3025540

    DO - 10.1145/3025453.3025540

    M3 - Conference contribution

    VL - 2017-May

    SP - 6672

    EP - 6682

    BT - CHI 2017 - Proceedings of the 2017 ACM SIGCHI Conference on Human Factors in Computing Systems

    PB - Association for Computing Machinery

    ER -