Learning to extract quality discourse in online communities

Michael Brennan, Stacy Wrazien, Rachel Greenstadt

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Collaborative filtering systems have been developed to manage information overload and improve discourse in online communities. In such systems, users rank content provided by other users on the validity or usefulness within their particular context. The goal is that "good" content will rise to prominence and "bad" content will fade into obscurity. These filtering mechanisms are not well-understood and have known weaknesses. For example, they depend on the presence of a large crowd to rate content, but such a crowd may not be present. Additionally, the community's decisions determine which voices will reach a large audience and which will be silenced, but it is not known if these decisions represent "the wisdom of crowds" or a "censoring mob." Our approach uses statistical machine learning to predict community ratings. By extracting features that replicate the community's verdict, we can better understand collaborative filtering, improve the way the community uses the ratings of their members, and design agents that augment community decision-making. Slashdot is an example of such a community where peers will rate each others' comments based on their relevance to the post. This work extracts a wide variety of features from the Slashdot metadata and posts' linguistic contents to identify features that can predict the community rating. We find that author reputation, use of pronouns, and author sentiment are salient. We achieve 76% accuracy predicting community ratings as good, neutral, or bad.

    Original languageEnglish (US)
    Title of host publicationCollaboratively-Built Knowledge Sources and Artificial Intelligence - Papers from the 2010 AAAI Workshop, Technical Report
    Pages2-7
    Number of pages6
    StatePublished - Dec 1 2010
    Event2010 AAAI Workshop - Atlanta, GA, United States
    Duration: Jul 12 2010Jul 12 2010

    Publication series

    NameAAAI Workshop - Technical Report
    VolumeWS-10-02

    Conference

    Conference2010 AAAI Workshop
    CountryUnited States
    CityAtlanta, GA
    Period7/12/107/12/10

    Fingerprint

    Collaborative filtering
    Metadata
    Linguistics
    Learning systems
    Decision making

    ASJC Scopus subject areas

    • Engineering(all)

    Cite this

    Brennan, M., Wrazien, S., & Greenstadt, R. (2010). Learning to extract quality discourse in online communities. In Collaboratively-Built Knowledge Sources and Artificial Intelligence - Papers from the 2010 AAAI Workshop, Technical Report (pp. 2-7). (AAAI Workshop - Technical Report; Vol. WS-10-02).

    Learning to extract quality discourse in online communities. / Brennan, Michael; Wrazien, Stacy; Greenstadt, Rachel.

    Collaboratively-Built Knowledge Sources and Artificial Intelligence - Papers from the 2010 AAAI Workshop, Technical Report. 2010. p. 2-7 (AAAI Workshop - Technical Report; Vol. WS-10-02).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Brennan, M, Wrazien, S & Greenstadt, R 2010, Learning to extract quality discourse in online communities. in Collaboratively-Built Knowledge Sources and Artificial Intelligence - Papers from the 2010 AAAI Workshop, Technical Report. AAAI Workshop - Technical Report, vol. WS-10-02, pp. 2-7, 2010 AAAI Workshop, Atlanta, GA, United States, 7/12/10.
    Brennan M, Wrazien S, Greenstadt R. Learning to extract quality discourse in online communities. In Collaboratively-Built Knowledge Sources and Artificial Intelligence - Papers from the 2010 AAAI Workshop, Technical Report. 2010. p. 2-7. (AAAI Workshop - Technical Report).
    Brennan, Michael ; Wrazien, Stacy ; Greenstadt, Rachel. / Learning to extract quality discourse in online communities. Collaboratively-Built Knowledge Sources and Artificial Intelligence - Papers from the 2010 AAAI Workshop, Technical Report. 2010. pp. 2-7 (AAAI Workshop - Technical Report).
    @inproceedings{81729322c3e140b8bdd24749897f2ac5,
    title = "Learning to extract quality discourse in online communities",
    abstract = "Collaborative filtering systems have been developed to manage information overload and improve discourse in online communities. In such systems, users rank content provided by other users on the validity or usefulness within their particular context. The goal is that {"}good{"} content will rise to prominence and {"}bad{"} content will fade into obscurity. These filtering mechanisms are not well-understood and have known weaknesses. For example, they depend on the presence of a large crowd to rate content, but such a crowd may not be present. Additionally, the community's decisions determine which voices will reach a large audience and which will be silenced, but it is not known if these decisions represent {"}the wisdom of crowds{"} or a {"}censoring mob.{"} Our approach uses statistical machine learning to predict community ratings. By extracting features that replicate the community's verdict, we can better understand collaborative filtering, improve the way the community uses the ratings of their members, and design agents that augment community decision-making. Slashdot is an example of such a community where peers will rate each others' comments based on their relevance to the post. This work extracts a wide variety of features from the Slashdot metadata and posts' linguistic contents to identify features that can predict the community rating. We find that author reputation, use of pronouns, and author sentiment are salient. We achieve 76{\%} accuracy predicting community ratings as good, neutral, or bad.",
    author = "Michael Brennan and Stacy Wrazien and Rachel Greenstadt",
    year = "2010",
    month = "12",
    day = "1",
    language = "English (US)",
    isbn = "9781577354680",
    series = "AAAI Workshop - Technical Report",
    pages = "2--7",
    booktitle = "Collaboratively-Built Knowledge Sources and Artificial Intelligence - Papers from the 2010 AAAI Workshop, Technical Report",

    }

    TY - GEN

    T1 - Learning to extract quality discourse in online communities

    AU - Brennan, Michael

    AU - Wrazien, Stacy

    AU - Greenstadt, Rachel

    PY - 2010/12/1

    Y1 - 2010/12/1

    N2 - Collaborative filtering systems have been developed to manage information overload and improve discourse in online communities. In such systems, users rank content provided by other users on the validity or usefulness within their particular context. The goal is that "good" content will rise to prominence and "bad" content will fade into obscurity. These filtering mechanisms are not well-understood and have known weaknesses. For example, they depend on the presence of a large crowd to rate content, but such a crowd may not be present. Additionally, the community's decisions determine which voices will reach a large audience and which will be silenced, but it is not known if these decisions represent "the wisdom of crowds" or a "censoring mob." Our approach uses statistical machine learning to predict community ratings. By extracting features that replicate the community's verdict, we can better understand collaborative filtering, improve the way the community uses the ratings of their members, and design agents that augment community decision-making. Slashdot is an example of such a community where peers will rate each others' comments based on their relevance to the post. This work extracts a wide variety of features from the Slashdot metadata and posts' linguistic contents to identify features that can predict the community rating. We find that author reputation, use of pronouns, and author sentiment are salient. We achieve 76% accuracy predicting community ratings as good, neutral, or bad.

    AB - Collaborative filtering systems have been developed to manage information overload and improve discourse in online communities. In such systems, users rank content provided by other users on the validity or usefulness within their particular context. The goal is that "good" content will rise to prominence and "bad" content will fade into obscurity. These filtering mechanisms are not well-understood and have known weaknesses. For example, they depend on the presence of a large crowd to rate content, but such a crowd may not be present. Additionally, the community's decisions determine which voices will reach a large audience and which will be silenced, but it is not known if these decisions represent "the wisdom of crowds" or a "censoring mob." Our approach uses statistical machine learning to predict community ratings. By extracting features that replicate the community's verdict, we can better understand collaborative filtering, improve the way the community uses the ratings of their members, and design agents that augment community decision-making. Slashdot is an example of such a community where peers will rate each others' comments based on their relevance to the post. This work extracts a wide variety of features from the Slashdot metadata and posts' linguistic contents to identify features that can predict the community rating. We find that author reputation, use of pronouns, and author sentiment are salient. We achieve 76% accuracy predicting community ratings as good, neutral, or bad.

    UR - http://www.scopus.com/inward/record.url?scp=79959711196&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=79959711196&partnerID=8YFLogxK

    M3 - Conference contribution

    AN - SCOPUS:79959711196

    SN - 9781577354680

    T3 - AAAI Workshop - Technical Report

    SP - 2

    EP - 7

    BT - Collaboratively-Built Knowledge Sources and Artificial Intelligence - Papers from the 2010 AAAI Workshop, Technical Report

    ER -