Crowd-sourced Text Analysis

Reproducible and Agile Production of Political Data

Kenneth Benoit, Drew Conway, Benjamin E. Lauderdale, Michael Laver, Slava Mikhaylov

    Research output: Contribution to journalArticle

    Abstract

    Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers' attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.

    Original languageEnglish (US)
    Pages (from-to)278-295
    Number of pages18
    JournalAmerican Political Science Review
    Volume110
    Issue number2
    DOIs
    StatePublished - May 1 2016

    Fingerprint

    text analysis
    expert
    social science
    interpretation
    language

    ASJC Scopus subject areas

    • Sociology and Political Science

    Cite this

    Crowd-sourced Text Analysis : Reproducible and Agile Production of Political Data. / Benoit, Kenneth; Conway, Drew; Lauderdale, Benjamin E.; Laver, Michael; Mikhaylov, Slava.

    In: American Political Science Review, Vol. 110, No. 2, 01.05.2016, p. 278-295.

    Research output: Contribution to journalArticle

    Benoit, K, Conway, D, Lauderdale, BE, Laver, M & Mikhaylov, S 2016, 'Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data', American Political Science Review, vol. 110, no. 2, pp. 278-295. https://doi.org/10.1017/S0003055416000058
    Benoit, Kenneth ; Conway, Drew ; Lauderdale, Benjamin E. ; Laver, Michael ; Mikhaylov, Slava. / Crowd-sourced Text Analysis : Reproducible and Agile Production of Political Data. In: American Political Science Review. 2016 ; Vol. 110, No. 2. pp. 278-295.
    @article{6c3f090422394321810039050b4d08a8,
    title = "Crowd-sourced Text Analysis: Reproducible and Agile Production of Political Data",
    abstract = "Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers' attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.",
    author = "Kenneth Benoit and Drew Conway and Lauderdale, {Benjamin E.} and Michael Laver and Slava Mikhaylov",
    year = "2016",
    month = "5",
    day = "1",
    doi = "10.1017/S0003055416000058",
    language = "English (US)",
    volume = "110",
    pages = "278--295",
    journal = "American Political Science Review",
    issn = "0003-0554",
    publisher = "Cambridge University Press",
    number = "2",

    }

    TY - JOUR

    T1 - Crowd-sourced Text Analysis

    T2 - Reproducible and Agile Production of Political Data

    AU - Benoit, Kenneth

    AU - Conway, Drew

    AU - Lauderdale, Benjamin E.

    AU - Laver, Michael

    AU - Mikhaylov, Slava

    PY - 2016/5/1

    Y1 - 2016/5/1

    N2 - Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers' attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.

    AB - Empirical social science often relies on data that are not observed in the field, but are transformed into quantitative variables by expert researchers who analyze and interpret qualitative raw sources. While generally considered the most valid way to produce data, this expert-driven process is inherently difficult to replicate or to assess on grounds of reliability. Using crowd-sourcing to distribute text for reading and interpretation by massive numbers of nonexperts, we generate results comparable to those using experts to read and interpret the same texts, but do so far more quickly and flexibly. Crucially, the data we collect can be reproduced and extended transparently, making crowd-sourced datasets intrinsically reproducible. This focuses researchers' attention on the fundamental scientific objective of specifying reliable and replicable methods for collecting the data needed, rather than on the content of any particular dataset. We also show that our approach works straightforwardly with different types of political text, written in different languages. While findings reported here concern text analysis, they have far-reaching implications for expert-generated data in the social sciences.

    UR - http://www.scopus.com/inward/record.url?scp=84982918683&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84982918683&partnerID=8YFLogxK

    U2 - 10.1017/S0003055416000058

    DO - 10.1017/S0003055416000058

    M3 - Article

    VL - 110

    SP - 278

    EP - 295

    JO - American Political Science Review

    JF - American Political Science Review

    SN - 0003-0554

    IS - 2

    ER -