Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity

Michael Brennan, Sadia Afroz, Rachel Greenstadt

    Research output: Contribution to journalArticle

    Abstract

    The use of stylometry, authorship recognition through purely linguistic means, has contributed to literary, historical, and criminal investigation breakthroughs. Existing stylometry research assumes that authors have not attempted to disguise their linguistic writing style. We challenge this basic assumption of existing stylometry methodologies and present a new area of research: adversarial stylometry. Adversaries have a devastating effect on the robustness of existing classification methods. Our work presents a framework for creating adversarial passages including obfuscation, where a subject attempts to hide her identity, and imitation, where a subject attempts to frame another subject by imitating his writing style, and translation where original passages are obfuscated with machine translation services. This research demonstrates that manual circumvention methods work very well while automated translation methods are not effective. The obfuscation method reduces the techniques' effectiveness to the level of random guessing and the imitation attempts succeed up to 67% of the time depending on the stylometry technique used. These results are more significant given the fact that experimental subjects were unfamiliar with stylometry, were not professional writers, and spent little time on the attacks. This article also contributes to the field by using human subjects to empirically validate the claim of high accuracy for four current techniques (without adversaries).We have also compiled and released two corpora of adversarial stylometry texts to promote research in this field with a total of 57 unique authors. We argue that this field is important to a multidisciplinary approach to privacy, security, and anonymity.

    Original languageEnglish (US)
    Article number12
    JournalACM Transactions on Information and System Security
    Volume15
    Issue number3
    DOIs
    StatePublished - Nov 1 2012

    Fingerprint

    Linguistics

    Keywords

    • Algorithms
    • Experimentation Additional

    ASJC Scopus subject areas

    • Computer Science(all)
    • Safety, Risk, Reliability and Quality

    Cite this

    Adversarial stylometry : Circumventing authorship recognition to preserve privacy and anonymity. / Brennan, Michael; Afroz, Sadia; Greenstadt, Rachel.

    In: ACM Transactions on Information and System Security, Vol. 15, No. 3, 12, 01.11.2012.

    Research output: Contribution to journalArticle

    @article{c910a3ca86bc4441a0154f91b0b8185c,
    title = "Adversarial stylometry: Circumventing authorship recognition to preserve privacy and anonymity",
    abstract = "The use of stylometry, authorship recognition through purely linguistic means, has contributed to literary, historical, and criminal investigation breakthroughs. Existing stylometry research assumes that authors have not attempted to disguise their linguistic writing style. We challenge this basic assumption of existing stylometry methodologies and present a new area of research: adversarial stylometry. Adversaries have a devastating effect on the robustness of existing classification methods. Our work presents a framework for creating adversarial passages including obfuscation, where a subject attempts to hide her identity, and imitation, where a subject attempts to frame another subject by imitating his writing style, and translation where original passages are obfuscated with machine translation services. This research demonstrates that manual circumvention methods work very well while automated translation methods are not effective. The obfuscation method reduces the techniques' effectiveness to the level of random guessing and the imitation attempts succeed up to 67{\%} of the time depending on the stylometry technique used. These results are more significant given the fact that experimental subjects were unfamiliar with stylometry, were not professional writers, and spent little time on the attacks. This article also contributes to the field by using human subjects to empirically validate the claim of high accuracy for four current techniques (without adversaries).We have also compiled and released two corpora of adversarial stylometry texts to promote research in this field with a total of 57 unique authors. We argue that this field is important to a multidisciplinary approach to privacy, security, and anonymity.",
    keywords = "Algorithms, Experimentation Additional",
    author = "Michael Brennan and Sadia Afroz and Rachel Greenstadt",
    year = "2012",
    month = "11",
    day = "1",
    doi = "10.1145/2382448.2382450",
    language = "English (US)",
    volume = "15",
    journal = "ACM Transactions on Information and System Security",
    issn = "1094-9224",
    publisher = "Association for Computing Machinery (ACM)",
    number = "3",

    }

    TY - JOUR

    T1 - Adversarial stylometry

    T2 - Circumventing authorship recognition to preserve privacy and anonymity

    AU - Brennan, Michael

    AU - Afroz, Sadia

    AU - Greenstadt, Rachel

    PY - 2012/11/1

    Y1 - 2012/11/1

    N2 - The use of stylometry, authorship recognition through purely linguistic means, has contributed to literary, historical, and criminal investigation breakthroughs. Existing stylometry research assumes that authors have not attempted to disguise their linguistic writing style. We challenge this basic assumption of existing stylometry methodologies and present a new area of research: adversarial stylometry. Adversaries have a devastating effect on the robustness of existing classification methods. Our work presents a framework for creating adversarial passages including obfuscation, where a subject attempts to hide her identity, and imitation, where a subject attempts to frame another subject by imitating his writing style, and translation where original passages are obfuscated with machine translation services. This research demonstrates that manual circumvention methods work very well while automated translation methods are not effective. The obfuscation method reduces the techniques' effectiveness to the level of random guessing and the imitation attempts succeed up to 67% of the time depending on the stylometry technique used. These results are more significant given the fact that experimental subjects were unfamiliar with stylometry, were not professional writers, and spent little time on the attacks. This article also contributes to the field by using human subjects to empirically validate the claim of high accuracy for four current techniques (without adversaries).We have also compiled and released two corpora of adversarial stylometry texts to promote research in this field with a total of 57 unique authors. We argue that this field is important to a multidisciplinary approach to privacy, security, and anonymity.

    AB - The use of stylometry, authorship recognition through purely linguistic means, has contributed to literary, historical, and criminal investigation breakthroughs. Existing stylometry research assumes that authors have not attempted to disguise their linguistic writing style. We challenge this basic assumption of existing stylometry methodologies and present a new area of research: adversarial stylometry. Adversaries have a devastating effect on the robustness of existing classification methods. Our work presents a framework for creating adversarial passages including obfuscation, where a subject attempts to hide her identity, and imitation, where a subject attempts to frame another subject by imitating his writing style, and translation where original passages are obfuscated with machine translation services. This research demonstrates that manual circumvention methods work very well while automated translation methods are not effective. The obfuscation method reduces the techniques' effectiveness to the level of random guessing and the imitation attempts succeed up to 67% of the time depending on the stylometry technique used. These results are more significant given the fact that experimental subjects were unfamiliar with stylometry, were not professional writers, and spent little time on the attacks. This article also contributes to the field by using human subjects to empirically validate the claim of high accuracy for four current techniques (without adversaries).We have also compiled and released two corpora of adversarial stylometry texts to promote research in this field with a total of 57 unique authors. We argue that this field is important to a multidisciplinary approach to privacy, security, and anonymity.

    KW - Algorithms

    KW - Experimentation Additional

    UR - http://www.scopus.com/inward/record.url?scp=84872026771&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84872026771&partnerID=8YFLogxK

    U2 - 10.1145/2382448.2382450

    DO - 10.1145/2382448.2382450

    M3 - Article

    AN - SCOPUS:84872026771

    VL - 15

    JO - ACM Transactions on Information and System Security

    JF - ACM Transactions on Information and System Security

    SN - 1094-9224

    IS - 3

    M1 - 12

    ER -