Translate once, translate twice, translate thrice and attribute: Identifying authors and machine translation tools in translated text

Aylin Caliskan, Rachel Greenstadt

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    In this paper, we investigate the effects of machine translation tools on translated texts and the accuracy of authorship and translator attribution of translated texts. We show that the more translation performed on a text by a specific machine translation tool, the more effects unique to that translator are observed. We also propose a novel method to perform machine translator and authorship attribution of translated texts using a feature set that led to 91.13% and 91.54% accuracy on average, respectively. We claim that the features leading to highest accuracy in translator attribution are translator-dependent features and that even though translator-effect-heavy features are present in translated text, we can still succeed in authorship attribution. These findings demonstrate that stylometric features of the original text are preserved at some level despite multiple consequent translations and the introduction of translator-dependent features. The main contribution of our work is the discovery of a feature set used to accurately perform both translator and authorship attribution on a corpus of diverse topics from the twenty-first century, which has been consequently translated multiple times using machine translation tools.

    Original languageEnglish (US)
    Title of host publicationProceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012
    Pages121-125
    Number of pages5
    DOIs
    StatePublished - Dec 12 2012
    Event6th IEEE International Conference on Semantic Computing, ICSC 2012 - Palermo, Italy
    Duration: Sep 19 2012Sep 21 2012

    Publication series

    NameProceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012

    Conference

    Conference6th IEEE International Conference on Semantic Computing, ICSC 2012
    CountryItaly
    CityPalermo
    Period9/19/129/21/12

    Keywords

    • anonymity
    • authorship attribution
    • machine learning
    • machine translation
    • privacy

    ASJC Scopus subject areas

    • Software

    Cite this

    Caliskan, A., & Greenstadt, R. (2012). Translate once, translate twice, translate thrice and attribute: Identifying authors and machine translation tools in translated text. In Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012 (pp. 121-125). [6337093] (Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012). https://doi.org/10.1109/ICSC.2012.46

    Translate once, translate twice, translate thrice and attribute : Identifying authors and machine translation tools in translated text. / Caliskan, Aylin; Greenstadt, Rachel.

    Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012. 2012. p. 121-125 6337093 (Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Caliskan, A & Greenstadt, R 2012, Translate once, translate twice, translate thrice and attribute: Identifying authors and machine translation tools in translated text. in Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012., 6337093, Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012, pp. 121-125, 6th IEEE International Conference on Semantic Computing, ICSC 2012, Palermo, Italy, 9/19/12. https://doi.org/10.1109/ICSC.2012.46
    Caliskan A, Greenstadt R. Translate once, translate twice, translate thrice and attribute: Identifying authors and machine translation tools in translated text. In Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012. 2012. p. 121-125. 6337093. (Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012). https://doi.org/10.1109/ICSC.2012.46
    Caliskan, Aylin ; Greenstadt, Rachel. / Translate once, translate twice, translate thrice and attribute : Identifying authors and machine translation tools in translated text. Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012. 2012. pp. 121-125 (Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012).
    @inproceedings{cc0ee265778146c48cf78d33c373529d,
    title = "Translate once, translate twice, translate thrice and attribute: Identifying authors and machine translation tools in translated text",
    abstract = "In this paper, we investigate the effects of machine translation tools on translated texts and the accuracy of authorship and translator attribution of translated texts. We show that the more translation performed on a text by a specific machine translation tool, the more effects unique to that translator are observed. We also propose a novel method to perform machine translator and authorship attribution of translated texts using a feature set that led to 91.13{\%} and 91.54{\%} accuracy on average, respectively. We claim that the features leading to highest accuracy in translator attribution are translator-dependent features and that even though translator-effect-heavy features are present in translated text, we can still succeed in authorship attribution. These findings demonstrate that stylometric features of the original text are preserved at some level despite multiple consequent translations and the introduction of translator-dependent features. The main contribution of our work is the discovery of a feature set used to accurately perform both translator and authorship attribution on a corpus of diverse topics from the twenty-first century, which has been consequently translated multiple times using machine translation tools.",
    keywords = "anonymity, authorship attribution, machine learning, machine translation, privacy",
    author = "Aylin Caliskan and Rachel Greenstadt",
    year = "2012",
    month = "12",
    day = "12",
    doi = "10.1109/ICSC.2012.46",
    language = "English (US)",
    isbn = "9780769548593",
    series = "Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012",
    pages = "121--125",
    booktitle = "Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012",

    }

    TY - GEN

    T1 - Translate once, translate twice, translate thrice and attribute

    T2 - Identifying authors and machine translation tools in translated text

    AU - Caliskan, Aylin

    AU - Greenstadt, Rachel

    PY - 2012/12/12

    Y1 - 2012/12/12

    N2 - In this paper, we investigate the effects of machine translation tools on translated texts and the accuracy of authorship and translator attribution of translated texts. We show that the more translation performed on a text by a specific machine translation tool, the more effects unique to that translator are observed. We also propose a novel method to perform machine translator and authorship attribution of translated texts using a feature set that led to 91.13% and 91.54% accuracy on average, respectively. We claim that the features leading to highest accuracy in translator attribution are translator-dependent features and that even though translator-effect-heavy features are present in translated text, we can still succeed in authorship attribution. These findings demonstrate that stylometric features of the original text are preserved at some level despite multiple consequent translations and the introduction of translator-dependent features. The main contribution of our work is the discovery of a feature set used to accurately perform both translator and authorship attribution on a corpus of diverse topics from the twenty-first century, which has been consequently translated multiple times using machine translation tools.

    AB - In this paper, we investigate the effects of machine translation tools on translated texts and the accuracy of authorship and translator attribution of translated texts. We show that the more translation performed on a text by a specific machine translation tool, the more effects unique to that translator are observed. We also propose a novel method to perform machine translator and authorship attribution of translated texts using a feature set that led to 91.13% and 91.54% accuracy on average, respectively. We claim that the features leading to highest accuracy in translator attribution are translator-dependent features and that even though translator-effect-heavy features are present in translated text, we can still succeed in authorship attribution. These findings demonstrate that stylometric features of the original text are preserved at some level despite multiple consequent translations and the introduction of translator-dependent features. The main contribution of our work is the discovery of a feature set used to accurately perform both translator and authorship attribution on a corpus of diverse topics from the twenty-first century, which has been consequently translated multiple times using machine translation tools.

    KW - anonymity

    KW - authorship attribution

    KW - machine learning

    KW - machine translation

    KW - privacy

    UR - http://www.scopus.com/inward/record.url?scp=84870664825&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84870664825&partnerID=8YFLogxK

    U2 - 10.1109/ICSC.2012.46

    DO - 10.1109/ICSC.2012.46

    M3 - Conference contribution

    AN - SCOPUS:84870664825

    SN - 9780769548593

    T3 - Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012

    SP - 121

    EP - 125

    BT - Proceedings - IEEE 6th International Conference on Semantic Computing, ICSC 2012

    ER -