Poster: Git blame who?: Stylistic authorship attribution of small, incomplete source code fragments

Edwin Dauber, Aylin Caliskan, Richard Harang, Rachel Greenstadt

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Program authorship attribution has implications for the privacy of programmers who wish to contribute code anonymously. While previous work has shown that complete files that are individually authored can be attributed, these efforts have focused on ideal data sets such as the Google Code Jam data. We explore the problem of attribution "in the wild," examining source code obtained from open source version control systems, and investigate if and how such contributions can be attributed to their authors, either individually or on a per-Account basis. In this work we show that accounts belonging to open source contributors containing short, incomplete, and typically uncompilable fragments can be effectively attributed.

    Original languageEnglish (US)
    Title of host publicationProceedings - International Conference on Software Engineering
    PublisherIEEE Computer Society
    Pages356-357
    Number of pages2
    ISBN (Electronic)9781450356633
    DOIs
    StatePublished - May 27 2018
    Event40th ACM/IEEE International Conference on Software Engineering, ICSE 2018 - Gothenburg, Sweden
    Duration: May 27 2018Jun 3 2018

    Publication series

    NameProceedings - International Conference on Software Engineering
    ISSN (Print)0270-5257

    Conference

    Conference40th ACM/IEEE International Conference on Software Engineering, ICSE 2018
    CountrySweden
    CityGothenburg
    Period5/27/186/3/18

    Fingerprint

    Control systems

    Keywords

    • Machine learning
    • Source code authorship attribution
    • Stylometry

    ASJC Scopus subject areas

    • Software

    Cite this

    Dauber, E., Caliskan, A., Harang, R., & Greenstadt, R. (2018). Poster: Git blame who?: Stylistic authorship attribution of small, incomplete source code fragments. In Proceedings - International Conference on Software Engineering (pp. 356-357). (Proceedings - International Conference on Software Engineering). IEEE Computer Society. https://doi.org/10.1145/3183440.3195007

    Poster : Git blame who?: Stylistic authorship attribution of small, incomplete source code fragments. / Dauber, Edwin; Caliskan, Aylin; Harang, Richard; Greenstadt, Rachel.

    Proceedings - International Conference on Software Engineering. IEEE Computer Society, 2018. p. 356-357 (Proceedings - International Conference on Software Engineering).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Dauber, E, Caliskan, A, Harang, R & Greenstadt, R 2018, Poster: Git blame who?: Stylistic authorship attribution of small, incomplete source code fragments. in Proceedings - International Conference on Software Engineering. Proceedings - International Conference on Software Engineering, IEEE Computer Society, pp. 356-357, 40th ACM/IEEE International Conference on Software Engineering, ICSE 2018, Gothenburg, Sweden, 5/27/18. https://doi.org/10.1145/3183440.3195007
    Dauber E, Caliskan A, Harang R, Greenstadt R. Poster: Git blame who?: Stylistic authorship attribution of small, incomplete source code fragments. In Proceedings - International Conference on Software Engineering. IEEE Computer Society. 2018. p. 356-357. (Proceedings - International Conference on Software Engineering). https://doi.org/10.1145/3183440.3195007
    Dauber, Edwin ; Caliskan, Aylin ; Harang, Richard ; Greenstadt, Rachel. / Poster : Git blame who?: Stylistic authorship attribution of small, incomplete source code fragments. Proceedings - International Conference on Software Engineering. IEEE Computer Society, 2018. pp. 356-357 (Proceedings - International Conference on Software Engineering).
    @inproceedings{5a8aaec24be34b5bac6769c8e105692e,
    title = "Poster: Git blame who?: Stylistic authorship attribution of small, incomplete source code fragments",
    abstract = "Program authorship attribution has implications for the privacy of programmers who wish to contribute code anonymously. While previous work has shown that complete files that are individually authored can be attributed, these efforts have focused on ideal data sets such as the Google Code Jam data. We explore the problem of attribution {"}in the wild,{"} examining source code obtained from open source version control systems, and investigate if and how such contributions can be attributed to their authors, either individually or on a per-Account basis. In this work we show that accounts belonging to open source contributors containing short, incomplete, and typically uncompilable fragments can be effectively attributed.",
    keywords = "Machine learning, Source code authorship attribution, Stylometry",
    author = "Edwin Dauber and Aylin Caliskan and Richard Harang and Rachel Greenstadt",
    year = "2018",
    month = "5",
    day = "27",
    doi = "10.1145/3183440.3195007",
    language = "English (US)",
    series = "Proceedings - International Conference on Software Engineering",
    publisher = "IEEE Computer Society",
    pages = "356--357",
    booktitle = "Proceedings - International Conference on Software Engineering",

    }

    TY - GEN

    T1 - Poster

    T2 - Git blame who?: Stylistic authorship attribution of small, incomplete source code fragments

    AU - Dauber, Edwin

    AU - Caliskan, Aylin

    AU - Harang, Richard

    AU - Greenstadt, Rachel

    PY - 2018/5/27

    Y1 - 2018/5/27

    N2 - Program authorship attribution has implications for the privacy of programmers who wish to contribute code anonymously. While previous work has shown that complete files that are individually authored can be attributed, these efforts have focused on ideal data sets such as the Google Code Jam data. We explore the problem of attribution "in the wild," examining source code obtained from open source version control systems, and investigate if and how such contributions can be attributed to their authors, either individually or on a per-Account basis. In this work we show that accounts belonging to open source contributors containing short, incomplete, and typically uncompilable fragments can be effectively attributed.

    AB - Program authorship attribution has implications for the privacy of programmers who wish to contribute code anonymously. While previous work has shown that complete files that are individually authored can be attributed, these efforts have focused on ideal data sets such as the Google Code Jam data. We explore the problem of attribution "in the wild," examining source code obtained from open source version control systems, and investigate if and how such contributions can be attributed to their authors, either individually or on a per-Account basis. In this work we show that accounts belonging to open source contributors containing short, incomplete, and typically uncompilable fragments can be effectively attributed.

    KW - Machine learning

    KW - Source code authorship attribution

    KW - Stylometry

    UR - http://www.scopus.com/inward/record.url?scp=85049678445&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85049678445&partnerID=8YFLogxK

    U2 - 10.1145/3183440.3195007

    DO - 10.1145/3183440.3195007

    M3 - Conference contribution

    T3 - Proceedings - International Conference on Software Engineering

    SP - 356

    EP - 357

    BT - Proceedings - International Conference on Software Engineering

    PB - IEEE Computer Society

    ER -