Privacy detective: Detecting private information and collective privacy behavior in a large social network

Aylin Caliskan-Islam, Jonathan Walsh, Rachel Greenstadt

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Detecting the presence and amount of private information being shared in online media is the first step towards analyzing information revealing habits of users in social networks and a useful method for researchers to study aggregate privacy behavior. In this work, we aim to find out if text contains private content by using our novel learning based approach 'privacy detective' that combines topic modeling, named entity recognition, privacy ontology, sentiment analysis, and text normalization to represent privacy features. Privacy detective investigates a broader range of privacy concerns compared to previous approaches that focus on keyword searching or profile related properties. We collected 500,000 tweets from 100,000 Twitter users along with other information such as tweet linkages and follower relationships. We reach 95.45% accuracy in a two- class task classifying Twitter users who do not reveal much private information and Twitter users who share sensitive information. We score timelines according to three privacy levels after having Amazon Mechanical Turk (AMT) workers annotate collected tweets according to privacy categories. Supervised machine learning classification results on these annotations reach 69.63% accuracy on a three-class task. Inter-annotator agreement on timeline privacy scores between various AMT workers and our classifiers fall under the same positive agreement level. Additionally, we show that a user's privacy level is correlated with her friends' privacy scores and also with the privacy scores of people mentioned in her text but not with the number of her followers. As such, privacy in social networks appear to be socially constructed, which can have great implications for privacy enhancing technologies and educational interventions.

    Original languageEnglish (US)
    Title of host publicationProceedings of the ACM Conference on Computer and Communications Security
    PublisherAssociation for Computing Machinery
    Pages35-46
    Number of pages12
    ISBN (Electronic)9781450331487
    DOIs
    StatePublished - Nov 3 2014
    Event13th Workshop on Privacy in the Electronic Society, WPES 2014, in Conjunction with the ACM Conference on Computer and Communications Security, ACM CCS 2014 - Scottsdale, United States
    Duration: Nov 3 2014 → …

    Publication series

    NameProceedings of the ACM Conference on Computer and Communications Security
    ISSN (Print)1543-7221

    Conference

    Conference13th Workshop on Privacy in the Electronic Society, WPES 2014, in Conjunction with the ACM Conference on Computer and Communications Security, ACM CCS 2014
    CountryUnited States
    CityScottsdale
    Period11/3/14 → …

    Fingerprint

    Ontology
    Learning systems
    Classifiers

    Keywords

    • Detecting private information
    • Privacy
    • Privacy behavior
    • Sensitive information
    • Social network
    • Text classification

    ASJC Scopus subject areas

    • Software
    • Computer Networks and Communications

    Cite this

    Caliskan-Islam, A., Walsh, J., & Greenstadt, R. (2014). Privacy detective: Detecting private information and collective privacy behavior in a large social network. In Proceedings of the ACM Conference on Computer and Communications Security (pp. 35-46). (Proceedings of the ACM Conference on Computer and Communications Security). Association for Computing Machinery. https://doi.org/10.1145/2665943.2665958

    Privacy detective : Detecting private information and collective privacy behavior in a large social network. / Caliskan-Islam, Aylin; Walsh, Jonathan; Greenstadt, Rachel.

    Proceedings of the ACM Conference on Computer and Communications Security. Association for Computing Machinery, 2014. p. 35-46 (Proceedings of the ACM Conference on Computer and Communications Security).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Caliskan-Islam, A, Walsh, J & Greenstadt, R 2014, Privacy detective: Detecting private information and collective privacy behavior in a large social network. in Proceedings of the ACM Conference on Computer and Communications Security. Proceedings of the ACM Conference on Computer and Communications Security, Association for Computing Machinery, pp. 35-46, 13th Workshop on Privacy in the Electronic Society, WPES 2014, in Conjunction with the ACM Conference on Computer and Communications Security, ACM CCS 2014, Scottsdale, United States, 11/3/14. https://doi.org/10.1145/2665943.2665958
    Caliskan-Islam A, Walsh J, Greenstadt R. Privacy detective: Detecting private information and collective privacy behavior in a large social network. In Proceedings of the ACM Conference on Computer and Communications Security. Association for Computing Machinery. 2014. p. 35-46. (Proceedings of the ACM Conference on Computer and Communications Security). https://doi.org/10.1145/2665943.2665958
    Caliskan-Islam, Aylin ; Walsh, Jonathan ; Greenstadt, Rachel. / Privacy detective : Detecting private information and collective privacy behavior in a large social network. Proceedings of the ACM Conference on Computer and Communications Security. Association for Computing Machinery, 2014. pp. 35-46 (Proceedings of the ACM Conference on Computer and Communications Security).
    @inproceedings{1c4c672daa7a4747b99ab29c7a9b2eee,
    title = "Privacy detective: Detecting private information and collective privacy behavior in a large social network",
    abstract = "Detecting the presence and amount of private information being shared in online media is the first step towards analyzing information revealing habits of users in social networks and a useful method for researchers to study aggregate privacy behavior. In this work, we aim to find out if text contains private content by using our novel learning based approach 'privacy detective' that combines topic modeling, named entity recognition, privacy ontology, sentiment analysis, and text normalization to represent privacy features. Privacy detective investigates a broader range of privacy concerns compared to previous approaches that focus on keyword searching or profile related properties. We collected 500,000 tweets from 100,000 Twitter users along with other information such as tweet linkages and follower relationships. We reach 95.45{\%} accuracy in a two- class task classifying Twitter users who do not reveal much private information and Twitter users who share sensitive information. We score timelines according to three privacy levels after having Amazon Mechanical Turk (AMT) workers annotate collected tweets according to privacy categories. Supervised machine learning classification results on these annotations reach 69.63{\%} accuracy on a three-class task. Inter-annotator agreement on timeline privacy scores between various AMT workers and our classifiers fall under the same positive agreement level. Additionally, we show that a user's privacy level is correlated with her friends' privacy scores and also with the privacy scores of people mentioned in her text but not with the number of her followers. As such, privacy in social networks appear to be socially constructed, which can have great implications for privacy enhancing technologies and educational interventions.",
    keywords = "Detecting private information, Privacy, Privacy behavior, Sensitive information, Social network, Text classification",
    author = "Aylin Caliskan-Islam and Jonathan Walsh and Rachel Greenstadt",
    year = "2014",
    month = "11",
    day = "3",
    doi = "10.1145/2665943.2665958",
    language = "English (US)",
    series = "Proceedings of the ACM Conference on Computer and Communications Security",
    publisher = "Association for Computing Machinery",
    pages = "35--46",
    booktitle = "Proceedings of the ACM Conference on Computer and Communications Security",

    }

    TY - GEN

    T1 - Privacy detective

    T2 - Detecting private information and collective privacy behavior in a large social network

    AU - Caliskan-Islam, Aylin

    AU - Walsh, Jonathan

    AU - Greenstadt, Rachel

    PY - 2014/11/3

    Y1 - 2014/11/3

    N2 - Detecting the presence and amount of private information being shared in online media is the first step towards analyzing information revealing habits of users in social networks and a useful method for researchers to study aggregate privacy behavior. In this work, we aim to find out if text contains private content by using our novel learning based approach 'privacy detective' that combines topic modeling, named entity recognition, privacy ontology, sentiment analysis, and text normalization to represent privacy features. Privacy detective investigates a broader range of privacy concerns compared to previous approaches that focus on keyword searching or profile related properties. We collected 500,000 tweets from 100,000 Twitter users along with other information such as tweet linkages and follower relationships. We reach 95.45% accuracy in a two- class task classifying Twitter users who do not reveal much private information and Twitter users who share sensitive information. We score timelines according to three privacy levels after having Amazon Mechanical Turk (AMT) workers annotate collected tweets according to privacy categories. Supervised machine learning classification results on these annotations reach 69.63% accuracy on a three-class task. Inter-annotator agreement on timeline privacy scores between various AMT workers and our classifiers fall under the same positive agreement level. Additionally, we show that a user's privacy level is correlated with her friends' privacy scores and also with the privacy scores of people mentioned in her text but not with the number of her followers. As such, privacy in social networks appear to be socially constructed, which can have great implications for privacy enhancing technologies and educational interventions.

    AB - Detecting the presence and amount of private information being shared in online media is the first step towards analyzing information revealing habits of users in social networks and a useful method for researchers to study aggregate privacy behavior. In this work, we aim to find out if text contains private content by using our novel learning based approach 'privacy detective' that combines topic modeling, named entity recognition, privacy ontology, sentiment analysis, and text normalization to represent privacy features. Privacy detective investigates a broader range of privacy concerns compared to previous approaches that focus on keyword searching or profile related properties. We collected 500,000 tweets from 100,000 Twitter users along with other information such as tweet linkages and follower relationships. We reach 95.45% accuracy in a two- class task classifying Twitter users who do not reveal much private information and Twitter users who share sensitive information. We score timelines according to three privacy levels after having Amazon Mechanical Turk (AMT) workers annotate collected tweets according to privacy categories. Supervised machine learning classification results on these annotations reach 69.63% accuracy on a three-class task. Inter-annotator agreement on timeline privacy scores between various AMT workers and our classifiers fall under the same positive agreement level. Additionally, we show that a user's privacy level is correlated with her friends' privacy scores and also with the privacy scores of people mentioned in her text but not with the number of her followers. As such, privacy in social networks appear to be socially constructed, which can have great implications for privacy enhancing technologies and educational interventions.

    KW - Detecting private information

    KW - Privacy

    KW - Privacy behavior

    KW - Sensitive information

    KW - Social network

    KW - Text classification

    UR - http://www.scopus.com/inward/record.url?scp=84910639684&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84910639684&partnerID=8YFLogxK

    U2 - 10.1145/2665943.2665958

    DO - 10.1145/2665943.2665958

    M3 - Conference contribution

    T3 - Proceedings of the ACM Conference on Computer and Communications Security

    SP - 35

    EP - 46

    BT - Proceedings of the ACM Conference on Computer and Communications Security

    PB - Association for Computing Machinery

    ER -