What’s in a name

A study of names, gender inference, and gender behavior in facebook

Cong Tang, Keith Ross, Nitesh Saxena, Ruichuan Chen

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    In this paper, by crawling Facebook public profile pages of a large and diverse user population in New York City, we create a comprehensive and contemporary first name list, in which each name is annotated with a popularity estimate and a gender probability. First, we use the name list as part of a novel and powerful technique for inferring Facebook users’ gender. Our name-centric approach to gender prediction partitions the users into two groups, A and B, and is able to accurately predict genders for users belonging to A. Applying our methodology to NYC users in Facebook, we are able to achieve an accuracy of 95.2% for group A consisting of 95.1% of the NYC users. This is a significant improvement over recent results of gender prediction [14], which achieved a maximum accuracy of 77.2% based on users’ group affiliations. Second, having inferred the gender of most users in our Facebook dataset, we learn several interesting gender characteristics and analyze how males and females behave in Facebook. We find, for example, that females and males exhibit contrasting behaviors while hiding their attributes, such as gender, age, and sexual preference, and that females are more conscious about their online privacy on Facebook.

    Original languageEnglish (US)
    Title of host publicationDatabase Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops
    Subtitle of host publicationGDB, SIM3, FlashDB, SNSMW, DaMEN, DQIS, Proceedings
    PublisherSpringer Verlag
    Pages344-356
    Number of pages13
    Volume6637 LNCS
    ISBN (Print)9783642202438
    StatePublished - 2011
    Event16th International Conference on Database Systems for Advanced Applications, DASFAA 2011 - Hong Kong, China
    Duration: Apr 22 2011Apr 25 2011

    Publication series

    NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
    Volume6637 LNCS
    ISSN (Print)0302-9743
    ISSN (Electronic)1611-3349

    Other

    Other16th International Conference on Database Systems for Advanced Applications, DASFAA 2011
    CountryChina
    CityHong Kong
    Period4/22/114/25/11

    Fingerprint

    Gender
    Prediction
    Privacy
    Attribute
    Partition
    Predict
    Methodology
    Estimate
    Profile

    ASJC Scopus subject areas

    • Theoretical Computer Science
    • Computer Science(all)

    Cite this

    Tang, C., Ross, K., Saxena, N., & Chen, R. (2011). What’s in a name: A study of names, gender inference, and gender behavior in facebook. In Database Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops: GDB, SIM3, FlashDB, SNSMW, DaMEN, DQIS, Proceedings (Vol. 6637 LNCS, pp. 344-356). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6637 LNCS). Springer Verlag.

    What’s in a name : A study of names, gender inference, and gender behavior in facebook. / Tang, Cong; Ross, Keith; Saxena, Nitesh; Chen, Ruichuan.

    Database Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops: GDB, SIM3, FlashDB, SNSMW, DaMEN, DQIS, Proceedings. Vol. 6637 LNCS Springer Verlag, 2011. p. 344-356 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 6637 LNCS).

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Tang, C, Ross, K, Saxena, N & Chen, R 2011, What’s in a name: A study of names, gender inference, and gender behavior in facebook. in Database Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops: GDB, SIM3, FlashDB, SNSMW, DaMEN, DQIS, Proceedings. vol. 6637 LNCS, Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 6637 LNCS, Springer Verlag, pp. 344-356, 16th International Conference on Database Systems for Advanced Applications, DASFAA 2011, Hong Kong, China, 4/22/11.
    Tang C, Ross K, Saxena N, Chen R. What’s in a name: A study of names, gender inference, and gender behavior in facebook. In Database Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops: GDB, SIM3, FlashDB, SNSMW, DaMEN, DQIS, Proceedings. Vol. 6637 LNCS. Springer Verlag. 2011. p. 344-356. (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    Tang, Cong ; Ross, Keith ; Saxena, Nitesh ; Chen, Ruichuan. / What’s in a name : A study of names, gender inference, and gender behavior in facebook. Database Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops: GDB, SIM3, FlashDB, SNSMW, DaMEN, DQIS, Proceedings. Vol. 6637 LNCS Springer Verlag, 2011. pp. 344-356 (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)).
    @inproceedings{34a8630653fe4376a687949c8a1b6087,
    title = "What’s in a name: A study of names, gender inference, and gender behavior in facebook",
    abstract = "In this paper, by crawling Facebook public profile pages of a large and diverse user population in New York City, we create a comprehensive and contemporary first name list, in which each name is annotated with a popularity estimate and a gender probability. First, we use the name list as part of a novel and powerful technique for inferring Facebook users’ gender. Our name-centric approach to gender prediction partitions the users into two groups, A and B, and is able to accurately predict genders for users belonging to A. Applying our methodology to NYC users in Facebook, we are able to achieve an accuracy of 95.2{\%} for group A consisting of 95.1{\%} of the NYC users. This is a significant improvement over recent results of gender prediction [14], which achieved a maximum accuracy of 77.2{\%} based on users’ group affiliations. Second, having inferred the gender of most users in our Facebook dataset, we learn several interesting gender characteristics and analyze how males and females behave in Facebook. We find, for example, that females and males exhibit contrasting behaviors while hiding their attributes, such as gender, age, and sexual preference, and that females are more conscious about their online privacy on Facebook.",
    author = "Cong Tang and Keith Ross and Nitesh Saxena and Ruichuan Chen",
    year = "2011",
    language = "English (US)",
    isbn = "9783642202438",
    volume = "6637 LNCS",
    series = "Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)",
    publisher = "Springer Verlag",
    pages = "344--356",
    booktitle = "Database Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops",
    address = "Germany",

    }

    TY - GEN

    T1 - What’s in a name

    T2 - A study of names, gender inference, and gender behavior in facebook

    AU - Tang, Cong

    AU - Ross, Keith

    AU - Saxena, Nitesh

    AU - Chen, Ruichuan

    PY - 2011

    Y1 - 2011

    N2 - In this paper, by crawling Facebook public profile pages of a large and diverse user population in New York City, we create a comprehensive and contemporary first name list, in which each name is annotated with a popularity estimate and a gender probability. First, we use the name list as part of a novel and powerful technique for inferring Facebook users’ gender. Our name-centric approach to gender prediction partitions the users into two groups, A and B, and is able to accurately predict genders for users belonging to A. Applying our methodology to NYC users in Facebook, we are able to achieve an accuracy of 95.2% for group A consisting of 95.1% of the NYC users. This is a significant improvement over recent results of gender prediction [14], which achieved a maximum accuracy of 77.2% based on users’ group affiliations. Second, having inferred the gender of most users in our Facebook dataset, we learn several interesting gender characteristics and analyze how males and females behave in Facebook. We find, for example, that females and males exhibit contrasting behaviors while hiding their attributes, such as gender, age, and sexual preference, and that females are more conscious about their online privacy on Facebook.

    AB - In this paper, by crawling Facebook public profile pages of a large and diverse user population in New York City, we create a comprehensive and contemporary first name list, in which each name is annotated with a popularity estimate and a gender probability. First, we use the name list as part of a novel and powerful technique for inferring Facebook users’ gender. Our name-centric approach to gender prediction partitions the users into two groups, A and B, and is able to accurately predict genders for users belonging to A. Applying our methodology to NYC users in Facebook, we are able to achieve an accuracy of 95.2% for group A consisting of 95.1% of the NYC users. This is a significant improvement over recent results of gender prediction [14], which achieved a maximum accuracy of 77.2% based on users’ group affiliations. Second, having inferred the gender of most users in our Facebook dataset, we learn several interesting gender characteristics and analyze how males and females behave in Facebook. We find, for example, that females and males exhibit contrasting behaviors while hiding their attributes, such as gender, age, and sexual preference, and that females are more conscious about their online privacy on Facebook.

    UR - http://www.scopus.com/inward/record.url?scp=85012302520&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=85012302520&partnerID=8YFLogxK

    M3 - Conference contribution

    SN - 9783642202438

    VL - 6637 LNCS

    T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    SP - 344

    EP - 356

    BT - Database Systems for Adanced Applications - 16th International Conference, DASFAA 2011, International Workshops

    PB - Springer Verlag

    ER -