Improving web spam classifiers using link structure

Qingqing Gan, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, in-cluding both content spam [16, 12] and link spam [22, 13].However, any time an anti-spam technique is developed, spam-mers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Ma-chine learning-based classification methods can quickly adapt to newly developed spam techniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifer to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassifed result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.

    Original languageEnglish (US)
    Title of host publicationAIRWeb 2007 - Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web
    Pages17-20
    Number of pages4
    Volume215
    DOIs
    StatePublished - 2007
    EventAIRWeb 2007 - 3rd International Workshop on Adversarial Information Retrieval on the Web - Banff, AB, Canada
    Duration: May 8 2007May 8 2007

    Other

    OtherAIRWeb 2007 - 3rd International Workshop on Adversarial Information Retrieval on the Web
    CountryCanada
    CityBanff, AB
    Period5/8/075/8/07

    Fingerprint

    Search engines
    Classifiers
    Spamming
    Industry

    Keywords

    • Classification
    • Link analy-sis
    • Machine learning
    • Search engines
    • Web mining
    • Web spam detection

    ASJC Scopus subject areas

    • Human-Computer Interaction

    Cite this

    Gan, Q., & Suel, T. (2007). Improving web spam classifiers using link structure. In AIRWeb 2007 - Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web (Vol. 215, pp. 17-20) https://doi.org/10.1145/1244408.1244412

    Improving web spam classifiers using link structure. / Gan, Qingqing; Suel, Torsten.

    AIRWeb 2007 - Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web. Vol. 215 2007. p. 17-20.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Gan, Q & Suel, T 2007, Improving web spam classifiers using link structure. in AIRWeb 2007 - Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web. vol. 215, pp. 17-20, AIRWeb 2007 - 3rd International Workshop on Adversarial Information Retrieval on the Web, Banff, AB, Canada, 5/8/07. https://doi.org/10.1145/1244408.1244412
    Gan Q, Suel T. Improving web spam classifiers using link structure. In AIRWeb 2007 - Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web. Vol. 215. 2007. p. 17-20 https://doi.org/10.1145/1244408.1244412
    Gan, Qingqing ; Suel, Torsten. / Improving web spam classifiers using link structure. AIRWeb 2007 - Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web. Vol. 215 2007. pp. 17-20
    @inproceedings{3a32c384b9fb4ba6862a7774620ddff3,
    title = "Improving web spam classifiers using link structure",
    abstract = "Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, in-cluding both content spam [16, 12] and link spam [22, 13].However, any time an anti-spam technique is developed, spam-mers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Ma-chine learning-based classification methods can quickly adapt to newly developed spam techniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifer to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassifed result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.",
    keywords = "Classification, Link analy-sis, Machine learning, Search engines, Web mining, Web spam detection",
    author = "Qingqing Gan and Torsten Suel",
    year = "2007",
    doi = "10.1145/1244408.1244412",
    language = "English (US)",
    isbn = "1595937323",
    volume = "215",
    pages = "17--20",
    booktitle = "AIRWeb 2007 - Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web",

    }

    TY - GEN

    T1 - Improving web spam classifiers using link structure

    AU - Gan, Qingqing

    AU - Suel, Torsten

    PY - 2007

    Y1 - 2007

    N2 - Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, in-cluding both content spam [16, 12] and link spam [22, 13].However, any time an anti-spam technique is developed, spam-mers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Ma-chine learning-based classification methods can quickly adapt to newly developed spam techniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifer to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassifed result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.

    AB - Web spam has been recognized as one of the top challenges in the search engine industry [14]. A lot of recent work has addressed the problem of detecting or demoting web spam, in-cluding both content spam [16, 12] and link spam [22, 13].However, any time an anti-spam technique is developed, spam-mers will design new spamming techniques to confuse search engine ranking methods and spam detection mechanisms. Ma-chine learning-based classification methods can quickly adapt to newly developed spam techniques. We describe a two-stage approach to improve the performance of common classifiers. We first implement a classifer to catch a large portion of spam in our data. Then we design several heuristics to decide if a node should be relabeled based on the preclassifed result and knowledge about the neighborhood. Our experimental results show visible improvements with respect to precision and recall.

    KW - Classification

    KW - Link analy-sis

    KW - Machine learning

    KW - Search engines

    KW - Web mining

    KW - Web spam detection

    UR - http://www.scopus.com/inward/record.url?scp=35549007196&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=35549007196&partnerID=8YFLogxK

    U2 - 10.1145/1244408.1244412

    DO - 10.1145/1244408.1244412

    M3 - Conference contribution

    SN - 1595937323

    SN - 9781595937322

    VL - 215

    SP - 17

    EP - 20

    BT - AIRWeb 2007 - Proceedings of the 3rd International Workshop on Adversarial Information Retrieval on the Web

    ER -