Geographic web usage estimation by monitoring DNS caches

Hüseyin Akcan, Torsten Suel, Hervé Brönnimann

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    DNS is one of the most actively used distributed databases on earth, accessed by millions of people every day to transparently convert host names into IP addresses and vice versa. In order to improve their performance, DNS servers also keep temporary records of all requested domain names in their cache. While most of the DNS servers are configured to be used by their local users only, there still exist many DNS servers that respond to public queries. Querying these DNS servers reveals the recently visited domains. Exploiting the geographically distributed nature of DNS, one can gather usage statistics ranging from a single DNS server to global scale. In particular, this enables collecting statistics about geographic differences in web browsing behavior between different regions of a country or the world. In this paper, we present methods to identify these public DNS servers, discuss how to effectively crawl them, and describe our algorithm to extract usage estimations from the crawl data. We also evaluate our estimation algorithm using extensive simulations, and finally use our algorithms to crawl 150 U.S. universities for various domains, and explore the effects of location and time on the access rate of these domains.

    Original languageEnglish (US)
    Title of host publicationLocWeb 2008 - Proceedings of the 1st International Workshop on Location and the Web, in Conjunction with the WWW 2008 Conference
    Pages85-92
    Number of pages8
    Volume300
    DOIs
    StatePublished - 2008
    Event1st International Workshop on Location and the Web, LocWeb 2008, in Conjunction with the WWW 2008 Conference - Beijing, China
    Duration: Apr 22 2008Apr 22 2008

    Other

    Other1st International Workshop on Location and the Web, LocWeb 2008, in Conjunction with the WWW 2008 Conference
    CountryChina
    CityBeijing
    Period4/22/084/22/08

    Fingerprint

    Servers
    Monitoring
    Statistics
    Earth (planet)

    Keywords

    • DNS
    • web access monitoring
    • web site usage estimation

    ASJC Scopus subject areas

    • Human-Computer Interaction
    • Computer Networks and Communications
    • Computer Vision and Pattern Recognition
    • Software

    Cite this

    Akcan, H., Suel, T., & Brönnimann, H. (2008). Geographic web usage estimation by monitoring DNS caches. In LocWeb 2008 - Proceedings of the 1st International Workshop on Location and the Web, in Conjunction with the WWW 2008 Conference (Vol. 300, pp. 85-92) https://doi.org/10.1145/1367798.1367813

    Geographic web usage estimation by monitoring DNS caches. / Akcan, Hüseyin; Suel, Torsten; Brönnimann, Hervé.

    LocWeb 2008 - Proceedings of the 1st International Workshop on Location and the Web, in Conjunction with the WWW 2008 Conference. Vol. 300 2008. p. 85-92.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Akcan, H, Suel, T & Brönnimann, H 2008, Geographic web usage estimation by monitoring DNS caches. in LocWeb 2008 - Proceedings of the 1st International Workshop on Location and the Web, in Conjunction with the WWW 2008 Conference. vol. 300, pp. 85-92, 1st International Workshop on Location and the Web, LocWeb 2008, in Conjunction with the WWW 2008 Conference, Beijing, China, 4/22/08. https://doi.org/10.1145/1367798.1367813
    Akcan H, Suel T, Brönnimann H. Geographic web usage estimation by monitoring DNS caches. In LocWeb 2008 - Proceedings of the 1st International Workshop on Location and the Web, in Conjunction with the WWW 2008 Conference. Vol. 300. 2008. p. 85-92 https://doi.org/10.1145/1367798.1367813
    Akcan, Hüseyin ; Suel, Torsten ; Brönnimann, Hervé. / Geographic web usage estimation by monitoring DNS caches. LocWeb 2008 - Proceedings of the 1st International Workshop on Location and the Web, in Conjunction with the WWW 2008 Conference. Vol. 300 2008. pp. 85-92
    @inproceedings{88d743e7e8134f35ac293dcc15e69fa8,
    title = "Geographic web usage estimation by monitoring DNS caches",
    abstract = "DNS is one of the most actively used distributed databases on earth, accessed by millions of people every day to transparently convert host names into IP addresses and vice versa. In order to improve their performance, DNS servers also keep temporary records of all requested domain names in their cache. While most of the DNS servers are configured to be used by their local users only, there still exist many DNS servers that respond to public queries. Querying these DNS servers reveals the recently visited domains. Exploiting the geographically distributed nature of DNS, one can gather usage statistics ranging from a single DNS server to global scale. In particular, this enables collecting statistics about geographic differences in web browsing behavior between different regions of a country or the world. In this paper, we present methods to identify these public DNS servers, discuss how to effectively crawl them, and describe our algorithm to extract usage estimations from the crawl data. We also evaluate our estimation algorithm using extensive simulations, and finally use our algorithms to crawl 150 U.S. universities for various domains, and explore the effects of location and time on the access rate of these domains.",
    keywords = "DNS, web access monitoring, web site usage estimation",
    author = "H{\"u}seyin Akcan and Torsten Suel and Herv{\'e} Br{\"o}nnimann",
    year = "2008",
    doi = "10.1145/1367798.1367813",
    language = "English (US)",
    isbn = "9781605581606",
    volume = "300",
    pages = "85--92",
    booktitle = "LocWeb 2008 - Proceedings of the 1st International Workshop on Location and the Web, in Conjunction with the WWW 2008 Conference",

    }

    TY - GEN

    T1 - Geographic web usage estimation by monitoring DNS caches

    AU - Akcan, Hüseyin

    AU - Suel, Torsten

    AU - Brönnimann, Hervé

    PY - 2008

    Y1 - 2008

    N2 - DNS is one of the most actively used distributed databases on earth, accessed by millions of people every day to transparently convert host names into IP addresses and vice versa. In order to improve their performance, DNS servers also keep temporary records of all requested domain names in their cache. While most of the DNS servers are configured to be used by their local users only, there still exist many DNS servers that respond to public queries. Querying these DNS servers reveals the recently visited domains. Exploiting the geographically distributed nature of DNS, one can gather usage statistics ranging from a single DNS server to global scale. In particular, this enables collecting statistics about geographic differences in web browsing behavior between different regions of a country or the world. In this paper, we present methods to identify these public DNS servers, discuss how to effectively crawl them, and describe our algorithm to extract usage estimations from the crawl data. We also evaluate our estimation algorithm using extensive simulations, and finally use our algorithms to crawl 150 U.S. universities for various domains, and explore the effects of location and time on the access rate of these domains.

    AB - DNS is one of the most actively used distributed databases on earth, accessed by millions of people every day to transparently convert host names into IP addresses and vice versa. In order to improve their performance, DNS servers also keep temporary records of all requested domain names in their cache. While most of the DNS servers are configured to be used by their local users only, there still exist many DNS servers that respond to public queries. Querying these DNS servers reveals the recently visited domains. Exploiting the geographically distributed nature of DNS, one can gather usage statistics ranging from a single DNS server to global scale. In particular, this enables collecting statistics about geographic differences in web browsing behavior between different regions of a country or the world. In this paper, we present methods to identify these public DNS servers, discuss how to effectively crawl them, and describe our algorithm to extract usage estimations from the crawl data. We also evaluate our estimation algorithm using extensive simulations, and finally use our algorithms to crawl 150 U.S. universities for various domains, and explore the effects of location and time on the access rate of these domains.

    KW - DNS

    KW - web access monitoring

    KW - web site usage estimation

    UR - http://www.scopus.com/inward/record.url?scp=77954429438&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=77954429438&partnerID=8YFLogxK

    U2 - 10.1145/1367798.1367813

    DO - 10.1145/1367798.1367813

    M3 - Conference contribution

    SN - 9781605581606

    VL - 300

    SP - 85

    EP - 92

    BT - LocWeb 2008 - Proceedings of the 1st International Workshop on Location and the Web, in Conjunction with the WWW 2008 Conference

    ER -