Local methods for estimating PageRank values

Yen Yu Chen, Qingqing Gan, Torsten Suel

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Abstract

    The Google search engine uses a method called PageRank, together with term-based and other ranking techniques, to order search results returned to the user. PageRank uses link analysis to assign a global importance score to each web page, The PageRank scores of all the pages are usually determined off-line in a large-scale computation on the entire hyperlink graph of the web, and several recent studies have focused on improving the efficiency of this computation, which may require multiple hours on a workstation. However, in some scenarios, such as online analysis of link evolution and mining of large web archives such as the Internet Archive, it may be desirable to quickly approximate or update the PageRanks of individual nodes without performing a large-scale computation on the entire graph. We address this problem by studying several methods for efficiently estimating the PageRank score of a particular web page using only a small subgraph of the entire web. In our model, we assume that the graph is accessible remotely via a link database (such as the AltaVista Connectivity Server) or is stored in a relational database that performs lookups on disks to retrieve node and connectivity information. We show that a reasonable estimate of the PageRank value of a node is possible in most cases by retrieving only a moderate number of nodes in the local neighborhood of the node.

    Original languageEnglish (US)
    Title of host publicationCIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management
    EditorsD.A. Evans, L. Gravano, O. Herzog, C. Zhai, M. Ronthaler
    Pages381-389
    Number of pages9
    StatePublished - 2004
    EventCIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management - Washington, DC, United States
    Duration: Nov 8 2004Nov 13 2004

    Other

    OtherCIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management
    CountryUnited States
    CityWashington, DC
    Period11/8/0411/13/04

    Fingerprint

    World Wide Web
    PageRank
    Node
    Graph
    Connectivity
    Search engine
    Scenarios
    Link analysis
    Google
    Ranking
    Relational database
    Data base

    Keywords

    • External memory algorithms
    • Link database
    • Link-based ranking
    • Out-of-core
    • Pagerank
    • Search engines

    ASJC Scopus subject areas

    • Business, Management and Accounting(all)

    Cite this

    Chen, Y. Y., Gan, Q., & Suel, T. (2004). Local methods for estimating PageRank values. In D. A. Evans, L. Gravano, O. Herzog, C. Zhai, & M. Ronthaler (Eds.), CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management (pp. 381-389)

    Local methods for estimating PageRank values. / Chen, Yen Yu; Gan, Qingqing; Suel, Torsten.

    CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management. ed. / D.A. Evans; L. Gravano; O. Herzog; C. Zhai; M. Ronthaler. 2004. p. 381-389.

    Research output: Chapter in Book/Report/Conference proceedingConference contribution

    Chen, YY, Gan, Q & Suel, T 2004, Local methods for estimating PageRank values. in DA Evans, L Gravano, O Herzog, C Zhai & M Ronthaler (eds), CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management. pp. 381-389, CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management, Washington, DC, United States, 11/8/04.
    Chen YY, Gan Q, Suel T. Local methods for estimating PageRank values. In Evans DA, Gravano L, Herzog O, Zhai C, Ronthaler M, editors, CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management. 2004. p. 381-389
    Chen, Yen Yu ; Gan, Qingqing ; Suel, Torsten. / Local methods for estimating PageRank values. CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management. editor / D.A. Evans ; L. Gravano ; O. Herzog ; C. Zhai ; M. Ronthaler. 2004. pp. 381-389
    @inproceedings{a5891f35a5704fcf8e0a8c28626ec879,
    title = "Local methods for estimating PageRank values",
    abstract = "The Google search engine uses a method called PageRank, together with term-based and other ranking techniques, to order search results returned to the user. PageRank uses link analysis to assign a global importance score to each web page, The PageRank scores of all the pages are usually determined off-line in a large-scale computation on the entire hyperlink graph of the web, and several recent studies have focused on improving the efficiency of this computation, which may require multiple hours on a workstation. However, in some scenarios, such as online analysis of link evolution and mining of large web archives such as the Internet Archive, it may be desirable to quickly approximate or update the PageRanks of individual nodes without performing a large-scale computation on the entire graph. We address this problem by studying several methods for efficiently estimating the PageRank score of a particular web page using only a small subgraph of the entire web. In our model, we assume that the graph is accessible remotely via a link database (such as the AltaVista Connectivity Server) or is stored in a relational database that performs lookups on disks to retrieve node and connectivity information. We show that a reasonable estimate of the PageRank value of a node is possible in most cases by retrieving only a moderate number of nodes in the local neighborhood of the node.",
    keywords = "External memory algorithms, Link database, Link-based ranking, Out-of-core, Pagerank, Search engines",
    author = "Chen, {Yen Yu} and Qingqing Gan and Torsten Suel",
    year = "2004",
    language = "English (US)",
    pages = "381--389",
    editor = "D.A. Evans and L. Gravano and O. Herzog and C. Zhai and M. Ronthaler",
    booktitle = "CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management",

    }

    TY - GEN

    T1 - Local methods for estimating PageRank values

    AU - Chen, Yen Yu

    AU - Gan, Qingqing

    AU - Suel, Torsten

    PY - 2004

    Y1 - 2004

    N2 - The Google search engine uses a method called PageRank, together with term-based and other ranking techniques, to order search results returned to the user. PageRank uses link analysis to assign a global importance score to each web page, The PageRank scores of all the pages are usually determined off-line in a large-scale computation on the entire hyperlink graph of the web, and several recent studies have focused on improving the efficiency of this computation, which may require multiple hours on a workstation. However, in some scenarios, such as online analysis of link evolution and mining of large web archives such as the Internet Archive, it may be desirable to quickly approximate or update the PageRanks of individual nodes without performing a large-scale computation on the entire graph. We address this problem by studying several methods for efficiently estimating the PageRank score of a particular web page using only a small subgraph of the entire web. In our model, we assume that the graph is accessible remotely via a link database (such as the AltaVista Connectivity Server) or is stored in a relational database that performs lookups on disks to retrieve node and connectivity information. We show that a reasonable estimate of the PageRank value of a node is possible in most cases by retrieving only a moderate number of nodes in the local neighborhood of the node.

    AB - The Google search engine uses a method called PageRank, together with term-based and other ranking techniques, to order search results returned to the user. PageRank uses link analysis to assign a global importance score to each web page, The PageRank scores of all the pages are usually determined off-line in a large-scale computation on the entire hyperlink graph of the web, and several recent studies have focused on improving the efficiency of this computation, which may require multiple hours on a workstation. However, in some scenarios, such as online analysis of link evolution and mining of large web archives such as the Internet Archive, it may be desirable to quickly approximate or update the PageRanks of individual nodes without performing a large-scale computation on the entire graph. We address this problem by studying several methods for efficiently estimating the PageRank score of a particular web page using only a small subgraph of the entire web. In our model, we assume that the graph is accessible remotely via a link database (such as the AltaVista Connectivity Server) or is stored in a relational database that performs lookups on disks to retrieve node and connectivity information. We show that a reasonable estimate of the PageRank value of a node is possible in most cases by retrieving only a moderate number of nodes in the local neighborhood of the node.

    KW - External memory algorithms

    KW - Link database

    KW - Link-based ranking

    KW - Out-of-core

    KW - Pagerank

    KW - Search engines

    UR - http://www.scopus.com/inward/record.url?scp=18744409364&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=18744409364&partnerID=8YFLogxK

    M3 - Conference contribution

    SP - 381

    EP - 389

    BT - CIKM 2004: Proceedings of the Thirteenth ACM Conference on Information and Knowledge Management

    A2 - Evans, D.A.

    A2 - Gravano, L.

    A2 - Herzog, O.

    A2 - Zhai, C.

    A2 - Ronthaler, M.

    ER -