On the feasibility of peer-to-peer web indexing and search

Jinyang Li, Boon Thau Loo, Joseph M. Hellerstein, M. Frans Kaashoek, David R. Karger, Robert Morris

Research output: Contribution to journalArticle

Abstract

This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries over an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table. We present a simple feasibility analysis based on the resource constraints and search workload. Our study suggests that the peer-to-peer network does not have enough capacity to make naive use of either of search techniques attractive for Web search. The paper presents a number of existing and novel optimizations for P2P search based on distributed hash tables, estimates their effects, on performance, and concludes that in combination these optimizations would bring the problem to within an order of magnitude of feasibility. The paper suggests a number of compromises that might achieve the last order of magnitude.

Original languageEnglish (US)
Pages (from-to)207-215
Number of pages9
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2735
StatePublished - 2003

Fingerprint

Peer to Peer
Workload
Indexing
Keyword Search
Overlay networks
Peer to peer networks
Resource Constraints
Optimization
Overlay Networks
Peer-to-peer Networks
Web Search
Flooding
Tables
Table
Intersection
Query
Estimate

ASJC Scopus subject areas

  • Computer Science(all)
  • Biochemistry, Genetics and Molecular Biology(all)
  • Theoretical Computer Science

Cite this

On the feasibility of peer-to-peer web indexing and search. / Li, Jinyang; Loo, Boon Thau; Hellerstein, Joseph M.; Kaashoek, M. Frans; Karger, David R.; Morris, Robert.

In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), Vol. 2735, 2003, p. 207-215.

Research output: Contribution to journalArticle

Li, Jinyang ; Loo, Boon Thau ; Hellerstein, Joseph M. ; Kaashoek, M. Frans ; Karger, David R. ; Morris, Robert. / On the feasibility of peer-to-peer web indexing and search. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2003 ; Vol. 2735. pp. 207-215.
@article{0dfe45be2a2f4e0dbdfdec44a2333ddd,
title = "On the feasibility of peer-to-peer web indexing and search",
abstract = "This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries over an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table. We present a simple feasibility analysis based on the resource constraints and search workload. Our study suggests that the peer-to-peer network does not have enough capacity to make naive use of either of search techniques attractive for Web search. The paper presents a number of existing and novel optimizations for P2P search based on distributed hash tables, estimates their effects, on performance, and concludes that in combination these optimizations would bring the problem to within an order of magnitude of feasibility. The paper suggests a number of compromises that might achieve the last order of magnitude.",
author = "Jinyang Li and Loo, {Boon Thau} and Hellerstein, {Joseph M.} and Kaashoek, {M. Frans} and Karger, {David R.} and Robert Morris",
year = "2003",
language = "English (US)",
volume = "2735",
pages = "207--215",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - On the feasibility of peer-to-peer web indexing and search

AU - Li, Jinyang

AU - Loo, Boon Thau

AU - Hellerstein, Joseph M.

AU - Kaashoek, M. Frans

AU - Karger, David R.

AU - Morris, Robert

PY - 2003

Y1 - 2003

N2 - This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries over an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table. We present a simple feasibility analysis based on the resource constraints and search workload. Our study suggests that the peer-to-peer network does not have enough capacity to make naive use of either of search techniques attractive for Web search. The paper presents a number of existing and novel optimizations for P2P search based on distributed hash tables, estimates their effects, on performance, and concludes that in combination these optimizations would bring the problem to within an order of magnitude of feasibility. The paper suggests a number of compromises that might achieve the last order of magnitude.

AB - This paper discusses the feasibility of peer-to-peer full-text keyword search of the Web. Two classes of keyword search techniques are in use or have been proposed: flooding of queries over an overlay network (as in Gnutella), and intersection of index lists stored in a distributed hash table. We present a simple feasibility analysis based on the resource constraints and search workload. Our study suggests that the peer-to-peer network does not have enough capacity to make naive use of either of search techniques attractive for Web search. The paper presents a number of existing and novel optimizations for P2P search based on distributed hash tables, estimates their effects, on performance, and concludes that in combination these optimizations would bring the problem to within an order of magnitude of feasibility. The paper suggests a number of compromises that might achieve the last order of magnitude.

UR - http://www.scopus.com/inward/record.url?scp=35248880268&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35248880268&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:35248880268

VL - 2735

SP - 207

EP - 215

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -