Looking at both the present and the past to efficiently update replicas of Web content

Luciano Barbosa, Ana Carolina Salgado, Francisco De Carvalho, Jacques Robin, Juliana Freire

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Since Web sites are autonomous and independently updated, applications that keep replicas of Web data, such as Web warehouses and search engines, must periodically poll the sites and check for changes. Since this is a resource-intensive task, in order to keep the copies up-to-date, it is important to devise efficient update schedules that adapt to the change rate of the pages and avoid visiting pages not modified since the last visit. In this paper, we propose a new approach that learns to predict the change behavior of Web pages based both on the static features and change history of pages, and refreshes the copies accordingly. Experiments using real-world data show that our technique leads to substantial performance improvements compared to previously proposed approaches.

Original languageEnglish (US)
Title of host publicationWIDM 2005 - Proceedings of the 7th ACM International Workshop on Web Information and Data Management, Co-located with CIKM 2005
Pages75-80
Number of pages6
DOIs
StatePublished - 2005
Event7th ACM International Workshop on Web Information and Data Management, WIDM 2005, Held in Conjunction with the International Conference on Information and Knowledge Management, CIKM 2005 - Bremen, Germany
Duration: Nov 5 2005Nov 5 2005

Other

Other7th ACM International Workshop on Web Information and Data Management, WIDM 2005, Held in Conjunction with the International Conference on Information and Knowledge Management, CIKM 2005
CountryGermany
CityBremen
Period11/5/0511/5/05

Fingerprint

World Wide Web
Websites
Warehouses
Search engines
Experiments

Keywords

  • Indexing update
  • Machine learning
  • Update policy

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Information Systems
  • Software

Cite this

Barbosa, L., Salgado, A. C., De Carvalho, F., Robin, J., & Freire, J. (2005). Looking at both the present and the past to efficiently update replicas of Web content. In WIDM 2005 - Proceedings of the 7th ACM International Workshop on Web Information and Data Management, Co-located with CIKM 2005 (pp. 75-80) https://doi.org/10.1145/1097047.1097062

Looking at both the present and the past to efficiently update replicas of Web content. / Barbosa, Luciano; Salgado, Ana Carolina; De Carvalho, Francisco; Robin, Jacques; Freire, Juliana.

WIDM 2005 - Proceedings of the 7th ACM International Workshop on Web Information and Data Management, Co-located with CIKM 2005. 2005. p. 75-80.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Barbosa, L, Salgado, AC, De Carvalho, F, Robin, J & Freire, J 2005, Looking at both the present and the past to efficiently update replicas of Web content. in WIDM 2005 - Proceedings of the 7th ACM International Workshop on Web Information and Data Management, Co-located with CIKM 2005. pp. 75-80, 7th ACM International Workshop on Web Information and Data Management, WIDM 2005, Held in Conjunction with the International Conference on Information and Knowledge Management, CIKM 2005, Bremen, Germany, 11/5/05. https://doi.org/10.1145/1097047.1097062
Barbosa L, Salgado AC, De Carvalho F, Robin J, Freire J. Looking at both the present and the past to efficiently update replicas of Web content. In WIDM 2005 - Proceedings of the 7th ACM International Workshop on Web Information and Data Management, Co-located with CIKM 2005. 2005. p. 75-80 https://doi.org/10.1145/1097047.1097062
Barbosa, Luciano ; Salgado, Ana Carolina ; De Carvalho, Francisco ; Robin, Jacques ; Freire, Juliana. / Looking at both the present and the past to efficiently update replicas of Web content. WIDM 2005 - Proceedings of the 7th ACM International Workshop on Web Information and Data Management, Co-located with CIKM 2005. 2005. pp. 75-80
@inproceedings{71de7e295d1841fbbed138afa1d42048,
title = "Looking at both the present and the past to efficiently update replicas of Web content",
abstract = "Since Web sites are autonomous and independently updated, applications that keep replicas of Web data, such as Web warehouses and search engines, must periodically poll the sites and check for changes. Since this is a resource-intensive task, in order to keep the copies up-to-date, it is important to devise efficient update schedules that adapt to the change rate of the pages and avoid visiting pages not modified since the last visit. In this paper, we propose a new approach that learns to predict the change behavior of Web pages based both on the static features and change history of pages, and refreshes the copies accordingly. Experiments using real-world data show that our technique leads to substantial performance improvements compared to previously proposed approaches.",
keywords = "Indexing update, Machine learning, Update policy",
author = "Luciano Barbosa and Salgado, {Ana Carolina} and {De Carvalho}, Francisco and Jacques Robin and Juliana Freire",
year = "2005",
doi = "10.1145/1097047.1097062",
language = "English (US)",
isbn = "1595931945",
pages = "75--80",
booktitle = "WIDM 2005 - Proceedings of the 7th ACM International Workshop on Web Information and Data Management, Co-located with CIKM 2005",

}

TY - GEN

T1 - Looking at both the present and the past to efficiently update replicas of Web content

AU - Barbosa, Luciano

AU - Salgado, Ana Carolina

AU - De Carvalho, Francisco

AU - Robin, Jacques

AU - Freire, Juliana

PY - 2005

Y1 - 2005

N2 - Since Web sites are autonomous and independently updated, applications that keep replicas of Web data, such as Web warehouses and search engines, must periodically poll the sites and check for changes. Since this is a resource-intensive task, in order to keep the copies up-to-date, it is important to devise efficient update schedules that adapt to the change rate of the pages and avoid visiting pages not modified since the last visit. In this paper, we propose a new approach that learns to predict the change behavior of Web pages based both on the static features and change history of pages, and refreshes the copies accordingly. Experiments using real-world data show that our technique leads to substantial performance improvements compared to previously proposed approaches.

AB - Since Web sites are autonomous and independently updated, applications that keep replicas of Web data, such as Web warehouses and search engines, must periodically poll the sites and check for changes. Since this is a resource-intensive task, in order to keep the copies up-to-date, it is important to devise efficient update schedules that adapt to the change rate of the pages and avoid visiting pages not modified since the last visit. In this paper, we propose a new approach that learns to predict the change behavior of Web pages based both on the static features and change history of pages, and refreshes the copies accordingly. Experiments using real-world data show that our technique leads to substantial performance improvements compared to previously proposed approaches.

KW - Indexing update

KW - Machine learning

KW - Update policy

UR - http://www.scopus.com/inward/record.url?scp=63449098887&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=63449098887&partnerID=8YFLogxK

U2 - 10.1145/1097047.1097062

DO - 10.1145/1097047.1097062

M3 - Conference contribution

SN - 1595931945

SN - 9781595931948

SP - 75

EP - 80

BT - WIDM 2005 - Proceedings of the 7th ACM International Workshop on Web Information and Data Management, Co-located with CIKM 2005

ER -