Querying structured information sources on the Web

Sergio Mergen, Juliana Freire, Carlos Alberto Heuser

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

To provide access to distributed and heterogeneous sources, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the information sources. Queries posed to the mediated schema are then reformulated in terms of the source schemas. On the Web, where sources are plentiful, autonomous and extremely volatile, a system based on the existence of a pre-defined mediated schema and mapping information presents several drawbacks. Notably, the cost of keeping the mappings up to date as new sources are found or existing sources change can be prohibitively high. In this paper, we propose a new querying mechanism for integrating a large number of sources that requires neither a mediated schema nor source mappings. In the absence of a mediated schema, the user formulates queries based on what she expects to find. These queries are rewritten using a best-effort approach: the rewriting component compares a user query against the source schemas and produces a set of rewritings based on the matches found. We demonstrate the feasibility of this approach by providing a query interface for integrating hundreds of (real) structured Web information sources. We also discuss experimental results which indicate that our query rewriting algorithm can be effective.

Original languageEnglish (US)
Title of host publicationProceedings of the 10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008
Pages470-476
Number of pages7
DOIs
StatePublished - 2008
Event10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008 - Linz, Austria
Duration: Nov 24 2008Nov 26 2008

Other

Other10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008
CountryAustria
CityLinz
Period11/24/0811/26/08

Fingerprint

Availability
Costs

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Computer Science Applications

Cite this

Mergen, S., Freire, J., & Heuser, C. A. (2008). Querying structured information sources on the Web. In Proceedings of the 10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008 (pp. 470-476) https://doi.org/10.1145/1497308.1497394

Querying structured information sources on the Web. / Mergen, Sergio; Freire, Juliana; Heuser, Carlos Alberto.

Proceedings of the 10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008. 2008. p. 470-476.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mergen, S, Freire, J & Heuser, CA 2008, Querying structured information sources on the Web. in Proceedings of the 10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008. pp. 470-476, 10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008, Linz, Austria, 11/24/08. https://doi.org/10.1145/1497308.1497394
Mergen S, Freire J, Heuser CA. Querying structured information sources on the Web. In Proceedings of the 10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008. 2008. p. 470-476 https://doi.org/10.1145/1497308.1497394
Mergen, Sergio ; Freire, Juliana ; Heuser, Carlos Alberto. / Querying structured information sources on the Web. Proceedings of the 10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008. 2008. pp. 470-476
@inproceedings{42b7a42c3ab643d69981ae2d8738d273,
title = "Querying structured information sources on the Web",
abstract = "To provide access to distributed and heterogeneous sources, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the information sources. Queries posed to the mediated schema are then reformulated in terms of the source schemas. On the Web, where sources are plentiful, autonomous and extremely volatile, a system based on the existence of a pre-defined mediated schema and mapping information presents several drawbacks. Notably, the cost of keeping the mappings up to date as new sources are found or existing sources change can be prohibitively high. In this paper, we propose a new querying mechanism for integrating a large number of sources that requires neither a mediated schema nor source mappings. In the absence of a mediated schema, the user formulates queries based on what she expects to find. These queries are rewritten using a best-effort approach: the rewriting component compares a user query against the source schemas and produces a set of rewritings based on the matches found. We demonstrate the feasibility of this approach by providing a query interface for integrating hundreds of (real) structured Web information sources. We also discuss experimental results which indicate that our query rewriting algorithm can be effective.",
author = "Sergio Mergen and Juliana Freire and Heuser, {Carlos Alberto}",
year = "2008",
doi = "10.1145/1497308.1497394",
language = "English (US)",
isbn = "9781605583495",
pages = "470--476",
booktitle = "Proceedings of the 10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008",

}

TY - GEN

T1 - Querying structured information sources on the Web

AU - Mergen, Sergio

AU - Freire, Juliana

AU - Heuser, Carlos Alberto

PY - 2008

Y1 - 2008

N2 - To provide access to distributed and heterogeneous sources, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the information sources. Queries posed to the mediated schema are then reformulated in terms of the source schemas. On the Web, where sources are plentiful, autonomous and extremely volatile, a system based on the existence of a pre-defined mediated schema and mapping information presents several drawbacks. Notably, the cost of keeping the mappings up to date as new sources are found or existing sources change can be prohibitively high. In this paper, we propose a new querying mechanism for integrating a large number of sources that requires neither a mediated schema nor source mappings. In the absence of a mediated schema, the user formulates queries based on what she expects to find. These queries are rewritten using a best-effort approach: the rewriting component compares a user query against the source schemas and produces a set of rewritings based on the matches found. We demonstrate the feasibility of this approach by providing a query interface for integrating hundreds of (real) structured Web information sources. We also discuss experimental results which indicate that our query rewriting algorithm can be effective.

AB - To provide access to distributed and heterogeneous sources, information integration systems have traditionally relied on the availability of a mediated schema, along with mappings between this schema and the schema of the information sources. Queries posed to the mediated schema are then reformulated in terms of the source schemas. On the Web, where sources are plentiful, autonomous and extremely volatile, a system based on the existence of a pre-defined mediated schema and mapping information presents several drawbacks. Notably, the cost of keeping the mappings up to date as new sources are found or existing sources change can be prohibitively high. In this paper, we propose a new querying mechanism for integrating a large number of sources that requires neither a mediated schema nor source mappings. In the absence of a mediated schema, the user formulates queries based on what she expects to find. These queries are rewritten using a best-effort approach: the rewriting component compares a user query against the source schemas and produces a set of rewritings based on the matches found. We demonstrate the feasibility of this approach by providing a query interface for integrating hundreds of (real) structured Web information sources. We also discuss experimental results which indicate that our query rewriting algorithm can be effective.

UR - http://www.scopus.com/inward/record.url?scp=70349116794&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=70349116794&partnerID=8YFLogxK

U2 - 10.1145/1497308.1497394

DO - 10.1145/1497308.1497394

M3 - Conference contribution

AN - SCOPUS:70349116794

SN - 9781605583495

SP - 470

EP - 476

BT - Proceedings of the 10th International Conference on Information Integration and Web-based Applications and Services, iiWAS 2008

ER -