A Layered Architecture for Querying Dynamic Web Content

Hasan Davulcu, Juliana Freire, Michael Kifer, I. V. Ramakrishnan

Research output: Contribution to journalArticle

Abstract

The design of webbases, database systems for supporting Web-based applications, is currently an active area of research. In this paper, we propose a 3-layer architecture for designing and implementing webbases for querying dynamic Web content (i.e., data that can only be extracted by filling out multiple forms). The lowest layer, virtual physical layer, provides navigation independence by shielding the user from the complexities associated with retrieving data from raw Web sources. Next, the traditional logical layer supports site independence. The top layer is analogous to the external schema layer in traditional databases. Within this architectural framework we address two problems unique to webbases - retrieving dynamic Web content in the virtual physical layer and querying of the external schema by the end user. The layered architecture makes it possible to automate data extraction to a much greater degree than in existing proposals. Wrappers for the virtual physical schema can be created semiautomatically, by asking the webbase designer to navigate through the sites of interest - we call this approach mapping by example. Thus, the webbase designer need not have expertise in the language that maps the physical schema to the raw Web (this should be contrasted to other approaches, which require expertise in various Web-enabled flavors of SQL). For the external schema layer, we propose a semantic extension of the universal relation interface. This interface provides powerful, yet reasonably simple, ad hoc querying capabilities for the end user compared to the currently prevailing "canned" form-based interfaces on the one hand or complex Web-enabling extensions of SQL on the other. Finally, we discuss the implementation of the proposed architecture.

Original languageEnglish (US)
Pages (from-to)491-502
Number of pages12
JournalSIGMOD Record (ACM Special Interest Group on Management of Data)
Volume28
Issue number2
StatePublished - Jun 1999

Fingerprint

Flavors
Shielding
Navigation
Semantics

ASJC Scopus subject areas

  • Computer Graphics and Computer-Aided Design
  • Information Systems
  • Software

Cite this

A Layered Architecture for Querying Dynamic Web Content. / Davulcu, Hasan; Freire, Juliana; Kifer, Michael; Ramakrishnan, I. V.

In: SIGMOD Record (ACM Special Interest Group on Management of Data), Vol. 28, No. 2, 06.1999, p. 491-502.

Research output: Contribution to journalArticle

Davulcu, Hasan ; Freire, Juliana ; Kifer, Michael ; Ramakrishnan, I. V. / A Layered Architecture for Querying Dynamic Web Content. In: SIGMOD Record (ACM Special Interest Group on Management of Data). 1999 ; Vol. 28, No. 2. pp. 491-502.
@article{1ae60352e2ab4334909f04036b3ced6e,
title = "A Layered Architecture for Querying Dynamic Web Content",
abstract = "The design of webbases, database systems for supporting Web-based applications, is currently an active area of research. In this paper, we propose a 3-layer architecture for designing and implementing webbases for querying dynamic Web content (i.e., data that can only be extracted by filling out multiple forms). The lowest layer, virtual physical layer, provides navigation independence by shielding the user from the complexities associated with retrieving data from raw Web sources. Next, the traditional logical layer supports site independence. The top layer is analogous to the external schema layer in traditional databases. Within this architectural framework we address two problems unique to webbases - retrieving dynamic Web content in the virtual physical layer and querying of the external schema by the end user. The layered architecture makes it possible to automate data extraction to a much greater degree than in existing proposals. Wrappers for the virtual physical schema can be created semiautomatically, by asking the webbase designer to navigate through the sites of interest - we call this approach mapping by example. Thus, the webbase designer need not have expertise in the language that maps the physical schema to the raw Web (this should be contrasted to other approaches, which require expertise in various Web-enabled flavors of SQL). For the external schema layer, we propose a semantic extension of the universal relation interface. This interface provides powerful, yet reasonably simple, ad hoc querying capabilities for the end user compared to the currently prevailing {"}canned{"} form-based interfaces on the one hand or complex Web-enabling extensions of SQL on the other. Finally, we discuss the implementation of the proposed architecture.",
author = "Hasan Davulcu and Juliana Freire and Michael Kifer and Ramakrishnan, {I. V.}",
year = "1999",
month = "6",
language = "English (US)",
volume = "28",
pages = "491--502",
journal = "SIGMOD Record",
issn = "0163-5808",
publisher = "Association for Computing Machinery (ACM)",
number = "2",

}

TY - JOUR

T1 - A Layered Architecture for Querying Dynamic Web Content

AU - Davulcu, Hasan

AU - Freire, Juliana

AU - Kifer, Michael

AU - Ramakrishnan, I. V.

PY - 1999/6

Y1 - 1999/6

N2 - The design of webbases, database systems for supporting Web-based applications, is currently an active area of research. In this paper, we propose a 3-layer architecture for designing and implementing webbases for querying dynamic Web content (i.e., data that can only be extracted by filling out multiple forms). The lowest layer, virtual physical layer, provides navigation independence by shielding the user from the complexities associated with retrieving data from raw Web sources. Next, the traditional logical layer supports site independence. The top layer is analogous to the external schema layer in traditional databases. Within this architectural framework we address two problems unique to webbases - retrieving dynamic Web content in the virtual physical layer and querying of the external schema by the end user. The layered architecture makes it possible to automate data extraction to a much greater degree than in existing proposals. Wrappers for the virtual physical schema can be created semiautomatically, by asking the webbase designer to navigate through the sites of interest - we call this approach mapping by example. Thus, the webbase designer need not have expertise in the language that maps the physical schema to the raw Web (this should be contrasted to other approaches, which require expertise in various Web-enabled flavors of SQL). For the external schema layer, we propose a semantic extension of the universal relation interface. This interface provides powerful, yet reasonably simple, ad hoc querying capabilities for the end user compared to the currently prevailing "canned" form-based interfaces on the one hand or complex Web-enabling extensions of SQL on the other. Finally, we discuss the implementation of the proposed architecture.

AB - The design of webbases, database systems for supporting Web-based applications, is currently an active area of research. In this paper, we propose a 3-layer architecture for designing and implementing webbases for querying dynamic Web content (i.e., data that can only be extracted by filling out multiple forms). The lowest layer, virtual physical layer, provides navigation independence by shielding the user from the complexities associated with retrieving data from raw Web sources. Next, the traditional logical layer supports site independence. The top layer is analogous to the external schema layer in traditional databases. Within this architectural framework we address two problems unique to webbases - retrieving dynamic Web content in the virtual physical layer and querying of the external schema by the end user. The layered architecture makes it possible to automate data extraction to a much greater degree than in existing proposals. Wrappers for the virtual physical schema can be created semiautomatically, by asking the webbase designer to navigate through the sites of interest - we call this approach mapping by example. Thus, the webbase designer need not have expertise in the language that maps the physical schema to the raw Web (this should be contrasted to other approaches, which require expertise in various Web-enabled flavors of SQL). For the external schema layer, we propose a semantic extension of the universal relation interface. This interface provides powerful, yet reasonably simple, ad hoc querying capabilities for the end user compared to the currently prevailing "canned" form-based interfaces on the one hand or complex Web-enabling extensions of SQL on the other. Finally, we discuss the implementation of the proposed architecture.

UR - http://www.scopus.com/inward/record.url?scp=0345870306&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0345870306&partnerID=8YFLogxK

M3 - Article

VL - 28

SP - 491

EP - 502

JO - SIGMOD Record

JF - SIGMOD Record

SN - 0163-5808

IS - 2

ER -