Capturing both Types and Constraints in Data Integration

Michael Benedikt, Chee Yong Chan, Wenfei Fan, Juliana Freire, Rajeev Rastogi

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a framework for integrating data from multiple relational sources into an XML document that both conforms to a given DTD and satisfies predefined XML constraints. The framework is based on a specification language, AIG, that extends a DTD by (1) associating element types with semantic attributes (inherited and synthesized, inspired by the corresponding notions from Attribute Grammars), (2) computing these attributes via parameterized SQL queries over multiple data sources, and (3) incorporating XML keys and inclusion constraints. The novelty of AIG consists in semantic attributes and their dependency relations for controlling context-dependent, DTD-directed construction of XML documents, as well as for checking XML constraints in parallel with document-generation. We also present cost-based optimization techniques for efficiently evaluating AIGs, including algorithms for merging queries and for scheduling queries on multiple data sources. This provides a new grammar-based approach for data integration under both syntactic and semantic constraints.

Original languageEnglish (US)
Title of host publicationProceedings of the ACM SIGMOD International Conference on Management of Data
EditorsA.Y. Halevy, Z.G. Ives, A.H. Doan
Pages277-288
Number of pages12
StatePublished - 2003
Event2003 ACM SIGMOD International Conference on Management of Data - San Diego, CA, United States
Duration: Jun 9 2003Jun 12 2003

Other

Other2003 ACM SIGMOD International Conference on Management of Data
CountryUnited States
CitySan Diego, CA
Period6/9/036/12/03

Fingerprint

Data integration
XML
Semantics
Specification languages
Syntactics
Merging
Scheduling
Costs

ASJC Scopus subject areas

  • Computer Science(all)

Cite this

Benedikt, M., Chan, C. Y., Fan, W., Freire, J., & Rastogi, R. (2003). Capturing both Types and Constraints in Data Integration. In A. Y. Halevy, Z. G. Ives, & A. H. Doan (Eds.), Proceedings of the ACM SIGMOD International Conference on Management of Data (pp. 277-288)

Capturing both Types and Constraints in Data Integration. / Benedikt, Michael; Chan, Chee Yong; Fan, Wenfei; Freire, Juliana; Rastogi, Rajeev.

Proceedings of the ACM SIGMOD International Conference on Management of Data. ed. / A.Y. Halevy; Z.G. Ives; A.H. Doan. 2003. p. 277-288.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Benedikt, M, Chan, CY, Fan, W, Freire, J & Rastogi, R 2003, Capturing both Types and Constraints in Data Integration. in AY Halevy, ZG Ives & AH Doan (eds), Proceedings of the ACM SIGMOD International Conference on Management of Data. pp. 277-288, 2003 ACM SIGMOD International Conference on Management of Data, San Diego, CA, United States, 6/9/03.
Benedikt M, Chan CY, Fan W, Freire J, Rastogi R. Capturing both Types and Constraints in Data Integration. In Halevy AY, Ives ZG, Doan AH, editors, Proceedings of the ACM SIGMOD International Conference on Management of Data. 2003. p. 277-288
Benedikt, Michael ; Chan, Chee Yong ; Fan, Wenfei ; Freire, Juliana ; Rastogi, Rajeev. / Capturing both Types and Constraints in Data Integration. Proceedings of the ACM SIGMOD International Conference on Management of Data. editor / A.Y. Halevy ; Z.G. Ives ; A.H. Doan. 2003. pp. 277-288
@inproceedings{3f056ce2aef3479090a22f208387ef2e,
title = "Capturing both Types and Constraints in Data Integration",
abstract = "We propose a framework for integrating data from multiple relational sources into an XML document that both conforms to a given DTD and satisfies predefined XML constraints. The framework is based on a specification language, AIG, that extends a DTD by (1) associating element types with semantic attributes (inherited and synthesized, inspired by the corresponding notions from Attribute Grammars), (2) computing these attributes via parameterized SQL queries over multiple data sources, and (3) incorporating XML keys and inclusion constraints. The novelty of AIG consists in semantic attributes and their dependency relations for controlling context-dependent, DTD-directed construction of XML documents, as well as for checking XML constraints in parallel with document-generation. We also present cost-based optimization techniques for efficiently evaluating AIGs, including algorithms for merging queries and for scheduling queries on multiple data sources. This provides a new grammar-based approach for data integration under both syntactic and semantic constraints.",
author = "Michael Benedikt and Chan, {Chee Yong} and Wenfei Fan and Juliana Freire and Rajeev Rastogi",
year = "2003",
language = "English (US)",
pages = "277--288",
editor = "A.Y. Halevy and Z.G. Ives and A.H. Doan",
booktitle = "Proceedings of the ACM SIGMOD International Conference on Management of Data",

}

TY - GEN

T1 - Capturing both Types and Constraints in Data Integration

AU - Benedikt, Michael

AU - Chan, Chee Yong

AU - Fan, Wenfei

AU - Freire, Juliana

AU - Rastogi, Rajeev

PY - 2003

Y1 - 2003

N2 - We propose a framework for integrating data from multiple relational sources into an XML document that both conforms to a given DTD and satisfies predefined XML constraints. The framework is based on a specification language, AIG, that extends a DTD by (1) associating element types with semantic attributes (inherited and synthesized, inspired by the corresponding notions from Attribute Grammars), (2) computing these attributes via parameterized SQL queries over multiple data sources, and (3) incorporating XML keys and inclusion constraints. The novelty of AIG consists in semantic attributes and their dependency relations for controlling context-dependent, DTD-directed construction of XML documents, as well as for checking XML constraints in parallel with document-generation. We also present cost-based optimization techniques for efficiently evaluating AIGs, including algorithms for merging queries and for scheduling queries on multiple data sources. This provides a new grammar-based approach for data integration under both syntactic and semantic constraints.

AB - We propose a framework for integrating data from multiple relational sources into an XML document that both conforms to a given DTD and satisfies predefined XML constraints. The framework is based on a specification language, AIG, that extends a DTD by (1) associating element types with semantic attributes (inherited and synthesized, inspired by the corresponding notions from Attribute Grammars), (2) computing these attributes via parameterized SQL queries over multiple data sources, and (3) incorporating XML keys and inclusion constraints. The novelty of AIG consists in semantic attributes and their dependency relations for controlling context-dependent, DTD-directed construction of XML documents, as well as for checking XML constraints in parallel with document-generation. We also present cost-based optimization techniques for efficiently evaluating AIGs, including algorithms for merging queries and for scheduling queries on multiple data sources. This provides a new grammar-based approach for data integration under both syntactic and semantic constraints.

UR - http://www.scopus.com/inward/record.url?scp=1142303689&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=1142303689&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:1142303689

SP - 277

EP - 288

BT - Proceedings of the ACM SIGMOD International Conference on Management of Data

A2 - Halevy, A.Y.

A2 - Ives, Z.G.

A2 - Doan, A.H.

ER -