Extensible framework for data cleaning

Helena Galhardas, Daniela Florescu, Dennis Shasha, Eric Simon

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data quality concerns arise when one wants to correct anomalies in a single data source, or when one wants to integrate data coming from multiple sources into a single new data source. The main quality problem that arises is that the same real object is modeled by different data records. This is called the Object Identity Problem and may result from several factors. Correcting the Object Identity Problem is ensured by a set of software solutions called data cleaning tools. A new tool, called AJAX, is proposed whose main goal is to facilitate the specification and execution of data cleaning programs either for a single source or for integrating multiple data sources.

Original languageEnglish (US)
Title of host publicationProceedings - International Conference on Data Engineering
PublisherIEEE
Pages312
Number of pages1
StatePublished - 2000
Event2000 IEEE 16th International Conference on Data Engineering (ICDE'00) - San Diego, CA, USA
Duration: Feb 29 2000Mar 3 2000

Other

Other2000 IEEE 16th International Conference on Data Engineering (ICDE'00)
CitySan Diego, CA, USA
Period2/29/003/3/00

Fingerprint

Cleaning
Specifications

ASJC Scopus subject areas

  • Software
  • Engineering(all)
  • Engineering (miscellaneous)

Cite this

Galhardas, H., Florescu, D., Shasha, D., & Simon, E. (2000). Extensible framework for data cleaning. In Proceedings - International Conference on Data Engineering (pp. 312). IEEE.

Extensible framework for data cleaning. / Galhardas, Helena; Florescu, Daniela; Shasha, Dennis; Simon, Eric.

Proceedings - International Conference on Data Engineering. IEEE, 2000. p. 312.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Galhardas, H, Florescu, D, Shasha, D & Simon, E 2000, Extensible framework for data cleaning. in Proceedings - International Conference on Data Engineering. IEEE, pp. 312, 2000 IEEE 16th International Conference on Data Engineering (ICDE'00), San Diego, CA, USA, 2/29/00.
Galhardas H, Florescu D, Shasha D, Simon E. Extensible framework for data cleaning. In Proceedings - International Conference on Data Engineering. IEEE. 2000. p. 312
Galhardas, Helena ; Florescu, Daniela ; Shasha, Dennis ; Simon, Eric. / Extensible framework for data cleaning. Proceedings - International Conference on Data Engineering. IEEE, 2000. pp. 312
@inproceedings{5600be2cf17c443dbf833c6e082a78cf,
title = "Extensible framework for data cleaning",
abstract = "Data quality concerns arise when one wants to correct anomalies in a single data source, or when one wants to integrate data coming from multiple sources into a single new data source. The main quality problem that arises is that the same real object is modeled by different data records. This is called the Object Identity Problem and may result from several factors. Correcting the Object Identity Problem is ensured by a set of software solutions called data cleaning tools. A new tool, called AJAX, is proposed whose main goal is to facilitate the specification and execution of data cleaning programs either for a single source or for integrating multiple data sources.",
author = "Helena Galhardas and Daniela Florescu and Dennis Shasha and Eric Simon",
year = "2000",
language = "English (US)",
pages = "312",
booktitle = "Proceedings - International Conference on Data Engineering",
publisher = "IEEE",

}

TY - GEN

T1 - Extensible framework for data cleaning

AU - Galhardas, Helena

AU - Florescu, Daniela

AU - Shasha, Dennis

AU - Simon, Eric

PY - 2000

Y1 - 2000

N2 - Data quality concerns arise when one wants to correct anomalies in a single data source, or when one wants to integrate data coming from multiple sources into a single new data source. The main quality problem that arises is that the same real object is modeled by different data records. This is called the Object Identity Problem and may result from several factors. Correcting the Object Identity Problem is ensured by a set of software solutions called data cleaning tools. A new tool, called AJAX, is proposed whose main goal is to facilitate the specification and execution of data cleaning programs either for a single source or for integrating multiple data sources.

AB - Data quality concerns arise when one wants to correct anomalies in a single data source, or when one wants to integrate data coming from multiple sources into a single new data source. The main quality problem that arises is that the same real object is modeled by different data records. This is called the Object Identity Problem and may result from several factors. Correcting the Object Identity Problem is ensured by a set of software solutions called data cleaning tools. A new tool, called AJAX, is proposed whose main goal is to facilitate the specification and execution of data cleaning programs either for a single source or for integrating multiple data sources.

UR - http://www.scopus.com/inward/record.url?scp=0033891155&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0033891155&partnerID=8YFLogxK

M3 - Conference contribution

SP - 312

BT - Proceedings - International Conference on Data Engineering

PB - IEEE

ER -