Automatically extracting form labels

Hoa Nguyen, Eun Yong Kang, Juliana Freire

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We describe a machine-learning-based approach for extracting attribute labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to retrieve and integrate data that reside in online databases and that are hidden behind form interfaces, including schema matching and clustering, and hidden-Web crawlers. Whereas previous approaches to this problem have relied on heuristics and manually specified extraction rules, our technique makes use of learning classifiers to identify form labels. Our preliminary experiments show this approach is promising and has high accuracy.

Original languageEnglish (US)
Title of host publicationProceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08
Pages1498-1500
Number of pages3
DOIs
StatePublished - 2008
Event2008 IEEE 24th International Conference on Data Engineering, ICDE'08 - Cancun, Mexico
Duration: Apr 7 2008Apr 12 2008

Other

Other2008 IEEE 24th International Conference on Data Engineering, ICDE'08
CountryMexico
CityCancun
Period4/7/084/12/08

Fingerprint

Labels
Learning systems
Classifiers
Experiments

ASJC Scopus subject areas

  • Information Systems
  • Signal Processing
  • Software

Cite this

Nguyen, H., Kang, E. Y., & Freire, J. (2008). Automatically extracting form labels. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08 (pp. 1498-1500). [4497602] https://doi.org/10.1109/ICDE.2008.4497602

Automatically extracting form labels. / Nguyen, Hoa; Kang, Eun Yong; Freire, Juliana.

Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08. 2008. p. 1498-1500 4497602.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Nguyen, H, Kang, EY & Freire, J 2008, Automatically extracting form labels. in Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08., 4497602, pp. 1498-1500, 2008 IEEE 24th International Conference on Data Engineering, ICDE'08, Cancun, Mexico, 4/7/08. https://doi.org/10.1109/ICDE.2008.4497602
Nguyen H, Kang EY, Freire J. Automatically extracting form labels. In Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08. 2008. p. 1498-1500. 4497602 https://doi.org/10.1109/ICDE.2008.4497602
Nguyen, Hoa ; Kang, Eun Yong ; Freire, Juliana. / Automatically extracting form labels. Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08. 2008. pp. 1498-1500
@inproceedings{95db47aee86340cea7eeba1540f003e3,
title = "Automatically extracting form labels",
abstract = "We describe a machine-learning-based approach for extracting attribute labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to retrieve and integrate data that reside in online databases and that are hidden behind form interfaces, including schema matching and clustering, and hidden-Web crawlers. Whereas previous approaches to this problem have relied on heuristics and manually specified extraction rules, our technique makes use of learning classifiers to identify form labels. Our preliminary experiments show this approach is promising and has high accuracy.",
author = "Hoa Nguyen and Kang, {Eun Yong} and Juliana Freire",
year = "2008",
doi = "10.1109/ICDE.2008.4497602",
language = "English (US)",
isbn = "9781424418374",
pages = "1498--1500",
booktitle = "Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08",

}

TY - GEN

T1 - Automatically extracting form labels

AU - Nguyen, Hoa

AU - Kang, Eun Yong

AU - Freire, Juliana

PY - 2008

Y1 - 2008

N2 - We describe a machine-learning-based approach for extracting attribute labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to retrieve and integrate data that reside in online databases and that are hidden behind form interfaces, including schema matching and clustering, and hidden-Web crawlers. Whereas previous approaches to this problem have relied on heuristics and manually specified extraction rules, our technique makes use of learning classifiers to identify form labels. Our preliminary experiments show this approach is promising and has high accuracy.

AB - We describe a machine-learning-based approach for extracting attribute labels from Web form interfaces. Having these labels is a requirement for several techniques that attempt to retrieve and integrate data that reside in online databases and that are hidden behind form interfaces, including schema matching and clustering, and hidden-Web crawlers. Whereas previous approaches to this problem have relied on heuristics and manually specified extraction rules, our technique makes use of learning classifiers to identify form labels. Our preliminary experiments show this approach is promising and has high accuracy.

UR - http://www.scopus.com/inward/record.url?scp=52649109075&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=52649109075&partnerID=8YFLogxK

U2 - 10.1109/ICDE.2008.4497602

DO - 10.1109/ICDE.2008.4497602

M3 - Conference contribution

SN - 9781424418374

SP - 1498

EP - 1500

BT - Proceedings of the 2008 IEEE 24th International Conference on Data Engineering, ICDE'08

ER -