Creating and exploring web form repositories

Luciano Barbosa, Hoa Nguyen, Thanh Nguyen, Ramesh Pinnamaneni, Juliana Freire

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present DeepPeep (http://www.deeppeep.org), a new system for discovering, organizing and analyzing Web forms. DeepPeep allows users to explore the entry points to hidden-Web sites whose contents are out of reach for traditional search engines. Besides demonstrating important features of DeepPeep and describing the infrastructure we used to build the system, we will show how this infrastructure can be used to create form collections and form search engines for different domains. We also present the analysis component of DeepPeep which allows users to explore and visualize information in form repositories, helping them not only to better search and understand forms in different domains, but also to refine the form gathering process.

Original languageEnglish (US)
Title of host publicationProceedings of the 2010 International Conference on Management of Data, SIGMOD '10
Pages1175-1177
Number of pages3
DOIs
StatePublished - 2010
Event2010 International Conference on Management of Data, SIGMOD '10 - Indianapolis, IN, United States
Duration: Jun 6 2010Jun 11 2010

Other

Other2010 International Conference on Management of Data, SIGMOD '10
CountryUnited States
CityIndianapolis, IN
Period6/6/106/11/10

Fingerprint

Search engines
Websites

Keywords

  • focused crawling
  • hidden web
  • learning classifiers
  • search engines
  • web forms

ASJC Scopus subject areas

  • Software
  • Information Systems

Cite this

Barbosa, L., Nguyen, H., Nguyen, T., Pinnamaneni, R., & Freire, J. (2010). Creating and exploring web form repositories. In Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10 (pp. 1175-1177) https://doi.org/10.1145/1807167.1807311

Creating and exploring web form repositories. / Barbosa, Luciano; Nguyen, Hoa; Nguyen, Thanh; Pinnamaneni, Ramesh; Freire, Juliana.

Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10. 2010. p. 1175-1177.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Barbosa, L, Nguyen, H, Nguyen, T, Pinnamaneni, R & Freire, J 2010, Creating and exploring web form repositories. in Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10. pp. 1175-1177, 2010 International Conference on Management of Data, SIGMOD '10, Indianapolis, IN, United States, 6/6/10. https://doi.org/10.1145/1807167.1807311
Barbosa L, Nguyen H, Nguyen T, Pinnamaneni R, Freire J. Creating and exploring web form repositories. In Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10. 2010. p. 1175-1177 https://doi.org/10.1145/1807167.1807311
Barbosa, Luciano ; Nguyen, Hoa ; Nguyen, Thanh ; Pinnamaneni, Ramesh ; Freire, Juliana. / Creating and exploring web form repositories. Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10. 2010. pp. 1175-1177
@inproceedings{2aecd0f7deda4d10b557d9b0205a0416,
title = "Creating and exploring web form repositories",
abstract = "We present DeepPeep (http://www.deeppeep.org), a new system for discovering, organizing and analyzing Web forms. DeepPeep allows users to explore the entry points to hidden-Web sites whose contents are out of reach for traditional search engines. Besides demonstrating important features of DeepPeep and describing the infrastructure we used to build the system, we will show how this infrastructure can be used to create form collections and form search engines for different domains. We also present the analysis component of DeepPeep which allows users to explore and visualize information in form repositories, helping them not only to better search and understand forms in different domains, but also to refine the form gathering process.",
keywords = "focused crawling, hidden web, learning classifiers, search engines, web forms",
author = "Luciano Barbosa and Hoa Nguyen and Thanh Nguyen and Ramesh Pinnamaneni and Juliana Freire",
year = "2010",
doi = "10.1145/1807167.1807311",
language = "English (US)",
isbn = "9781450300322",
pages = "1175--1177",
booktitle = "Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10",

}

TY - GEN

T1 - Creating and exploring web form repositories

AU - Barbosa, Luciano

AU - Nguyen, Hoa

AU - Nguyen, Thanh

AU - Pinnamaneni, Ramesh

AU - Freire, Juliana

PY - 2010

Y1 - 2010

N2 - We present DeepPeep (http://www.deeppeep.org), a new system for discovering, organizing and analyzing Web forms. DeepPeep allows users to explore the entry points to hidden-Web sites whose contents are out of reach for traditional search engines. Besides demonstrating important features of DeepPeep and describing the infrastructure we used to build the system, we will show how this infrastructure can be used to create form collections and form search engines for different domains. We also present the analysis component of DeepPeep which allows users to explore and visualize information in form repositories, helping them not only to better search and understand forms in different domains, but also to refine the form gathering process.

AB - We present DeepPeep (http://www.deeppeep.org), a new system for discovering, organizing and analyzing Web forms. DeepPeep allows users to explore the entry points to hidden-Web sites whose contents are out of reach for traditional search engines. Besides demonstrating important features of DeepPeep and describing the infrastructure we used to build the system, we will show how this infrastructure can be used to create form collections and form search engines for different domains. We also present the analysis component of DeepPeep which allows users to explore and visualize information in form repositories, helping them not only to better search and understand forms in different domains, but also to refine the form gathering process.

KW - focused crawling

KW - hidden web

KW - learning classifiers

KW - search engines

KW - web forms

UR - http://www.scopus.com/inward/record.url?scp=77954712919&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=77954712919&partnerID=8YFLogxK

U2 - 10.1145/1807167.1807311

DO - 10.1145/1807167.1807311

M3 - Conference contribution

SN - 9781450300322

SP - 1175

EP - 1177

BT - Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10

ER -