Information extraction for enhanced access to disease outbreak reports

Ralph Grishman, Silja Huttunen, Roman Yangarber

Research output: Contribution to journalArticle

Abstract

Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This paper describes a system designed for collections of reports of infectious disease outbreaks. The system, Proteus-BIO, automatically creates a table of outbreaks, with each table entry linked to the document describing that outbreak; this makes it possible to use database operations such as selection and sorting to find relevant documents. Proteus-BIO consists of a Web crawler which gathers relevant documents; an information extraction engine which converts the individual outbreak events to a tabular database; and a database browser which provides access to the events and, through them, to the documents. The information extraction engine uses sets of patterns and word classes to extract the information about each event. Preparing these patterns and word classes has been a time-consuming manual operation in the past, but automated discovery tools now make this task significantly easier. A small study comparing the effectiveness of the tabular index with conventional Web search tools demonstrated that users can find substantially more documents in a given time period with Proteus-BIO.

Original languageEnglish (US)
Pages (from-to)236-246
Number of pages11
JournalJournal of Biomedical Informatics
Volume35
Issue number4
DOIs
StatePublished - Aug 2002

Fingerprint

Information Storage and Retrieval
Proteus
Disease Outbreaks
Databases
Engines
Sorting

ASJC Scopus subject areas

  • Computer Science Applications
  • Health Informatics

Cite this

Information extraction for enhanced access to disease outbreak reports. / Grishman, Ralph; Huttunen, Silja; Yangarber, Roman.

In: Journal of Biomedical Informatics, Vol. 35, No. 4, 08.2002, p. 236-246.

Research output: Contribution to journalArticle

Grishman, Ralph ; Huttunen, Silja ; Yangarber, Roman. / Information extraction for enhanced access to disease outbreak reports. In: Journal of Biomedical Informatics. 2002 ; Vol. 35, No. 4. pp. 236-246.
@article{9399c1a754ea45ec9ae6d49562448003,
title = "Information extraction for enhanced access to disease outbreak reports",
abstract = "Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This paper describes a system designed for collections of reports of infectious disease outbreaks. The system, Proteus-BIO, automatically creates a table of outbreaks, with each table entry linked to the document describing that outbreak; this makes it possible to use database operations such as selection and sorting to find relevant documents. Proteus-BIO consists of a Web crawler which gathers relevant documents; an information extraction engine which converts the individual outbreak events to a tabular database; and a database browser which provides access to the events and, through them, to the documents. The information extraction engine uses sets of patterns and word classes to extract the information about each event. Preparing these patterns and word classes has been a time-consuming manual operation in the past, but automated discovery tools now make this task significantly easier. A small study comparing the effectiveness of the tabular index with conventional Web search tools demonstrated that users can find substantially more documents in a given time period with Proteus-BIO.",
author = "Ralph Grishman and Silja Huttunen and Roman Yangarber",
year = "2002",
month = "8",
doi = "10.1016/S1532-0464(03)00013-3",
language = "English (US)",
volume = "35",
pages = "236--246",
journal = "Journal of Biomedical Informatics",
issn = "1532-0464",
publisher = "Academic Press Inc.",
number = "4",

}

TY - JOUR

T1 - Information extraction for enhanced access to disease outbreak reports

AU - Grishman, Ralph

AU - Huttunen, Silja

AU - Yangarber, Roman

PY - 2002/8

Y1 - 2002/8

N2 - Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This paper describes a system designed for collections of reports of infectious disease outbreaks. The system, Proteus-BIO, automatically creates a table of outbreaks, with each table entry linked to the document describing that outbreak; this makes it possible to use database operations such as selection and sorting to find relevant documents. Proteus-BIO consists of a Web crawler which gathers relevant documents; an information extraction engine which converts the individual outbreak events to a tabular database; and a database browser which provides access to the events and, through them, to the documents. The information extraction engine uses sets of patterns and word classes to extract the information about each event. Preparing these patterns and word classes has been a time-consuming manual operation in the past, but automated discovery tools now make this task significantly easier. A small study comparing the effectiveness of the tabular index with conventional Web search tools demonstrated that users can find substantially more documents in a given time period with Proteus-BIO.

AB - Document search is generally based on individual terms in the document. However, for collections within limited domains it is possible to provide more powerful access tools. This paper describes a system designed for collections of reports of infectious disease outbreaks. The system, Proteus-BIO, automatically creates a table of outbreaks, with each table entry linked to the document describing that outbreak; this makes it possible to use database operations such as selection and sorting to find relevant documents. Proteus-BIO consists of a Web crawler which gathers relevant documents; an information extraction engine which converts the individual outbreak events to a tabular database; and a database browser which provides access to the events and, through them, to the documents. The information extraction engine uses sets of patterns and word classes to extract the information about each event. Preparing these patterns and word classes has been a time-consuming manual operation in the past, but automated discovery tools now make this task significantly easier. A small study comparing the effectiveness of the tabular index with conventional Web search tools demonstrated that users can find substantially more documents in a given time period with Proteus-BIO.

UR - http://www.scopus.com/inward/record.url?scp=0036706221&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=0036706221&partnerID=8YFLogxK

U2 - 10.1016/S1532-0464(03)00013-3

DO - 10.1016/S1532-0464(03)00013-3

M3 - Article

C2 - 12755518

AN - SCOPUS:0036706221

VL - 35

SP - 236

EP - 246

JO - Journal of Biomedical Informatics

JF - Journal of Biomedical Informatics

SN - 1532-0464

IS - 4

ER -