Punctuating speech for information extraction

Benoit Favre, Ralph Grishman, Dustin Hillard, Heng Ji, Dilek Hakkani-Tür, Mari Ostendorf

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper studies the effect of automatic sentence boundary detection and comma prediction on entity and relation extraction in speech. We show that punctuating the machine generated transcript according to maximum F-measure of period and comma annotation results in suboptimal information extraction. Precisely, period and comma decision thresholds can be chosen in order to improve the entity value score and the relation value score by 4% relative. Error analysis shows that preventing noun-phrase splitting by generating longer sentences and fewer commas can be harmful for IE performance. Indeed, it seems that missed punctuation allows syntactic parsers to merge noun-phrases and prevent the extraction of correct information.

Original languageEnglish (US)
Title of host publication2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
Pages5013-5016
Number of pages4
DOIs
StatePublished - 2008
Event2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP - Las Vegas, NV, United States
Duration: Mar 31 2008Apr 4 2008

Other

Other2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP
CountryUnited States
CityLas Vegas, NV
Period3/31/084/4/08

Fingerprint

sentences
annotations
edge detection
error analysis
Syntactics
Error analysis
thresholds
predictions

Keywords

  • Information extraction
  • Punctuation prediction
  • Speech

ASJC Scopus subject areas

  • Electrical and Electronic Engineering
  • Signal Processing
  • Acoustics and Ultrasonics

Cite this

Favre, B., Grishman, R., Hillard, D., Ji, H., Hakkani-Tür, D., & Ostendorf, M. (2008). Punctuating speech for information extraction. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP (pp. 5013-5016). [4518784] https://doi.org/10.1109/ICASSP.2008.4518784

Punctuating speech for information extraction. / Favre, Benoit; Grishman, Ralph; Hillard, Dustin; Ji, Heng; Hakkani-Tür, Dilek; Ostendorf, Mari.

2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP. 2008. p. 5013-5016 4518784.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Favre, B, Grishman, R, Hillard, D, Ji, H, Hakkani-Tür, D & Ostendorf, M 2008, Punctuating speech for information extraction. in 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP., 4518784, pp. 5013-5016, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP, Las Vegas, NV, United States, 3/31/08. https://doi.org/10.1109/ICASSP.2008.4518784
Favre B, Grishman R, Hillard D, Ji H, Hakkani-Tür D, Ostendorf M. Punctuating speech for information extraction. In 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP. 2008. p. 5013-5016. 4518784 https://doi.org/10.1109/ICASSP.2008.4518784
Favre, Benoit ; Grishman, Ralph ; Hillard, Dustin ; Ji, Heng ; Hakkani-Tür, Dilek ; Ostendorf, Mari. / Punctuating speech for information extraction. 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP. 2008. pp. 5013-5016
@inproceedings{c74de78b24f7464e902b8a0658e12ff8,
title = "Punctuating speech for information extraction",
abstract = "This paper studies the effect of automatic sentence boundary detection and comma prediction on entity and relation extraction in speech. We show that punctuating the machine generated transcript according to maximum F-measure of period and comma annotation results in suboptimal information extraction. Precisely, period and comma decision thresholds can be chosen in order to improve the entity value score and the relation value score by 4{\%} relative. Error analysis shows that preventing noun-phrase splitting by generating longer sentences and fewer commas can be harmful for IE performance. Indeed, it seems that missed punctuation allows syntactic parsers to merge noun-phrases and prevent the extraction of correct information.",
keywords = "Information extraction, Punctuation prediction, Speech",
author = "Benoit Favre and Ralph Grishman and Dustin Hillard and Heng Ji and Dilek Hakkani-T{\"u}r and Mari Ostendorf",
year = "2008",
doi = "10.1109/ICASSP.2008.4518784",
language = "English (US)",
isbn = "1424414849",
pages = "5013--5016",
booktitle = "2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP",

}

TY - GEN

T1 - Punctuating speech for information extraction

AU - Favre, Benoit

AU - Grishman, Ralph

AU - Hillard, Dustin

AU - Ji, Heng

AU - Hakkani-Tür, Dilek

AU - Ostendorf, Mari

PY - 2008

Y1 - 2008

N2 - This paper studies the effect of automatic sentence boundary detection and comma prediction on entity and relation extraction in speech. We show that punctuating the machine generated transcript according to maximum F-measure of period and comma annotation results in suboptimal information extraction. Precisely, period and comma decision thresholds can be chosen in order to improve the entity value score and the relation value score by 4% relative. Error analysis shows that preventing noun-phrase splitting by generating longer sentences and fewer commas can be harmful for IE performance. Indeed, it seems that missed punctuation allows syntactic parsers to merge noun-phrases and prevent the extraction of correct information.

AB - This paper studies the effect of automatic sentence boundary detection and comma prediction on entity and relation extraction in speech. We show that punctuating the machine generated transcript according to maximum F-measure of period and comma annotation results in suboptimal information extraction. Precisely, period and comma decision thresholds can be chosen in order to improve the entity value score and the relation value score by 4% relative. Error analysis shows that preventing noun-phrase splitting by generating longer sentences and fewer commas can be harmful for IE performance. Indeed, it seems that missed punctuation allows syntactic parsers to merge noun-phrases and prevent the extraction of correct information.

KW - Information extraction

KW - Punctuation prediction

KW - Speech

UR - http://www.scopus.com/inward/record.url?scp=51449122781&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=51449122781&partnerID=8YFLogxK

U2 - 10.1109/ICASSP.2008.4518784

DO - 10.1109/ICASSP.2008.4518784

M3 - Conference contribution

SN - 1424414849

SN - 9781424414840

SP - 5013

EP - 5016

BT - 2008 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP

ER -