Identification of naturally occurring numerical expressions in Arabic

Nizar Habash, Ryan Roth

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

In this paper, we define the task of Number Identification in natural context. We present and validate a language-independent semiautomatic approach to quickly building a gold standard for evaluating number identification systems by exploiting hand-aligned parallel data. We also present and extensively evaluate a robust rule-based system for number identification in natural context for Arabic for a variety of number formats and types. The system is shown to have strong performance, achieving, on a blind test, a 94.8% F-score for the task of correctly identifying number expression spans in natural text, and a 92.1% F-score for the task of correctly determining the core numerical value.

Original languageEnglish (US)
Title of host publicationProceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008
PublisherEuropean Language Resources Association (ELRA)
Pages3330-3336
Number of pages7
ISBN (Electronic)2951740840, 9782951740846
StatePublished - Jan 1 2008
Event6th International Conference on Language Resources and Evaluation, LREC 2008 - Marrakech, Morocco
Duration: May 28 2008May 30 2008

Other

Other6th International Conference on Language Resources and Evaluation, LREC 2008
CountryMorocco
CityMarrakech
Period5/28/085/30/08

Fingerprint

gold standard
language
performance
Rule-based Systems
Gold Standard
Blind Test
Language

ASJC Scopus subject areas

  • Library and Information Sciences
  • Linguistics and Language
  • Language and Linguistics
  • Education

Cite this

Habash, N., & Roth, R. (2008). Identification of naturally occurring numerical expressions in Arabic. In Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008 (pp. 3330-3336). European Language Resources Association (ELRA).

Identification of naturally occurring numerical expressions in Arabic. / Habash, Nizar; Roth, Ryan.

Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. European Language Resources Association (ELRA), 2008. p. 3330-3336.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Habash, N & Roth, R 2008, Identification of naturally occurring numerical expressions in Arabic. in Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. European Language Resources Association (ELRA), pp. 3330-3336, 6th International Conference on Language Resources and Evaluation, LREC 2008, Marrakech, Morocco, 5/28/08.
Habash N, Roth R. Identification of naturally occurring numerical expressions in Arabic. In Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. European Language Resources Association (ELRA). 2008. p. 3330-3336
Habash, Nizar ; Roth, Ryan. / Identification of naturally occurring numerical expressions in Arabic. Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008. European Language Resources Association (ELRA), 2008. pp. 3330-3336
@inproceedings{a42b64cad60945ae9917cb7fad3451e9,
title = "Identification of naturally occurring numerical expressions in Arabic",
abstract = "In this paper, we define the task of Number Identification in natural context. We present and validate a language-independent semiautomatic approach to quickly building a gold standard for evaluating number identification systems by exploiting hand-aligned parallel data. We also present and extensively evaluate a robust rule-based system for number identification in natural context for Arabic for a variety of number formats and types. The system is shown to have strong performance, achieving, on a blind test, a 94.8{\%} F-score for the task of correctly identifying number expression spans in natural text, and a 92.1{\%} F-score for the task of correctly determining the core numerical value.",
author = "Nizar Habash and Ryan Roth",
year = "2008",
month = "1",
day = "1",
language = "English (US)",
pages = "3330--3336",
booktitle = "Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008",
publisher = "European Language Resources Association (ELRA)",

}

TY - GEN

T1 - Identification of naturally occurring numerical expressions in Arabic

AU - Habash, Nizar

AU - Roth, Ryan

PY - 2008/1/1

Y1 - 2008/1/1

N2 - In this paper, we define the task of Number Identification in natural context. We present and validate a language-independent semiautomatic approach to quickly building a gold standard for evaluating number identification systems by exploiting hand-aligned parallel data. We also present and extensively evaluate a robust rule-based system for number identification in natural context for Arabic for a variety of number formats and types. The system is shown to have strong performance, achieving, on a blind test, a 94.8% F-score for the task of correctly identifying number expression spans in natural text, and a 92.1% F-score for the task of correctly determining the core numerical value.

AB - In this paper, we define the task of Number Identification in natural context. We present and validate a language-independent semiautomatic approach to quickly building a gold standard for evaluating number identification systems by exploiting hand-aligned parallel data. We also present and extensively evaluate a robust rule-based system for number identification in natural context for Arabic for a variety of number formats and types. The system is shown to have strong performance, achieving, on a blind test, a 94.8% F-score for the task of correctly identifying number expression spans in natural text, and a 92.1% F-score for the task of correctly determining the core numerical value.

UR - http://www.scopus.com/inward/record.url?scp=84255201790&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84255201790&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84255201790

SP - 3330

EP - 3336

BT - Proceedings of the 6th International Conference on Language Resources and Evaluation, LREC 2008

PB - European Language Resources Association (ELRA)

ER -