Reranking with linguistic and semantic features for arabic optical character recognition

Nadi Tomeh, Nizar Habash, Ryan Roth, Noura Farra, Pradeep Dasigi, Mona Diab

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Optical Character Recognition (OCR) systems for Arabic rely on information contained in the scanned images to recognize sequences of characters and on language models to emphasize fluency. In this paper we incorporate linguistically and seman-tically motivated features to an existing OCR system. To do so we follow an n-best list reranking approach that exploits recent advances in learning to rank techniques. We achieve 10.1% and 11.4% reduction in recognition word error rate (WER) relative to a standard baseline system on typewritten and handwritten Arabic respectively.

Original languageEnglish (US)
Title of host publicationShort Papers
PublisherAssociation for Computational Linguistics (ACL)
Pages549-555
Number of pages7
ISBN (Print)9781937284510
StatePublished - Jan 1 2013
Event51st Annual Meeting of the Association for Computational Linguistics, ACL 2013 - Sofia, Bulgaria
Duration: Aug 4 2013Aug 9 2013

Publication series

NameACL 2013 - 51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference
Volume2

Other

Other51st Annual Meeting of the Association for Computational Linguistics, ACL 2013
CountryBulgaria
CitySofia
Period8/4/138/9/13

    Fingerprint

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Tomeh, N., Habash, N., Roth, R., Farra, N., Dasigi, P., & Diab, M. (2013). Reranking with linguistic and semantic features for arabic optical character recognition. In Short Papers (pp. 549-555). (ACL 2013 - 51st Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference; Vol. 2). Association for Computational Linguistics (ACL).