Exploiting syntactic and distributional information for spelling correction with web-scale N-gram models

Wei Xu, Joel Tetreault, Martin Chodorow, Ralph Grishman, Le Zhao

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We propose a novel way of incorporating dependency parse and word co-occurrence information into a state-of-the-art web-scale n-gram model for spelling correction. The syntactic and distributional information provides extra evidence in addition to that provided by a web-scale n-gram corpus and especially helps with data sparsity problems. Experimental results show that introducing syntactic features into n-gram based models significantly reduces errors by up to 12.4% over the current state-of-the-art. The word co-occurrence information shows potential but only improves overall accuracy slightly.

Original languageEnglish (US)
Title of host publicationEMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference
Pages1291-1300
Number of pages10
StatePublished - 2011
EventConference on Empirical Methods in Natural Language Processing, EMNLP 2011 - Edinburgh, United Kingdom
Duration: Jul 27 2011Jul 31 2011

Other

OtherConference on Empirical Methods in Natural Language Processing, EMNLP 2011
CountryUnited Kingdom
CityEdinburgh
Period7/27/117/31/11

Fingerprint

Syntactics

ASJC Scopus subject areas

  • Computational Theory and Mathematics
  • Computer Science Applications
  • Information Systems

Cite this

Xu, W., Tetreault, J., Chodorow, M., Grishman, R., & Zhao, L. (2011). Exploiting syntactic and distributional information for spelling correction with web-scale N-gram models. In EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference (pp. 1291-1300)

Exploiting syntactic and distributional information for spelling correction with web-scale N-gram models. / Xu, Wei; Tetreault, Joel; Chodorow, Martin; Grishman, Ralph; Zhao, Le.

EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2011. p. 1291-1300.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Xu, W, Tetreault, J, Chodorow, M, Grishman, R & Zhao, L 2011, Exploiting syntactic and distributional information for spelling correction with web-scale N-gram models. in EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. pp. 1291-1300, Conference on Empirical Methods in Natural Language Processing, EMNLP 2011, Edinburgh, United Kingdom, 7/27/11.
Xu W, Tetreault J, Chodorow M, Grishman R, Zhao L. Exploiting syntactic and distributional information for spelling correction with web-scale N-gram models. In EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2011. p. 1291-1300
Xu, Wei ; Tetreault, Joel ; Chodorow, Martin ; Grishman, Ralph ; Zhao, Le. / Exploiting syntactic and distributional information for spelling correction with web-scale N-gram models. EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference. 2011. pp. 1291-1300
@inproceedings{312ed9771a124cd6b7b12e2d98b44262,
title = "Exploiting syntactic and distributional information for spelling correction with web-scale N-gram models",
abstract = "We propose a novel way of incorporating dependency parse and word co-occurrence information into a state-of-the-art web-scale n-gram model for spelling correction. The syntactic and distributional information provides extra evidence in addition to that provided by a web-scale n-gram corpus and especially helps with data sparsity problems. Experimental results show that introducing syntactic features into n-gram based models significantly reduces errors by up to 12.4{\%} over the current state-of-the-art. The word co-occurrence information shows potential but only improves overall accuracy slightly.",
author = "Wei Xu and Joel Tetreault and Martin Chodorow and Ralph Grishman and Le Zhao",
year = "2011",
language = "English (US)",
isbn = "1937284115",
pages = "1291--1300",
booktitle = "EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference",

}

TY - GEN

T1 - Exploiting syntactic and distributional information for spelling correction with web-scale N-gram models

AU - Xu, Wei

AU - Tetreault, Joel

AU - Chodorow, Martin

AU - Grishman, Ralph

AU - Zhao, Le

PY - 2011

Y1 - 2011

N2 - We propose a novel way of incorporating dependency parse and word co-occurrence information into a state-of-the-art web-scale n-gram model for spelling correction. The syntactic and distributional information provides extra evidence in addition to that provided by a web-scale n-gram corpus and especially helps with data sparsity problems. Experimental results show that introducing syntactic features into n-gram based models significantly reduces errors by up to 12.4% over the current state-of-the-art. The word co-occurrence information shows potential but only improves overall accuracy slightly.

AB - We propose a novel way of incorporating dependency parse and word co-occurrence information into a state-of-the-art web-scale n-gram model for spelling correction. The syntactic and distributional information provides extra evidence in addition to that provided by a web-scale n-gram corpus and especially helps with data sparsity problems. Experimental results show that introducing syntactic features into n-gram based models significantly reduces errors by up to 12.4% over the current state-of-the-art. The word co-occurrence information shows potential but only improves overall accuracy slightly.

UR - http://www.scopus.com/inward/record.url?scp=80053249348&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=80053249348&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:80053249348

SN - 1937284115

SN - 9781937284114

SP - 1291

EP - 1300

BT - EMNLP 2011 - Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference

ER -