Probabilistic context-free grammar induction based on structural zeros

Mehryar Mohri, Brian Roark

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

We present a method for induction of concise and accurate probabilistic context-free grammars for efficient use in early stages of a multi-stage parsing technique. The method is based on the use of statistical tests to determine if a non-terminal combination is unobserved due to sparse data or hard syntactic constraints. Experimental results show that, using this method, high accuracies can be achieved with a non-terminal set that is orders of magnitude smaller than in typically induced probabilistic context-free grammars, leading to substantial speed-ups in parsing. The approach is further used in combination with an existing reranker to provide competitive WSJ parsing results.

Original languageEnglish (US)
Title of host publicationHLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference
Pages312-319
Number of pages8
StatePublished - 2006
Event2006 Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics Annual Meeting, HLT-NAACL 2006 - New York, NY, United States
Duration: Jun 4 2006Jun 9 2006

Other

Other2006 Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics Annual Meeting, HLT-NAACL 2006
CountryUnited States
CityNew York, NY
Period6/4/066/9/06

Fingerprint

induction
grammar
statistical test
Induction
Grammar
Parsing

ASJC Scopus subject areas

  • Language and Linguistics
  • Linguistics and Language

Cite this

Mohri, M., & Roark, B. (2006). Probabilistic context-free grammar induction based on structural zeros. In HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference (pp. 312-319)

Probabilistic context-free grammar induction based on structural zeros. / Mohri, Mehryar; Roark, Brian.

HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference. 2006. p. 312-319.

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Mohri, M & Roark, B 2006, Probabilistic context-free grammar induction based on structural zeros. in HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference. pp. 312-319, 2006 Human Language Technology Conference - North American Chapter of the Association for Computational Linguistics Annual Meeting, HLT-NAACL 2006, New York, NY, United States, 6/4/06.
Mohri M, Roark B. Probabilistic context-free grammar induction based on structural zeros. In HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference. 2006. p. 312-319
Mohri, Mehryar ; Roark, Brian. / Probabilistic context-free grammar induction based on structural zeros. HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference. 2006. pp. 312-319
@inproceedings{5c2c9bab5d6d49fcb0e182c3be280297,
title = "Probabilistic context-free grammar induction based on structural zeros",
abstract = "We present a method for induction of concise and accurate probabilistic context-free grammars for efficient use in early stages of a multi-stage parsing technique. The method is based on the use of statistical tests to determine if a non-terminal combination is unobserved due to sparse data or hard syntactic constraints. Experimental results show that, using this method, high accuracies can be achieved with a non-terminal set that is orders of magnitude smaller than in typically induced probabilistic context-free grammars, leading to substantial speed-ups in parsing. The approach is further used in combination with an existing reranker to provide competitive WSJ parsing results.",
author = "Mehryar Mohri and Brian Roark",
year = "2006",
language = "English (US)",
pages = "312--319",
booktitle = "HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference",

}

TY - GEN

T1 - Probabilistic context-free grammar induction based on structural zeros

AU - Mohri, Mehryar

AU - Roark, Brian

PY - 2006

Y1 - 2006

N2 - We present a method for induction of concise and accurate probabilistic context-free grammars for efficient use in early stages of a multi-stage parsing technique. The method is based on the use of statistical tests to determine if a non-terminal combination is unobserved due to sparse data or hard syntactic constraints. Experimental results show that, using this method, high accuracies can be achieved with a non-terminal set that is orders of magnitude smaller than in typically induced probabilistic context-free grammars, leading to substantial speed-ups in parsing. The approach is further used in combination with an existing reranker to provide competitive WSJ parsing results.

AB - We present a method for induction of concise and accurate probabilistic context-free grammars for efficient use in early stages of a multi-stage parsing technique. The method is based on the use of statistical tests to determine if a non-terminal combination is unobserved due to sparse data or hard syntactic constraints. Experimental results show that, using this method, high accuracies can be achieved with a non-terminal set that is orders of magnitude smaller than in typically induced probabilistic context-free grammars, leading to substantial speed-ups in parsing. The approach is further used in combination with an existing reranker to provide competitive WSJ parsing results.

UR - http://www.scopus.com/inward/record.url?scp=84858379959&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84858379959&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:84858379959

SP - 312

EP - 319

BT - HLT-NAACL 2006 - Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Main Conference

ER -