Dependency parsing of modern standard arabic with lexical and inflectional features

Yuval Marton, Nizar Habash, Owen Rambow

    Research output: Contribution to journalArticle

    Abstract

    We explore the contribution of lexical and inflectional morphology features to dependency parsing of Arabic, a morphologically rich language with complex agreement patterns. Using controlled experiments, we contrast the contribution of different part-of-speech (POS) tag sets and morphological features in two input conditions: machine-predicted condition (in which POS tags and morphological feature values are automatically assigned), and gold condition (in which their true values are known). We find that more informative (fine-grained) tag sets are useful in the gold condition, but may be detrimental in the predicted condition, where they are outperformed by simpler but more accurately predicted tag sets. We identify a set of features (definiteness, person, number, gender, and undiacritized lemma) that improve parsing quality in the predicted condition, whereas other features are more useful in gold.We are the first to show that functional features for gender and number (e.g., "broken plurals"), and optionally the related rationality ("humanness") feature, are more helpful for parsing than form-based gender and number. We finally show that parsing quality in the predicted condition can dramatically improve by training in a combined gold+predicted condition. We experimented with two transition-based parsers, MaltParser and Easy-First Parser. Our findings are robust across parsers, models, and input conditions. This suggests that the contribution of the linguistic knowledge in the tag sets and features we identified goes beyond particular experimental settings, and may be informative for other parsers and morphologically rich languages.

    Original languageEnglish (US)
    Pages (from-to)161-194
    Number of pages34
    JournalComputational Linguistics
    Volume39
    Issue number1
    DOIs
    StatePublished - Mar 1 2013

    Fingerprint

    gold
    Gold
    gender
    language
    Linguistics
    rationality
    Values
    linguistics
    human being
    Parsing
    Tag
    experiment
    Experiments
    Parsers
    Part of Speech
    Language

    ASJC Scopus subject areas

    • Language and Linguistics
    • Linguistics and Language
    • Computer Science Applications
    • Artificial Intelligence

    Cite this

    Dependency parsing of modern standard arabic with lexical and inflectional features. / Marton, Yuval; Habash, Nizar; Rambow, Owen.

    In: Computational Linguistics, Vol. 39, No. 1, 01.03.2013, p. 161-194.

    Research output: Contribution to journalArticle

    Marton, Yuval ; Habash, Nizar ; Rambow, Owen. / Dependency parsing of modern standard arabic with lexical and inflectional features. In: Computational Linguistics. 2013 ; Vol. 39, No. 1. pp. 161-194.
    @article{a77af757f8244b9db3e1f94d4d6275bc,
    title = "Dependency parsing of modern standard arabic with lexical and inflectional features",
    abstract = "We explore the contribution of lexical and inflectional morphology features to dependency parsing of Arabic, a morphologically rich language with complex agreement patterns. Using controlled experiments, we contrast the contribution of different part-of-speech (POS) tag sets and morphological features in two input conditions: machine-predicted condition (in which POS tags and morphological feature values are automatically assigned), and gold condition (in which their true values are known). We find that more informative (fine-grained) tag sets are useful in the gold condition, but may be detrimental in the predicted condition, where they are outperformed by simpler but more accurately predicted tag sets. We identify a set of features (definiteness, person, number, gender, and undiacritized lemma) that improve parsing quality in the predicted condition, whereas other features are more useful in gold.We are the first to show that functional features for gender and number (e.g., {"}broken plurals{"}), and optionally the related rationality ({"}humanness{"}) feature, are more helpful for parsing than form-based gender and number. We finally show that parsing quality in the predicted condition can dramatically improve by training in a combined gold+predicted condition. We experimented with two transition-based parsers, MaltParser and Easy-First Parser. Our findings are robust across parsers, models, and input conditions. This suggests that the contribution of the linguistic knowledge in the tag sets and features we identified goes beyond particular experimental settings, and may be informative for other parsers and morphologically rich languages.",
    author = "Yuval Marton and Nizar Habash and Owen Rambow",
    year = "2013",
    month = "3",
    day = "1",
    doi = "10.1162/COLI_a_00138",
    language = "English (US)",
    volume = "39",
    pages = "161--194",
    journal = "Computational Linguistics",
    issn = "0891-2017",
    publisher = "MIT Press Journals",
    number = "1",

    }

    TY - JOUR

    T1 - Dependency parsing of modern standard arabic with lexical and inflectional features

    AU - Marton, Yuval

    AU - Habash, Nizar

    AU - Rambow, Owen

    PY - 2013/3/1

    Y1 - 2013/3/1

    N2 - We explore the contribution of lexical and inflectional morphology features to dependency parsing of Arabic, a morphologically rich language with complex agreement patterns. Using controlled experiments, we contrast the contribution of different part-of-speech (POS) tag sets and morphological features in two input conditions: machine-predicted condition (in which POS tags and morphological feature values are automatically assigned), and gold condition (in which their true values are known). We find that more informative (fine-grained) tag sets are useful in the gold condition, but may be detrimental in the predicted condition, where they are outperformed by simpler but more accurately predicted tag sets. We identify a set of features (definiteness, person, number, gender, and undiacritized lemma) that improve parsing quality in the predicted condition, whereas other features are more useful in gold.We are the first to show that functional features for gender and number (e.g., "broken plurals"), and optionally the related rationality ("humanness") feature, are more helpful for parsing than form-based gender and number. We finally show that parsing quality in the predicted condition can dramatically improve by training in a combined gold+predicted condition. We experimented with two transition-based parsers, MaltParser and Easy-First Parser. Our findings are robust across parsers, models, and input conditions. This suggests that the contribution of the linguistic knowledge in the tag sets and features we identified goes beyond particular experimental settings, and may be informative for other parsers and morphologically rich languages.

    AB - We explore the contribution of lexical and inflectional morphology features to dependency parsing of Arabic, a morphologically rich language with complex agreement patterns. Using controlled experiments, we contrast the contribution of different part-of-speech (POS) tag sets and morphological features in two input conditions: machine-predicted condition (in which POS tags and morphological feature values are automatically assigned), and gold condition (in which their true values are known). We find that more informative (fine-grained) tag sets are useful in the gold condition, but may be detrimental in the predicted condition, where they are outperformed by simpler but more accurately predicted tag sets. We identify a set of features (definiteness, person, number, gender, and undiacritized lemma) that improve parsing quality in the predicted condition, whereas other features are more useful in gold.We are the first to show that functional features for gender and number (e.g., "broken plurals"), and optionally the related rationality ("humanness") feature, are more helpful for parsing than form-based gender and number. We finally show that parsing quality in the predicted condition can dramatically improve by training in a combined gold+predicted condition. We experimented with two transition-based parsers, MaltParser and Easy-First Parser. Our findings are robust across parsers, models, and input conditions. This suggests that the contribution of the linguistic knowledge in the tag sets and features we identified goes beyond particular experimental settings, and may be informative for other parsers and morphologically rich languages.

    UR - http://www.scopus.com/inward/record.url?scp=84874622627&partnerID=8YFLogxK

    UR - http://www.scopus.com/inward/citedby.url?scp=84874622627&partnerID=8YFLogxK

    U2 - 10.1162/COLI_a_00138

    DO - 10.1162/COLI_a_00138

    M3 - Article

    VL - 39

    SP - 161

    EP - 194

    JO - Computational Linguistics

    JF - Computational Linguistics

    SN - 0891-2017

    IS - 1

    ER -