Multi-align: Combining linguistic and statistical techniques to improve alignments for adaptable MT

Necip Fazil Ayan, Bonnie J. Dorr, Nizar Habash

Research output: Contribution to journalArticle

Abstract

An adaptable statistical or hybrid MT system relies heavily on the quality of word-level alignments of real-world data. Statistical alignment approaches provide a reasonable initial estimate for word alignment. However, they cannot handle certain types of linguistic phenomena such as long-distance dependencies and structural differences between languages. We address this issue in Multi-Align, a new framework for incremental testing of different alignment algorithms and their combinations. Our design allows users to tune their systems to the properties of a particular genre/domain while still benefiting from general linguistic knowledge associated with a language pair. We demonstrate that a combination of statistical and linguistically-informed alignments can resolve translation divergences during the alignment process.

Original languageEnglish (US)
Pages (from-to)17-26
Number of pages10
JournalLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3265
StatePublished - Dec 1 2004

Fingerprint

Linguistics
Alignment
Hybrid systems
Hybrid Systems
Resolve
Divergence
Testing
Estimate
Demonstrate
Language

ASJC Scopus subject areas

  • Theoretical Computer Science
  • Computer Science(all)

Cite this

@article{643114d278674ec29432a6b1cc405480,
title = "Multi-align: Combining linguistic and statistical techniques to improve alignments for adaptable MT",
abstract = "An adaptable statistical or hybrid MT system relies heavily on the quality of word-level alignments of real-world data. Statistical alignment approaches provide a reasonable initial estimate for word alignment. However, they cannot handle certain types of linguistic phenomena such as long-distance dependencies and structural differences between languages. We address this issue in Multi-Align, a new framework for incremental testing of different alignment algorithms and their combinations. Our design allows users to tune their systems to the properties of a particular genre/domain while still benefiting from general linguistic knowledge associated with a language pair. We demonstrate that a combination of statistical and linguistically-informed alignments can resolve translation divergences during the alignment process.",
author = "Ayan, {Necip Fazil} and Dorr, {Bonnie J.} and Nizar Habash",
year = "2004",
month = "12",
day = "1",
language = "English (US)",
volume = "3265",
pages = "17--26",
journal = "Lecture Notes in Computer Science",
issn = "0302-9743",
publisher = "Springer Verlag",

}

TY - JOUR

T1 - Multi-align

T2 - Combining linguistic and statistical techniques to improve alignments for adaptable MT

AU - Ayan, Necip Fazil

AU - Dorr, Bonnie J.

AU - Habash, Nizar

PY - 2004/12/1

Y1 - 2004/12/1

N2 - An adaptable statistical or hybrid MT system relies heavily on the quality of word-level alignments of real-world data. Statistical alignment approaches provide a reasonable initial estimate for word alignment. However, they cannot handle certain types of linguistic phenomena such as long-distance dependencies and structural differences between languages. We address this issue in Multi-Align, a new framework for incremental testing of different alignment algorithms and their combinations. Our design allows users to tune their systems to the properties of a particular genre/domain while still benefiting from general linguistic knowledge associated with a language pair. We demonstrate that a combination of statistical and linguistically-informed alignments can resolve translation divergences during the alignment process.

AB - An adaptable statistical or hybrid MT system relies heavily on the quality of word-level alignments of real-world data. Statistical alignment approaches provide a reasonable initial estimate for word alignment. However, they cannot handle certain types of linguistic phenomena such as long-distance dependencies and structural differences between languages. We address this issue in Multi-Align, a new framework for incremental testing of different alignment algorithms and their combinations. Our design allows users to tune their systems to the properties of a particular genre/domain while still benefiting from general linguistic knowledge associated with a language pair. We demonstrate that a combination of statistical and linguistically-informed alignments can resolve translation divergences during the alignment process.

UR - http://www.scopus.com/inward/record.url?scp=35048842336&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=35048842336&partnerID=8YFLogxK

M3 - Article

AN - SCOPUS:35048842336

VL - 3265

SP - 17

EP - 26

JO - Lecture Notes in Computer Science

JF - Lecture Notes in Computer Science

SN - 0302-9743

ER -