Bitext word alignment or simply word alignment is the

natural language processing Natural language processing (NLP) is an interdisciplinary subfield of linguistics, computer science, and artificial intelligence concerned with the interactions between computers and human language, in particular how to program computers to pro ...

task of identifying translation relationships among the words (or more rarely multiword units) in a bitext, resulting in a

bipartite graph In the mathematical field of graph theory, a bipartite graph (or bigraph) is a graph whose vertices can be divided into two disjoint and independent sets U and V, that is every edge connects a vertex in U to one in V. Vertex sets U and V are ...

between the two sides of the bitext, with an arc between two words if and only if they are translations of one another. Word alignment is typically done after sentence alignment has already identified pairs of sentences that are translations of one another. Bitext word alignment is an important supporting task for most methods of statistical machine translation. The parameters of statistical machine translation models are typically estimated by observing word-aligned bitexts, and conversely automatic word alignment is typically done by choosing that alignment which best fits a statistical machine translation model. Circular application of these two ideas results in an instance of the expectation-maximization algorithm. This approach to training is an instance of unsupervised learning, in that the system is not given examples of the kind of output desired, but is trying to find values for the unobserved model and alignments which best explain the observed bitext. Recent work has begun to explore supervised methods which rely on presenting the system with a (usually small) number of manually aligned sentences. In addition to the benefit of the additional information provided by supervision, these models are typically also able to more easily take advantage of combining many features of the data, such as context,

syntactic structure In linguistics, syntax () is the study of how words and morphemes combine to form larger units such as phrases and sentences. Central concerns of syntax include word order, grammatical relations, hierarchical sentence structure ( constituency) ...

part-of-speech In grammar, a part of speech or part-of-speech (abbreviated as POS or PoS, also known as word class or grammatical category) is a category of words (or, more generally, of lexical items) that have similar grammatical properties. Words that are ass ...

, or translation lexicon information, which are difficult to integrate into the generative statistical models traditionally used. Besides the training of machine translation systems, other applications of word alignment include translation lexicon induction,

word sense In linguistics, a word sense is one of the meanings of a word. For example, a dictionary may have over 50 different senses of the word "play", each of these having a different meaning based on the context of the word's usage in a sentence, as fo ...

discovery,

word sense disambiguation Word-sense disambiguation (WSD) is the process of identifying which sense of a word is meant in a sentence or other segment of context. In human language processing and cognition, it is usually subconscious/automatic but can often come to consci ...

and the cross-lingual projection of linguistic information.

Training

IBM Models

The IBM models are used in Statistical machine translation to train a translation model and an alignment model. They are an instance of the Expectation–maximization algorithm: in the expectation-step the translation probabilities within each sentence are computed, in the maximization step they are accumulated to global translation probabilities. Features: * IBM Model 1: lexical alignment probabilities * IBM Model 2: absolute positions * IBM Model 3: fertilities (supports insertions) * IBM Model 4: relative positions * IBM Model 5: fixes deficiencies (ensures that no two words can be aligned to the same position)

HMM

Vogel et al.S. Vogel, H. Ney and C. Tillmann. 1996
HMM-based Word Alignment in Statistical Translation
. In COLING ’96: The 16th International Conference on Computational Linguistics, pp. 836-841, Copenhagen, Denmark. developed an approach featuring lexical translation probabilities and relative alignment by mapping the problem to a

Hidden Markov model A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process — call it X — with unobservable ("''hidden''") states. As part of the definition, HMM requires that there be an ob ...

. The states and observations represent the source and target words respectively. The transition probabilities model the alignment probabilities. In training the translation and alignment probabilities can be obtained from

\gamma_t(i)

and

\xi_t(i,j)

in the Forward-backward algorithm.

Software

GIZA++
(free software under GPL) ** The most widely used alignment toolkit, implementing the famous IBM models with a variety of improvements
The Berkeley Word Aligner
(free software under GPL) ** Another widely used aligner implementing alignment by agreement, and discriminative models for alignment
Nile
(free software under GPL) ** A supervised word aligner that is able to use syntactic information on the source and target side
pialign
(free software under the Common Public License) ** An aligner that aligns both words and phrases using Bayesian learning and inversion transduction grammars
Natura Alignment Tools
(NATools, free software under GPL)
UNL aligner
(free software under Creative Commons Attribution 3.0 Unported License)
Geometric Mapping and Alignment (GMA)
(free software under GPL)
HunAlign
(free software under LGPL-2.1)
Anymalign
(free software under GPL)

References

{{DEFAULTSORT:Bitext Word Alignment Machine translation