Example-based machine translation (EBMT) is a method of
machine translation
Machine translation is use of computational techniques to translate text or speech from one language to another, including the contextual, idiomatic and pragmatic nuances of both languages.
Early approaches were mostly rule-based or statisti ...
often characterized by its use of a bilingual
corpus
Corpus (plural ''corpora'') is Latin for "body". It may refer to:
Linguistics
* Text corpus, in linguistics, a large and structured set of texts
* Speech corpus, in linguistics, a large set of speech audio files
* Corpus linguistics, a branch of ...
with
parallel text
A parallel text is a text placed alongside its translation or translations. Parallel text alignment is the identification of the corresponding sentences in both halves of the parallel text. The Loeb Classical Library and the Clay Sanskrit Libr ...
s as its main knowledge base at run-time. It is essentially a translation by
analogy
Analogy is a comparison or correspondence between two things (or two groups of things) because of a third element that they are considered to share.
In logic, it is an inference or an argument from one particular to another particular, as oppose ...
and can be viewed as an implementation of a
case-based reasoning
Case-based reasoning (CBR), broadly construed, is the process of solving new problems based on the solutions of similar past problems.
In everyday life, an auto mechanic who fixes an engine by recalling another car that exhibited similar sympto ...
approach to
machine learning
Machine learning (ML) is a field of study in artificial intelligence concerned with the development and study of Computational statistics, statistical algorithms that can learn from data and generalise to unseen data, and thus perform Task ( ...
.
Translation by analogy
At the foundation of example-based machine translation is the idea of translation by analogy. When applied to the process of human translation, the idea that translation takes place by analogy is a rejection of the idea that people translate sentences by doing deep linguistic analysis. Instead, it is founded on the belief that people translate by first decomposing a sentence into certain phrases, then by translating these phrases, and finally by properly composing these fragments into one long sentence. Phrasal translations are translated by analogy to previous translations. The principle of translation by analogy is encoded to example-based machine translation through the example translations that are used to train such a system.
Other approaches to machine translation, including
statistical machine translation
Statistical machine translation (SMT) is a machine translation approach where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contra ...
, also use bilingual corpora to learn the process of translation.
History
Example-based machine translation was first suggested by
Makoto Nagao in 1984.
He pointed out that it is especially adapted to translation between two totally different languages, such as English and Japanese. In this case, one sentence can be translated into several well-structured sentences in another language, therefore, it is no use to do the deep linguistic analysis characteristic of
rule-based machine translation.
Example
Example-based machine translation systems are trained from bilingual parallel corpora containing sentence pairs like the example shown in the table above. Sentence pairs contain sentences in one language with their translations into another. The particular example shows an example of a ''minimal pair'', meaning that the sentences vary by just one element. These sentences make it simple to learn translations of portions of a sentence. For example, an example-based machine translation system would learn three units of translation from the above example:
# ''How much is that'' X ''?'' corresponds to ''Ano X wa ikura desu ka.''
# ''red umbrella'' corresponds to ''akai kasa''
# ''small camera'' corresponds to ''chiisai kamera''
Composing these units can be used to produce novel translations in the future. For example, if we have been trained using some text containing the sentences:
''President Kennedy was shot dead during the parade.'' and ''The convict escaped on July 15th.'', then we could translate the sentence ''The convict was shot dead during the parade.'' by substituting the appropriate parts of the sentences.
Phrasal verbs
Example-based machine translation is best suited for sub-language phenomena like
phrasal verb
In the traditional grammar of Modern English, a phrasal verb typically constitutes a single semantic unit consisting of a verb followed by a particle (e.g., ''turn down'', ''run into,'' or ''sit up''), sometimes collocated with a preposition (e. ...
s. Phrasal verbs have highly context-dependent meanings. They are common in English, where they comprise a verb followed by an
adverb An adverb is a word or an expression that generally modifies a verb, an adjective, another adverb, a determiner, a clause, a preposition, or a sentence. Adverbs typically express manner, place, time, frequency, degree, or level of certainty by ...
and/or a
preposition
Adpositions are a part of speech, class of words used to express spatial or temporal relations (''in, under, towards, behind, ago'', etc.) or mark various thematic relations, semantic roles (''of, for''). The most common adpositions are prepositi ...
, which are called the
particle
In the physical sciences, a particle (or corpuscle in older texts) is a small localized object which can be described by several physical or chemical properties, such as volume, density, or mass.
They vary greatly in size or quantity, from s ...
to the verb. Phrasal verbs produce specialized context-specific meanings that may not be derived from the meaning of the constituents. There is almost always an ambiguity during word-to-word translation from source to the target language.
As an example, consider the phrasal verb "put on" and its
Hindustani translation. It may be used in any of the following ways:
*Ram put on the lights. (Switched on) (Hindustani translation: ''Jalana'')
*Ram put on a cap. (Wear) (Hindustani translation: ''Pahenna'')
See also
*
Programming by example In computer science, programming by example (PbE), also termed programming by demonstration or more generally as demonstrational programming, is an end-user development technique for machine learning, teaching a computer new behavior by demonstratin ...
*
Translation memory
*
Natural Language Processing
Natural language processing (NLP) is a subfield of computer science and especially artificial intelligence. It is primarily concerned with providing computers with the ability to process data encoded in natural language and is thus closely related ...
References
Further reading
*
External links
Cunei- an open source platform for data-driven machine translation that grew out of research in EBMT, but also includes recent advances from the
SMT field
{{Approaches to machine translation
Machine translation
Machine translation, example-based