HOME

TheInfoList



OR:

Interlingual machine translation is one of the classic approaches to
machine translation Machine translation, sometimes referred to by the abbreviation MT (not to be confused with computer-aided translation, machine-aided human translation or interactive translation), is a sub-field of computational linguistics that investigates t ...
. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representation. The target language is then generated from the interlingua. Within the rule-based machine translation paradigm, the interlingual approach is an alternative to the direct approach and the transfer approach. In the direct approach, words are translated directly without passing through an additional representation. In the transfer approach the source language is transformed into an abstract, less language-specific representation. Linguistic rules which are specific to the language pair then transform the source language representation into an abstract target language representation and from this the target sentence is generated. The interlingual approach to machine translation has advantages and disadvantages. The advantages are that it requires fewer components in order to relate each source language to each target language, it takes fewer components to add a new language, it supports paraphrases of the input in the original language, it allows both the analysers and generators to be written by monolingual system developers, and it handles languages that are very different from each other (e.g. English and Arabic). The obvious disadvantage is that the definition of an interlingua is difficult and maybe even impossible for a wider domain. The ideal context for interlingual machine translation is thus multilingual machine translation in a very specific domain.


History

The first ideas about interlingual machine translation appeared in the 17th century with Descartes and
Leibniz Gottfried Wilhelm (von) Leibniz . ( – 14 November 1716) was a German polymath active as a mathematician, philosopher, scientist and diplomat. He is one of the most prominent figures in both the history of philosophy and the history of mat ...
, who came up with theories of how to create dictionaries using universal numerical codes. Others, such as
Cave Beck Cave Beck (1623 – 1706) was an English schoolmaster and clergyman, the author of ''The Universal Character'' (published in London, 1657) in which he proposed a universal language based on a numerical system. Life Beck was born in London in 1623 ...
,
Athanasius Kircher Athanasius Kircher (2 May 1602 – 27 November 1680) was a German Jesuit scholar and polymath who published around 40 major works, most notably in the fields of comparative religion, geology, and medicine. Kircher has been compared to ...
and
Johann Joachim Becher Johann Joachim Becher (; 6 May 1635 – October 1682) was a German physician, alchemist, precursor of chemistry, scholar and adventurer, best known for his development of the phlogiston theory of combustion, and his advancement of Austrian camera ...
worked on developing an unambiguous universal language based on the principles of
logic Logic is the study of correct reasoning. It includes both formal and informal logic. Formal logic is the science of deductively valid inferences or of logical truths. It is a formal science investigating how conclusions follow from premis ...
and iconographs. In 1668,
John Wilkins John Wilkins, (14 February 1614 – 19 November 1672) was an Anglican clergyman, natural philosopher, and author, and was one of the founders of the Royal Society. He was Bishop of Chester from 1668 until his death. Wilkins is one of the f ...
described his interlingua in his "Essay towards a Real Character and a Philosophical Language". In the 18th and 19th centuries many proposals for "universal" international languages were developed, the most well known being
Esperanto Esperanto ( or ) is the world's most widely spoken constructed international auxiliary language. Created by the Warsaw-based ophthalmologist L. L. Zamenhof in 1887, it was intended to be a universal second language for international communi ...
. That said, applying the idea of a universal language to machine translation did not appear in any of the first significant approaches. Instead, work started on pairs of languages. However, during the 1950s and 60s, researchers in
Cambridge Cambridge ( ) is a university city and the county town in Cambridgeshire, England. It is located on the River Cam approximately north of London. As of the 2021 United Kingdom census, the population of Cambridge was 145,700. Cambridge beca ...
headed by Margaret Masterman, in
Leningrad Saint Petersburg ( rus, links=no, Санкт-Петербург, a=Ru-Sankt Peterburg Leningrad Petrograd Piter.ogg, r=Sankt-Peterburg, p=ˈsankt pʲɪtʲɪrˈburk), formerly known as Petrograd (1914–1924) and later Leningrad (1924–1991), i ...
headed by Nikolai Andreev and in
Milan Milan ( , , Lombard language, Lombard: ; it, Milano ) is a city in northern Italy, capital of Lombardy, and the List of cities in Italy, second-most populous city proper in Italy after Rome. The city proper has a population of about 1.4  ...
by
Silvio Ceccato Silvio Ceccato (Montecchio Maggiore, Italy 25 January 1914 – Milan, 2 December 1997) was an Italian philosopher and linguist. Born in Montecchio Maggiore, he studied law and music. In 1949 he founded the international magazine ''Methodos'', whic ...
started work in this area. The idea was discussed extensively by the Israeli philosopher
Yehoshua Bar-Hillel Yehoshua Bar-Hillel ( he, יהושע בר-הלל; 8 September 1915, in Vienna – 25 September 1975, in Jerusalem) was an Israeli philosopher, mathematician, and linguist. He was a pioneer in the fields of machine translation and formal linguist ...
in 1969. During the 1970s, noteworthy research was done in
Grenoble lat, Gratianopolis , commune status = Prefecture and commune , image = Panorama grenoble.png , image size = , caption = From upper left: Panorama of the city, Grenoble’s cable cars, place Saint- ...
by researchers attempting to translate physics and mathematical texts from
Russian Russian(s) refers to anything related to Russia, including: *Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *Rossiyane (), Russian language term for all citizens and peo ...
to French, and in
Texas Texas (, ; Spanish language, Spanish: ''Texas'', ''Tejas'') is a state in the South Central United States, South Central region of the United States. At 268,596 square miles (695,662 km2), and with more than 29.1 million residents in 2 ...
a similar project (METAL) was ongoing for Russian to
English English usually refers to: * English language * English people English may also refer to: Peoples, culture, and language * ''English'', an adjective for something of, from, or related to England ** English national id ...
. Early interlingual MT systems were also built at
Stanford Stanford University, officially Leland Stanford Junior University, is a private research university in Stanford, California. The campus occupies , among the largest in the United States, and enrolls over 17,000 students. Stanford is consider ...
in the 1970s by
Roger Schank Roger Carl Schank (born 1946) is an American artificial intelligence theorist, cognitive psychologist, learning scientist, educational reformer, and entrepreneur. Beginning in the late 1960s, he pioneered conceptual dependency theory (within the ...
and
Yorick Wilks Yorick Wilks FBCS (born 27 October 1939), a British computer scientist, is emeritus professor of artificial intelligence at the University of Sheffield, visiting professor of artificial intelligence at Gresham College (a post created especially ...
; the former became the basis of a commercial system for the transfer of funds, and the latter's code is preserved at The Computer Museum at
Boston Boston (), officially the City of Boston, is the capital city, state capital and List of municipalities in Massachusetts, most populous city of the Commonwealth (U.S. state), Commonwealth of Massachusetts, as well as the cultural and financ ...
as the first interlingual machine translation system. In the 1980s, renewed relevance was given to interlingua-based, and knowledge-based approaches to machine translation in general, with much research going on in the field. The uniting factor in this research was that high-quality translation required abandoning the idea of requiring total comprehension of the text. Instead, the translation should be based on linguistic knowledge and the specific domain in which the system would be used. The most important research of this era was done in
distributed language translation Distributed Language Translation ( eo, Distribuita Lingvo-Tradukado, DLT) was a project to develop an interlingual machine translation system for twelve European languages. It ran between 1985 and 1990. :The distinctive feature of DLT was the use ...
(DLT) in
Utrecht Utrecht ( , , ) is the fourth-largest city and a municipality of the Netherlands, capital and most populous city of the province of Utrecht. It is located in the eastern corner of the Randstad conurbation, in the very centre of mainland Nethe ...
, which worked with a modified version of
Esperanto Esperanto ( or ) is the world's most widely spoken constructed international auxiliary language. Created by the Warsaw-based ophthalmologist L. L. Zamenhof in 1887, it was intended to be a universal second language for international communi ...
, and the Fujitsu system in Japan.


Outline

In this method of translation, the interlingua can be thought of as a way of describing the analysis of a text written in a source language such that it is possible to convert its morphological, syntactic, semantic (and even pragmatic) characteristics, that is "meaning" into a target language. This interlingua is able to describe all of the characteristics of all of the languages which are to be translated, instead of simply translating from one language to another. Sometimes two interlinguas are used in translation. It is possible that one of the two covers more of the characteristics of the source language, and the other possess more of the characteristics of the target language. The translation then proceeds by converting sentences from the first language into sentences closer to the target language through two stages. The system may also be set up such that the second interlingua uses a more specific vocabulary that is closer, or more aligned with the target language, and this could improve the translation quality. The above-mentioned system is based on the idea of using linguistic proximity to improve the translation quality from a text in one original language to many other structurally similar languages from only one original analysis. This principle is also used in pivot machine translation, where a
natural language In neuropsychology, linguistics, and philosophy of language, a natural language or ordinary language is any language that has evolved naturally in humans through use and repetition without conscious planning or premeditation. Natural languag ...
is used as a "bridge" between two more distant languages. For example, in the case of translating to
English English usually refers to: * English language * English people English may also refer to: Peoples, culture, and language * ''English'', an adjective for something of, from, or related to England ** English national id ...
from
Ukrainian Ukrainian may refer to: * Something of, from, or related to Ukraine * Something relating to Ukrainians, an East Slavic people from Eastern Europe * Something relating to demographics of Ukraine in terms of demography and population of Ukraine * Som ...
using
Russian Russian(s) refers to anything related to Russia, including: *Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *Rossiyane (), Russian language term for all citizens and peo ...
as an intermediate language.Bogdan Babych, Anthony Hartley, and Serge Sharoff (2007)
Translating from under-resourced languages: comparing direct transfer against pivot translation
". ''Proceedings of MT Summit XI, 10–14 September 2007, Copenhagen, Denmark''. pp.29—35


Translation process

In interlingual machine translation systems, there are two monolingual components: the ''analysis'' of the source language and the interlingual, and the ''generation'' of the interlingua and the target language. It is however necessary to distinguish between interlingual systems using only syntactic methods (for example the systems developed in the 1970s at the universities of Grenoble and Texas) and those based on
artificial intelligence Artificial intelligence (AI) is intelligence—perceiving, synthesizing, and inferring information—demonstrated by machines, as opposed to intelligence displayed by animals and humans. Example tasks in which this is done include speech r ...
(from 1987 in Japan and the research at the universities of Southern California and Carnegie Mellon). The first type of system corresponds to that outlined in Figure 1. while the other types would be approximated by the diagram in Figure 4. The following resources are necessary to an interlingual machine translation system: * Dictionaries (or lexicons) for analysis and generation (specific to the
domain Domain may refer to: Mathematics *Domain of a function, the set of input values for which the (total) function is defined ** Domain of definition of a partial function **Natural domain of a partial function **Domain of holomorphy of a function *Do ...
and the languages involved). * A conceptual lexicon (specific to the domain), which is the
knowledge base A knowledge base (KB) is a technology used to store complex structured and unstructured information used by a computer system. The initial use of the term was in connection with expert systems, which were the first knowledge-based systems. ...
about events and entities known in the domain. * A set of projection rules (specific to the domain and the languages). * Grammars for the analysis and generation of the languages involved. One of the problems of knowledge-based machine translation systems is that it becomes impossible to create databases for domains larger than very specific areas. Another is that processing these databases is very computationally expensive.


Efficacy

One of the main advantages of this strategy is that it provides an economical way to make multilingual translation systems. With an interlingua it becomes unnecessary to make a translation pair between each pair of languages in the system. So instead of creating n (n-1) language pairs, where n is the number of languages in the system, it is only necessary to make 2n pairs between the n languages and the interlingua. The main disadvantage of this strategy is the difficulty of creating an adequate interlingua. It should be both abstract and independent of the source and target languages. The more languages added to the translation system, and the more different they are, the more potent the interlingua must be to express all possible translation directions. Another problem is that it is difficult to extract meaning from texts in the original languages to create the intermediate representation.


Existing interlingual machine translation systems


Calliope-Aero

Carabao Linguistic Virtual Machine
*
Grammatical Framework Grammatical Framework (GF) is a programming language for writing grammars of natural languages. GF is capable of parsing and generating texts in several languages simultaneously while working from a language-independent representation of meaning. ...

Number Translator
*
Google Translate Google Translate is a multilingual neural machine translation service developed by Google to translate text, documents and websites from one language into another. It offers a website interface, a mobile app for Android and iOS, and an A ...
use English internally as a
pivot language A pivot language, sometimes also called a bridge language, is an artificial or natural language used as an intermediary language for translation between many different languages – to translate between any pair of languages A and B, one translates ...
for some language pairs such as Chinese and Japanese, and more generally those with "higher quality" neural-network translators with English but not between each other.


See also

*
Intermediate representation An intermediate representation (IR) is the data structure or code used internally by a compiler or virtual machine to represent source code. An IR is designed to be conducive to further processing, such as optimization and translation. A "good ...
*
Pivot language A pivot language, sometimes also called a bridge language, is an artificial or natural language used as an intermediary language for translation between many different languages – to translate between any pair of languages A and B, one translates ...
*
Universal Networking Language {{Advert, date=April 2021 Universal Networking Language (UNL) is a declarative formal language specifically designed to represent semantic data extracted from natural language texts. It can be used as a pivot language in interlingual machine tra ...
*
Knowledge representation and reasoning Knowledge representation and reasoning (KRR, KR&R, KR²) is the field of artificial intelligence (AI) dedicated to representing information about the world in a form that a computer system can use to solve complex tasks such as diagnosing a medic ...


Notes


External links

* Interlingua Methods *
Slides
** tp://ftp.umiacs.umd.edu/pub/bonnie/Interlingual-MT-Dorr-Hovy-Levin.pdf Paper {{DEFAULTSORT:Interlingual Machine Translation Machine translation Computational linguistics