
Machine translation is use of computational techniques to
translate text or speech from one
language
Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
to another, including the contextual, idiomatic and pragmatic nuances of both languages.
Early approaches were mostly
rule-based or
statistical
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
. These methods have since been superseded by
neural machine translation and
large language models
A large language model (LLM) is a language model trained with Self-supervised learning, self-supervised machine learning on a vast amount of text, designed for natural language processing tasks, especially Natural language generation, language g ...
.
History
Origins
The origins of machine translation can be traced back to the work of
Al-Kindi, a ninth-century Arabic
cryptographer
Cryptography, or cryptology (from "hidden, secret"; and ''graphein'', "to write", or '' -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adversarial behavior. More gen ...
who developed techniques for systemic language translation, including
cryptanalysis
Cryptanalysis (from the Greek ''kryptós'', "hidden", and ''analýein'', "to analyze") refers to the process of analyzing information systems in order to understand hidden aspects of the systems. Cryptanalysis is used to breach cryptographic se ...
,
frequency analysis, and
probability
Probability is a branch of mathematics and statistics concerning events and numerical descriptions of how likely they are to occur. The probability of an event is a number between 0 and 1; the larger the probability, the more likely an e ...
and
statistics
Statistics (from German language, German: ', "description of a State (polity), state, a country") is the discipline that concerns the collection, organization, analysis, interpretation, and presentation of data. In applying statistics to a s ...
, which are used in modern machine translation. The idea of machine translation later appeared in the 17th century. In 1629,
René Descartes
René Descartes ( , ; ; 31 March 1596 – 11 February 1650) was a French philosopher, scientist, and mathematician, widely considered a seminal figure in the emergence of modern philosophy and Modern science, science. Mathematics was paramou ...
proposed a universal language, with equivalent ideas in different tongues sharing one symbol.
The idea of using digital computers for translation of natural languages was proposed as early as 1947 by England's
A. D. Booth and
Warren Weaver at
Rockefeller Foundation
The Rockefeller Foundation is an American private foundation and philanthropic medical research and arts funding organization based at 420 Fifth Avenue, New York City. The foundation was created by Standard Oil magnate John D. Rockefeller (" ...
in the same year. "The memorandum written by
Warren Weaver in 1949 is perhaps the single most influential publication in the earliest days of machine translation." Others followed. A demonstration was made in 1954 on the
APEXC machine at
Birkbeck College (
University of London
The University of London (UoL; abbreviated as Lond or more rarely Londin in Post-nominal letters, post-nominals) is a collegiate university, federal Public university, public research university located in London, England, United Kingdom. The ...
) of a rudimentary translation of English into French. Several papers on the topic were published at the time, and even articles in popular journals (for example an article by Cleave and Zacharov in the September 1955 issue of ''
Wireless World
''Electronics World'' (''Wireless World'', founded in 1913, and in October 1983 renamed ''Electronics & Wireless World'') is a technical magazine published by Datateam Business Media Ltd that covers electronics and RF engineering and is aimed at ...
''). A similar application, also pioneered at Birkbeck College at the time, was reading and composing
Braille
Braille ( , ) is a Tactile alphabet, tactile writing system used by blindness, blind or visually impaired people. It can be read either on embossed paper or by using refreshable braille displays that connect to computers and smartphone device ...
texts by computer.
1950s
The first researcher in the field,
Yehoshua Bar-Hillel, began his research at MIT (1951). A
Georgetown University
Georgetown University is a private university, private Jesuit research university in Washington, D.C., United States. Founded by Bishop John Carroll (archbishop of Baltimore), John Carroll in 1789, it is the oldest Catholic higher education, Ca ...
MT research team, led by Professor Michael Zarechnak, followed (1951) with a public demonstration of its
Georgetown-IBM experiment system in 1954. MT research programs popped up in Japan and Russia (1955), and the first MT conference was held in London (1956).
David G. Hays "wrote about computer-assisted language processing as early as 1957" and "was project leader on computational linguistics
at
Rand from 1955 to 1968."
1960–1975
Researchers continued to join the field as the Association for Machine Translation and Computational Linguistics was formed in the U.S. (1962) and the National Academy of Sciences formed the Automatic Language Processing Advisory Committee (ALPAC) to study MT (1964). Real progress was much slower, however, and after the
ALPAC report (1966), which found that the ten-year-long research had failed to fulfill expectations, funding was greatly reduced.
According to a 1972 report by the Director of Defense Research and Engineering (DDR&E), the feasibility of large-scale MT was reestablished by the success of the Logos MT system in translating military manuals into Vietnamese during that conflict.
The French Textile Institute also used MT to translate abstracts from and into French, English, German and Spanish (1970); Brigham Young University started a project to translate Mormon texts by automated translation (1971).
1975 and beyond
SYSTRAN, which "pioneered the field under contracts from the U.S. government"
in the 1960s, was used by Xerox to translate technical manuals (1978). Beginning in the late 1980s, as
computation
A computation is any type of arithmetic or non-arithmetic calculation that is well-defined. Common examples of computation are mathematical equation solving and the execution of computer algorithms.
Mechanical or electronic devices (or, hist ...
al power increased and became less expensive, more interest was shown in
statistical models for machine translation. MT became more popular after the advent of computers. SYSTRAN's first implementation system was implemented in 1988 by the online service of the
French Postal Service called Minitel. Various computer based translation companies were also launched, including Trados (1984), which was the first to develop and market Translation Memory technology (1989), though this is not the same as MT. The first commercial MT system for Russian / English / German-Ukrainian was developed at Kharkov State University (1991).
By 1998, "for as little as $29.95" one could "buy a program for translating in one direction between English and a major European language of
your choice" to run on a PC.
[
MT on the web started with SYSTRAN offering free translation of small texts (1996) and then providing this via AltaVista Babelfish,][ which racked up 500,000 requests a day (1997). The second free translation service on the web was Lernout & Hauspie's GlobaLink.][ ''Atlantic Magazine'' wrote in 1998 that "Systran's Babelfish and GlobaLink's Comprende" handled
"Don't bank on it" with a "competent performance."
Franz Josef Och (the future head of Translation Development AT Google) won DARPA's speed MT competition (2003). More innovations during this time included MOSES, the open-source statistical MT engine (2007), a text/SMS translation service for mobiles in Japan (2008), and a mobile phone with built-in speech-to-speech translation functionality for English, Japanese and Chinese (2009). In 2012, Google announced that ]Google Translate
Google Translate is a multilingualism, multilingual neural machine translation, neural machine translation service developed by Google to translation, translate text, documents and websites from one language into another. It offers a web applic ...
translates roughly enough text to fill 1 million books in one day.
Approaches
Before the advent of deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
methods, statistical methods required a lot of rules accompanied by morphological, syntactic, and semantic
Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
annotations.
Rule-based
The rule-based machine translation approach was used mostly in the creation of dictionaries
A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
and grammar programs. Its biggest downfall was that everything had to be made explicit: orthographical variation and erroneous input must be made part of the source language analyser in order to cope with it, and lexical selection rules must be written for all instances of ambiguity.
Transfer-based machine translation
Transfer-based machine translation was similar to interlingual machine translation in that it created a translation from an intermediate representation that simulated the meaning of the original sentence. Unlike interlingual MT, it depended partially on the language pair involved in the translation.
Interlingual
Interlingual machine translation was one instance of rule-based machine-translation approaches. In this approach, the source language, i.e. the text to be translated, was transformed into an interlingual language, i.e. a "language neutral" representation that is independent of any language. The target language was then generated out of the interlingua
Interlingua (, ) is an international auxiliary language (IAL) developed between 1937 and 1951 by the American International Auxiliary Language Association (IALA). It is a constructed language of the "naturalistic" variety, whose vocabulary, ...
. The only interlingual machine translation system that was made operational at the commercial level was the KANT system (Nyberg and Mitamura, 1992), which was designed to translate Caterpillar Technical English (CTE) into other languages.
Dictionary-based
Machine translation used a method based on dictionary
A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
entries, which means that the words were translated as they are by a dictionary.
Statistical
Statistical machine translation tried to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the European Parliament
The European Parliament (EP) is one of the two legislative bodies of the European Union and one of its seven institutions. Together with the Council of the European Union (known as the Council and informally as the Council of Ministers), it ...
. Where such corpora were available, good results were achieved translating similar texts, but such corpora were rare for many language pairs. The first statistical machine translation software was CANDIDE
( , ) is a French satire written by Voltaire, a philosopher of the Age of Enlightenment, first published in 1759. The novella has been widely translated, with English versions titled ''Candide: or, All for the Best'' (1759); ''Candide: or, The ...
from IBM
International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
. In 2005, Google improved its internal translation capabilities by using approximately 200 billion words from United Nations materials to train their system; translation accuracy improved.
SMT's biggest downfall included it being dependent upon huge amounts of parallel texts, its problems with morphology-rich languages (especially with translating ''into'' such languages), and its inability to correct singleton errors.
Some work has been done in the utilization of multiparallel corpora, that is a body of text that has been translated into 3 or more languages. Using these methods, a text that has been translated into 2 or more languages may be utilized in combination to provide a more accurate translation into a third language compared with if just one of those source languages were used alone.
Neural MT
A deep learning
Deep learning is a subset of machine learning that focuses on utilizing multilayered neural networks to perform tasks such as classification, regression, and representation learning. The field takes inspiration from biological neuroscience a ...
-based approach to MT, neural machine translation has made rapid progress in recent years. However, the current consensus is that the so-called human parity achieved is not real, being based wholly on limited domains, language pairs, and certain test benchmarks i.e., it lacks statistical significance power.
Translations by neural MT tools like DeepL Translator, which is thought to usually deliver the best machine translation results as of 2022, typically still need post-editing by a human.
Instead of training specialized translation models on parallel datasets, one can also directly prompt generative large language models like GPT to translate a text. This approach is considered promising, but is still more resource-intensive than specialized translation models.
Issues
Studies using human evaluation (e.g. by professional literary translators or human readers) have systematically identified various issues with the latest advanced MT outputs. Common issues include the translation of ambiguous parts whose correct translation requires common sense-like semantic language processing or context. There can also be errors in the source texts, missing high-quality training data and the severity of frequency of several types of problems may not get reduced with techniques used to date, requiring some level of human active participation.
Disambiguation
Word-sense disambiguation concerns finding a suitable translation when a word can have more than one meaning. The problem was first raised in the 1950s by Yehoshua Bar-Hillel. He pointed out that without a "universal encyclopedia", a machine would never be able to distinguish between the two meanings of a word. Today there are numerous approaches designed to overcome this problem. They can be approximately divided into "shallow" approaches and "deep" approaches.
Shallow approaches assume no knowledge of the text. They simply apply statistical methods to the words surrounding the ambiguous word. Deep approaches presume a comprehensive knowledge of the word. So far, shallow approaches have been more successful.
Claude Piron, a long-time translator for the United Nations and the World Health Organization
The World Health Organization (WHO) is a list of specialized agencies of the United Nations, specialized agency of the United Nations which coordinates responses to international public health issues and emergencies. It is headquartered in Gen ...
, wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities in the source text, which the grammatical and lexical exigencies of the target language require to be resolved:
The ideal deep approach would require the translation software to do all the research necessary for this kind of disambiguation on its own; but this would require a higher degree of AI than has yet been attained. A shallow approach which simply guessed at the sense of the ambiguous English phrase that Piron mentions (based, perhaps, on which kind of prisoner-of-war camp is more often mentioned in a given corpus) would have a reasonable chance of guessing wrong fairly often. A shallow approach that involves "ask the user about each ambiguity" would, by Piron's estimate, only automate about 25% of a professional translator's job, leaving the harder 75% still to be done by a human.
Non-standard speech
One of the major pitfalls of MT is its inability to translate non-standard language with the same accuracy as standard language. Heuristic or statistical based MT takes input from various sources in standard form of a language. Rule-based translation, by nature, does not include common non-standard usages. This causes errors in translation from a vernacular source or into colloquial language. Limitations on translation from casual speech present issues in the use of machine translation in mobile devices.
Named entities
In information extraction, named entities, in a narrow sense, refer to concrete or abstract entities in the real world such as people, organizations, companies, and places that have a proper name: George Washington, Chicago, Microsoft. It also refers to expressions of time, space and quantity such as 1 July 2011, $500.
In the sentence "Smith is the president of Fabrionix" both ''Smith'' and ''Fabrionix'' are named entities, and can be further qualified via first name or other information; "president" is not, since Smith could have earlier held another position at Fabrionix, e.g. Vice President.
The term rigid designator is what defines these usages for analysis in statistical machine translation.
Named entities must first be identified in the text; if not, they may be erroneously translated as common nouns, which would most likely not affect the BLEU rating of the translation but would change the text's human readability. They may be omitted from the output translation, which would also have implications for the text's readability and message.
Transliteration
Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus '' trans-'' + '' liter-'') in predictable ways, such as Greek → and → the digraph , Cyrillic → , Armenian → or L ...
includes finding the letters in the target language that most closely correspond to the name in the source language. This, however, has been cited as sometimes worsening the quality of translation. For "Southern California" the first word should be translated directly, while the second word should be transliterated. Machines often transliterate both because they treated them as one entity. Words like these are hard for machine translators, even those with a transliteration component, to process.
Use of a "do-not-translate" list, which has the same end goal – transliteration as opposed to translation. still relies on correct identification of named entities.
A third approach is a class-based model. Named entities are replaced with a token to represent their "class"; "Ted" and "Erica" would both be replaced with "person" class token. Then the statistical distribution and use of person names, in general, can be analyzed instead of looking at the distributions of "Ted" and "Erica" individually, so that the probability of a given name in a specific language will not affect the assigned probability of a translation. A study by Stanford on improving this area of translation gives the examples that different probabilities will be assigned to "David is going for a walk" and "Ankit is going for a walk" for English as a target language due to the different number of occurrences for each name in the training data. A frustrating outcome of the same study by Stanford (and other attempts to improve named recognition translation) is that many times, a decrease in the BLEU scores for translation will result from the inclusion of methods for named entity translation.
Applications
While no system provides the ideal of fully automatic high-quality machine translation of unrestricted text, many fully automated systems produce reasonable output. The quality of machine translation is substantially improved if the domain is restricted and controlled. This enables using machine translation as a tool to speed up and simplify translations, as well as producing flawed but useful low-cost or ad-hoc translations.
Travel
Machine translation applications have also been released for most mobile devices, including mobile telephones, pocket PCs, PDAs, etc. Due to their portability, such instruments have come to be designated as mobile translation tools enabling mobile business networking between partners speaking different languages, or facilitating both foreign language learning and unaccompanied traveling to foreign countries without the need of the intermediation of a human translator.
For example, the Google Translate app allows foreigners to quickly translate text in their surrounding via augmented reality
Augmented reality (AR), also known as mixed reality (MR), is a technology that overlays real-time 3D computer graphics, 3D-rendered computer graphics onto a portion of the real world through a display, such as a handheld device or head-mounted ...
using the smartphone camera that overlays the translated text onto the text. It can also recognize speech and then translate it.
Public administration
Despite their inherent limitations, MT programs are used around the world. Probably the largest institutional user is the European Commission
The European Commission (EC) is the primary Executive (government), executive arm of the European Union (EU). It operates as a cabinet government, with a number of European Commissioner, members of the Commission (directorial system, informall ...
. In 2012, with an aim to replace a rule-based MT by newer, statistical-based MT@EC, The European Commission contributed 3.072 million euros (via its ISA programme).
Wikipedia
Machine translation has also been used for translating Wikipedia
Wikipedia is a free content, free Online content, online encyclopedia that is written and maintained by a community of volunteers, known as Wikipedians, through open collaboration and the wiki software MediaWiki. Founded by Jimmy Wales and La ...
articles and could play a larger role in creating, updating, expanding, and generally improving articles in the future, especially as the MT capabilities may improve. There is a "content translation tool" which allows editors to more easily translate articles across several select languages. English-language articles are thought to usually be more comprehensive and less biased than their non-translated equivalents in other languages. As of 2022, English Wikipedia
The English Wikipedia is the primary English-language edition of Wikipedia, an online encyclopedia. It was created by Jimmy Wales and Larry Sanger on 15 January 2001, as Wikipedia's first edition.
English Wikipedia is hosted alongside o ...
has over 6.5 million articles while, for example, the German and Swedish Wikipedias each only have over 2.5 million articles, each often far less comprehensive.
Surveillance and military
Following terrorist attacks in Western countries, including 9-11, the U.S. and its allies have been most interested in developing Arabic machine translation programs, but also in translating Pashto
Pashto ( , ; , ) is an eastern Iranian language in the Indo-European language family, natively spoken in northwestern Pakistan and southern and eastern Afghanistan. It has official status in Afghanistan and the Pakistani province of Khyb ...
and Dari languages. Within these languages, the focus is on key phrases and quick communication between military members and civilians through the use of mobile phone apps. The Information Processing Technology Office in DARPA
The Defense Advanced Research Projects Agency (DARPA) is a research and development agency of the United States Department of Defense responsible for the development of emerging technologies for use by the military. Originally known as the Adva ...
hosted programs like TIDES
Tides are the rise and fall of sea levels caused by the combined effects of the gravitational forces exerted by the Moon (and to a much lesser extent, the Sun) and are also caused by the Earth and Moon orbiting one another.
Tide tables ...
and Babylon translator. US Air Force has awarded a $1 million contract to develop a language translation technology.
Social media
The notable rise of social networking on the web in recent years has created yet another niche for the application of machine translation software – in utilities such as Facebook
Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
, or instant messaging
Instant messaging (IM) technology is a type of synchronous computer-mediated communication involving the immediate ( real-time) transmission of messages between two or more parties over the Internet or another computer network. Originally involv ...
clients such as Skype
Skype () was a proprietary telecommunications application operated by Skype Technologies, a division of Microsoft, best known for IP-based videotelephony, videoconferencing and voice calls. It also had instant messaging, file transfer, ...
, Google Talk, MSN Messenger
MSN Messenger (also known colloquially simply as MSN), later rebranded as Windows Live Messenger, was a Cross-platform software, cross-platform instant messaging client, instant-messaging client developed by Microsoft. It connected to the now-di ...
, etc. – allowing users speaking different languages to communicate with each other.
Online games
Lineage W gained popularity in Japan because of its machine translation features allowing players from different countries to communicate.
Medicine
Despite being labelled as an unworthy competitor to human translation in 1966 by the Automated Language Processing Advisory Committee put together by the United States government, the quality of machine translation has now been improved to such levels that its application in online collaboration and in the medical field are being investigated. The application of this technology in medical settings where human translators are absent is another topic of research, but difficulties arise due to the importance of accurate translations in medical diagnoses.
Researchers caution that the use of machine translation in medicine could risk mistranslations that can be dangerous in critical situations. Machine translation can make it easier for doctors to communicate with their patients in day to day activities, but it is recommended to only use machine translation when there is no other alternative, and that translated medical texts should be reviewed by human translators for accuracy.
Law
Legal language poses a significant challenge to machine translation tools due to its precise nature and atypical use of normal words. For this reason, specialized algorithms have been developed for use in legal contexts. Due to the risk of mistranslations arising from machine translators, researchers recommend that machine translations should be reviewed by human translators for accuracy, and some courts prohibit its use in formal proceedings.
The use of machine translation in law has raised concerns about translation errors and client confidentiality. Lawyers who use free translation tools such as Google Translate may accidentally violate client confidentiality by exposing private information to the providers of the translation tools. In addition, there have been arguments that consent for a police search that is obtained with machine translation is invalid, with different courts issuing different verdicts over whether or not these arguments are valid.
Ancient languages
The advancements in convolutional neural networks in recent years and in low resource machine translation (when only a very limited amount of data and examples are available for training) enabled machine translation for ancient languages, such as Akkadian and its dialects Babylonian and Assyrian.
Evaluation
There are many factors that affect how machine translation systems are evaluated. These factors include the intended use of the translation, the nature of the machine translation software, and the nature of the translation process.
Different programs may work well for different purposes. For example, statistical machine translation (SMT) typically outperforms example-based machine translation (EBMT), but researchers found that when evaluating English to French translation, EBMT performs better. The same concept applies for technical documents, which can be more easily translated by SMT because of their formal language.
In certain applications, however, e.g., product descriptions written in a controlled language, a dictionary-based machine-translation system has produced satisfactory translations that require no human intervention save for quality inspection.
There are various means for evaluating the output quality of machine translation systems. The oldest is the use of human judges to assess a translation's quality. Even though human evaluation is time-consuming, it is still the most reliable method to compare different systems such as rule-based and statistical systems. Automated means of evaluation include BLEU, NIST
The National Institute of Standards and Technology (NIST) is an agency of the United States Department of Commerce whose mission is to promote American innovation and industrial competitiveness. NIST's activities are organized into physical s ...
, METEOR
A meteor, known colloquially as a shooting star, is a glowing streak of a small body (usually meteoroid) going through Earth's atmosphere, after being heated to incandescence by collisions with air molecules in the upper atmosphere,
creating a ...
, and LEPOR.
Relying exclusively on unedited machine translation ignores the fact that communication in human language
Language is a structured system of communication that consists of grammar and vocabulary. It is the primary means by which humans convey meaning, both in spoken and signed language, signed forms, and may also be conveyed through writing syste ...
is context-embedded and that it takes a person to comprehend the context of the original text with a reasonable degree of probability. It is certainly true that even purely human-generated translations are prone to error. Therefore, to ensure that a machine-generated translation will be useful to a human being and that publishable-quality translation is achieved, such translations must be reviewed and edited by a human. The late Claude Piron wrote that machine translation, at its best, automates the easier part of a translator's job; the harder and more time-consuming part usually involves doing extensive research to resolve ambiguities in the source text, which the grammatical and lexical exigencies of the target language require to be resolved. Such research is a necessary prelude to the pre-editing necessary in order to provide input for machine-translation software such that the output will not be meaningless.[See th]
annually performed NIST tests since 2001
and Bilingual Evaluation Understudy
In addition to disambiguation problems, decreased accuracy can occur due to varying levels of training data for machine translating programs. Both example-based and statistical machine translation rely on a vast array of real example sentences as a base for translation, and when too many or too few sentences are analyzed accuracy is jeopardized. Researchers found that when a program is trained on 203,529 sentence pairings, accuracy actually decreases. The optimal level of training data seems to be just over 100,000 sentences, possibly because as training data increases, the number of possible sentences increases, making it harder to find an exact translation match.
Flaws in machine translation have been noted for their entertainment value. Two videos uploaded to YouTube
YouTube is an American social media and online video sharing platform owned by Google. YouTube was founded on February 14, 2005, by Steve Chen, Chad Hurley, and Jawed Karim who were three former employees of PayPal. Headquartered in ...
in April 2017 involve two Japanese hiragana
is a Japanese language, Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hiragana'' means "common" or "plain" kana (originally also "easy", ...
characters えぐ ('' e'' and '' gu'') being repeatedly pasted into Google Translate, with the resulting translations quickly degrading into nonsensical phrases such as "DECEARING EGG" and "Deep-sea squeeze trees", which are then read in increasingly absurd voices; the full-length version of the video currently has 6.9 million views
Machine translation and signed languages
In the early 2000s, options for machine translation between spoken and signed languages were severely limited. It was a common belief that deaf individuals could use traditional translators. However, stress, intonation, pitch, and timing are conveyed much differently in spoken languages compared to signed languages. Therefore, a deaf individual may misinterpret or become confused about the meaning of written text that is based on a spoken language.[Zhao, L., Kipper, K., Schuler, W., Vogler, C., & Palmer, M. (2000)]
A Machine Translation System from English to American Sign Language
. Lecture Notes in Computer Science, 1934: 54–67.
Researchers Zhao, et al. (2000), developed a prototype called TEAM (translation from English to ASL by machine) that completed English to American Sign Language
American Sign Language (ASL) is a natural language that serves as the predominant sign language of Deaf communities in the United States and most of Anglophone Canadians, Anglophone Canada. ASL is a complete and organized visual language that i ...
(ASL) translations. The program would first analyze the syntactic, grammatical, and morphological aspects of the English text. Following this step, the program accessed a sign synthesizer, which acted as a dictionary for ASL. This synthesizer housed the process one must follow to complete ASL signs, as well as the meanings of these signs. Once the entire text is analyzed and the signs necessary to complete the translation are located in the synthesizer, a computer generated human appeared and would use ASL to sign the English text to the user.
Copyright
Only works that are original are subject to copyright
A copyright is a type of intellectual property that gives its owner the exclusive legal right to copy, distribute, adapt, display, and perform a creative work, usually for a limited time. The creative work may be in a literary, artistic, ...
protection, so some scholars claim that machine translation results are not entitled to copyright protection because MT does not involve creativity
Creativity is the ability to form novel and valuable Idea, ideas or works using one's imagination. Products of creativity may be intangible (e.g. an idea, scientific theory, Literature, literary work, musical composition, or joke), or a physica ...
. The copyright at issue is for a derivative work
In copyright law, a derivative work is an expressive creation that includes major copyrightable elements of a first, previously created original work (the underlying work). The derivative work becomes a second, separate work independent from ...
; the author of the original work in the original language does not lose his rights
Rights are law, legal, social, or ethics, ethical principles of freedom or Entitlement (fair division), entitlement; that is, rights are the fundamental normative rules about what is allowed of people or owed to people according to some legal sy ...
when a work is translated: a translator must have permission to publish a translation.
See also
*AI-complete
In the field of artificial intelligence (AI), tasks that are hypothesized to require artificial general intelligence to solve are informally known as AI-complete or AI-hard.Shapiro, Stuart C. (1992)Artificial Intelligence In Stuart C. Shapiro (Ed. ...
* Cache language model
* Comparison of machine translation applications
* Comparison of different machine translation approaches
*Computational linguistics
Computational linguistics is an interdisciplinary field concerned with the computational modelling of natural language, as well as the study of appropriate computational approaches to linguistic questions. In general, computational linguistics ...
* Computer-assisted translation and Translation memory
* Controlled language in machine translation
*Controlled natural language
Controlled natural languages (CNLs) are subsets of natural languages that are obtained by restricting the grammar and vocabulary in order to reduce or eliminate ambiguity and complexity. Traditionally, controlled languages fall into two major types ...
* Foreign language writing aid
* Fuzzy matching (computer-assisted translation)
* History of machine translation
* Human language technology
* Humour in translation ("howlers")
* Language and Communication Technologies
* Language barrier
*List of emerging technologies
This is a list of emerging technologies, which are emerging technologies, in-development technical innovations that have significant potential in their applications. The criteria for this list is that the technology must:
# Exist in some way; ...
* List of research laboratories for machine translation
* Mobile translation
* Neural machine translation
* OpenLogos
* Phraselator
* Postediting
* Pseudo-translation
* Round-trip translation
* Statistical machine translation
*
* Translation memory
* ULTRA (machine translation system)
* Universal Networking Language
* Universal translator
Notes
Further reading
*
*
*
*
External links
The Advantages and Disadvantages of Machine Translation
International Association for Machine Translation (IAMT)
Machine Translation Archive
by John Hutchins. An electronic repository (and bibliography) of articles, books and papers in the field of machine translation and computer-based translation technology
Machine translation (computer-based translation)
– Publications by John Hutchins (includes PDFs of several books on machine translation)
Machine Translation and Minority Languages
Slator News & analysis of the latest developments in machine translation
From Classroom to Real World: How Machine Translation is Changing the Landscape of Foreign Language Learning
{{Authority control
Applications of artificial intelligence
Computational linguistics
Computer-assisted translation
Tasks of natural language processing
Automation software