HOME
TheInfoList



Romanization or romanisation, in
linguistics Linguistics is the scientific study of language. It encompasses the analysis of every aspect of language, as well as the methods for studying and modeling them. The traditional areas of linguistic analysis include phonetics, phonology, morp ...
, is the conversion of writing from a different
writing system A writing system is a method of visually representing verbal communication, based on a script and a set of rules regulating its use. While both writing and speech are useful in conveying messages, writing differs in also being a reliable form o ...
to the Roman (Latin) script, or a system for doing so. Methods of romanization include
transliteration Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus ''trans-'' + ''liter-'') in predictable ways, such as Greek → , Cyrillic → , Greek → the digraph , Armenian → or Lat ...
, for representing written text, and transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into ''
phonemic In phonology and linguistics, a phoneme is a unit of sound that distinguishes one word from another in a particular language. For example, in most dialects of English, with the notable exception of the West Midlands and the north-west of Engl ...
transcription'', which records the
phonemes In phonology and linguistics, a phoneme is a unit of sound that distinguishes one word from another in a particular language. For example, in most dialects of English, with the notable exception of the West Midlands and the north-west of Engl ...
or units of
semantic Semantics (from grc, σημαντικός ''sēmantikós'', "significant") is the study of meaning, reference, or truth. The term can be used to refer to subfields of several distinct disciplines, including linguistics, philosophy and computer s ...
meaning in speech, and more strict ''
phonetic transcription Phonetic transcription (also known as phonetic script or phonetic notation) is the visual representation of speech sounds (or phones) by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the Interna ...
'', which records speech sounds with precision.


Methods

There are many consistent or
standardized Standardization or standardisation is the process of implementing and developing technical standards based on the consensus of different parties that include firms, users, interest groups, standards organizations and governments. Standardization c ...
romanization systems. They can be classified by their characteristics. A particular system’s characteristics may make it better-suited for various, sometimes contradictory applications, including document retrieval, linguistic analysis, easy readability, faithful representation of pronunciation. * Source, or donor language – A system may be tailored to romanize text from a particular language, or a series of languages, or for any language in a particular writing system. A language-specific system typically preserves language features like pronunciation, while the general one may be better for cataloguing international texts. * Target, or receiver language – Most systems are intended for an audience that speaks or reads a particular language. (So-called ''international'' romanization systems for Cyrillic text are based on central-European alphabets like the
Czech Czech may refer to: * Anything from or related to the Czech Republic, a country in Europe * Czech language * Czechs, the people of the area * Czech culture * Czech cuisine * One of three mythical brothers, Lech, Czech, and Rus Places *Czech, Łód ...
and
Croatian alphabet Croatian may refer to: *Croatia *Croatian cuisine *Croatian language *Croatian name *Croats, people from Croatia, or of Croatian descent *Citizens of Croatia, see demographics of Croatia See also * Croatia (disambiguation) * Serbo-Croatian (disamb ...
.) * Simplicity – Since the basic
Latin alphabet The Latin alphabet or Roman alphabet is the collection of letters originally used by the ancient Romans to write the Latin language and its extensions used to write modern languages. Etymology The term ''Latin alphabet'' may refer to either ...
has a smaller number of letters than many other writing systems,
digraphs Digraph may refer to: * Digraph (orthography), a pair of characters used together to represent a single sound, such as "sh" in English * Orthographic ligature, the joining of two letters as a single glyph, such as "æ" * Digraph (computing), a grou ...
,
diacritics A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacritic'' is ...
, or special characters must be used to represent them all in Latin script. This affects the ease of creation, digital storage and transmission, reproduction, and reading of the romanized text. * Reversibility – Whether or not the original can be restored from the converted text. Some reversible systems allow for an irreversible simplified version.


Transliteration

If the romanization attempts to transliterate the original script, the guiding principle is a one-to-one mapping of characters in the source language into the target script, with less emphasis on how the result sounds when pronounced according to the reader's language. For example, the
Nihon-shiki Nihon-shiki, or Nippon-shiki Rōmaji ( ja, 日本式ローマ字, "Japan-style," romanized as ''Nihon-siki'' or ''Nippon-siki'' in Nippon-shiki itself), is a romanization system for transliterating the Japanese language into the Latin alphabet. In ...
romanization of
Japanese Japanese may refer to: * Something from or related to Japan, an island country in East Asia * Japanese language, spoken mainly in Japan * Japanese people, the ethnic group that identifies with Japan through culture or ancestry ** Japanese diaspora ...

Japanese
allows the informed reader to reconstruct the original Japanese
kana The are syllabaries used to write Japanese phonological units, morae. Such syllabaries include: (1) the original kana, or , which were Chinese characters (kanji) used phonetically to transcribe Japanese; the most prominent magana system being ...

kana
syllables with 100% accuracy, but requires additional knowledge for correct pronunciation.


Transcription


Phonemic

Most romanizations are intended to enable the casual reader who is unfamiliar with the original script to pronounce the source language reasonably accurately. Such romanizations follow the principle of
phonemic transcription Phonetic transcription (also known as phonetic script or phonetic notation) is the visual representation of speech sounds (or phones) by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the Interna ...
and attempt to render the significant sounds (
phoneme In phonology and linguistics, a phoneme is a unit of sound that distinguishes one word from another in a particular language. For example, in most dialects of English, with the notable exception of the West Midlands and the north-west of Engl ...
s) of the original as faithfully as possible in the target language. The popular
Hepburn Romanization Hepburn romanization (Japanese: , Hepburn: ) is the most widely-used system of romanization for the Japanese language. Originally published in 1867 by American missionary James Curtis Hepburn as the standard in the first edition of his Japanese ...
of Japanese is an example of a transcriptive romanization designed for English speakers.


Phonetic

A
phonetic Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Phoneticians—linguists who specialize in phonetics—study the physical properties of sp ...
conversion goes one step further and attempts to depict all phones in the source language, sacrificing legibility if necessary by using characters or conventions not found in the target script. In practice such a representation almost never tries to represent ''every'' possible allophone—especially those that occur naturally due to
coarticulationCoarticulation in its general sense refers to a situation in which a conceptually isolated speech sound is influenced by, and becomes more like, a preceding or following speech sound. There are two types of coarticulation: ''anticipatory coarticulati ...
effects—and instead limits itself to the most significant allophonic distinctions. The
International Phonetic Alphabet The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standardized representation of ...
is the most common system of phonetic transcription.


Trade-offs

For most language pairs, building a usable romanization involves trade-offs between the two extremes. Pure transcriptions are generally not possible, as the source language usually contains sounds and distinctions not found in the target language, but which must be shown for the romanized form to be comprehensible. Furthermore, due to
diachronic Synchrony and diachrony are two different and complementary viewpoints in linguistic analysis. A synchronic approach (from Greek συν- "together" and χρόνος "time") considers a language at a moment in time without taking its history into a ...
and
synchronic Synchronic may refer to: *''Synchronic'' (film), a 2019 American science fiction film starring Jamie Dornan and Anthony Mackie *Synchronic analysis, the analysis of a language at a specific point of time *Synchronicity, the experience of two or mor ...
variance no
written language A written language is the representation of a spoken or gestural language by means of a writing system. Written language is an invention in that it must be taught to children, who will pick up spoken language or sign language by exposure even if ...
represents any
spoken language A spoken language is a language produced by articulate sounds, as opposed to a written language. Many languages are only in written form and are not spoken. An oral language or vocal language is a language produced with the vocal tract, as opposed t ...
with perfect accuracy and the vocal interpretation of a
script Script may refer to: Writing systems * Script, a distinctive writing system, based on a repertoire of specific elements or symbols, or that repertoire * Script (styles of handwriting) * Script (Unicode), historical and modern scripts as organise ...
may vary by a great degree among languages. In modern times the chain of transcription is usually spoken foreign language, written foreign language, written native language, spoken (read) native language. Reducing the number of those processes, i.e. removing one or both steps of writing, usually leads to more accurate oral articulations. In general, outside a limited audience of scholars, romanizations tend to lean more towards transcription. As an example, consider the Japanese martial art 柔術: the Nihon-shiki romanization ''zyûzyutu'' may allow someone who knows Japanese to reconstruct the kana syllables , but most native English speakers, or rather readers, would find it easier to guess the pronunciation from the Hepburn version, ''
jūjutsu Jujutsu ( ja, link=no, 柔術 ), also known as jiu-jitsu and ju-jitsu, is a family of Japanese martial arts and a system of close combat (unarmed or with a minor weapon) that can be used in a defensive or offensive manner to kill or subdue one o ...
''.


Romanization of specific writing systems

:''The below listing may be incomplete, see also
:Romanization {{Selfref, For Wikipedia administration pages about romanization in Wikipedia pages see :Wikipedia romanization systems. Transliteration ...
''


Arabic

The
Arabic alphabet The Arabic alphabet ( ar, الْأَبْجَدِيَّة الْعَرَبِيَّة, ' or , ', ), or Arabic abjad, is the Arabic script as it is codified for writing Arabic. It is written from right to left in a cursive style and includes 28 l ...
is used to write
Arabic Arabic (, ' or , ' or ) is a Semitic language that first emerged in the 1st to 4th centuries CE.Semitic languages: an international handbook / edited by Stefan Weninger; in collaboration with Geoffrey Khan, Michael P. Streck, Janet C. E.Wats ...

Arabic
, Persian,
Urdu Urdu (; ur, , ALA-LC: ) is an Indo-Aryan language spoken chiefly in South Asia. It is the official national language and ''lingua franca'' of Pakistan. In India, Urdu is an Eighth Schedule language whose status, function, and cultural heritag ...
and
Pashto Pashto (,; / , ), sometimes spelled Pukhto or Pakhto, is an Eastern Iranian language of the Indo-European family. It is known in Persian literature as Afghani (, ). The language is natively spoken by Pashtuns (also called Pukhtuns/Pakhtun ...
as well as numerous other languages in the Muslim world, particularly African and Asian languages without alphabets of their own. Romanization standards include the following: *
Deutsche Morgenländische Gesellschaft The Deutsche Morgenländische Gesellschaft (, ''German Oriental Society''), abbreviated DMG, is a scholarly organization dedicated to Oriental studies, that is, to the study of the languages and cultures of the Near East and the Far East, the broad ...
(1936): Adopted by the International Convention of Orientalist Scholars in Rome. It is the basis for the very influential Hans Wehr dictionary (). *
BS 4280#REDIRECT BS {{Redirect category shell, 1= {{R from other capitalisation ...
(1968): Developed by the
British Standards Institution The British Standards Institution (BSI) is the national standards body of the United Kingdom. BSI produces technical standards on a wide range of products and services and also supplies certification and standards-related services to businesses. ...
* SATTS (1970s): A one-for-one substitution system, a legacy from the
Morse code Morse code is a method used in telecommunication to encode text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'' (or ''dits'' and ''dahs''). Morse code is named after Samuel Morse, an inven ...
era *
UNGEGN The United Nations Group of Experts on Geographical Names (UNGEGN) is one of the nine expert groups of the United Nations Economic and Social Council (ECOSOC) and deals with the national and international standardization of geographical names. Eve ...
(1972) *
DIN 31635DIN 31635 is a Deutsches Institut für Normung (DIN) standard for the transliteration of the Arabic alphabet adopted in 1982. It is based on the rules of the Deutsche Morgenländische Gesellschaft (DMG) as modified by the International Orientalist Co ...
(1982): Developed by the
Deutsches Institut für Normung ' (DIN; in English, the German Institute for Standardisation) is the German national organization for standardization and is the German ISO member body. DIN is a German Registered Association (''e.V.'') headquartered in Berlin. There are current ...
(German Institute for Standardization) *
ISO 233 The international standard ISO 233 establishes a system for romanization of Arabic and Syriac. It has been supplemented by ISO 233-2 in 1993. 1984 edition The table below shows the consonants for the Arabic language. ISO 233-2:1993 ISO 233-2: ...
(1984). Transliteration. *
Qalam A qalam ( ar, قلم) is a type of pen made from a cut, dried reed, used for Islamic calligraphy. The pen is seen as an important symbol of wisdom in Islam, and references the emphasis on knowledge and education within the Islamic tradition. Etym ...
(1985): A system that focuses upon preserving the spelling, rather than the pronunciation, and uses mixed case * ISO 233-2 (1993): Simplified transliteration. * Buckwalter transliteration (1990s): Developed at
Xerox Xerox Holdings Corporation (; also known simply as Xerox) is an American corporation that sells print and digital document products and services in more than 160 countries. Xerox is headquartered in Norwalk, Connecticut (having moved from Stamfo ...
by Tim Buckwalter; doesn't require unusual
diacritic A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacritic'' is ...
s *
ALA-LC ALA-LC (American Library Association - Library of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script. Applications The system is used to represent bibliographic information b ...
(1997) *
Arabic chat alphabetThe Arabic chat alphabet, ''Arabizi'' (, ), Franco-Arabic (, , or ), Arabish, Araby ( ar, عربي, ), and Mu'arrab (), refer to the Romanized alphabets for informal Arabic dialects in which Arabic script is transcribed or encoded into a combination o ...


Persian


Armenian


Georgian


Greek

There are romanization systems for both
Modern Modern may refer to: History *Modern history ** Early Modern period ** Late Modern period *** 18th century *** 19th century *** 20th century ** Contemporary history * Moderns, a faction of Freemasonry that existed in the 18th century Philosophy a ...
and
Ancient Greek Ancient Greek includes the forms of the Greek language used in ancient Greece and the ancient world from around 1500 BC to 300 BC. It is often roughly divided into the following periods: Mycenaean Greek (), Dark Ages (), the Archaic period ...
. *
ALA-LC ALA-LC (American Library Association - Library of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script. Applications The system is used to represent bibliographic information b ...
*
Beta Code#REDIRECT Beta Code {{rcat shell, {{R from other capitalisation {{R unprintworthy {{R from move ...
* Greeklish *
ISO 843ISO 843 is a system for the transliteration and/or transcription of Greek characters into Latin characters.
(1997)


Hebrew

The
Hebrew alphabet The Hebrew alphabet ( he, אָלֶף־בֵּית עִבְרִי, ), known variously by scholars as the Ktav Ashuri, Jewish script, square script and block script, is an abjad script used in the writing of the Hebrew language and other Jewish lan ...
is romanized using several standards: *
ANSI The American National Standards Institute (ANSI ) is a private non-profit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organizat ...
Z39.25 (1975) *
UNGEGN The United Nations Group of Experts on Geographical Names (UNGEGN) is one of the nine expert groups of the United Nations Economic and Social Council (ECOSOC) and deals with the national and international standardization of geographical names. Eve ...
(1977) *
ISO 259ISO 259 is a series of international standards for the romanization of Hebrew characters into Latin characters, dating to 1984, with updated ISO 259-2 (a simplification, disregarding several vowel signs, 1994) and ISO 259-3 (Phonemic Conversion, 1999 ...
(1984): Transliteration. *
ISO 259-2ISO 259 is a series of international standards for the romanization of Hebrew characters into Latin characters, dating to 1984, with updated ISO 259-2 (a simplification, disregarding several vowel signs, 1994) and ISO 259-3 (Phonemic Conversion, 1999 ...
(1994): Simplified transliteration. * ISO/DIS 259-3: Phonemic transcription. *
ALA-LC ALA-LC (American Library Association - Library of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script. Applications The system is used to represent bibliographic information b ...


Indic (Brahmic) scripts

The
Brahmic family The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia, including Japan in the form of Siddhaṃ. They have descended ...
of
abugida . ''May Śiva protect those who take delight in the language of the gods.'' (Kalidasa) An abugida (, from Ge'ez language, Ge'ez: አቡጊዳ), sometimes known as alphasyllabary, neosyllabary or pseudo-alphabet, is a segmental writing system in ...
s is used for languages of the Indian subcontinent and south-east Asia. There is a long tradition in the west to study
Sanskrit Sanskrit (, attributively , ''saṃskṛta-'', nominally , ''saṃskṛtam'') is a classical language of South Asia belonging to the Indo-Aryan branch of the Indo-European languages. It arose in South Asia after its predecessor languages had d ...
and other Indic texts in Latin transliteration. Various transliteration conventions have been used for Indic scripts since the time of Sir William Jones. *
ISO 15919 ISO 15919 "Transliteration of Devanagari and related Indic scripts into Latin characters" is one of a series of international standards for romanization by the International Organization for Standardization. It was published in 2001 and uses diacr ...
(2001): A standard
transliteration Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus ''trans-'' + ''liter-'') in predictable ways, such as Greek → , Cyrillic → , Greek → the digraph , Armenian → or Lat ...
convention was codified in the ISO 15919 standard. It uses
diacritic A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacritic'' is ...
s to map the much larger set of Brahmic
consonant In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are , pronounced with the lips; , pronounced with the front of the tongue; , pronounced with the back of the ...
s and
vowel A vowel is a syllabic speech sound pronounced without any stricture in the vocal tract. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness and also in quantity (length ...
s to the Latin script. See also Transliteration of Indic scripts: how to use ISO 15919. The Devanagari-specific portion is very similar to the academic standard,
IAST The International Alphabet of Sanskrit Transliteration (IAST) is a transliteration scheme that allows the lossless romanisation of Indic scripts as employed by Sanskrit and related Indic languages. It is based on a scheme that emerged during t ...
: "International Alphabet of Sanskrit Transliteration", and to the United States Library of Congress standard,
ALA-LC ALA-LC (American Library Association - Library of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script. Applications The system is used to represent bibliographic information b ...
, although there are a few differences * The
National Library at Kolkata romanization The National Library at Kolkata romanisationSee p 24-26 for table comparing Indic languages, and p 33-34 for Devanagari alphabet listing. is a widely used transliteration scheme in dictionaries and grammars of Indic languages. This transliteratio ...
, intended for the romanization of all
Indic scripts The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia, including Japan in the form of Siddhaṃ. They have descended ...
, is an extension of
IAST The International Alphabet of Sanskrit Transliteration (IAST) is a transliteration scheme that allows the lossless romanisation of Indic scripts as employed by Sanskrit and related Indic languages. It is based on a scheme that emerged during t ...
*
Harvard-Kyoto The Harvard-Kyoto Convention is a system for transliterating Sanskrit and other languages that use the Devanāgarī script into ASCII. It is predominantly used informally in e-mail, and for electronic texts. Harvard-Kyoto system Prior to the Un ...
: Uses upper and lower case and doubling of letters, to avoid the use of diacritics, and to restrict the range to 7-bit ASCII. *
ITRANS The "Indian languages TRANSliteration" (ITRANS) is an ASCII transliteration scheme for Indic scripts, particularly for Devanagari script. The need for a simple encoding scheme that used only keys available on an ordinary keyboard was felt in the ...
: a transliteration scheme into 7-bit ASCII created by Avinash Chopde that used to be prevalent on
Usenet Usenet () is a worldwide distributed discussion system available on computers. It was developed from the general-purpose Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Ellis conceived the idea in 1979, and it was est ...
. *
ISCII Indian Script Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Assamese, Bengal (Bangla), Devana ...
(1988)


Devanagari–nastaʿlīq (Hindustani)

Hindustani is an
Indo-Aryan language The Indo-Aryan or Indic languages form a major language family of South Asia. They constitute a branch of the Indo-Iranian languages, themselves a branch of the Indo-European language family. As of the early 21st century more than 800 million p ...
with extreme digraphia and
diglossia In linguistics, diglossia () is a situation in which two dialects or languages are used (in fairly strict compartmentalization) by a single language community. In addition to the community's everyday or vernacular language variety (labeled "L" or ...
resulting from the Hindi–Urdu controversy starting in the 1800s. Technically, Hindustani itself is recognized by neither the language community nor any governments. Two
standardized Standardization or standardisation is the process of implementing and developing technical standards based on the consensus of different parties that include firms, users, interest groups, standards organizations and governments. Standardization c ...
registers,
Standard Hindi Hindi (Devanagari: हिन्दी, IAST/ISO 15919: ''Hindī''), or more precisely Modern Standard Hindi (Devanagari: मानक हिन्दी, IAST/ISO 15919: ''Mānak Hindī''), is an Indo-Aryan language spoken chiefly in India. ...
and Standard Urdu, are recognized as
official language An official language, also called state language, is a language given a special status in a particular country, state, or other jurisdiction. Typically a country's official language refers to the language used in government (judiciary, legislature ...
s in India and Pakistan. However, in practice the situation is, * In Pakistan: Standard (Saaf or Khaalis) Urdu is the "high" variety, whereas Hindustani is the "low" variety used by the masses (called Urdu, written in
nastaʿlīq script Nastaʼlīq (; fa, , ) is one of the main calligraphic hands used in writing the Persian alphabet and the Urdu alphabet, and traditionally the predominant style in Persian calligraphy. It was developed in Iran in the 14th and 15th centuries. ...
). * In India, both Standard (Shuddh) Hindi and Standard (Saaf or Khaalis) Urdu are the "H" varieties (written in
devanagari Devanagari ( ; , , Sanskrit pronunciation: ), also called Nagari (''Nāgarī'', ),Kathleen Kuiper (2010), The Culture of India, New York: The Rosen Publishing Group, , page 83 is a left-to-right abugida (alphasyllabary), based on the ancient ' ...
and nastaʿlīq respectively), whereas Hindustani is the "L" variety used by the masses and written in either devanagari or nastaʿlīq (and called 'Hindi' or 'Urdu' respectively). The digraphia renders any work in either script largely inaccessible to users of the other script, though otherwise Hindustani is a perfectly mutually intelligible language, essentially meaning that any kind of text-based
open source#REDIRECT Open source#REDIRECT Open source {{R from other capitalisation ...
{{R from other capitalisation ...
collaboration is impossible among devanagari and nastaʿlīq readers. Initiated in 2011, the Hamari Boli Initiative is a full-scale open-source
language planning Language planning (also known as language engineering) is a deliberate effort to influence the function, structure or acquisition of languages or language varieties within a speech community.Kaplan B., Robert, and Richard B. Baldauf Jr. ''Langua ...
initiative aimed at Hindustani script, style, status & lexical reform and modernization. One of primary stated objectives of Hamari Boli is to relieve Hindustani of the crippling devanagari–nastaʿlīq digraphia by way of romanization.


Chinese

Romanization of the
Sinitic languages#REDIRECT Sinitic languages#REDIRECT Sinitic languages {{R from other capitalisation ...
{{R from other capitalisation ...
, particularly
Mandarin Mandarin may refer to: * Mandarin (bureaucrat), a bureaucrat of Imperial China (the original meaning of the word) ** by extension, any senior government bureaucrat Language * Mandarin Chinese, branch of Chinese spoken in northern and southwester ...
, has proved a very difficult problem, although the issue is further complicated by political considerations. Because of this, many romanization tables contain Chinese characters plus one or more romanizations or
Zhuyin Zhuyin () or Mandarin Phonetic Symbols, also nicknamed Bopomofo, is a major Chinese transliteration system for Mandarin Chinese and other related languages and dialects which is nowadays most commonly used in Taiwanese Mandarin. It is also us ...
.


Mandarin

*
ALA-LC ALA-LC (American Library Association - Library of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script. Applications The system is used to represent bibliographic information b ...
: Used to be similar to Wade–Giles, but converted to
Hanyu Pinyin ''Hanyu Pinyin'' (), often abbreviated to pinyin, is the official romanization system for Standard Mandarin Chinese in mainland China, Taiwan (ROC), and Singapore. It is often used to teach Standard Mandarin, which is normally written using C ...
in 2000 *
EFEO The French School of the Far East (french: École française d'Extrême-Orient, ), abbreviated EFEO, is an associated college of PSL University dedicated to the study of Asian societies. It was founded in 1900 with headquarters in Hanoi in wha ...
. Developed by
École française d'Extrême-Orient The French School of the Far East (french: École française d'Extrême-Orient, ), abbreviated EFEO, is an associated college of PSL University dedicated to the study of Asian societies. It was founded in 1900 with headquarters in Hanoi in wha ...
in the 19th century, used mainly in France. *
Latinxua Sin Wenz Latinxua Sin Wenz (; also known as Sin Wenz "New Script", Zhungguo Latinxua Sin Wenz "China Latinized New Script", Latinxua "Latinization") is a historical set of romanizations for Chinese languages, although references to Sin Wenz usually refer ...
(1926): Omitted tone sounds. Used mainly in the
Soviet Union The Soviet Union,. officially the Union of Soviet Socialist Republics. (USSR),. was a federal socialist state in Northern Eurasia that existed from 1922 to 1991. Nominally a union of multiple national Soviet republics, in practice its governmen ...
and
Xinjiang Xinjiang, SASM/GNC: ''Xinjang''; zh, c=, p=Xīnjiāng; alternately romanized as Sinkiang; officially Xinjiang Uygur Autonomous Region (XUAR) is an autonomous region of the People's Republic of China (PRC), located in the northwest of the countr ...
in the 1930s. Predecessor of
Hanyu Pinyin ''Hanyu Pinyin'' (), often abbreviated to pinyin, is the official romanization system for Standard Mandarin Chinese in mainland China, Taiwan (ROC), and Singapore. It is often used to teach Standard Mandarin, which is normally written using C ...
. *
Lessing-Othmer Lessing-Othmer is a romanization of Mandarin Chinese that was once utilized by Germans written by F. Lessing and Dr. W. Othmer, who in 1912 printed their book „Lehrgang der nordchinesischen Umgangssprache“ (Course in the North Chinese Colloquia ...
: Used mainly in Germany. *
Postal romanization Postal romanization was a system of transliterating Chinese place names developed by postal authorities in the late 19th and early 20th centuries. For many cities, the postal romanization was the most common English-language form of the city's nam ...
(1906): Early standard for international addresses *
Wade–Giles Wade–Giles () is a romanization system for Mandarin Chinese. It developed from a system produced by Thomas Francis Wade, during the mid-19th century, and was given completed form with Herbert A. Giles's ''Chinese–English Dictionary'' of 18 ...
(1892): Transliteration. Very popular from the 19th century until recently and continues to be used by some Western academics. *
Yale Yale University is a private Ivy League research university in New Haven, Connecticut. Founded in 1701 as the Collegiate School, it is the third-oldest institution of higher education in the United States and one of the nine Colonial Colleges ...
(1942): Created by the U.S. for battlefield communication and used in the influential Yale textbooks. *
Legge romanization Legge romanization is a transcription system for Mandarin Chinese, used by the prolific 19th century sinologist James Legge. It was replaced by the Wade–Giles system, which itself has been mostly supplanted by Pinyin. The Legge system is still t ...
: Created by
James Legge James Legge (; 20 December 181529 November 1897) was a Scottish sinologist, missionary, and scholar, best known as an early and prolific translator of Classical Chinese texts into English. Legge served as a representative of the London Missiona ...
, a Scottish missionary.


=Mainland China

= *
Hanyu Pinyin ''Hanyu Pinyin'' (), often abbreviated to pinyin, is the official romanization system for Standard Mandarin Chinese in mainland China, Taiwan (ROC), and Singapore. It is often used to teach Standard Mandarin, which is normally written using C ...
(1958): In
mainland China Mainland China, also known as the Chinese mainland, China mainland, or the Mainland Area of the Republic of China is the geopolitical area under the direct jurisdiction of the People's Republic of China (PRC) since October 1, 1949. It include ...
, Hanyu Pinyin has been used officially to romanize
Mandarin Mandarin may refer to: * Mandarin (bureaucrat), a bureaucrat of Imperial China (the original meaning of the word) ** by extension, any senior government bureaucrat Language * Mandarin Chinese, branch of Chinese spoken in northern and southwester ...
for decades, primarily as a linguistic tool for teaching the standardized language. The system is also used in other Chinese-speaking areas such as
Singapore Singapore (), officially the Republic of Singapore, is a sovereign island city-state in maritime Southeast Asia. It lies about one degree of latitude () north of the equator, off the southern tip of the Malay Peninsula, bordering the Straits ...
and parts of
Taiwan Taiwan (), officially the Republic of China (ROC), is a country in East Asia. Neighbouring countries include the People's Republic of China (PRC) to the northwest, Japan to the northeast, and the Philippines to the south. The main island of Ta ...
, and has been adopted by much of the international community as a standard for writing Chinese words and names in the Latin script. The value of Hanyu Pinyin in education in China lies in the fact that China, like any other populated area with comparable area and population, has numerous distinct
dialects The term dialect (from Latin , , from the Ancient Greek word , , "discourse", from , , "through" and , , "I speak") is used in two distinct ways to refer to two different types of linguistic phenomena: * One usage refers to a variety of a languag ...
, though there is just one common written language and one common standardized spoken form. (These comments apply to romanization in general) * ISO 7098 (1991): Based on Hanyu Pinyin.


=Taiwan

= #
Gwoyeu Romatzyh Gwoyeu Romatzyh (, literally "National Language Romanization"), abbreviated GR, is a system for writing Mandarin Chinese in the Latin alphabet. The system was conceived by Yuen Ren Chao and developed by a group of linguists including Chao and Lin ...
(GR, 1928–1986, in Taiwan 1945–1986; Taiwan used Japanese Romaji before 1945), #
Mandarin Phonetic Symbols II Mandarin Phonetic Symbols II ( zh, t= ), abbreviated MPS II, is a romanization system formerly used in the Republic of China (Taiwan). It was created to replace the complex tonal-spelling Gwoyeu Romatzyh, and to co-exist with the popular Wade ...
(MPS II, 1986–2002), #
Tongyong Pinyin Tongyong Pinyin () was the official romanization of Mandarin in Taiwan between 2002 and 2008. The system was unofficially used between 2000 and 2002, when a new romanization system for Taiwan was being evaluated for adoption. Taiwan's Ministry of E ...
(2002–2008), and #
Hanyu Pinyin ''Hanyu Pinyin'' (), often abbreviated to pinyin, is the official romanization system for Standard Mandarin Chinese in mainland China, Taiwan (ROC), and Singapore. It is often used to teach Standard Mandarin, which is normally written using C ...
(since January 1, 2009).


=Singapore

=


Cantonese

* Barnett–Chao *
Guangdong Guangdong (alternately romanized as Canton Province or Kwangtung) is a coastal province in South China on the north shore of the South China Sea. The capital of the province is Guangzhou. With a population of 113.46 million (as of 2018) a ...
(1960) *
Hong Kong Government The Government of the Hong Kong Special Administrative Region , commonly known as the Hong Kong Government or HKSAR Government, refers to the executive authorities of Hong Kong SAR. It was formed in July 1997 in accordance with the Sino-British ...
*
Jyutping Jyutping is a romanisation system for Cantonese developed by the Linguistic Society of Hong Kong (LSHK), an academic group, in 1993. Its formal name is the Linguistic Society of Hong Kong Cantonese Romanization Scheme. The LSHK advocates for an ...
* Meyer–Wempe * Sidney Lau *
Yale Yale University is a private Ivy League research university in New Haven, Connecticut. Founded in 1701 as the Collegiate School, it is the third-oldest institution of higher education in the United States and one of the nine Colonial Colleges ...
(1942) *
Cantonese Pinyin Cantonese Pinyin (, also known as ) is a romanization system for Cantonese developed by Rev. Yu Ping Chiu (余秉昭) in 1971, and subsequently modified by the Education Department (merged into the Education and Manpower Bureau since 2003) of Ho ...


Min Nan or Hokkien

*
Pe̍h-ōe-jī ''Pe̍h-ōe-jī'' (, abbreviated POJ, literally ''vernacular writing'', also known as Church Romanization) is an orthography used to write variants of Southern Min Chinese, particularly Taiwanese Hokkien and Amoy Hokkien. Developed by Western mis ...
(POJ), once the ''de facto'' official script of the
Presbyterian Church in Taiwan The Presbyterian Church in Taiwan (PCT; ; ) is the largest Protestant Christian denomination based in Taiwan. The PCT is a member of the World Council of Churches, and its flag features a "Burning Bush," which signifies the concept of burning yet n ...
(since the late 19th century). Technically this represented a largely phonemic transcription system, as Min Nan was not commonly written in Chinese. ** ''See also Comparison of Hokkien writing systems.''


=Teochew

= * Guangdong Romanization#Teochew, Guangdong (1960), for the distinct Teochew dialect, Teochew variety.


Min Dong

* Foochow Romanized


Min Bei

* Kienning Colloquial Romanized


Japanese

Romanization (or, more generally, Roman letters) is called "rōmaji" in
Japanese Japanese may refer to: * Something from or related to Japan, an island country in East Asia * Japanese language, spoken mainly in Japan * Japanese people, the ethnic group that identifies with Japan through culture or ancestry ** Japanese diaspora ...

Japanese
. The most common systems are: * Hepburn romanization, Hepburn (1867): phonetic transcription to Anglo-American practices, used in geographical names * Nihon-shiki romanization, Nihon-shiki (1885): transliteration. Also adopted as (ISO 3602 Strict) in 1989. * Kunrei-shiki romanization, Kunrei-shiki (1937): phonemic transcription. Also adopted as (ISO 3602). * JSL romanization, JSL (1987): phonemic transcription. Named after the book ''Japanese: The Spoken Language'' by Eleanor Jorden. *
ALA-LC ALA-LC (American Library Association - Library of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script. Applications The system is used to represent bibliographic information b ...
: Similar to Modified Hepburn * Wāpuro rōmaji, Wāpuro: ("word processor romanization") transliteration. Not strictly a system, but a collection of common practices that enables input of Japanese text.


Korean

While romanization has taken various and at times seemingly unstructured forms, some sets of rules do exist: * McCune–Reischauer (MR; 1937?), the first transcription to gain some acceptance. A slightly changed version of MR was the official system for Korean language, Korean in South Korea from 1984 to 2000, and yet a different modification is still the official system in North Korea. Uses breves, apostrophes and Diaeresis (diacritic), diereses, the latter two indicating orthographic syllable boundaries in cases that would otherwise be ambiguous.
What is called MR may in many cases be any of a number of systems that differ from each other and from the original MR mostly in whether word endings are separated from the stem by a space, a hyphen or – according to McCune's and Reischauer's system – not at all; and if a hyphen or space is used, whether sound change is reflected in a stem's last and an ending's first consonant letter (e.g. ''pur-i'' vs. ''pul-i''). Although mostly irrelevant when transcribing uninflected words, these aberrations are so widespread that any mention of "McCune-Reischauer romanization" may not necessarily refer to the original system as published in the 1930s. ** There is, for example, the
ALA-LC ALA-LC (American Library Association - Library of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script. Applications The system is used to represent bibliographic information b ...
/ U.S. Library of Congress system, based on MR but with some deviations. Word division is addressed in detail, with a generous use of spaces to separate word endings from stems that is not seen in MR. Syllables of given names are always separated with a hyphen, which is expressly never done by MR. Sound changes are ignored more often than in MR. Distinguishes between ‘ and ’. Several problems with MR led to the development of the newer systems: * Yale romanization of Korean, Yale (1942): This system has become the established standard romanization for Korean among linguistics, linguists. Vowel length in old or dialectal pronunciation is indicated by a Macron (diacritic), macron. In cases that would otherwise be ambiguous, orthographic syllable boundaries are indicated with a period. This system also indicates consonants that have disappeared from a word's Hangul orthography, South Korean orthography and standard pronunciation. * Revised Romanization of Korean (RR; 2000): Includes rules both for transcription and for transliteration. South Korea now officially uses this system that was approved in 2000. Road signs and textbooks were required to follow these rules as soon as possible, at a cost estimated by the government to be at least US$20 million. All road signs, names of railway and subway stations on line maps and signs etc. have been changed. The change has been either ignored or grandfather clause, grandfathered in some cases, notably the romanization of names and existing companies. RR is generally similar to MR, but uses no diacritics or apostrophes, and uses distinct letters for ㅌ/ㄷ (t/d), ㅋ/ㄱ (k/g), ㅊ/ㅈ (ch/j) and ㅍ/ㅂ (p/b). In cases of ambiguity, orthographic syllable boundaries were intended to be indicated with a hyphen, but this is inconsistently applied in practice. * ISO/TR 11941 (1996): This actually is two different standards under one name: one for North Korea (DPRK) and the other for South Korea (ROK). The initial submission to the ISO was based heavily on Yale and was a joint effort between both states, but they could not agree on the final draft. * Fred Lukoff, Lukoff romanization, developed 1945–47 for his ''Spoken Korean'' coursebooks


Philippine languages

Almost all Languages of the Philippines (including Tagalog language, Tagalog, Ilokano language, Ilokano, the Bicol languages, Cebuano language, Cebuano and other Visayan languages, Kapampangan language, Kapampangan, and the Spanish-based creole languages, Spanish-based Creole language, creole Chavacano language, Chavacano) use the Filipino alphabet, Modern Filipino Alphabet. When Spain colonised the Philippines in the late 16th century, the numerous languages of the Philippines were written in various scripts, such as Baybayin. These were initially promoted by the colonists but later replaced by Spanish transcriptions, which are still evident in place names and surnames. Letters such as C, Ll, and Ñ were considered Hispanic additions and removed in the ''Abakada'', an attempt at a more indigenous alphabet devised by Lope K. Santos in 1940. These were eventually superseded by the Pilipino Alphabet and by the 28-letter, Modern Filipino Alphabet, which adds Ñ and the native Ng (digraph), Ng to the standard, 26-letter
Latin alphabet The Latin alphabet or Roman alphabet is the collection of letters originally used by the ancient Romans to write the Latin language and its extensions used to write modern languages. Etymology The term ''Latin alphabet'' may refer to either ...
. While the Spanish language itself uses a very phonemic spelling, the romanised spelling created for Philippine languages is even more so. For example, the Spanish ''caballo'' ([kaˈβa.ʎo], "horse"), the same word in Tagalog is ''kabayo'' (demonstrating yeismo in the pronunciation of the Spanish "Ll" digraph).


Thai

Thai language, Thai, spoken in Thailand and some areas of Laos, Burma and China, is written with Thai alphabet, its own script, probably descended from mixture of Tai–Laotian and Old Khmer, in the
Brahmic family The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia, including Japan in the form of Siddhaṃ. They have descended ...
. * Royal Thai General System of Transcription * ISO 11940 1998 Transliteration * ISO 11940-2 2007 Transcription *
ALA-LC ALA-LC (American Library Association - Library of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script. Applications The system is used to represent bibliographic information b ...


Cyrillic

In English language library catalogues, bibliographies, and most academic publications, the ALA-LC romanization for Russian, Library of Congress transliteration method is used worldwide. In linguistics, scientific transliteration is used for both Cyrillic script, Cyrillic and Glagolitic alphabets. This applies to Old Church Slavonic, as well as modern Slavic languages that use these alphabets.


Belarusian

* BGN/PCGN romanization of Belarusian, 1979 (United States Board on Geographic Names and Permanent Committee on Geographical Names for British Official Use) * Scientific transliteration, or the ''International Scholarly System'' for
linguistics Linguistics is the scientific study of language. It encompasses the analysis of every aspect of language, as well as the methods for studying and modeling them. The traditional areas of linguistic analysis include phonetics, phonology, morp ...
* ALA-LC romanization, 1997 (American Library Association and Library of Congress): * ISO 9:1995 * ''Instruction on transliteration of Belarusian geographical names with letters of Latin script'', 2000


Bulgarian

A system based on scientific transliteration and ISO/R 9:1968 was considered official in Bulgaria since the 1970s. Since the late 1990s, Bulgarian authorities have switched to the so-called Romanization of Bulgarian#Streamlined System, Streamlined System avoiding the use of diacritics and optimized for compatibility with English. This system became mandatory for public use with a law passed in 2009. Where the old system uses <č,š,ž,št,c,j,ă>, the new system uses . The new Bulgarian system was endorsed for official use also by UN in 2012, and by United States Board on Geographic Names, BGN and Permanent Committee on Geographical Names for British Official Use, PCGN in 2013.


Kyrgyz


Macedonian


Russian

There is no single universally accepted system of writing Russian language, Russian using the Latin script—in fact there are a huge number of such systems: some are adjusted for a particular target language (e.g. German or French), some are designed as a librarian's transliteration, some are prescribed for Russian travellers' passports; the transcription of some names is purely traditional.   All this has resulted in great reduplication of names.   E.g. the name of the Russian composer Pyotr Ilyich Tchaikovsky, Tchaikovsky may also be written as ''Tchaykovsky'', ''Tchajkovskij'', ''Tchaikowski'', ''Tschaikowski'', ''Czajkowski'', ''Čajkovskij'', ''Čajkovski'', ''Chajkovskij'', ''Çaykovski'', ''Chaykovsky'', ''Chaykovskiy'', ''Chaikovski'', ''Tshaikovski'', ''Tšaikovski'', ''Tsjajkovskij'' etc. Systems include: * BGN/PCGN romanization, BGN/PCGN (1947): Transliteration system (United States Board on Geographic Names & Permanent Committee on Geographical Names for British Official Use). * GOST 16876-71 (1971): A now defunct Soviet transliteration standard. Replaced by GOST 7.79, which is an ISO 9 equivalent. * United Nations romanization system for geographical names (1987): Based on GOST 16876-71. * ISO 9 (1995): Transliteration. From the International Organization for Standardization. *
ALA-LC ALA-LC (American Library Association - Library of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script. Applications The system is used to represent bibliographic information b ...
(1997) * Volapuk encoding, "Volapuk" encoding (1990s): Slang term (it's not really Volapük) for a writing method that's not truly a transliteration, but used for similar goals (see article). * Conventional English transliteration is based to BGN/PCGN, but doesn't follow a particular standard. Described in detail at Romanization of Russian. * Streamlined System for the romanization of Russian. * Comparative transliteration of Russian in different languages (Western European, Arabic, Georgian, Braille, Morse)


Syriac

The Latin script for Syriac was developed in the 1930s, following the state policy for minority languages of the
Soviet Union The Soviet Union,. officially the Union of Soviet Socialist Republics. (USSR),. was a federal socialist state in Northern Eurasia that existed from 1922 to 1991. Nominally a union of multiple national Soviet republics, in practice its governmen ...
, with some material published.


Ukrainian

The 2010 Ukrainian National system has been adopted by the UNGEGN in 2012 and by the BGN/PCGN in 2020. It is also very close to the modified (simplified) ALA-LC system, which has remained unchanged since 1941. *
ALA-LC ALA-LC (American Library Association - Library of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script. Applications The system is used to represent bibliographic information b ...
* ISO 9 * Ukrainian National transliteration * Ukrainian National and BGN/PCGN systems, at the UN Working Group on Romanization Systems * Thomas T. Pedersen's comparison of five systems ''See also:'' Ukrainian Latin alphabet


Overview and summary

The chart below shows the most common phonemic transcription romanization used for several different alphabets. While it is sufficient for many casual users, there are multiple alternatives used for each alphabet, and many exceptions. For details, consult each of the language sections above. (Hangul characters are broken down into Hangul consonant and vowel tables, jamo components.)


See also

* Anglicisation * Cyrillization, expression of a language in Cyrillic letters * Francization * Gairaigo * Latinisation of names * Spread of the Latin script


References


External links

About Romanization:
IPA for Urdu and Roman Urdu for Mobile and Internet Users (Download)

Microsoft Transliteration Utility
nbsp;– A tool for creating, debugging and using transliteration modules from any script to any other script. * Randall Barry (ed.) ''ALA-LC Romanization Tables'' U.S. Library of Congress, 1997, . (One of the few printed books with lists of romanizations)

in PDF format
UNGEGN Working Group on Romanization Systems


Romanization Online:
Chinese Phonetic Conversion Tool
nbsp;– Converts between Pinyin and other formats
Cyrillic Transliteration and Transcription ONLINE (Cyrillic -> Latin)

eiktub
nbsp;– An Arabic Transliteration Pad
Lingua::Translit
nbsp;– Perl module covering a variety of writing systems e.g. Cyrillic or Greek. Provides a lot of standards as well as common transliteration schemes.
Arabeasy
nbsp;– Arabic Transliteration (free chrome extension exists, also works for Persian, Urdu)

nbsp;– Russian Transliteration (free chrome extension exists) {{Romanization Romanization, Latin script Multilingual orthographies Orthography