
In
linguistics
Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
, romanization is the conversion of text from a different
writing system
A writing system comprises a set of symbols, called a ''script'', as well as the rules by which the script represents a particular language. The earliest writing appeared during the late 4th millennium BC. Throughout history, each independen ...
to the
Roman (Latin) script, or a system for doing so. Methods of romanization include
transliteration
Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus '' trans-'' + '' liter-'') in predictable ways, such as Greek → and → the digraph , Cyrillic → , Armenian → or L ...
, for representing written text, and
transcription, for representing the spoken word, and combinations of both. Transcription methods can be subdivided into ''
phonemic
A phoneme () is any set of similar speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible phonetic unit—that helps distinguish one word from another. All languages con ...
transcription'', which records the
phoneme
A phoneme () is any set of similar Phone (phonetics), speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible Phonetics, phonetic unit—that helps distinguish one word fr ...
s or units of
semantic
Semantics is the study of linguistic Meaning (philosophy), meaning. It examines what meaning is, how words get their meaning, and how the meaning of a complex expression depends on its parts. Part of this process involves the distinction betwee ...
meaning in speech, and more strict ''
phonetic transcription
Phonetic transcription (also known as Phonetic script or Phonetic notation) is the visual representation of speech sounds (or ''phonetics'') by means of symbols. The most common type of phonetic transcription uses a phonetic alphabet, such as the ...
'', which records speech sounds with precision.
Methods
There are many consistent or
standardized
Standardization (American English) or standardisation (British English) is the process of implementing and developing technical standards based on the consensus of different parties that include firms, users, interest groups, standards organiza ...
romanization systems. They can be classified by their characteristics. A particular system's characteristics may make it better-suited for various, sometimes contradictory applications, including document retrieval, linguistic analysis, easy readability, faithful representation of pronunciation.
* Source, or donor language – A system may be tailored to romanize text from a particular language, or a series of languages, or for any language in a particular writing system. A language-specific system typically preserves language features like pronunciation, while the general one may be better for cataloguing international texts.
* Target, or receiver language – Most systems are intended for an audience that speaks or reads a particular language. (So-called ''international'' romanization systems for Cyrillic text are based on central-European alphabets like the
Czech
Czech may refer to:
* Anything from or related to the Czech Republic, a country in Europe
** Czech language
** Czechs, the people of the area
** Czech culture
** Czech cuisine
* One of three mythical brothers, Lech, Czech, and Rus
*Czech (surnam ...
and
Croatian alphabet
Croatian may refer to:
*Croatia
*Croatian language
*Croatian people
*Croatians (demonym)
See also
*
*
* Croatan (disambiguation)
* Croatia (disambiguation)
* Croatoan (disambiguation)
* Hrvatski (disambiguation)
* Hrvatsko (disambiguation)
...
.)
* Simplicity – Since the basic
Latin alphabet
The Latin alphabet, also known as the Roman alphabet, is the collection of letters originally used by the Ancient Rome, ancient Romans to write the Latin language. Largely unaltered except several letters splitting—i.e. from , and from � ...
has a smaller number of letters than many other writing systems,
digraphs,
diacritics
A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...
, or special characters must be used to represent them all in Latin script. This affects the ease of creation, digital storage and transmission, reproduction, and reading of the romanized text.
* Reversibility – Whether or not the original can be restored from the converted text. Some reversible systems allow for an irreversible simplified version.
Transliteration
If the romanization attempts to transliterate the original script, the guiding principle is a one-to-one mapping of characters in the source language into the target script, with less emphasis on how the result sounds when pronounced according to the reader's language. For example, the
Nihon-shiki
, romanized as in the system itself, is a romanization system for transliterating the Japanese language into the Latin alphabet. Among the major romanization systems for Japanese, it is the most regular one and has an almost one-to-one rel ...
romanization of
Japanese allows the informed reader to reconstruct the original Japanese
kana
are syllabary, syllabaries used to write Japanese phonology, Japanese phonological units, Mora (linguistics), morae. In current usage, ''kana'' most commonly refers to ''hiragana'' and ''katakana''. It can also refer to their ancestor , wh ...
syllables with 100% accuracy, but requires additional knowledge for correct pronunciation.
Transcription
Phonemic
Most romanizations are intended to enable the casual reader who is unfamiliar with the original script to pronounce the source language reasonably accurately. Such romanizations follow the principle of
phonemic transcription and attempt to render the significant sounds (
phoneme
A phoneme () is any set of similar Phone (phonetics), speech sounds that are perceptually regarded by the speakers of a language as a single basic sound—a smallest possible Phonetics, phonetic unit—that helps distinguish one word fr ...
s) of the original as faithfully as possible in the target language. The popular
Hepburn Romanization
is the main system of Romanization of Japanese, romanization for the Japanese language. The system was originally published in 1867 by American Christian missionary and physician James Curtis Hepburn as the standard in the first edition of h ...
of Japanese is an example of a transcriptive romanization designed for English speakers.
Phonetic
A
phonetic
Phonetics is a branch of linguistics that studies how humans produce and perceive sounds or, in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians ...
conversion goes one step further and attempts to depict all
phones in the source language, sacrificing legibility if necessary by using characters or conventions not found in the target script. In practice such a representation almost never tries to represent ''every'' possible allophone—especially those that occur naturally due to
coarticulation effects—and instead limits itself to the most significant allophonic distinctions. The
International Phonetic Alphabet
The International Phonetic Alphabet (IPA) is an alphabetic system of phonetic notation based primarily on the Latin script. It was devised by the International Phonetic Association in the late 19th century as a standard written representation ...
is the most common system of phonetic transcription.
Compromise
For most language pairs, building a usable romanization involves a trade-off between the two extremes. Pure transcriptions are generally not possible, as the source language usually contains sounds and distinctions not found in the target language, but which must be shown for the romanized form to be comprehensible. Furthermore, due to
diachronic and
synchronic variance no
written language
A written language is the representation of a language by means of writing. This involves the use of visual symbols, known as graphemes, to represent linguistic units such as phonemes, syllables, morphemes, or words. However, written language is ...
represents any
spoken language
A spoken language is a form of communication produced through articulate sounds or, in some cases, through manual gestures, as opposed to written language. Oral or vocal languages are those produced using the vocal tract, whereas sign languages ar ...
with perfect accuracy and the vocal interpretation of a
script may vary by a great degree among languages. In modern times the chain of transcription is usually spoken foreign language, written foreign language, written native language, spoken (read) native language. Reducing the number of those processes, i.e. removing one or both steps of writing, usually leads to more accurate oral articulations. In general, outside a limited audience of scholars, romanizations tend to lean more towards transcription. As an example, consider the Japanese martial art 柔術: the Nihon-shiki romanization ''zyûzyutu'' may allow someone who knows Japanese to reconstruct the kana syllables , but most native English speakers, or rather readers, would find it easier to guess the pronunciation from the Hepburn version, ''
jūjutsu
Jujutsu ( , or ), also known as jiu-jitsu and ju-jitsu (both ), is a Japanese martial art and a system of close combat that can be used in a defensive or offensive manner to kill or subdue one or more weaponless or armed and armored opponent ...
''.
Romanization of specific writing systems
Arabic
The
Arabic script
The Arabic script is the writing system used for Arabic (Arabic alphabet) and several other languages of Asia and Africa. It is the second-most widely used alphabetic writing system in the world (after the Latin script), the second-most widel ...
is used to write
Arabic
Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
,
Persian,
Urdu
Urdu (; , , ) is an Indo-Aryan languages, Indo-Aryan language spoken chiefly in South Asia. It is the Languages of Pakistan, national language and ''lingua franca'' of Pakistan. In India, it is an Eighth Schedule to the Constitution of Indi ...
,
Pashto
Pashto ( , ; , ) is an eastern Iranian language in the Indo-European language family, natively spoken in northwestern Pakistan and southern and eastern Afghanistan. It has official status in Afghanistan and the Pakistani province of Khyb ...
and
Sindhi as well as numerous other languages in the Muslim world, particularly
African and
Asian languages without alphabets of their own. Romanization standards include the following:
Arabic
* (1936): Adopted by the International Convention of Orientalist Scholars in Rome. It is the basis for the very influential
Hans Wehr dictionary ().
*
BS 4280 (1968): Developed by the
British Standards Institution
The British Standards Institution (BSI) is the Standards organization, national standards body of the United Kingdom. BSI produces technical standards on a wide range of products and services and also supplies standards certification services ...
*
SATTS (1970s): A one-for-one substitution system, a legacy from the
Morse code
Morse code is a telecommunications method which Character encoding, encodes Written language, text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code i ...
era
*
UNGEGN
The United Nations Group of Experts on Geographical Names (UNGEGN) is one of the nine expert groups of the United Nations Economic and Social Council (ECOSOC) and deals with the national and international standardization of geographical names. ...
(1972)
*
DIN 31635
DIN 31635 is a (DIN) standard for the transliteration of the Arabic alphabet adopted in 1982. It is based on the rules of the (DMG) as modified by the International Orientalist Congress 1935 in Rome. The most important differences from English-ba ...
(1982): Developed by the (German Institute for Standardization)
*
ISO 233
The international standard ISO 233 establishes a system for romanization of Arabic script. It was supplemented by ISO 233-2 in 1993 which is specific for Arabic language.
1984 edition
The table below shows the consonants for the Arabic langua ...
(1984). Transliteration.
*
Qalam (1985): A system that focuses upon preserving the spelling, rather than the pronunciation, and uses mixed case
*
ISO 233-2 (1993): Simplified transliteration.
*
Buckwalter transliteration (1990s): Developed at
Xerox
Xerox Holdings Corporation (, ) is an American corporation that sells print and electronic document, digital document products and services in more than 160 countries. Xerox was the pioneer of the photocopier market, beginning with the introduc ...
by
Tim Buckwalter; does not require unusual
diacritic
A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...
s
*
ALA-LC
ALA-LC (American Library AssociationLibrary of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.
Applications
The system is used to represent bibliographic information by ...
(1997)
*
Arabic chat alphabet
The Arabic chat alphabet, also known as ''Arabizi'', ''Arabeezi'', ''Arabish'', Franco-Arabic or simply Franco (from ) refer to the romanized alphabets for informal Arabic dialects in which Arabic script is transcribed or encoded into a combinati ...
Persian
Notes:
Armenian
Georgian
Notes:
Greek
There are romanization systems for both
Modern and
Ancient Greek
Ancient Greek (, ; ) includes the forms of the Greek language used in ancient Greece and the classical antiquity, ancient world from around 1500 BC to 300 BC. It is often roughly divided into the following periods: Mycenaean Greek (), Greek ...
.
*
ALA-LC
ALA-LC (American Library AssociationLibrary of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.
Applications
The system is used to represent bibliographic information by ...
*
Beta Code
*
Greeklish
*
ISO 843 (1997)
Hebrew
The
Hebrew alphabet
The Hebrew alphabet (, ), known variously by scholars as the Ktav Ashuri, Jewish script, square script and block script, is a unicase, unicameral abjad script used in the writing of the Hebrew language and other Jewish languages, most notably ...
is romanized using several standards:
*
ANSI
The American National Standards Institute (ANSI ) is a private nonprofit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organiz ...
Z39.25 (1975)
*
UNGEGN
The United Nations Group of Experts on Geographical Names (UNGEGN) is one of the nine expert groups of the United Nations Economic and Social Council (ECOSOC) and deals with the national and international standardization of geographical names. ...
(1977)
*
ISO 259
ISO 259 is a series of international standards for the romanization of Hebrew characters into Latin characters, dating to 1984, with updated ISO 259-2 (a simplification, disregarding several vowel signs, 1994) and ISO 259-3 ( Phonemic Conversion, ...
(1984): Transliteration.
*
ISO 259-2 (1994): Simplified transliteration.
* ISO/DIS 259-3: Phonemic transcription.
*
ALA-LC
ALA-LC (American Library AssociationLibrary of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.
Applications
The system is used to represent bibliographic information by ...
Indic (Brahmic) scripts
The
Brahmic family
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout South Asia, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used b ...
of
abugida
An abugida (; from Geʽez: , )sometimes also called alphasyllabary, neosyllabary, or pseudo-alphabetis a segmental Writing systems#Segmental writing system, writing system in which consonant–vowel sequences are written as units; each unit ...
s is used for languages of the Indian subcontinent and south-east Asia. There is a long tradition in the west to study
Sanskrit
Sanskrit (; stem form ; nominal singular , ,) is a classical language belonging to the Indo-Aryan languages, Indo-Aryan branch of the Indo-European languages. It arose in northwest South Asia after its predecessor languages had Trans-cultural ...
and other Indic texts in Latin transliteration. Various transliteration conventions have been used for Indic scripts since the time of Sir William Jones.
*
ISO 15919
ISO 15919 is an international standard for the romanization of Indic scripts. Published in 2001, it is part of a series of romanization standards by the International Organization for Standardization.
Overview
Relation to other systems
...
(2001): A standard
transliteration
Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus '' trans-'' + '' liter-'') in predictable ways, such as Greek → and → the digraph , Cyrillic → , Armenian → or L ...
convention was codified in the ISO 15919 standard. It uses
diacritic
A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...
s to map the much larger set of Brahmic
consonant
In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract, except for the h sound, which is pronounced without any stricture in the vocal tract. Examples are and pronou ...
s and
vowel
A vowel is a speech sound pronounced without any stricture in the vocal tract, forming the nucleus of a syllable. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness a ...
s to the Latin script. The Devanagari-specific portion is very similar to the academic standard,
IAST
The International Alphabet of Sanskrit Transliteration (IAST) is a transliteration scheme that allows the lossless romanisation of Brahmic family, Indic scripts as employed by Sanskrit and related Indic languages. It is based on a scheme that ...
: "International Alphabet of Sanskrit Transliteration", and to the United States Library of Congress standard,
ALA-LC
ALA-LC (American Library AssociationLibrary of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.
Applications
The system is used to represent bibliographic information by ...
, although there are a few differences
* The
National Library at Kolkata romanization
The National Library at Kolkata romanisationSee p 24-26 for table comparing Indic languages,
and p 33-34 for Devanagari alphabet listing. is a widely used transliteration scheme in dictionaries and grammars of Indic languages. This transliter ...
, intended for the romanization of all
Indic scripts
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout South Asia, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used b ...
, is an extension of
IAST
The International Alphabet of Sanskrit Transliteration (IAST) is a transliteration scheme that allows the lossless romanisation of Brahmic family, Indic scripts as employed by Sanskrit and related Indic languages. It is based on a scheme that ...
*
Harvard-Kyoto: Uses upper and lower case and doubling of letters, to avoid the use of diacritics, and to restrict the range to 7-bit ASCII.
*
ITRANS: a transliteration scheme into 7-bit ASCII created by
Avinash Chopde that used to be prevalent on
Usenet
Usenet (), a portmanteau of User's Network, is a worldwide distributed discussion system available on computers. It was developed from the general-purpose UUCP, Unix-to-Unix Copy (UUCP) dial-up network architecture. Tom Truscott and Jim Elli ...
.
*
ISCII
Indian Standard Code for Information Interchange (ISCII) is a coding scheme for representing various writing systems of India. It encodes the main Indic scripts and a Roman transliteration. The supported scripts are: Eastern Nagari, Bengali–Ass ...
(1988)
Devanagari–nastaʿlīq (Hindustani)
Hindustani is an
Indo-Aryan language with extreme
digraphia
In sociolinguistics, digraphia refers to the use of more than one writing system for the same language. Synchronic digraphia is the coexistence of two or more writing systems for the same language, while diachronic digraphia or sequential digr ...
and
diglossia
In linguistics, diglossia ( , ) is where two dialects or languages are used (in fairly strict compartmentalization) by a single language community. In addition to the community's everyday or vernacular language variety (labeled "L" or "low" v ...
resulting from the
Hindi–Urdu controversy
The Hindi–Urdu controversy arose in 19th-century British Raj out of the debate over whether Modern Standard Hindi or Standard Urdu should be chosen as a national language.
Hindi and Urdu are mutually intelligible standard registers of the ...
starting in the 1800s. Technically, Hindustani itself is recognized by neither the language community nor any governments. Two
standardized
Standardization (American English) or standardisation (British English) is the process of implementing and developing technical standards based on the consensus of different parties that include firms, users, interest groups, standards organiza ...
registers,
Standard Hindi
Modern Standard Hindi (, ), commonly referred to as Hindi, is the standardised variety of the Hindustani language written in the Devanagari script. It is an official language of the Government of India, alongside English, and is the ''lin ...
and
Standard Urdu, are recognized as
official language
An official language is defined by the Cambridge English Dictionary as, "the language or one of the languages that is accepted by a country's government, is taught in schools, used in the courts of law, etc." Depending on the decree, establishmen ...
s in India and Pakistan. However, in practice the situation is,
* In Pakistan: Standard (Saaf or Khaalis) Urdu is the "high" variety, whereas Hindustani is the "low" variety used by the masses (called Urdu, written in
nastaʿlīq script
''Nastaliq'' (; ; ), also romanized as ''Nastaʿlīq'' or ''Nastaleeq'' (), is one of the main calligraphic hands used to write Arabic script and is used for some Indo-Iranian languages, predominantly Classical Persian, Kashmiri, Punjabi a ...
).
* In India, both Standard (Shuddh) Hindi and Standard (Saaf or Khaalis) Urdu are the "H" varieties (written in
devanagari
Devanagari ( ; in script: , , ) is an Indic script used in the Indian subcontinent. It is a left-to-right abugida (a type of segmental Writing systems#Segmental systems: alphabets, writing system), based on the ancient ''Brāhmī script, Brā ...
and nastaʿlīq respectively), whereas Hindustani is the "L" variety used by the masses and written in either devanagari or nastaʿlīq (and called 'Hindi' or 'Urdu' respectively).
The digraphia renders any work in either script largely inaccessible to users of the other script, though otherwise Hindustani is a perfectly mutually intelligible language, essentially meaning that any kind of text-based
open source
Open source is source code that is made freely available for possible modification and redistribution. Products include permission to use and view the source code, design documents, or content of the product. The open source model is a decentrali ...
collaboration is impossible among devanagari and nastaʿlīq readers.
Initiated in 2011, the Hamari Boli Initiative is a full-scale open-source
language planning
In sociolinguistics, language planning (also known as language engineering) is a deliberate effort to influence the function, structure or acquisition of languages or language varieties within a speech community.Kaplan B., Robert, and Rich ...
initiative aimed at Hindustani script, style, status & lexical reform and modernization. One of primary stated objectives of Hamari Boli is to relieve Hindustani of the crippling devanagari–nastaʿlīq digraphia by way of romanization.
Chinese
Romanization of the
Sinitic languages
The Sinitic languages (), often synonymous with the Chinese languages, are a language group, group of East Asian analytic languages that constitute a major branch of the Sino-Tibetan language family. It is frequently proposed that there is a p ...
, particularly
Mandarin
Mandarin or The Mandarin may refer to:
Language
* Mandarin Chinese, branch of Chinese originally spoken in northern parts of the country
** Standard Chinese or Modern Standard Mandarin, the official language of China
** Taiwanese Mandarin, Stand ...
, has proved a very difficult problem, although the issue is further complicated by political considerations. Because of this, many romanization tables contain Chinese characters plus one or more romanizations or
Zhuyin.
Mandarin
*
ALA-LC
ALA-LC (American Library AssociationLibrary of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.
Applications
The system is used to represent bibliographic information by ...
: Used to be similar to Wade–Giles, but converted to
Hanyu Pinyin
Hanyu Pinyin, or simply pinyin, officially the Chinese Phonetic Alphabet, is the most common romanization system for Standard Chinese. ''Hanyu'' () literally means ' Han language'—that is, the Chinese language—while ''pinyin'' literally ...
in 2000
*
EFEO. Developed by
École française d'Extrême-Orient in the 19th century, used mainly in France.
*
Latinxua Sin Wenz
Latinxua Sin Wenz () is a historical set of romanizations for Chinese language, Chinese. Promoted as a revolutionary reform to combat illiteracy and replace Chinese characters, Sin Wenz distinctively does not indicate Tone (linguistics), tones, ...
(1926): Omitted tone sounds. Used mainly in the
Soviet Union
The Union of Soviet Socialist Republics. (USSR), commonly known as the Soviet Union, was a List of former transcontinental countries#Since 1700, transcontinental country that spanned much of Eurasia from 1922 until Dissolution of the Soviet ...
and
Xinjiang
Xinjiang,; , SASM/GNC romanization, SASM/GNC: Chinese postal romanization, previously romanized as Sinkiang, officially the Xinjiang Uygur Autonomous Region (XUAR), is an Autonomous regions of China, autonomous region of the China, People' ...
in the 1930s. Predecessor of
Hanyu Pinyin
Hanyu Pinyin, or simply pinyin, officially the Chinese Phonetic Alphabet, is the most common romanization system for Standard Chinese. ''Hanyu'' () literally means ' Han language'—that is, the Chinese language—while ''pinyin'' literally ...
.
*
Lessing-Othmer: Used mainly in Germany.
*
Postal romanization
Postal romanization was a system of transliterating place names in China developed by postal authorities in the late 19th and early 20th centuries. For many cities, the corresponding postal romanization was the most common English-language fo ...
(1906): Early standard for international addresses
*
Wade–Giles
Wade–Giles ( ) is a romanization system for Mandarin Chinese. It developed from the system produced by Thomas Francis Wade during the mid-19th century, and was given completed form with Herbert Giles's '' A Chinese–English Dictionary'' ...
(1892): Transliteration. Very popular from the 19th century until recently and continues to be used by some Western academics.
*
Yale
Yale University is a private Ivy League research university in New Haven, Connecticut, United States. Founded in 1701, Yale is the third-oldest institution of higher education in the United States, and one of the nine colonial colleges ch ...
(1942): Created by the U.S. for battlefield communication and used in the influential Yale textbooks.
*
Legge romanization: Created by
James Legge
James Legge (; 20 December 181529 November 1897) was a Scottish linguist, missionary, sinologist, and translator
who was best known as an early translator of Classical Chinese texts into English. Legge served as a representative of the Lond ...
, a Scottish missionary.
=Mainland China
=
*
Hanyu Pinyin
Hanyu Pinyin, or simply pinyin, officially the Chinese Phonetic Alphabet, is the most common romanization system for Standard Chinese. ''Hanyu'' () literally means ' Han language'—that is, the Chinese language—while ''pinyin'' literally ...
(1958): In
mainland China
"Mainland China", also referred to as "the Chinese mainland", is a Geopolitics, geopolitical term defined as the territory under direct administration of the People's Republic of China (PRC) in the aftermath of the Chinese Civil War. In addit ...
, Hanyu Pinyin has been used officially to romanize
Mandarin
Mandarin or The Mandarin may refer to:
Language
* Mandarin Chinese, branch of Chinese originally spoken in northern parts of the country
** Standard Chinese or Modern Standard Mandarin, the official language of China
** Taiwanese Mandarin, Stand ...
for decades, primarily as a linguistic tool for teaching the standardized language. The system is also used in other Chinese-speaking areas such as
Singapore
Singapore, officially the Republic of Singapore, is an island country and city-state in Southeast Asia. The country's territory comprises one main island, 63 satellite islands and islets, and one outlying islet. It is about one degree ...
and parts of
Taiwan
Taiwan, officially the Republic of China (ROC), is a country in East Asia. The main geography of Taiwan, island of Taiwan, also known as ''Formosa'', lies between the East China Sea, East and South China Seas in the northwestern Pacific Ocea ...
, and has been adopted by much of the international community as a standard for writing Chinese words and names in the Latin script. The value of Hanyu Pinyin in education in China lies in the fact that China, like any other populated area with comparable area and population, has numerous distinct
dialects
A dialect is a variety of language spoken by a particular group of people. This may include dominant and standardized varieties as well as vernacular, unwritten, or non-standardized varieties, such as those used in developing countries or iso ...
, though there is just one common written language and one common standardized spoken form. (These comments apply to romanization in general)
*
ISO 7098 (1991): Based on Hanyu Pinyin.
=Taiwan
=
#
Gwoyeu Romatzyh
Gwoyeu Romatzyh ( ; GR) is a system for writing Standard Chinese using the Latin alphabet. It was primarily conceived by Yuen Ren Chao (1892–1982), who led a group of linguists on the National Languages Committee in refining the system betwe ...
(GR, 1928–1986, in Taiwan 1945–1986; Taiwan used Japanese Romaji before 1945),
#
Mandarin Phonetic Symbols II (MPS II, 1986–2002),
#
Tongyong Pinyin
Tongyong Pinyin was the official romanization of Taiwanese Mandarin, Mandarin in Taiwan between 2002 and 2008. The system was unofficially used between 2000 and 2002, when a new romanization system for Taiwan was being evaluated for adoption. ...
(2002–2008),
and
#
Hanyu Pinyin
Hanyu Pinyin, or simply pinyin, officially the Chinese Phonetic Alphabet, is the most common romanization system for Standard Chinese. ''Hanyu'' () literally means ' Han language'—that is, the Chinese language—while ''pinyin'' literally ...
(since January 1, 2009).
=Singapore
=
Cantonese
*
Barnett–Chao
*
Guangdong
) means "wide" or "vast", and has been associated with the region since the creation of Guang Prefecture in AD 226. The name "''Guang''" ultimately came from Guangxin ( zh, labels=no, first=t, t= , s=广信), an outpost established in Han dynasty ...
(1960)
*
Hong Kong Government
The Government of the Hong Kong Special Administrative Region (commonly known as the Hong Kong Government or HKSAR Government) is the Executive (government), executive authorities of Hong Kong. It was established on 1 July 1997, following the ...
*
Jyutping
The Linguistic Society of Hong Kong Cantonese Romanization Scheme, also known as Jyutping, is a romanisation system for Cantonese developed in 1993 by the Linguistic Society of Hong Kong (LSHK).
The name ''Jyutping'' (itself the Jyutping ro ...
*
Macau Government
*
Meyer–Wempe
*
Sidney Lau
*
Yale
Yale University is a private Ivy League research university in New Haven, Connecticut, United States. Founded in 1701, Yale is the third-oldest institution of higher education in the United States, and one of the nine colonial colleges ch ...
(1942)
*
ILE romanization of Cantonese
Wu
Min Nan or Hokkien
*
Pe̍h-ōe-jī
( ; , , ; POJ), also known as Church Romanization, is an orthography used to write variants of Hokkien Southern Min, particularly Taiwanese Hokkien, Taiwanese and Amoy dialect, Amoy Hokkien, and it is widely employed as one of the writing syst ...
(POJ), once the ''de facto'' official script of the
Presbyterian Church in Taiwan (since the late 19th century). Technically this represented a largely phonemic transcription system, as
Min Nan
Southern Min (), Minnan ( Mandarin pronunciation: ) or Banlam (), is a group of linguistically similar and historically related Chinese languages that form a branch of Min Chinese spoken in Fujian (especially the Minnan region), most of Taiwan ...
was not commonly written in Chinese.
*
Tâi-uân Lô-má-jī Phing-im Hong-àn
=Teochew
=
*
Guangdong
) means "wide" or "vast", and has been associated with the region since the creation of Guang Prefecture in AD 226. The name "''Guang''" ultimately came from Guangxin ( zh, labels=no, first=t, t= , s=广信), an outpost established in Han dynasty ...
(1960), for the distinct
Teochew variety.
Min Dong
*
Foochow Romanized
Fuzhou is the capital of Fujian, China. The city lies between the Min River estuary to the south and the city of Ningde to the north. Together, Fuzhou and Ningde make up the Mindong linguistic and cultural region.
Fuzhou's population was 8 ...
Min Bei
*
Kienning Colloquial Romanized
Japanese
Romanization (or, more generally,
Roman letters) is called "
rōmaji
The romanization of Japanese is the use of Latin script to write the Japanese language. This method of writing is sometimes referred to in Japanese as .
Japanese is normally written in a combination of logogram, logographic characters borrowe ...
" in
Japanese. The most common systems are:
*
Hepburn (1867): phonetic transcription to Anglo-American practices, used in geographical names
*
Nihon-shiki
, romanized as in the system itself, is a romanization system for transliterating the Japanese language into the Latin alphabet. Among the major romanization systems for Japanese, it is the most regular one and has an almost one-to-one rel ...
(1885): transliteration. Also adopted as (
ISO 3602 Strict) in 1989.
*
Kunrei-shiki (1937): phonemic transcription. Also adopted as (
ISO 3602
The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries.
Me ...
).
*
JSL (1987): phonemic transcription. Named after the book ''Japanese: The Spoken Language'' by Eleanor Jorden.
*
ALA-LC
ALA-LC (American Library AssociationLibrary of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.
Applications
The system is used to represent bibliographic information by ...
: Similar to Modified Hepburn
*
Wāpuro: ("word processor romanization") transliteration. Not strictly a system, but a collection of common practices that enables input of Japanese text.
Korean
The following systems are currently the most widely used:
*
McCune–Reischauer
McCune–Reischauer romanization ( ) is a romanization system for the Korean language. It was first published in 1939 by George M. McCune and Edwin O. Reischauer.
According to Reischauer, McCune "persuaded the American Army Map Service to ad ...
("MR"; 1939): Basis for various romanization systems. Almost universally used by international academic journals on
Korean studies
Korean studies is an academic discipline that focuses on the study of Korea, which includes South Korea, North Korea, and diasporic Korean populations. Areas commonly included under this rubric include Korean history, Korean culture, Korea ...
.
**
Romanization of Korean
The romanization of Korean is the use of the Latin script to transcribe the Korean language.
There are multiple romanization systems in common use. The two most prominent systems are McCune–Reischauer (MR) and Revised Romanization (RR). MR ...
(1992): The official romanization in North Korea, with some differences from the original MR.
** The
ALA-LC
ALA-LC (American Library AssociationLibrary of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.
Applications
The system is used to represent bibliographic information by ...
system is based on but deviates from MR.
** South Korea formerly used yet another modified version of MR as its official system from 1984 to 2000.
*
Revised Romanization of Korean
Revised Romanization of Korean () is the official Romanization of Korean, Korean language romanization system in South Korea. It was developed by the National Institute of Korean Language, National Academy of the Korean Language from 1995 and w ...
(2000): South Korea's official romanization system.
*
Yale romanization of Korean (1942): Standard for almost exclusively international
linguists.
Thai
Thai, spoken in
Thailand
Thailand, officially the Kingdom of Thailand and historically known as Siam (the official name until 1939), is a country in Southeast Asia on the Mainland Southeast Asia, Indochinese Peninsula. With a population of almost 66 million, it spa ...
and some areas of Laos, Burma and China, is written with
its own script, probably descended from mixture of Tai–Laotian and
Old Khmer, in the
Brahmic family
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout South Asia, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used b ...
.
*
Royal Thai General System of Transcription
The Royal Thai General System of Transcription (RTGS) is the official system for rendering Thai words in the Latin alphabet. It was published by the Royal Institute of Thailand in early 1917, when Thailand was called Siam.
It is used in roa ...
*
ISO 11940 1998 Transliteration
*
ISO 11940-2 2007 Transcription
*
ALA-LC
ALA-LC (American Library AssociationLibrary of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.
Applications
The system is used to represent bibliographic information by ...
Nuosu
The
Nuosu language, spoken in southern China, is written with its own script, the
Yi script
The Yi scripts (; ) are two scripts used to write the Yi languages; Classical Yi (an ideogram script), and the later Yi syllabary. The script is historically known in Chinese as ''Cuan Wen'' () or ''Wei Shu'' () and various other names (), amon ...
. The only existing romanisation system is
YYPY (Yi Yu Pin Yin), which represents tone with letters attached to the end of syllables, as Nuosu forbids codas. It does not use diacritics, and as such due to the large phonemic inventory of Nuosu, it requires frequent use of digraphs, including for monophthong vowels.
Tibetan
The
Tibetan script
The Tibetan script is a segmental writing system, or '' abugida'', forming a part of the Brahmic scripts, and used to write certain Tibetic languages, including Tibetan, Dzongkha, Sikkimese, Ladakhi, Jirel and Balti. Its exact origins ...
has two official romanization systems:
Tibetan Pinyin (for
Lhasa Tibetan
Lhasa Tibetan or Standard Tibetan is a standardized dialect of Tibetan spoken by the people of Lhasa, the capital of the Tibetan Autonomous Region. It is an official language of the Tibet Autonomous Region.
In the traditional "three-branched" ...
) and
Roman Dzongkha (for
Dzongkha
Dzongkha (; ) is a Tibeto-Burman languages, Tibeto-Burman language that is the official and national language of Bhutan. It is written using the Tibetan script.
The word means "the language of the fortress", from ' "fortress" and ' "language ...
).
Cyrillic
In English language library catalogues, bibliographies, and most academic publications, the
Library of Congress transliteration method is used worldwide.
In linguistics,
scientific transliteration is used for both
Cyrillic
The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Ea ...
and
Glagolitic alphabet
The Glagolitic script ( , , ''glagolitsa'') is the oldest known Slavic alphabet. It is generally agreed that it was created in the 9th century for the purpose of translating liturgical texts into Old Church Slavonic by Saints Cyril and Methodi ...
s. This applies to
Old Church Slavonic
Old Church Slavonic or Old Slavonic ( ) is the first Slavic languages, Slavic literary language and the oldest extant written Slavonic language attested in literary sources. It belongs to the South Slavic languages, South Slavic subgroup of the ...
, as well as modern
Slavic languages
The Slavic languages, also known as the Slavonic languages, are Indo-European languages spoken primarily by the Slavs, Slavic peoples and their descendants. They are thought to descend from a proto-language called Proto-Slavic language, Proto- ...
that use these alphabets.
Belarusian
*
BGN/PCGN romanization of Belarusian
The BGN/PCGN romanization system for Belarusian is a method for romanization of Cyrillic Belarusian texts, that is, their transliteration into the Latin alphabet.
There are a number of systems for romanization of Belarusian, but the BGN/PCGN s ...
, 1979 (
United States Board on Geographic Names
The United States Board on Geographic Names (BGN) is a Federal government of the United States, federal body operating under the United States Secretary of the Interior. The purpose of the board is to establish and maintain uniform usage of geogr ...
and
Permanent Committee on Geographical Names for British Official Use)
*
Scientific transliteration, or the ''International Scholarly System'' for
linguistics
Linguistics is the scientific study of language. The areas of linguistic analysis are syntax (rules governing the structure of sentences), semantics (meaning), Morphology (linguistics), morphology (structure of words), phonetics (speech sounds ...
*
ALA-LC romanization
ALA-LC (American Library AssociationLibrary of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.
Applications
The system is used to represent bibliographic information by ...
, 1997 (American Library Association and Library of Congress):
*
ISO 9
ISO 9 is an international standard establishing a system for the transliteration into Latin characters of Cyrillic characters constituting the alphabets of many Slavic and non-Slavic languages.
Published on February 23, 1995 by the Internation ...
:1995
* ''
'', 2000
Bulgarian
A system based on
scientific transliteration and
ISO/R 9:1968 was considered official in Bulgaria since the 1970s. Since the late 1990s, Bulgarian authorities have switched to the so-called
Streamlined System avoiding the use of diacritics and optimized for compatibility with English. This system became mandatory for public use with a law passed in 2009. Where the old system uses <č,š,ž,št,c,j,ă>, the new system uses
.
The new Bulgarian system was endorsed for official use also by UN in 2012, and by BGN and PCGN in 2013.
Kyrgyz
Macedonian
Russian
There is no single universally accepted system of writing Russian using the Latin script—in fact there are a huge number of such systems: some are adjusted for a particular target language (e.g. German or French), some are designed as a librarian's transliteration, some are prescribed for Russian travellers' passports; the transcription of some names is purely traditional. All this has resulted in great reduplication of names. E.g. the name of the Russian composer Tchaikovsky
Pyotr Ilyich Tchaikovsky ( ; 7 May 1840 – 6 November 1893) was a Russian composer during the Romantic period. He was the first Russian composer whose music made a lasting impression internationally. Tchaikovsky wrote some of the most popular ...
may also be written as ''Tchaykovsky'', ''Tchajkovskij'', ''Tchaikowski'', ''Tschaikowski'', ''Czajkowski'', ''Čajkovskij'', ''Čajkovski'', ''Chajkovskij'', ''Çaykovski'', ''Chaykovsky'', ''Chaykovskiy'', ''Chaikovski'', ''Tshaikovski'', ''Tšaikovski'', ''Tsjajkovskij'' etc. Systems include:
* BGN/PCGN (1947): Transliteration system (United States Board on Geographic Names & Permanent Committee on Geographical Names for British Official Use).
* GOST 16876-71 (1971): A now defunct Soviet transliteration standard. Replaced by GOST 7.79, which is an ISO 9
ISO 9 is an international standard establishing a system for the transliteration into Latin characters of Cyrillic characters constituting the alphabets of many Slavic and non-Slavic languages.
Published on February 23, 1995 by the Internation ...
equivalent.
* United Nations
The United Nations (UN) is the Earth, global intergovernmental organization established by the signing of the Charter of the United Nations, UN Charter on 26 June 1945 with the stated purpose of maintaining international peace and internationa ...
romanization system for geographical names (1987): Based on GOST 16876-71.
* ISO 9
ISO 9 is an international standard establishing a system for the transliteration into Latin characters of Cyrillic characters constituting the alphabets of many Slavic and non-Slavic languages.
Published on February 23, 1995 by the Internation ...
(1995): Transliteration. From the International Organization for Standardization
The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries.
M ...
.
* ALA-LC
ALA-LC (American Library AssociationLibrary of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.
Applications
The system is used to represent bibliographic information by ...
(1997)
* "Volapuk" encoding (1990s): Slang term (it is not really Volapük) for a writing method that is not truly a transliteration, but used for similar goals (see article).
* Conventional English transliteration is based to BGN/PCGN, but does not follow a particular standard. Described in detail at Romanization of Russian
The romanization of the Russian language (the transliteration of Russian text from the Cyrillic script into the Latin script), aside from its primary use for including Russian names and words in text written in a Latin alphabet, is also essentia ...
.
* Streamlined System for the romanization of Russian.
* Comparative transliteration of Russian in different languages (Western European, Arabic, Georgian, Braille, Morse)
Syriac
The Latin script for Syriac was developed in the 1930s, following the state policy for minority languages of the Soviet Union
The Union of Soviet Socialist Republics. (USSR), commonly known as the Soviet Union, was a List of former transcontinental countries#Since 1700, transcontinental country that spanned much of Eurasia from 1922 until Dissolution of the Soviet ...
, with some material published.
Ukrainian
The 2010 Ukrainian National system has been adopted by the UNGEGN in 2012 and by the BGN/PCGN in 2020. It is also very close to the modified (simplified) ALA-LC system, which has remained unchanged since 1941.
* ALA-LC
ALA-LC (American Library AssociationLibrary of Congress) is a set of standards for romanization, the representation of text in other writing systems using the Latin script.
Applications
The system is used to represent bibliographic information by ...
* ISO 9
ISO 9 is an international standard establishing a system for the transliteration into Latin characters of Cyrillic characters constituting the alphabets of many Slavic and non-Slavic languages.
Published on February 23, 1995 by the Internation ...
* Ukrainian National transliteration
* Ukrainian National and BGN/PCGN systems, at the UN Working Group on Romanization Systems
* Thomas T. Pedersen's comparison of five systems
Overview and summary
The chart below shows the most common phonemic transcription romanization used for several different alphabets. While it is sufficient for many casual users, there are multiple alternatives used for each alphabet, and many exceptions. For details, consult each of the language sections above. (Hangul characters are broken down into jamo components.)
See also
* Anglicisation
Anglicisation or anglicization is a form of cultural assimilation whereby something non-English becomes assimilated into or influenced by the culture of England. It can be sociocultural, in which a non-English place adopts the English language ...
* Cyrillization, expression of a language in Cyrillic letters
* Francization
Francization (in American English, Canadian English, and Oxford English) or Francisation (in other British English), also known as Frenchification, is the expansion of French language use—either through willful adoption or coercion—by more an ...
* Gairaigo
is Japanese for "loan word", and indicates a transcription into Japanese. In particular, the word usually refers to a Japanese word of foreign origin that was not borrowed in ancient times from Old or Middle Chinese (especially Literary Chine ...
* Transcription into Chinese, though standards vary by polity.
* Sinicization
Sinicization, sinofication, sinification, or sinonization (from the prefix , 'Chinese, relating to China') is the process by which non-Chinese societies or groups are acculturated or assimilated into Chinese culture, particularly the language, ...
, specifically adoption of Chinese literary culture
Chinese writing, culture and institutions were imported as a whole by Vietnam, Korea, Japan and other neighbouring states over an extended period. Chinese Buddhism spread over East Asia between the 2nd and 5th centuries AD, followed by Confucia ...
* Latinisation of names
Latinisation (or Latinization) of names, also known as onomastic Latinisation (or onomastic Latinization), is the practice of rendering a ''non''-Latin name in a Neo-Latin, modern Latin style. It is commonly found with historical proper names, i ...
* Semitic romanization
* Spread of the Latin script
References
External links
; About romanization
IPA for Urdu and Roman Urdu for Mobile and Internet Users (Download)
Microsoft Transliteration Utility
nbsp;– A tool for creating, debugging and using transliteration modules from any script to any other script.
* Randall Barry (ed.) ''ALA-LC Romanization Tables'' U.S. Library of Congress, 1997, . (One of the few printed books with lists of romanizations)
in PDF format
UNGEGN Working Group on Romanization Systems
; Romanization online
Chinese Phonetic Conversion Tool
nbsp;– Converts between Pinyin and other formats
Cyrillic Transliteration and Transcription ONLINE (Cyrillic -> Latin)
eiktub
– An Arabic Transliteration Pad
Lingua::Translit
nbsp;– Perl
Perl is a high-level, general-purpose, interpreted, dynamic programming language. Though Perl is not officially an acronym, there are various backronyms in use, including "Practical Extraction and Reporting Language".
Perl was developed ...
module covering a variety of writing systems e.g. Cyrillic or Greek. Provides a lot of standards as well as common transliteration schemes.
Arabeasy
nbsp;– Arabic Transliteration (free chrome extension exists, also works for Persian, Urdu)
– Russian Transliteration (free chrome extension exists)
For Persian Romanization
Cantonese" target="_blank" class="mw-redirect" title="Romanization
* [https://hongkongvision.com/tool/cc_py_conv_en
{{Latin script
Romanization, ">Romanization
script
Romanization,
Latin script
Multilingual orthographies
Orthography