HOME

TheInfoList



OR:

ISO 11940 is an ISO standard for the
transliteration Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus ''trans-'' + '' liter-'') in predictable ways, such as Greek → , Cyrillic → , Greek → the digraph , Armenian → or L ...
of Thai characters, published in 1998 and updated in September 2003 and confirmed in 2008. An extension to this standard named
ISO 11940-2 ISO 11940-2 is an ISO standard for a simplified transcription of the Thai language into Latin characters. The full standard includes pronunciation rules and conversion tables of Thai consonants and vowels. It is a sequel to , describing a way to ...
defines a simplified transcription based on it.


Consonants

The
transliteration Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus ''trans-'' + '' liter-'') in predictable ways, such as Greek → , Cyrillic → , Greek → the digraph , Armenian → or L ...
of the pure
consonants In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are and pronounced with the lips; and pronounced with the front of the tongue; and pronounced wit ...
is derived from their usual pronunciation as an initial consonant. An unmarked ''h'' is used to form digraphs denoting
aspirated consonants In phonetics, aspiration is the strong burst of breath that accompanies either the release or, in the case of preaspiration, the closure of some obstruents. In English, aspirated consonants are allophones in complementary distribution with the ...
. High and low pairs of consonants are systematically differentiated by applying a macron to the high class consonant. Further differentiation of consonants with identical
phonetic Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. ...
function is obtained by leaving the most frequent unmarked, marking the second commonest by a dot below, marking the third commonest by a horn, and marking the fourth commonest by underlining. The use of a dot below has a similar effect to the Indological practice of distinguishing
retroflex consonants A retroflex ( /ˈɹɛtʃɹoːflɛks/), apico-domal ( /əpɪkoːˈdɔmɪnəl/), or cacuminal () consonant is a coronal consonant where the tongue has a flat, concave, or even curled shape, and is articulated between the alveolar ridge and the har ...
by a dot below, but there are subtle differences – it is the transliterations of ธ ''tho thong'' and ศ ''so sala'' that are dotted below, not those of the corresponding
retroflex consonants A retroflex ( /ˈɹɛtʃɹoːflɛks/), apico-domal ( /əpɪkoːˈdɔmɪnəl/), or cacuminal () consonant is a coronal consonant where the tongue has a flat, concave, or even curled shape, and is articulated between the alveolar ridge and the har ...
. The transliterations of consonants should be entered in the order base letter, macron if any, and then dot below, horn or "macron below". Only three consonants have the horn in their transliteration, ฅ ''kho khon'', ฒ ''tho phuthao'' and ษ ''so ruesi'', and only one consonant has an underline, ฑ ''tho nang montho''.


Vowels

The letter ''å'' is the only precomposed character specified in the output of transliteration. ''Lakkhangyao'' (ๅ) has been shown only in combination with the vowel letters ฤ and ฦ. The standard simply lists ฤ and ฦ with the consonants and ''lakkhangyao'' with the vowels. An isolated ''lakkhangyao'' would also be transliterated by a small letter "i" with stroke (''ɨ''), but such should not occur in Thai, Pāli, or Sanskrit. The transliterations of ว ''wo waen'' and อ ''o ang'' have been included here because of their use as complete vowel symbols, but their transliteration does not depend on how they are being used and the standard simply lists them with the consonants. Compound vowel symbols are transliterated in accordance with their constituents.


Other combining marks

Note that ''yamakkan'' (–๎) is represented by a spacing tilde, not a superscript tilde.


Punctuation and Digits

ISO 11940:1998 distinguishes the abbreviation symbol ''paiyannoi'' (ฯ) from the sentence terminator ''angkhandiao'' (ฯ), even though neither the national character standard TIS 620-2533 nor
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
Version 5.0 distinguishes them. ''Paiyannoi'' is transliterated as ''ǂ'' and ''angkhandiao'' is transliterated as ''ǀ''. Note that ''paiyannoi'', ''angkhandiao'' and ''angkhankhu'' (๚) are transliterated by the letters used for
click consonants Click consonants, or clicks, are speech sounds that occur as consonants in many languages of Southern Africa and in three languages of East Africa. Examples familiar to English-speakers are the '' tut-tut'' (British spelling) or '' tsk! tsk!' ...
, not by double dagger, vertical bars or ''
danda In Indic scripts, the daṇḍa (Sanskrit: दण्ड ' "stick") is a punctuation mark. The glyph consists of a single vertical stroke. Use The daṇḍa marks the end of a sentence or line, comparable to a full stop (period) as commonly u ...
s''.


Character sequencing

In general characters are transliterated from left to right and, where characters have the same horizontal position, from top to bottom. The vertical sequencing is in fact simply specified as tone marks and ''thanthakhat'' (–์) preceding any other marks above or below the consonant. The standard denies at the end of Section 4.2 that the combination of ''sara u'' (◌ุ, ◌ู) and ''nikkhahit'' (◌ํ) can occur and then gives an example of it when specifying the transliteration of ''nikkhahit'', but does not show the transliteration of the combination. The effect of these rules is that, except for ''nikkhahit'', all the non-vowel marks attached to a consonant in Thai are attached to the consonant in the Roman transliteration. The standard concedes that ''attempting'' to transpose preposed vowels and consonants may be comforting to those used to the
Roman alphabet The Latin alphabet or Roman alphabet is the collection of letters originally used by the ancient Romans to write the Latin language. Largely unaltered with the exception of extensions (such as diacritics), it used to write English and the o ...
, but recommends that preposed vowels not be transposed. For example, () should be transliterated to and () to .


Variations


Causes

The standard specifies the order in which the accents should be typed, but not all input systems will record accents in the order in which they are typed. Unicode specifies two normalised forms for letters with multiple accents, and transliterated text is highly likely to be stored in one of these forms. This complicates automatic back-transliteration. As Unicode-compliant processes must handle such variations correctly, the transliterations on this page have been chosen for ease of display – present day rendering systems may display equivalent forms differently. Many fonts display novel combinations of consonants and accents badly. For example, the Institute of the
Estonian Language Estonian ( ) is a Finnic language, written in the Latin script. It is the official language of Estonia and one of the official languages of the European Union, spoken natively by about 1.1 million people; 922,000 people in Estonia and 160, ...
publishes an explanation of the application of the standard to Thai on the web, and with one exception this seems to be a comply with the standard. The exception is that, except for the macron, accents over consonants are actually offset to the right, giving the impression that they have been entered as the corresponding non-combining characters. The standard specifies the transliterations in code points, but someone working from this free explanation could easily deduce that the spacing forms of the tone accents should be used.


ICU (CLDR 1.4.1)

The ICU implementation, recorded in Version 1.4.1 of the
Common Locale Data Repository The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating sys ...
sponsored by
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
,http://unicode.org/Public/cldr/1.4.1/core.zip files transforms/ThaiLogical-Latin.xml and transforms/Thai-ThaiLogical.xml (used b
ICU's transliterators
"Thai-Latin" and "Latin-Thai")
uses a prime instead of a horn in the transliteration of consonants. This affects the transliteration of ฅ ''kho khon'', ฒ ''tho phuthao'' and ษ ''so bo ruesi''. ฏ ''to patak'' is also transliterated differently, as ''t̩'' rather than ''ṭ''. This implementation transliterates ำ as ''ả'' instead of ''å'' to avoid ambiguity with the hypothetical Thai script sequence ะํ (''sara a'', ''nikkhahit''). The ICU implementation transliterates ฺ ''phinthu'' as instead of to avoid problems with
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
normalisation. This has the side effect of improving legibility when applied to an underdotted consonant. The ICU implementation transliterates ฯ ''paiyannoi'' as ''‡'' (double dagger) and ''angkhankhu'' as '', , '' (two ASCII vertical bars). As the ICU implementation uses Unicode, it cannot reliably distinguish ''angkhandiao'' from ''paiyannoi'' without a semantic analysis, and makes no such attempt. The character sequencing of the ICU implementation is different. It transposes preposed vowels with the following consonant, and processes the marks on a consonant in the order in which they are stored in memory. (Most Thai input methods ensure that the marks are stored in bottom to top order.) It does not transpose preposed vowels with complete consonant clusters; consonant clusters cannot be identified with complete accuracy, and transposing vowels with clusters would require an additional symbol to permit reliable conversion back to the Thai script. For example, under this implementation transliterates to and to . Finally, this implementation generates transliterations in
Unicode Normalisation Unicode equivalence is the specification by the Unicode character encoding standard that some sequences of code points represent essentially the same character. This feature was introduced in the standard to allow compatibility with preexisting ...
Form C (NFC).


See also

*
List of ISO transliterations List of ISO standards for transliterations and romanizations: Romanizations * ISO 3602:1989 (Romanization of Japanese (kana script)) * ISO 7098:2015 (Romanization of Chinese) Transliterations * ISO 9:1995 (Transliteration of Cyrillic charac ...
*
Romanization of Lao Lao romanization systems are transcriptions of the Lao script into the Latin alphabet. Tables Consonants The table below shows the Lao consonant letters and their transcriptions according to IPA ( International Phonetic Alphabet,) BGN/PCG ...
* Romanization of Thai *
Royal Thai General System of Transcription The Royal Thai General System of Transcription (RTGS) is the official system for rendering Thai words in the Latin alphabet. It was published by the Royal Institute of Thailand. It is used in road signs and government publications and is the cl ...


References


External links


Official ISO siteTransliteration rules on Unicode.orgRomanisation of Thai placenames in the KNAB Database of the Institute of the Estonian Language
{{ISO standards #11940 #11940 Romanization of Thai