HOME

TheInfoList



OR:

Thai Industrial Standard 620-2533, commonly referred to as TIS-620, is the most common
character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that ...
and
character encoding Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values that ...
for the Thai language. The standard is published by the Thai Industrial Standards Institute (TISI), an organ of the Ministry of Industry under the Royal Thai Government, and is the sole official standard for encoding Thai in
Thailand Thailand ( ), historically known as Siam () and officially the Kingdom of Thailand, is a country in Southeast Asia, located at the centre of the Indochinese Peninsula, spanning , with a population of almost 70 million. The country is b ...
. The descriptive name of the standard is "Standard for Thai Character Codes for Computers" (Thai: รหัสสำหรับอักขระไทยที่ใช้กับคอมพิวเตอร์). "2533" refers to year 2533 of the Buddhist Era (1990), the year the present version of the standard was published; a previous revision, TIS 620-2529 (1986), is now obsolete. The code page layout is the same between the two editions. TIS-620 is the
IANA The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Interne ...
preferred charset name for TIS-620, and that charset name is used also for
ISO/IEC 8859-11 ISO/IEC 8859-11:2001, ''Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. I ...
(which adds a no-break space character at 0xA0, which is unassigned in TIS-620). When the IANA name is used the codes are supplemented with the
C0 and C1 control codes The C0 and C1 control code or control character sets define control codes for use in text by computer systems that use ASCII and derivatives of ASCII. The codes represent additional information about the text, such as the position of a cursor ...
from
ISO/IEC 6429 ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and pr ...
.


Structure

TIS-620 is a conventionally structured
Extended ASCII Extended ASCII is a repertoire of character encodings that include (most of) the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criti ...
national character set that retains full compatibility with 7-bit
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because of ...
and uses the 8-bit range hex A1 to FB for encoding the
Thai alphabet The Thai script ( th, อักษรไทย, ) is the abugida used to write Thai, Southern Thai and many other languages spoken in Thailand. The Thai alphabet itself (as used to write Thai) has 44 consonant symbols ( th, พยัญชน� ...
. Due to the complex combining nature of Thai vowels and diacritics, TIS-620 is intended for information interchange only, and an additional display engine is required to compose characters correctly.


Variants

A nearly identical version of TIS-620 has been adopted as
ISO/IEC 8859-11 ISO/IEC 8859-11:2001, ''Information technology — 8-bit single-byte coded graphic character sets — Part 11: Latin/Thai alphabet'', is part of the ISO/IEC 8859 series of ASCII-based standard character encodings, first edition published in 2001. I ...
in 2001, the sole difference being that ISO/IEC 8859-11 defines hex A0 as a
non-breaking space In word processing and digital typesetting, a non-breaking space, , also called NBSP, required space, hard space, or fixed space (though it is not of fixed width), is a space character that prevents an automatic line break at its position. In ...
, while TIS-620 leaves it undefined but reserved. (In practice, this small distinction is usually ignored.) The ISO/IEC 8859-11 set has also been registered as ISO-IR-166 by
Ecma International Ecma International () is a nonprofit standards organization for information and communication systems. It acquired its current name in 1994, when the European Computer Manufacturers Association (ECMA) changed its name to reflect the organization ...
, but this variation adds explicit escape codes for signaling the beginning and end of Thai character sequences. The TIS-620 character set ordering has been used essentially as is within
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, whic ...
(
ISO/IEC 10646 ISO/IEC JTC 1, entitled "Information technology", is a joint technical committee (JTC) of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC). Its purpose is to develop, maintain and pr ...
) as well. Unicode's Thai block is U+0E01 through U+0E7F, and TIS-620 Thai characters can be converted to
UTF-16 UTF-16 (16-bit Unicode Transformation Format) is a character encoding capable of encoding all 1,112,064 valid code points of Unicode (in fact this number of code points is dictated by the design of UTF-16). The encoding is variable-length, as cod ...
simply by prefixing each byte with 0E and subtracting hex A0 from the value.


Character set

In the table above, 20 is the regular SPACE character. Code values 00-1F, 7F, 80-9F, A0, DB-DE and FC-FF are not assigned to characters by TIS-620. Code values D1, D4-DA, E7-EE are
combining character In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents). Unicode also ...
s.


Further reading

*


References


External links


Official reference
(in Thai) * Announcement in Royal Gazette o
TIS 620-2533
an
TIS 620-2529
* {{Character encoding Encodings of Thai Thai Industrial Standards Computer-related introductions in 1986