The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Standard Interchange Code or CSIC ( zh, tr=, t=中文標準交換碼), is officially the standard character set of

Taiwan Taiwan, officially the Republic of China (ROC), is a country in East Asia, at the junction of the East and South China Seas in the northwestern Pacific Ocean, with the People's Republic of China (PRC) to the northwest, Japan to the no ...

(Republic of China). In practice, variants of the related

Big5 Big-5 or Big5 is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 character se ...

character set are ''de facto'' standard. CNS 11643 is designed to conform to

ISO 2022 ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in th ...

. It contains 16 planes, so the maximum possible number of encodable characters is 16×94×94 = 141376. Planes 1 through 7 are defined by the standard; since 2007, planes 10 through 15 have also been defined by the standard. Prior to this, planes 12 to 15 (35344 code points) were specifically designated for user-defined characters. Unlike CCCII, the encoding of variant characters in CNS 11643 is not related. EUC-TW is an encoded representation of CNS 11643 and

ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...

in Extended Unix Code (EUC) form. Other encodings capable of representing certain CSIC planes include

ISO-2022-CN ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...

(planes 1 and 2) and ISO-2022-CN-EXT (planes 1 through 7).

History

The first edition of the standard was published in 1986, and included planes 1 and 2, deriving from levels 1 and 2 of

, with some re-ordering due to corrected stroke counts, two duplicate characters being omitted, and the addition of 213 classical

radicals Radical may refer to: Politics and ideology Politics *Radical politics, the political intent of fundamental societal change *Radicalism (historical), the Radical Movement that began in late 18th century Britain and spread to continental Europe and ...

. Extensions to the standard were subsequently published in 1988 (6319 characters, occupying plane 14) and 1990 (7169 characters, occupying plane 15). Unicode 1.0.0, although it did not yet include

hanzi Chinese characters () are logograms developed for the writing of Chinese. In addition, they have been adapted to write other East Asian languages, and remain a key component of the Japanese writing system where they are known as ''kanji' ...

, included characters for compatibility with CNS 11643: the

CJK Compatibility Forms CJK Compatibility Forms is a Unicode block containing vertical glyph variants for east Asian compatibility. Its block name in Unicode 1.0 was CNS 11643 Compatibility, in reference to CNS 11643. History The following Unicode-related documents ...

block was titled "CNS 11643 Compatibility" in Unicode 1.0.0. When the Unicode

CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. In the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Unicode ...

set was being compiled for Unicode 1.0.1, the national bodies submitted character sets to the CJK Joint Research Group for inclusion. The version of CNS 11643 submitted included the plane 14 extension, in addition to further desired characters appended to plane 14 (after 68-21, the last used code point in the standard version of the extension). In the second edition of the standard, published in 1992, a much larger collection of

was defined across seven planes. A subset of the 1988 plane 14 extension, including the 6148

code points In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but ...

01-01 through 66-38, became plane 3 (with the remaining 171 characters, code points 66-39 through 68-21, being instead distributed amongst plane 4). The plane 15 extension was not included, although 338 of its characters were included amongst planes 4 through 7. The third edition of the standard, published in 2007, added the

Euro sign The euro sign () is the currency sign used for the euro, the official currency of the eurozone and unilaterally adopted by Kosovo and Montenegro. The design was presented to the public by the European Commission on 12 December 1996. It consists o ...

, ideographic zero,

kana The term may refer to a number of syllabaries used to write Japanese phonological units, morae. Such syllabaries include (1) the original kana, or , which were Chinese characters ( kanji) used phonetically to transcribe Japanese, the most ...

and extensions to the existing

bopomofo Bopomofo (), or Mandarin Phonetic Symbols, also named Zhuyin (), is a Chinese transliteration system for Mandarin Chinese and other related languages and dialects. More commonly used in Taiwanese Mandarin, it may also be used to transcribe ...

and

Roman alphabet The Latin alphabet or Roman alphabet is the collection of letters originally used by the ancient Romans to write the Latin language. Largely unaltered with the exception of extensions (such as diacritics), it used to write English and the o ...

support to plane 1. It introduced planes 10 through 14, containing additional hanzi, and incorporated the existing plane 15 extension into the standard itself (with gaps left where the characters already existed in planes 4 through 7). It also added 128 further hanzi to plane 3, starting at code point 68-40. , there are several thousand CNS 11643 characters with no corresponding Unicode character, mostly in planes 10 through 14; these are mapped to the Unicode Supplementary Private Use Area.

Relationship to Big5

Levels 1 and 2 of the

encoding correspond mostly to CNS 11643 planes 1 and 2, respectively, with occasional differences in order, and with two duplicate hanzi existing in Big5 but not in CNS 11643. They can be mapped using a list of ranges. However, the 213 classical radicals in CNS 11643 plane 1 are additional to the characters available in Big5, and further additional characters were added to CNS 11643 plane 1 in 2007. The Big5-2003 variant of Big5 is defined as a partial encoding of CNS 11643. Within the Big5 hanzi repertoire, only one character is conventionally mapped to Unicode differently from the corresponding character from the first two CNS 11643 planes: to U+5F5D ( 彝), whereas its CNS plane 1 counterpart is mapped to a related variant at U+5F5E ( 彞). However, some variant mappings for Big5, such as some defined by IBM, include U+5F5E rather than U+5F5D.

References

* This page is based on the information on th
CNS official web site

External links

CNS 11643 official web site

Current CNS 11643 open data
including mapping data
Unicode Consortium mappings for CNS 11643-1986
planes 1 and 2, plus the 1988 plane 14 (not the 2007 plane 14) with extensions. Uses a single prefixed hex digit to indicate plane. * CNS 11643 mappings from

International Components for Unicode International Components for Unicode (ICU) is an open-source project of mature C/ C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and environ ...

(ICU): ** "CNS-11643-1992"
original versioncurrent version
The original version of the mapping includes standard planes 1–7 but includes the plane 15 layout as plane 9; the current version includes only planes 1 and 2. Uses prefixed 0x81 through 0x89 to indicate plane. *
"EUC-TW-2014"
standard assignments for planes 1 through 7 and 15, and IBM corporate assignments in planes 12 and 13. CNS codes in EUC format with two-byte plane 1. *

ISO-IR ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/ IEC standard (equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japanese Industrial Standard JIS X 0202) in the ...

registered CNS-11643 code charts
plane 1plane 2plane 3plane 4plane 5plane 6plane 7
{{CharacterEncoding-stub Character sets Chinese-language computing