CNS11643
   HOME

TheInfoList



OR:

The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Standard Interchange Code or CSIC ( zh, tr=, t=中文標準交換碼), is officially the standard character set of
Taiwan Taiwan, officially the Republic of China (ROC), is a country in East Asia. The main geography of Taiwan, island of Taiwan, also known as ''Formosa'', lies between the East China Sea, East and South China Seas in the northwestern Pacific Ocea ...
(Republic of China). Published and draft editions of CNS 11643 remain the source standards for
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
reference glyphs for
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Uni ...
submitted for use in Taiwan, and the
character repertoire Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a c ...
of CNS 11643 continues to be updated and used for administrative purposes in Taiwan.
EUC-TW Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese language, Japanese, Korean language, Korean, and simplified Chinese characters, simplified Chinese (characters). The most commonly used EUC codes are va ...
is an encoded representation of CNS 11643 and
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
in Extended Unix Code (EUC) form. In practice, variants of the
Big5 Big-5 or Big5 ( zh, t=大五碼) is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 ...
character set, which is closely related to the first two planes of CNS 11643, served as the ''de facto'' standard encoding for
Traditional Chinese A tradition is a system of beliefs or behaviors (folk custom) passed down within a group of people or society with symbolic meaning or special significance with origins in the past. A component of cultural expressions and folklore, common examp ...
before the introduction of Unicode. Other encodings capable of representing certain CSIC planes include ISO-2022-CN (planes 1 and 2) and ISO-2022-CN-EXT (planes 1 through 7).


Structure

CNS 11643 is designed to conform to ISO 2022, although only the first seven 94×94-character planes have ISO-IR registrations. The total number of planes has varied with successive revisions of the standard; the most recent pending drafts have 19 planes, so the maximum possible number of encodable characters across all planes is 19×94×94 = 167884. Planes 1 through 7 are defined by the standard; since 2007, planes 10 through 15 have also been defined by the standard. Prior to this, planes 12 to 15 (35344 code points) were specifically designated for user-defined characters. Unlike CCCII, the encoding of variant characters in CNS 11643 is not related.


History

The first edition of the standard was published in 1986, and included planes 1 and 2, deriving from levels 1 and 2 of
Big5 Big-5 or Big5 ( zh, t=大五碼) is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 ...
, with some re-ordering due to corrected stroke counts, two duplicate characters being omitted, and the addition of 213 classical radicals in plane 1 (out of 214
Kangxi radicals The ''Kangxi'' radicals (), also known as ''Zihui'' radicals, are a set of 214 radicals that were collated in the 18th-century '' Kangxi Dictionary'' to aid categorization of Chinese characters. They are primarily sorted by stroke count. They ...
, of which 210 are effectively duplicates of existing Big5 characters and the remaining three of HKSCS characters; see also Kangxi Radicals (Unicode block)). Extensions to the standard were subsequently published in 1988 (6319 characters, occupying plane 14) and 1990 (7169 characters, occupying plane 15). Unicode 1.0.0, although it did not yet include
hanzi Chinese characters are logographs used to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represent the only one ...
, included characters for compatibility with CNS 11643: the CJK Compatibility Forms block was titled "CNS 11643 Compatibility" in Unicode 1.0.0. When the Unicode
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Uni ...
set was being compiled for Unicode 1.0.1, the national bodies submitted character sets to the CJK Joint Research Group for inclusion. The version of CNS 11643 submitted included the plane 14 extension, in addition to further desired characters appended to plane 14 (after 68–21, the last used code point in the standard version of the extension). In the second edition of the standard, published in 1992, a much larger collection of
hanzi Chinese characters are logographs used to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represent the only one ...
was defined across seven planes. The majority of the 1988 plane 14 extension, comprising the 6148 code points 01-01 through 66–38, was adopted as plane 3 (with the remaining 171 characters, code points 66-39 through 68–21, being instead distributed amongst plane 4). The plane 15 extension was not included, although 338 of its characters were included amongst planes 4 through 7. The third edition of the standard, published in 2007, added the
Euro sign The euro sign () is the currency sign used for the euro, the official currency of the eurozone. The design was presented to the public by the European Commission on 12 December 1996. It consists of a stylized letter E (or epsilon), crossed by ...
, ideographic zero,
kana are syllabary, syllabaries used to write Japanese phonology, Japanese phonological units, Mora (linguistics), morae. In current usage, ''kana'' most commonly refers to ''hiragana'' and ''katakana''. It can also refer to their ancestor , wh ...
and extensions to the existing
bopomofo Bopomofo, also called Zhuyin Fuhao ( ; ), or simply Zhuyin, is a Chinese transliteration, transliteration system for Standard Chinese and other Sinitic languages. It is the principal method of teaching Chinese Mandarin pronunciation in Taiwa ...
and
Roman alphabet The Latin alphabet, also known as the Roman alphabet, is the collection of letters originally used by the ancient Romans to write the Latin language. Largely unaltered except several letters splitting—i.e. from , and from —additions su ...
support to plane 1. It introduced planes 10 through 14, containing additional hanzi, and incorporated the existing plane 15 extension into the standard itself (with gaps left where the characters already existed in planes 4 through 7). It also added 128 further hanzi to plane 3, starting at code point 68–40, based on the additions to the version of the 1988 plane 14 which had been submitted for inclusion in Unicode.


Plane numbering


Current purpose and relationship to Unicode

The CNS 11643 repertoire includes characters used for administrative purposes in Taiwan, including household registration and ID cards, in addition to characters used in education. In particular, characters in planes 1 and 2 are used in education. Only the characters used in education are subjected to glyph-form normalisation in CNS 11643. It continues to be expanded, with additional planes numbered up to 19 having been drafted, but not yet published as part of a CNS 11643 edition. A 2022 amendment to the 2007 edition appended to the end of plane 2, and corrected several glyph forms in planes 1 and 2. Although the 1992 and 2007 editions of CNS 11643, in addition to more recent working drafts, serve as the Unihan sources for reference glyphs for
CJK Unified Ideographs The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Uni ...
submitted for use in Taiwan, there remains, , several thousand CNS 11643 characters with no corresponding Unicode character, or which do not round-trip through Unicode, mostly in planes 10 through 14. These are mapped to the Unicode Supplementary Private Use Area. In some cases, two or more CNS 11643 characters correspond to a single Unicode CJK Unified Ideograph. These cases are (except where covered by the CJK Compatibility Ideographs Supplement block) currently mapped to Unicode Supplementary Private Use Area code points, but the Taipei Computer Association, participating in the
Ideographic Research Group The Ideographic Research Group (IRG), formerly called the Ideographic Rapporteur Group, is a subgroup of Working Group 2 (WG2) of ISO/IEC JTC1 Subcommittee 2 (SC2), which is the committee responsible for developing the Universal Coded Character Se ...
, has been evaluating the feasibility of registering them as Ideographic Variation Sequences at some point in the future.


Relationship to Big5

Levels 1 and 2 of the
Big5 Big-5 or Big5 ( zh, t=大五碼) is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters. The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 ...
encoding correspond mostly to CNS 11643 planes 1 and 2, respectively, with occasional differences in order, and with two duplicate hanzi existing in Big5 but not in CNS 11643. They can be mapped using a list of ranges. However, the 213 classical radicals in CNS 11643 plane 1 are additional to the characters available in Big5 (although they can be lossily mapped to the corresponding
hanzi Chinese characters are logographs used to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represent the only one ...
characters in Big5 or HKSCS), and further additional characters were added to CNS 11643 plane 1 in 2007. The Big5-2003 variant of Big5 is defined as a partial encoding of CNS 11643. Within the Big5 hanzi repertoire, only one plane 1 character is conventionally mapped to Unicode differently from the corresponding character from the first two CNS 11643 planes: to U+5F5D ( ), whereas its CNS plane 1 counterpart is mapped to a related variant at U+5F5E ( ); U+5F5D is separately included in CNS 11643 plane 3. However, some variant mappings for Big5, such as some defined by
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
, include U+5F5E rather than U+5F5D. Similarly, a single character from Big5 level 2 (including its IBM variant) is mapped to a different Unicode code point than its CNS 11643 plane 2 counterpart: to U+5284 ( ), while the Unihan database currently maps the CNS 11643 character to U+7B9A ( ); U+5284 appears in CNS 11643 plane 14.


References

* This page is based on the information on th
CNS official web site


External links


CNS 11643 official web site

Current CNS 11643 open data
including mapping data
Unicode Consortium mappings for CNS 11643-1986
planes 1 and 2, plus the 1988 plane 14 (not the 2007 plane 14) with extensions. Uses a single prefixed hex digit to indicate plane. * CNS 11643 mappings from
International Components for Unicode International Components for Unicode (ICU) is an open-source project of mature C/ C++ and Java libraries for Unicode support, software internationalization, and software globalization. ICU is widely portable to many operating systems and envir ...
(ICU): ** "CNS-11643-1992"
original versioncurrent version
The original version of the mapping includes standard planes 1–7 but includes the plane 15 layout as plane 9; the current version includes only planes 1 and 2. Uses prefixed 0x81 through 0x89 to indicate plane. *
"EUC-TW-2014"
standard assignments for planes 1 through 7 and 15, and IBM corporate assignments in planes 12 and 13. CNS codes in EUC format with two-byte plane 1. * ISO-IR registered CNS-11643 code charts
plane 1plane 2plane 3plane 4plane 5plane 6plane 7
{{character encoding Chinese character encodings