The Chinese Character Code for Information Interchange () or CCCII is a
character set
Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a c ...
developed by the Chinese Character Analysis Group in
Taiwan
Taiwan, officially the Republic of China (ROC), is a country in East Asia. The main geography of Taiwan, island of Taiwan, also known as ''Formosa'', lies between the East China Sea, East and South China Seas in the northwestern Pacific Ocea ...
. It was first published in 1980, and significantly expanded in 1982 and 1987.
It is used mostly by
library systems.
It is one of the earliest established and most sophisticated encodings for
traditional Chinese
A tradition is a system of beliefs or behaviors (folk custom) passed down within a group of people or society with symbolic meaning or special significance with origins in the past. A component of cultural expressions and folklore, common examp ...
(predating the establishment of
Big5
Big-5 or Big5 ( zh, t=大五碼) is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.
The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 ...
in 1984 and
CNS 11643
The CNS 11643 character set (Chinese National Standard 11643), also officially known as the Chinese Standard Interchange Code or CSIC ( zh, tr=, t=中文標準交換碼), is officially the standard character set of Taiwan (Republic of China). Publ ...
in 1986).
It is distinguished by its unique system for encoding
simplified versions and other
variants
Variant may refer to:
Arts and entertainment
* ''Variant'' (magazine), a former British cultural magazine
* Variant cover, an issue of comic books with varying cover art
* ''Variant'' (novel), a novel by Robison Wells
* " The Variant", 2021 epis ...
of its main set of
hanzi
Chinese characters are logographs used to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represent the only one ...
characters.
A variant of an earlier version of CCCII is used by the
Library of Congress
The Library of Congress (LOC) is a research library in Washington, D.C., serving as the library and research service for the United States Congress and the ''de facto'' national library of the United States. It also administers Copyright law o ...
as part of
MARC-8
The MARC-8 charset is a MARC standard used in MARC-21 library records. The MARC formats are standards for the representation and communication of bibliographic and related information in machine-readable form, and they are frequently used in lib ...
, under the name East Asian Character Code (EACC, ANSI/NISO Z39.64),
where it comprises part of
MARC 21's
JACKPHY In library automation the initialism JACKPHY refers to a group of language scripts not based on Roman characters, specifically: Japanese, Arabic, Chinese, Korean, Persian, Hebrew, and Yiddish. Focus on these seven writing systems by Library of ...
support. However, EACC contains fewer characters than the most recent versions of CCCII.
Work at
Apple
An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
based on
Research Libraries Group
The Research Libraries Group (RLG) was a U.S.-based library consortium that existed from 1974 until its merger with the OCLC library consortium in 2006. RLG developed the Eureka interlibrary search engine, the RedLightGreen database of bibliogr ...
's CJK Thesaurus, which was used to maintain EACC, was one of the direct predecessors of
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
's
Unihan
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature ...
set.
Design

Byte ranges
CCCII is designed as an 94
n set, as defined by
ISO/IEC 2022
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/ IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japane ...
.
Each Chinese character is represented by a 3-byte code in which each byte is 7-bit, between
0x21 and 0x7E inclusive. Thus, the maximum number of Chinese characters representable in CCCII is 94×94×94 = 830584. In practice the number of characters encodable by CCCII would be less than this number, because variant characters are encoded in related ISO 2022 planes under CCCII, so most of the code points would have to be reserved for variants.
In practice, however, bytes outside of these ranges are sometimes used. The code 0x212320 is used by some implementations as an
ideographic space.
A CCCII specification used by libraries in Hong Kong uses codes starting with 0x2120 for punctuation and symbols.
The first byte 0x7F is used by some variants to encode codes for some otherwise unavailable
Unified Repertoire and Ordering or
CJK Unified Ideographs Extension A
__FORCETOC__
CJK Unified Ideographs Extension-A is a Unicode block
A Unicode block is one of several contiguous ranges of numeric character codes (code points) of the Unicode character set that are defined by the Unicode Consortium for adminis ...
hanzi (e.g. 0x7F3449 for U+3449 or 0x7F796E for U+796E;
notice how the continuation bytes match the
UCS-2BE code), and this may include bytes outside of the 0x21–0x7E or even 0x20–0x7F range, e.g. 0x7F551C for U+551C,
0x7F5AA4 for U+5AA4
or 0x7F8EDA for U+8EDA.
Interaction with ISO 2022
CCCII/EACC is not registered in the
International Registry of Coded Character Sets to be Used with Escape Sequences,
and as such, does not have a standard designation escape for use with ISO 2022. MARC-8 assigns EACC the private-use -byte 0x31 () in its implementation of ANSI X3.41 (ISO 2022).
Layers and variant characters
The 94 ISO 2022 planes are grouped into 16 layers of 6 planes each (except for layer 16, which contains the four planes 91–94).
Layer 1 contains both non-hanzi and
hanzi
Chinese characters are logographs used to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represent the only one ...
characters, with the non-hanzi and most frequently used hanzi being placed in plane 1, and with the remaining five planes consisting of less common hanzi.
Layer 2 contains
simplified Chinese characters
Simplified Chinese characters are one of two standardized Chinese characters, character sets widely used to write the Chinese language, with the other being traditional characters. Their mass standardization during the 20th century was part of ...
, with their
row and cell numbers being the same as their
traditional Chinese
A tradition is a system of beliefs or behaviors (folk custom) passed down within a group of people or society with symbolic meaning or special significance with origins in the past. A component of cultural expressions and folklore, common examp ...
equivalents in layer 1. Layers 3 through 12 contain further
variant forms, at row and cell numbers homologous to the first two layers.
The last four layers are used for other purposes. Specifically, layer 13 contains additional characters for
Japanese language
is the principal language of the Japonic languages, Japonic language family spoken by the Japanese people. It has around 123 million speakers, primarily in Japan, the only country where it is the national language, and within the Japanese dia ...
support (
kana
are syllabary, syllabaries used to write Japanese phonology, Japanese phonological units, Mora (linguistics), morae. In current usage, ''kana'' most commonly refers to ''hiragana'' and ''katakana''. It can also refer to their ancestor , wh ...
and Japanese
kokuji
In Japanese, or are kanji created in Japan rather than borrowed from China. Like most Chinese characters, they are primarily formed by combining existing characters - though using combinations that are not used in Chinese.
Since kokuji ar ...
), and layer 14 contains additional characters for
Korean language
Korean is the first language, native language for about 81 million people, mostly of Koreans, Korean descent. It is the national language of both South Korea and North Korea. In the south, the language is known as () and in the north, it is kn ...
support (
hangul
The Korean alphabet is the modern writing system for the Korean language. In North Korea, the alphabet is known as (), and in South Korea, it is known as (). The letters for the five basic consonants reflect the shape of the speech organs ...
).
Layer 15 is unused (reserved), while layer 16 is used for other characters.
This distinctive design has been criticized by Christian Wittern of the International Research Institute for Zen Buddhism at
Hanazono University
is a private university in Kyoto, Japan that belongs to the Rinzai sect (specifically the Myōshin-ji temple complex, which it is next to). The university and the neighborhood are named for Emperor Hanazono, whose donated his palace to make Myōs ...
, who asserts that the relationship of character variants "is very complex and can not be expressed in a fixed, one-dimensional, hard-wired codetable".
Ken Lunde describes it as "one of the most well thought-out character set standards from Taiwan", describing its structure as "to be truly admired", but concluding that
OpenType
OpenType is a format for scalable computer fonts. Derived from TrueType, it retains TrueType's basic structure but adds many intricate data structures for describing typographic behavior. OpenType is a registered trademark of Microsoft Corpora ...
variant form substitution can provide the same level of functionality.
CCCII defines roughly 53940 code points as of its 1987 edition, although a more recent draft from 1989 extends this to 75684 code points (comprising 44167 unique characters and 31517 variants). EACC, the variant used by the Library of Congress, includes only a smaller set of 15686 characters.
Adoption
As of 1995, CCCII or EACC was used mostly in libraries in the
United States
The United States of America (USA), also known as the United States (U.S.) or America, is a country primarily located in North America. It is a federal republic of 50 U.S. state, states and a federal capital district, Washington, D.C. The 48 ...
,
Hong Kong
Hong Kong)., Legally Hong Kong, China in international treaties and organizations. is a special administrative region of China. With 7.5 million residents in a territory, Hong Kong is the fourth most densely populated region in the wor ...
and
Taiwan
Taiwan, officially the Republic of China (ROC), is a country in East Asia. The main geography of Taiwan, island of Taiwan, also known as ''Formosa'', lies between the East China Sea, East and South China Seas in the northwestern Pacific Ocea ...
. Although CCCII promised pan-
CJK coverage, its support was limited to specialized hardware; difficulty ascertaining when the root versus variant character should be used, exacerbated by a lack of firmly established reference glyphs, further limited its adoption, resulting in
Big5
Big-5 or Big5 ( zh, t=大五碼) is a Chinese character encoding method used in Taiwan, Hong Kong, and Macau for traditional Chinese characters.
The People's Republic of China (PRC), which uses simplified Chinese characters, uses the GB 18030 ...
being more commonly used for Chinese in those territories outside of library use (since
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
had yet to become widely adopted at the time).
, EACC is still in extensive use for specialized bibliographic purposes.
[
] It was also an important precursor to Unicode:
work at
Apple
An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...
on a CJK character cross-reference database based on
Research Libraries Group
The Research Libraries Group (RLG) was a U.S.-based library consortium that existed from 1974 until its merger with the OCLC library consortium in 2006. RLG developed the Eureka interlibrary search engine, the RedLightGreen database of bibliogr ...
's CJK Thesaurus, used to maintain EACC, was directly incorporated into the development of
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
's
Unihan
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature ...
set.
Unicode
hanzi
Chinese characters are logographs used to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represent the only one ...
characters are referenced to their corresponding CCCII and EACC codes in the
Unihan
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a feature ...
database, in the keys and ;
however, since Unicode's character unification criteria (based on those used by the Japanese
JIS X 0208
JIS X 0208 is a 2-byte character set specified as a Japanese Industrial Standards, Japanese Industrial Standard, containing 6879 graphic characters suitable for writing text, place names, personal names, and so forth in the Japanese language. Th ...
and on those developed by the Association for a Common Chinese Code in China) differ from those used by CCCII, not all variant characters are individually mapped.
Mapping tables for hanzi,
hangul
The Korean alphabet is the modern writing system for the Korean language. In North Korea, the alphabet is known as (), and in South Korea, it is known as (). The letters for the five basic consonants reflect the shape of the speech organs ...
,
kana
are syllabary, syllabaries used to write Japanese phonology, Japanese phonological units, Mora (linguistics), morae. In current usage, ''kana'' most commonly refers to ''hiragana'' and ''katakana''. It can also refer to their ancestor , wh ...
and punctuation between EACC and Unicode are available from the Library of Congress.
Punctuation, symbol, kana and jamo charts
Following are charts for punctuation, symbols,
kana
are syllabary, syllabaries used to write Japanese phonology, Japanese phonological units, Mora (linguistics), morae. In current usage, ''kana'' most commonly refers to ''hiragana'' and ''katakana''. It can also refer to their ancestor , wh ...
and Hangul
jamo, showing the characters and giving possible Unicode mappings. Where possible, these are referenced against published mapping data.
Unicode mappings for Hangul syllables are omitted below for brevity, but are documented by the Library of Congress. CCCII hanzi number in the tens of thousands
and are not shown below (except where they are also included in the non-hanzi range, as radicals or numerals), but mappings to Unicode are available from the Unihan database
and from elsewhere.
Character set 0x2120 (plane 1, row 0: Hong Kong punctuation)
Although CCCII is usually a 94
n set,
and therefore does not usually use codes starting with 0x2120,
the following layout is used by a variant used by libraries in Hong Kong:
Character set 0x2121 (plane 1, row 1: reserved for controls)
No characters are assigned in plane 1 row 1, which is reserved for
control code
In computing and telecommunications, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than ...
s.
Character set 0x2122 (plane 1, row 2: mathematical operators)
This row contains mathematical operators. EACC leaves this row empty.
The following table is referenced against sources from Taiwan.
The following table is referenced against CCCII data provided by the Hong Kong
Innovative
Innovation is the practical implementation of ideas that result in the introduction of new goods or services or improvement in offering goods or services. ISO TC 279 in the standard ISO 56000:2020 defines innovation as "a new or changed ent ...
Users Group, a group of libraries in Hong Kong, and hosted by the
University of Hong Kong
The University of Hong Kong (HKU) is a public research university in Pokfulam, Hong Kong. It was founded in 1887 as the Hong Kong College of Medicine for Chinese by the London Missionary Society and formally established as the University of ...
.
It uses an entirely different layout in this row:
Character set 0x2123 (plane 1, row 3: Roman and punctuation)
This row includes punctuation,
western Arabic numerals
The ten Arabic numerals (0, 1, 2, 3, 4, 5, 6, 7, 8, and 9) are the most commonly used symbols for writing numbers. The term often also implies a positional notation number with a decimal base, in particular when contrasted with Roman numerals. ...
and Roman letters.
Compare
row 3 of Wansung code and
row 3 of GB 2312.
Different variants variously encode the
ideographic space (U+3000) at 0x212320 (which the MARC specification acknowledges),
0x212321 (which is listed in the ANSI standard, and is also acknowledged by MARC),
or 0x21635F.
EACC includes only the
hyphen-minus
The symbol , known in Unicode as hyphen-minus, is the form of hyphen most commonly used in digital documents. On most keyboards, it is the only character that resembles a minus sign or a dash, so it is also used for these. The name ''hyphen-mi ...
, parentheses and ideographic space in this set.
Character set 0x212A (plane 1, row 10: internal IME characters and geta mark)
In EACC, this row includes several
Private Use Area
In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the standard. Three Private Use Areas are defined: one in the Basic Multilingual Plane (), and one each in, and nearly covering ...
mapped characters used internally to represent character components by the
RLIN
The Research Libraries Group (RLG) was a U.S.-based library consortium that existed from 1974 until its merger with the OCLC library consortium in 2006. RLG developed the Eureka interlibrary search engine, the RedLightGreen database of bibliograp ...
input method,
which is used by the Library of Congress for non-Roman cataloging.
These component characters should only be used internally by an
IME and, if encountered elsewhere, may be replaced with the
geta mark (U+3013),
which this row also includes at 0x212A46. This row is unassigned in CCCII,
but the geta mark is also listed at that location in some mappings for CCCII.
Character set 0x212B (plane 1, row 11: punctuation)
This row contains various punctuation marks used in Chinese,
in addition to other symbols. CCCII includes a set of 35 punctuation marks in this row.
EACC includes only 13 characters in this row (shown boxed below).
Character sets 0x212C–0x212E (plane 1, rows 12–14: radicals and ordinals)
These rows contain
Chinese radicals
A radical (), or indexing component, is a visually prominent component of a Chinese character under which the character is traditionally listed in a Chinese dictionary. The radical for a character is typically a semantic component, but it can ...
,
Roman numerals
Roman numerals are a numeral system that originated in ancient Rome and remained the usual way of writing numbers throughout Europe well into the Late Middle Ages. Numbers are written with combinations of letters from the Latin alphabet, eac ...
,
celestial stems and
terrestrial branches.
Character set 0x212F (plane 1, row 15: Chinese numerals and bopomofo)
This row includes Chinese numerals and
bopomofo
Bopomofo, also called Zhuyin Fuhao ( ; ), or simply Zhuyin, is a Chinese transliteration, transliteration system for Standard Chinese and other Sinitic languages. It is the principal method of teaching Chinese Mandarin pronunciation in Taiwa ...
characters.
EACC includes only the ideographic zero (〇).
Character set 0x272B (plane 7, row 11: reference mark)
This row contains the
reference mark
The reference mark or reference symbol "※" is a typographic mark or word used in Chinese, Japanese and Korean (CJK) writing.
The symbol was used historically to call attention to an important sentence or idea, such as a prologue or footnot ...
(''kome jirushi'').
Character set 0x272E–0x272F (plane 7, rows 14–15: alternative bopomofo)
A variant used by libraries in Hong Kong does not include bopomofo characters in plane 1 row 15, but includes them in a different layout in plane 7.
Character set 0x6921 (plane 73, row 1: Japanese punctuation)
This row is in plane 73, the first plane of layer 13, which contains characters included for
Japanese language
is the principal language of the Japonic languages, Japonic language family spoken by the Japanese people. It has around 123 million speakers, primarily in Japan, the only country where it is the national language, and within the Japanese dia ...
support.
It contains punctuation.
Compare
row 1 of JIS X 0208, which this row tends to follow the layout of for the characters it includes.
Character set 0x6924 (plane 73, row 4: hiragana)
This row contains
hiragana
is a Japanese language, Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hiragana'' means "common" or "plain" kana (originally also "easy", ...
. Compare
row 4 of JIS X 0208.
Character set 0x6925 (plane 73, row 5: katakana)
This row contains
katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji).
The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
. Compare
row 5 of JIS X 0208, which this row corresponds to, besides the addition of the separate
dakuten
The , colloquially , is a diacritic most often used in the Japanese kana syllabaries to indicate that the consonant of a mora should be pronounced voiced, for instance, on sounds that have undergone rendaku (sequential voicing).
The , coll ...
and
handakuten
The , colloquially , is a diacritic most often used in the Japanese language, Japanese kana syllabaries to indicate that the consonant of a Mora (linguistics), mora should be pronounced Voice (phonetics), voiced, for instance, on sounds that ...
.
Character set 0x6F24–0x6F25 (plane 79, rows 4–5: jamo)
These rows contains Korean
jamo.
Character set 0x6F76 (plane 79, row 86: archaic Hangul)
This row contains several historic
Hangul
The Korean alphabet is the modern writing system for the Korean language. In North Korea, the alphabet is known as (), and in South Korea, it is known as (). The letters for the five basic consonants reflect the shape of the speech organs ...
characters no longer in regular use. Several of these are mapped to the
Private Use Area
In Unicode, a Private Use Area (PUA) is a range of code points that, by definition, will not be assigned characters by the standard. Three Private Use Areas are defined: one in the Basic Multilingual Plane (), and one each in, and nearly covering ...
.
Character set 0x7B25 (plane 91, row 5: supplementary Katakana)
This row contains additional
katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji).
The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
used to write foreign phonemes.
See also
*
Chinese character IT
*
Chinese characters
Chinese characters are logographs used Written Chinese, to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represe ...
Footnotes
References
* Some information on this page is based on the information on th
CNS official website
External links
CNS 11643 official web site(English version of pages available) has information about the CCCII character set in the "Chinese Information Code" section
Full mapping of EACC to Unicode, from Library of Congress
{{character encoding
Computer-related introductions in 1980
1980 establishments in Taiwan
Taiwanese inventions
Character encoding
Encodings of Asian languages
Chinese-language computing