Sources
The Ideographic Research Group (IRG) is responsible for developing extensions to the encoded repertoires of CJK unified ideographs. IRG processes proposals for new CJK unified ideographs submitted by its member bodies, and after undergoing several rounds of expert review, IRG submits a consolidated set of characters toUTC sources The majority of characters submitted by the UTC to the IRG are derived from Unicode Technical Committee (UTC) documents. Other sources include: * ''
CJK Unified Ideographs blocks
CJK Unified Ideographs
The basic block named '' CJK Unified Ideographs'' (4E00–9FFF) contains 20,992 basic Chinese characters in the range U+4E00 through U+9FFF. The block not only includes characters used in the Chinese writing system but also kanji used in the Japanese writing system and hanja, whose use is diminishing in Korea. Many characters in this block are used in all three writing systems, while others are in only one or two of the three.Charts
4E00-62FF, 6300-77FF, 7800-8CFF, 8D00-9FFF.Sources
Note: Most characters appear in multiple sources, so the sum of individual character counts (102,794) is far greater than the number of encoded characters (20,992). In Unicode 4.1, 14 HKSCS-2004 characters and 8 GB 18030 characters were assigned to between U+9FA6 and U+9FBB code points. Since then, other additions were added to this block for various reasons, all summarized in the version history section below.CJK Unified Ideographs Extension A
The block named ''Charts
3400-4DBF.Sources
Note: Most characters appear in more than one source, so the sum of individual character counts (18,832) is far greater than the number of encoded characters (6,592).CJK Unified Ideographs Extension B
The block named '' CJK Unified Ideographs Extension B'' (20000–2A6DF) contains 42,720 characters in the range U+20000 through U+2A6DF. These include most of the characters used in the Kangxi Dictionary that are not in the basic CJK Unified Ideographs block, as well as many Hán-Nôm characters that were formerly used to write Vietnamese.Charts
20000-215FF, 21600-230FF, 23100-245FF, 24600-260FF, 26100-275FF, 27600-290FF, 29100-2A6DF.Sources
Note: Many characters appear in more than one source, so the sum of individual character counts (74,204) is far greater than the number of encoded characters (42,720).CJK Unified Ideographs Extension C
The block named ''Charts
2A700-2B73F.Sources
Note: Some characters appear in more than one source, so the sum of individual character counts (4,570) is greater than the number of encoded characters (4,154).CJK Unified Ideographs Extension D
The block named '' CJK Unified Ideographs Extension D'' (2B740–2B81F) contains 222 characters in the range U+2B740 through U+2B81D that were added in Unicode 6.0 (2010).Charts
2B740–2B81F.Sources
Note: Some characters appear in more than one source, so the sum of individual character counts (229) is greater than the number of encoded characters (222).CJK Unified Ideographs Extension E
The block named '' CJK Unified Ideographs Extension E'' (2B820–2CEAF) contains 5,762 characters in the range U+2B820 through U+2CEA1 that were added in Unicode 8.0 (2015).Charts
2B820–2CEAF.Sources
Note: Some characters appear in more than one source, so the sum of individual character counts (5,828) is greater than the number of encoded characters (5,762).CJK Unified Ideographs Extension F
The block named '' CJK Unified Ideographs Extension F'' (2CEB0–2EBEF) contains 7,473 characters in the range U+2CEB0 through 2EBE0 that were added in Unicode 10.0 (2017). It includes more than 1,000 Sawndip characters for Zhuang.Charts
2CEB0–2EBEF.Sources
Note: Some characters appear in more than one source, so the sum of individual character counts (7,774) is greater than the number of encoded characters (7,473).CJK Unified Ideographs Extension G
A block named ''Charts
30000–3134F.Sources
Note: Some characters appear in more than one source, so the sum of individual character counts (5,081) is greater than the number of encoded characters (4,939).CJK Unified Ideographs Extension H
A block named '' CJK Unified Ideographs Extension H'' was added as part of Unicode 15.0 to the Tertiary Ideographic Plane in the range U+31350 through U+323AF, containing 4,192 characters.Charts
31350–323AF.Sources
Note: Some characters appear in more than one source, so the sum of individual character counts (4,305) is greater than the number of encoded characters (4,192).CJK Compatibility Ideographs
The block named '' CJK Compatibility Ideographs'' (F900–FAFF) was created to retain round-trip compatibility with other standards. Only twelve of its characters have the "Unified Ideograph" property: U+FA0E, FA0F, FA11, FA13, FA14, FA1F, FA21, FA23, FA24, FA27, FA28 and FA29. None of the other characters in this and other "Compatibility" blocks relate to CJK Unification.Charts
F900–FAFF.Sources
Note: All characters appear in more than one source, so the sum of individual character counts (36) is greater than the number of encoded characters (12).Known issues
Disunification
U+4039
The character U+4039 (䀹) was a unification of two different characters (one with jiā 夾 phonetic and one with shǎn 㚒 phonetic) until Unicode 5.0. However, they were lexically different characters that should not have been unified; they have different pronunciations and different meanings. The proposal of disunification of U+4039 was accepted and the new character is encoded at U+9FC3 (鿃) in Unicode 5.1.Other 3 glyphs in Extension B
In CJK Unified Ideographs Extension B, some characters are incorrectly unified with others. These characters include U+2017B (𠅻), U+204AF (𠒯) and U+24CB2 (𤲲). The first two characters contained a wrong unification of Chinese Mainland and Vietnamese source of their glyph, while the last one unifies the Chinese Mainland and Taiwanese ones.Unifiable variants and exact duplicates in Extension B
Also in CJK Unified Ideographs Extension B, hundreds of glyph variants were encoded. In addition to the deliberate encoding of close glyph variants, six exact duplicates (where the same character has inadvertently been encoded twice) and two semi-duplicates (where the CJK-B character represents a ''de facto'' disunification of two glyph forms unified in the corresponding BMP character) were encoded by mistake: * U+34A8 㒨 = U+20457 𠑗 : U+20457 is the same as the China-source glyph for U+34A8, but it is significantly different from the Taiwan-source glyph for U+34A8 * U+3DB7 㶷 = U+2420E 𤈎 : same glyph shapes * U+8641 虁 = U+27144 𧅄 : U+27144 is the same as the Korean-source glyph for U+8641, but it is significantly different from the Chinese Mainland-, Taiwan- and Japan-source glyphs for U+8641 * U+204F2 𠓲 = U+23515 𣔕 : same glyph shapes, but ordered under different radicals * U+249BC 𤦼 = U+249E9 𤧩 : same glyph shapes * U+24BD2 𤯒 = U+2A415 𪐕 : same glyph shapes, but ordered under different radicals * U+26842 𦡂 = U+26866 𦡦 : same glyph shapes * U+FA23 﨣 = U+27EAF 𧺯 : same glyph shapes (U+FA23 﨣 is a unified CJK ideograph, despite its name "CJK COMPATIBILITY IDEOGRAPH-FA23.")Other CJK ideographs in Unicode, not Unified
Apart from the nine blocks of "Unified Ideographs," Unicode has about a dozen more blocks with not-unified CJK-characters. These are mainly CJK radicals, strokes, punctuation, marks, symbols and compatibility characters. Although some characters have their (decomposable) counterparts in other blocks, the usages can be different. An example of a not-unified CJK-character is in the CJK Symbols and Punctuation block. Although it is not covered under "CJK Unified Ideographs", it is treated as a CJK-character for all other intents and purposes. Four blocks of compatibility characters are included for compatibility with legacy text handling systems and older character sets: * CJK Compatibility (3300–33FF) * CJK Compatibility Forms (FE30–FE4F) * CJK Compatibility Ideographs (F900–FAFF) * CJK Compatibility Ideographs Supplement (2F800–2FA1F) They include forms of characters for vertical text layout and rich text characters that Unicode recommends handling through other means. Therefore, their use is discouraged.Font support
The blocks CJK Unified Ideographs and CJK Unified Ideographs Extension A, being parts of the Basic Multilingual Plane, are supported by the majority of the CJK fonts. However, Japanese and Korean fonts usually have fewer characters (about 13,000 and 8,000, respectively) than Chinese. Extensions B, C, D are supported by additional fonts MingLiU-ExtB, MingLiU_HKSCS-ExtB, PMingLiU-ExtB, SimSun-ExtB included in Microsoft Windows since Vista.Unicode version history
See also
* Han Unification * List of Unicode characters * List of CJK fonts * Ideographic Research Group * Chinese cultural sphereNotes
External links