Chinese Character Encoding

	Chinese Character Encoding In computing, Chinese character encodings can be used to represent text written in the CJK characters, CJK languages—Chinese language, Chinese, Japanese language, Japanese, Korean language, Korean—and (rarely) obsolete Chữ Nôm, Vietnamese, all of which use Chinese characters. Several general-purpose character encodings accommodate Chinese characters, and some of them were developed specifically for Chinese. In addition to Unicode (with the set of CJK Unified Ideographs), local encoding systems exist. The Chinese Guobiao code, Guobiao (or GB, "national standard") system is used in mainland China and Singapore, and the (mainly) Taiwanese Big5 system is used in Taiwan, Hong Kong and Macau as the two primary "legacy" local encoding systems. Guobiao is usually displayed using Simplified Chinese character, simplified characters and Big5 is usually displayed using traditional Chinese characters, traditional characters. There is however no mandated connection between the encoding sy ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	CJK Characters In internationalization, CJK characters is a collective term for graphemes used in the Chinese, Japanese, and Korean writing systems, which each include Chinese characters. It can also go by CJKV to include Chữ Nôm, the Chinese-origin logographic script formerly used for the Vietnamese language, or CJKVZ to also include Sawndip, used to write the Zhuang languages. Character repertoire Standard Mandarin Chinese and Standard Cantonese are written almost exclusively in Chinese characters. Over 3,000 characters are required for general literacy, with up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. The use of Chinese characters in Korea is increasingly rare, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. Even today, however, some South Korean students learn 1,800 character ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Traditional Chinese Characters Traditional Chinese characters are a standard set of Chinese character forms used to written Chinese, write Chinese languages. In Taiwan, the set of traditional characters is regulated by the Ministry of Education (Taiwan), Ministry of Education and standardized in the ''Standard Form of National Characters''. These forms were predominant in written Chinese until the middle of the 20th century, when various Chinese family of scripts, countries that use Chinese characters began standardizing simplified sets of characters, often with characters that existed before as well-known variant Chinese characters, variants of the predominant forms. Simplified characters as codified by the People's Republic of China are predominantly used in mainland China, Malaysia, and Singapore. "Traditional" as such is a retronym applied to non-simplified character sets in the wake of widespread use of simplified characters. Traditional characters are commonly used in Taiwan, Hong Kong, and Macau, as ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	PostScript Fonts PostScript fonts are font files encoded in outline font specifications developed by Adobe Systems for professional digital typesetting. This system uses PostScript file format to encode font information. "PostScript fonts" may also separately be used to refer to a basic set of fonts included as standards in the PostScript system, such as Times New Roman, Helvetica, and Avant Garde. History Type 1 and Type 3 fonts, though introduced by Adobe in 1984 as part of the PostScript page description language, did not see widespread use until March 1985 when the first laser printer to use the PostScript language, the Apple LaserWriter, was introduced. Even then, in 1985, the outline fonts were resident only in the printer, and the screen used bitmap fonts as substitutes for outline fonts. Although originally part of PostScript, Type 1 fonts used a simplified set of drawing operations compared to ordinary PostScript (programmatic elements such as loops and variables were removed, ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Ethnic Minorities In China Ethnic minorities in China are the non-Han Chinese, Han population in the China, People's Republic of China (PRC). The PRC officially recognizes 55 ethnic minority groups within China in addition to the Han majority. , the combined population of officially-recognized minority groups comprised 8.89% of the population of Mainland China. In addition to these officially-recognized ethnic minority groups, there are Chinese nationals who privately classify themselves as members of unrecognized ethnic groups in China, unrecognized ethnic groups, such as the very small Chinese history of the Jews in China, Jewish, Tuvans, Tuvan, and Ili Turk people, Ili Turk communities, as well as the much larger Oirats, Oirat and Japanese people in China, Japanese communities. In Chinese, 'ethnic minority' has translated to (), wherein () means 'Nationalities (ethnic affiliations), nationality' or 'nation' (as in ethnic group)—in line with the Soviet nationalities policy, Soviet concept of ethni ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	GB 18030 GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet name for the official character set of the People's Republic of China (PRC) superseding GB2312. As a Unicode Transformation Format (i.e. an encoding of all Unicode code points), GB18030 supports both simplified and traditional Chinese characters. It is also compatible with legacy encodings including GB/T 2312, CP936, and GBK 1.0. The Unicode Consortium has warned implementers that the latest version of this Chinese standard, GB 18030-2022, introduces what they describe as "disruptive changes" from the previous version GB 18030-2005 "involving 33 different characters and 55 code positions". GB 18030-2022 was enforced from 1 August 2023. It has been implemented in ICU 73.2; and in Java 21, and backported to older ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	Code Page 1386 Windows code page 936 (abbreviated MS936, Windows-936 or (Code page 936 (IBM), ambiguously) CP936), is Microsoft's legacy (pre-Unicode) character encoding for representing simplified Chinese text Chinese character IT, on computers. It is one of the four Windows DBCSs for East Asian languages, accompanying code pages Code page 932 (Microsoft Windows), 932 (Japanese language, Japanese), Unified Hangul Code, 949 (Korean language, Korean) and Code page 950, 950 (Traditional Chinese). It is a variant of the Mainland China, Mainland Chinese GBK (character encoding), ''Guójiā Biāozhǔn Kuòzhǎn'' (GBK) encoding, and roughly corresponds to IBM code page 1386 (CP1386 or IBM-1386). History Originally, Windows-936 covered GB 2312 (in its EUC-CN form), but it was expanded to cover most of GBK (character encoding), GBK with the release of Windows 95. The Euro sign (€), not defined in GBK, is encoded as 0x80 in Windows-936 and IBM-1386. On the other hand, 95 characters defined in GBK 1.0 ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	GBK (character Encoding) GBK is an extension of the GB 2312 character set for Simplified Chinese characters, used in the People's Republic of China. It includes all unified CJK characters found in , i.e. ISO/IEC 10646:1993, or Unicode 1.1. Since its initial release in 1993, GBK has been extended by Microsoft in Code page 936/1386, which was then extended into GBK 1.0. GBK is also the IANA-registered internet name for the Microsoft mapping, which differs from other implementations primarily by the single-byte euro sign at 0x80. ''GB'' abbreviates Guójiā Biāozhǔn, which means ''national standard'' in Chinese, while ''K'' stands for ''Extension'' (扩展 ''kuòzhǎn''). GBK not only extended the old standard with Traditional Chinese characters, but also with Chinese characters that were simplified after the establishment of in 1981. With the arrival of GBK, certain names with characters formerly unrepresentable, like the 镕 (''róng'') character in former Chinese Premier Zhu Rongji's name, are now ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	GB/T 12345 GB 12345, entitled ''Code of Chinese ideogram set for information interchange supplementary set'' ( zh, s=信息交換用漢字編碼字符集　輔助集), is a Traditional Chinese character set standard established by China, and can be thought as the traditional counterpart of GB 2312. It is used as an encoding of traditional Chinese characters, although it is not as commonly used as Big5. It has 6,866 characters, and has no relationship nor compatibility with Big5 and CNS 11643. Characters Characters in GB 12345 are arranged in a 94×94 grid (as in ISO/IEC 2022), and the two-byte code point of each character is expressed in the ''qu''-''wei'' form, which specifies a row (''qu'' 区) and the position of the character within the row (cell, ''wei'' 位). The rows (numbered from 1 to 94) contain characters as follows: * 01–09: identical to GB 2312, except in row 06 position 57–85, added 29 vertical punctuation forms, and in row 08 position 27–32, added 6 pinyin characters ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	HZ (character Encoding) The HZ character encoding is an encoding of GB 2312 that was formerly commonly used in email and USENET postings. It was designed in 1989 by Fung Fung Lee () of Stanford University, and subsequently codified in 1995 into RFC 1843. The HZ, short for ''Hanzi'' (), encoding was invented to facilitate the use of Chinese characters through e-mail, which at that time only allowed 7-bit characters. Therefore, in lieu of standard ISO 2022 escape sequences (as in the case of ISO-2022-JP) or 8-bit characters (as in the case of EUC), the HZ code uses only printable, 7-bit characters to represent Chinese characters. It was also popular in USENET networks, which in the late 1980s and early 1990s, generally did not allow transmission of 8-bit characters or escape characters. History HZ superseded the earlier "zW" encoding, which marked entire lines as being GB 2312 text by beginning them with the characters zW. Structure and use In the HZ encoding system, the character sequences "~" act as e ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
picture info	EUC-CN Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese, Korean, and simplified Chinese (characters). The most commonly used EUC codes are variable-length encodings with a character belonging to an compliant coded character set (such as ASCII) taking one byte, and a character belonging to a 94×94 coded character set (such as ) represented in two bytes. The EUC-CN form of and EUC-KR are examples of such two-byte EUC codes. EUC-JP includes characters represented by up to three bytes, including an initial , whereas a single character in EUC-TW can take up to four bytes. Modern applications are more likely to use UTF-8, which supports all of the glyphs of the EUC codes, and more, and is generally more portable with fewer vendor deviations and errors. EUC is however still very popular, especially EUC-KR for South Korea. Encoding structure The structure of EUC is based on the standard, which specifies a system of graphical charact ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	GB 2312 is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. ''GB'' refers to the Guobiao standards (国家标准), whereas the ''T'' suffix ( zh, c= 推荐, p=tuījiàn, l=recommendation, labels=no) denotes a non-mandatory standard. was originally a mandatory national standard designated . However, following a National Standard Bulletin of the People's Republic of China in 2017, GB 2312 is no longer mandatory, and its standard code is modified to . has been superseded by GBK and GB 18030, which include additional characters, but remains in widespread use as a subset of those encodings. , GB2312 is the second-most popular encoding served from China and territories (after UTF-8), with 5.5% of web servers serving a page declaring it. Globally, GB2312 is declared on 0.1% of all web pages. However, all major web browsers decode GB2312-marked document ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]
	Simplified Chinese Simplification, Simplify, or Simplified may refer to: Mathematics Simplification is the process of replacing a mathematical expression by an equivalent one that is simpler (usually shorter), according to a well-founded ordering. Examples include: * Simplification of algebraic expressions, in computer algebra * Simplification of boolean expressions i.e. logic optimization * Simplification by conjunction elimination in inference in logic yields a simpler, but generally non-equivalent formula * Simplification of fractions Science * Approximations simplify a more detailed or difficult to use process or model Linguistics * Simplification of Chinese characters * Simplified English (other) * Text simplification Music * ''Simplify'', a 1999 album by Ryan Shupe & the RubberBand * Simplified (band), a 2002 rock band from Charlotte, North Carolina * ''Simplified'' (album), a 2005 album by Simply Red * "Simplify", a 2008 song by Sanguine * "Simplify", a 2018 song by Yo ... [...More Info...] [...Related Items...] OR: [Wikipedia] [Google] [Baidu]