
In
internationalization
Internationalization or Internationalisation is the process of increasing involvement of enterprises in international markets, although there is no agreed definition of internationalization. Internationalization is a crucial strategy not only for ...
, CJK characters is a collective term for
graphemes
In linguistics, a grapheme is the smallest functional unit of a writing system.
The word ''grapheme'' is derived from Ancient Greek ('write'), and the suffix ''-eme'' by analogy with ''phoneme'' and other emic units. The study of graphemes ...
used in the
Chinese,
Japanese, and
Korean writing system
Korean is the native language for about 81 million people, mostly of Korean descent. It is the national language of both South Korea and North Korea. In the south, the language is known as () and in the north, it is known as (). Since the tu ...
s, which each include
Chinese characters
Chinese characters are logographs used Written Chinese, to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represe ...
. It can also go by CJKV to include
Chữ Nôm
Chữ Nôm (, ) is a logographic writing system formerly used to write the Vietnamese language. It uses Chinese characters to represent Sino-Vietnamese vocabulary and some native Vietnamese words, with other words represented by new characters ...
, the Chinese-origin
logographic
In a written language, a logogram (from Ancient Greek 'word', and 'that which is drawn or written'), also logograph or lexigraph, is a written character that represents a semantic component of a language, such as a word or morpheme. Chinese c ...
script formerly used for the
Vietnamese language
Vietnamese () is an Austroasiatic languages, Austroasiatic language Speech, spoken primarily in Vietnam where it is the official language. It belongs to the Vietic languages, Vietic subgroup of the Austroasiatic language family. Vietnamese is s ...
, or CJKVZ to also include
Sawndip
(Sawndip: ; ) are Chinese characters used to write the Zhuang languages in the Chinese provinces of Guangxi and Yunnan. is a Standard Zhuang, Zhuang word that means "immature characters". The Zhuang word for Chinese characters used in the Chi ...
, used to write the
Zhuang languages
The Zhuang languages (; autonym: , , pre-1982: , Sawndip: 話僮, from ''vah'', 'language' and ''Cuengh'', 'Zhuang'; ) are the more than a dozen Tai languages spoken by the Zhuang people of Southern China in the province of Guangxi and adjace ...
.
Character repertoire
Standard Mandarin Chinese and Standard Cantonese are written almost exclusively in Chinese characters. Over 3,000 characters are required for general
literacy
Literacy is the ability to read and write, while illiteracy refers to an inability to read and write. Some researchers suggest that the study of "literacy" as a concept can be divided into two periods: the period before 1950, when literacy was ...
, with up to 40,000 characters for reasonably complete coverage. Japanese uses fewer characters—general literacy in Japanese can be expected with 2,136 characters. The use of Chinese characters in Korea is increasingly rare, although idiosyncratic use of Chinese characters in proper names requires knowledge (and therefore availability) of many more characters. Even today, however, some South Korean students learn
1,800 characters.
Other scripts used for these languages, such as
bopomofo
Bopomofo, also called Zhuyin Fuhao ( ; ), or simply Zhuyin, is a Chinese transliteration, transliteration system for Standard Chinese and other Sinitic languages. It is the principal method of teaching Chinese Mandarin pronunciation in Taiwa ...
and the
Latin
Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
-based
pinyin
Hanyu Pinyin, or simply pinyin, officially the Chinese Phonetic Alphabet, is the most common romanization system for Standard Chinese. ''Hanyu'' () literally means 'Han Chinese, Han language'—that is, the Chinese language—while ''pinyin' ...
for Chinese,
hiragana
is a Japanese language, Japanese syllabary, part of the Japanese writing system, along with ''katakana'' as well as ''kanji''.
It is a phonetic lettering system. The word ''hiragana'' means "common" or "plain" kana (originally also "easy", ...
and
katakana
is a Japanese syllabary, one component of the Japanese writing system along with hiragana, kanji and in some cases the Latin script (known as rōmaji).
The word ''katakana'' means "fragmentary kana", as the katakana characters are derived fr ...
for Japanese, and
hangul
The Korean alphabet is the modern writing system for the Korean language. In North Korea, the alphabet is known as (), and in South Korea, it is known as (). The letters for the five basic consonants reflect the shape of the speech organs ...
for Korean, are not strictly "CJK characters", although CJK character sets almost invariably include them as necessary for full coverage of the target languages.
The
sinologist
Sinology, also referred to as China studies, is a subfield of area studies or East Asian studies involved in social sciences and humanities research on China. It is an academic discipline that focuses on the study of the Chinese civilizatio ...
Carl Leban (1971) produced an early survey of CJK encoding systems.
Until the early 20th century,
Classical Chinese
Classical Chinese is the language in which the classics of Chinese literature were written, from . For millennia thereafter, the written Chinese used in these works was imitated and iterated upon by scholars in a form now called Literary ...
was the written language of government and scholarship in Vietnam. Popular literature in
Vietnamese was written in the script, consisting of Chinese characters with many characters created locally. Since the 1920s, the script since then used for recording literature has been the Latin-based
Vietnamese alphabet
The Vietnamese alphabet (, ) is the modern writing script for the Vietnamese language. It uses the Latin script based on Romance languages like French language, French, originally developed by Francisco de Pina (1585–1625), a missionary from P ...
.
Encoding
The number of characters required for complete coverage of all these languages' needs cannot fit in the 256-character code space of 8-bit
character encoding
Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
s, requiring at least a 16-bit fixed width encoding or multi-byte variable-length encodings. The 16-bit fixed width encodings, such as those from
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
up to and including version 2.0, are now deprecated due to the requirement to encode more characters than a 16-bit encoding can accommodate—Unicode 5.0 has some 70,000 Han characters—and the requirement by the Chinese government that software in China support the
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
character set.
Although CJK encodings have common character sets, the encodings often used to represent them have been developed separately by different East Asian governments and software companies, and are mutually incompatible.
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
has attempted, with some controversy, to unify the character sets in a process known as
Han unification
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a featur ...
.
CJK character encodings should consist minimally of Han characters plus language-specific phonetic scripts such as
pinyin
Hanyu Pinyin, or simply pinyin, officially the Chinese Phonetic Alphabet, is the most common romanization system for Standard Chinese. ''Hanyu'' () literally means 'Han Chinese, Han language'—that is, the Chinese language—while ''pinyin' ...
,
bopomofo
Bopomofo, also called Zhuyin Fuhao ( ; ), or simply Zhuyin, is a Chinese transliteration, transliteration system for Standard Chinese and other Sinitic languages. It is the principal method of teaching Chinese Mandarin pronunciation in Taiwa ...
, hiragana, katakana and hangul.
CJK character encodings include:
*
Big5 (the most prevalent encoding before Unicode was implemented)
*
CCCII
*
CNS 11643 (official standard of
Republic of China
Taiwan, officially the Republic of China (ROC), is a country in East Asia. The main geography of Taiwan, island of Taiwan, also known as ''Formosa'', lies between the East China Sea, East and South China Seas in the northwestern Pacific Ocea ...
)
*
EUC-JP
*
EUC-KR
Extended Unix Code (EUC) is a multibyte character encoding system used primarily for Japanese language, Japanese, Korean language, Korean, and simplified Chinese characters, simplified Chinese (characters).
The most commonly used EUC codes are va ...
*
GB 2312
is a key official character set of the People's Republic of China, used for Simplified Chinese characters. GB2312 is the registered internet name for EUC-CN, which is its usual encoded form. ''GB'' refers to the Guobiao standards (国家标准), ...
(subset and predecessor of GB 18030)
*
GB 18030
GB 18030 is a Chinese government standard, described as ''Information Technology — Chinese coded character set'' and defines the required language and character support necessary for software in China. GB18030 is the registered Internet n ...
(mandated standard in the
People's Republic of China
China, officially the People's Republic of China (PRC), is a country in East Asia. With population of China, a population exceeding 1.4 billion, it is the list of countries by population (United Nations), second-most populous country after ...
)
* Giga Character Set (GCS)
*
ISO 2022-JP
*
ISO-2022-KR
ISO/IEC 2022 ''Information technology—Character code structure and extension techniques'', is an ISO/ IEC standard in the field of character encoding. It is equivalent to the ECMA standard ECMA-35, the ANSI standard ANSI X3.41 and the Japane ...
*
KS X 1001
KS X 1001, "''Code for Information Interchange (Hangul and Hanja)''", formerly called KS C 5601, is a South Korean coded character set standard to represent Hangul and Hanja characters on a computer.
KS X 1001 is encoded by the most common leg ...
*
KPS 9566
*
Shift-JIS
*
TRON
''Tron'' (stylized as ''TRON'') is a 1982 American science fiction action adventure film written and directed by Steven Lisberger from a story by Lisberger and Bonnie MacBird. The film stars Jeff Bridges as Kevin Flynn, a computer programmer ...
*
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
The CJK character sets take up the bulk of the assigned
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
code space. There is much controversy among Japanese experts of Chinese characters about the desirability and technical merit of the
Han unification
Han unification is an effort by the authors of Unicode and the Universal Character Set to map multiple character sets of the Han characters of the so-called CJK languages into a single set of unified characters. Han characters are a featur ...
process used to map multiple Chinese and Japanese character sets into a single set of unified characters.
All three languages can be written both
left-to-right and top-to-bottom (right-to-left and top-to-bottom in ancient documents), but are usually considered left-to-right scripts when discussing encoding issues.
Legal status
Libraries cooperated on encoding standards for
JACKPHY characters in the early 1980s. According to
Ken Lunde, the abbreviation "CJK" was a registered
trademark
A trademark (also written trade mark or trade-mark) is a form of intellectual property that consists of a word, phrase, symbol, design, or a combination that identifies a Good (economics and accounting), product or Service (economics), service f ...
of
Research Libraries Group[Ken Lunde, 1996](_blank)
/ref> (which merged with OCLC
OCLC, Inc. See also: is an American nonprofit cooperative organization "that provides shared technology services, original research, and community programs for its membership and the library community at large". It was founded in 1967 as the ...
in 2006). The trademark owned by OCLC between 1987 and 2009 has now expired.Justia listing
/ref>
See also
* Chinese character description languages
Several systems have been proposed for describing the internal structure of Chinese characters, including their strokes, components, and the stroke order, and the location of each in the character's ideal square. This information is useful for iden ...
* Chinese character encoding
In computing, Chinese character encodings can be used to represent text written in the CJK characters, CJK languages—Chinese language, Chinese, Japanese language, Japanese, Korean language, Korean—and (rarely) obsolete Chữ Nôm, Vietnamese, ...
* Chinese input methods for computers
Several input methods allow the use of Chinese characters with computers. Most allow selection of characters based either on their pronunciation or their graphical shape. Phonetic input methods are easier to learn but are less efficient, while g ...
* CJK Compatibility Ideographs
* Chinese character strokes
Strokes ( zh, t=筆畫, s=笔画, p=bǐhuà) are the smallest structural units making up written Chinese characters. In the act of writing, a stroke is defined as a movement of a writing instrument on a writing material surface, or
the trace l ...
* CJK Unified Ideographs
The Chinese, Japanese and Korean (CJK) scripts share a common background, collectively known as CJK characters. During the process called Han unification, the common (shared) characters were identified and named CJK Unified Ideographs. As of Uni ...
* Complex Text Layout languages (CTL)
* Input method editor
An input method (or input method editor, commonly abbreviated IME) is an operating system component or program that enables users to generate characters not natively available on their input devices by using sequences of characters (or mouse ope ...
* Japanese language and computers
In relation to the Japanese language and computers many adaptation issues arise, some unique to Japanese language, Japanese and others common to languages which have a very large number of characters. The number of characters needed in order to w ...
* Korean language and computers
* List of CJK fonts
This is a list of notable CJK fonts (computer fonts with a large range of CJK characters, Chinese/Japanese/Korean characters). These fonts are primarily sorted by their typeface, the main classes being "with serif", "without serif" and "script". ...
* Sinoxenic
* Variable-width encoding
* Vietnamese language and computers
References
Works cited
*
*
Sources
* DeFrancis, John. '' The Chinese Language: Fact and Fantasy''. Honolulu: University of Hawaii Press, 1990. .
* Hannas, William C. ''Asia's Orthographic Dilemma''. Honolulu: University of Hawaii Press, 1997. (paperback); (hardcover).
* Lemberg, Werner: The CJK package for LATEX2ε—Multilingual support beyond babel. TUGboat, Volume 18 (1997), No. 3—Proceedings of the 1997 Annual Meeting.
* Leban, Carl.
Automated Orthographic Systems for East Asian Languages (Chinese, Japanese, Korean)
', State-of-the-art Report, Prepared for the Board of Directors, Association for Asian Studies. 1971.
* Lunde, Ken. ''CJKV Information Processing''. Sebastopol, Calif.: O'Reilly & Associates, 1998. .
External links
CJKV: A Brief Introduction
Lemberg CJK article from above, TUGboat18-3
On "CJK Unified Ideograph"
from Wenlin.com
{{CJK ideographs in Unicode
Encodings of Asian languages
Languages of East Asia
Natural language and computing
Chinese-language computing
Japanese-language computing
Korean-language computing
Writing systems using Chinese characters
ja:CJKV