The Chinese character description languages are several proposed languages to most accurately and completely describe Chinese (or
CJK) characters and information such as their list of components, list of strokes (basic and complex), their order, and the location of each of them on a background empty square. They are designed to overcome the inherent lack of information within a bitmap description. This enriched information can be used to identify variants of characters that are unified into one code point by
Unicode and
ISO/IEC 10646, as well as to provide an alternative form of representation for rare characters that do not yet have a standardized encoding in Unicode or ISO/IEC 10646. Many aim to work for
Kaishu style and
Song style, as well as to provide the character's internal structure which can be used for easier look-up of a character by indexing the character's internal make-up and cross-referencing among similar characters.
CDL

Character Description Language is a
font
In metal typesetting, a font is a particular size, weight and style of a typeface. Each font is a matched set of type, with a piece (a "sort") for each glyph. A typeface consists of a range of such fonts that shared an overall design.
In mod ...
technology, based on
XML, co-created by Tom Bishop and Richard Cook for
Wenlin Institute, Inc, designed for describing any
CJK character, but suitable for describing any
glyph
A glyph () is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A g ...
.
This XML-based
declarative language defines the
stroke order
Stroke order is the order in which the strokes of a Chinese character (or Chinese derivative character) are written. A stroke is a movement of a writing instrument on a writing surface. Chinese characters are used in various forms in Chinese ...
of each component (a subunit of the glyph similar to a
radical
Radical may refer to:
Politics and ideology Politics
*Radical politics, the political intent of fundamental societal change
*Radicalism (historical), the Radical Movement that began in late 18th century Britain and spread to continental Europe and ...
, but not necessarily bearing the semantic significance of a true radical), as well as assembly of previously defined components to build up ever more complex characters. Many of these components are characters in their own right, in addition to serving as building-block components.
The background looks like a square of 128
pixels on each side. In this background:
# Each of about 50 strokes can be drawn in
SVG.
# A basic component is composed by calling several strokes. In this component, each stroke is described by its bottom-left and top-right corner. Transformations are possible (reduction, enlargement, etc.). There are more than 1,000 basic components.
# A character is composed by calling several components. In this character, each component is described by its bottom-left and top-right corner. In order for a component to fit into its proper portion of the Chinese character's rectangular block, a component may be transformed (e.g.,
horizontal
Horizontal may refer to:
*Horizontal plane, in astronomy, geography, geometry and other sciences and contexts
*Horizontal coordinate system, in astronomy
*Horizontalism, in monetary circuit theory
*Horizontalism, in sociology
*Horizontal market, ...
or
vertical reduction or enlargement) upon its use as a building-block embedded within a containing more-complex character.
Accordingly, a set of less than 50 strokes
[Bishop & Cook 2013-12-31:p2] allow one to construct a set of about 1,000 components
[Bishop & Cook 2013-12-31:p9] which may in turn be embedded within tens of thousands of characters' descriptions.
A change in the shape of one of the 50 basic strokes is implicitly applied within each character that embeds that stroke. Likewise, a change to a component is implicitly applied within each and all characters whose assemblage uses that component.
T. Bishop and R. Cook explain this as follows:
nearly 100,000 Chinese characters have been described via CDL.
[Wenlin Institute webpage for CDL](_blank)
/ref>
HanGlyph
A character description language intended for supplying missing rare characters in documents (addressing the Chinese equivalent of the gaiji problem). Documents can contain markup for missing characters, which will automatically trigger the generation of small fonts to provide the characters. The language itself is a simple postfix notation describing strokes and ways to combine them. The prototype software uses Metapost to render the characters and embed them in LaTeX documents. The language was presented by Wai Wong in 1997, and papers about its implementation in Metapost and LaTeX appeared at TeX user group conferences in 2003.
Ideographic Description Sequences
Chapter 12 of the Unicode specification[https://www.unicode.org/versions/Unicode6.0.0/ch12.pdf ] defines a syntax for "Ideographic Description Sequences" (IDSes) intended for use in describing characters not included in the standard in terms of combinations of components that do have code points. Twelve special characters in the range U+2FF0 to U+2FFB act as prefix operators to combine other characters or sequences to form larger characters.
These sequences are useful in describing to the reader a character that is not directly printable, either because it is absent in a given font, or is absent from the Unicode standard altogether. For example, the Sawndip character "
" (encoded in CJK Unified Ideographs Extension F as U+2DA21 𭨡) can be described as "⿰書史". Another use is for dictionary lookup purposes, as a sort of rough input method for queries.
These sequences can be rendered either by keeping the individual characters separately or by parsing the Ideographic Description Sequence and drawing the ideograph so described. They do not, by themselves, provide unambiguous rendering for all characters. For instance, the sequence ⿱十一 represents both 土
Radical 32 or radical earth () meaning "earth" is one of the 31 Kangxi radicals (214 radicals total) composed of three strokes.
In the ''Kangxi Dictionary'', there are 580 characters (out of 49,030) to be found under this radical.
is also th ...
("soil", the middle bar being narrower) and 士
Radical 33 or radical scholar (士部) meaning "scholar" or " bachelor" is one of the 31 Kangxi radicals (214 radicals total) composed of three strokes.
In the ''Kangxi Dictionary'', there are 24 characters (out of 49,030) to be found under thi ...
("bachelor", the middle bar being wider).
Unicode's specification for these sequences is based on the characters and syntax of the earlier GBK standard.
The IDSgrep free software package by Matthew Skala extends Unicode's IDS syntax to include additional features for dictionary lookup; it is capable of converting KanjiVG's database to its own extended IDS format, or of searching EIDS files generated by the related Tsukurimashou font family.
KanjiVG
KanjiVG (Kanji Vector Graphics) is a free, Creative Commons
Creative Commons (CC) is an American non-profit organization and international network devoted to educational access and expanding the range of creative works available for others to build upon legally and to share. The organization has release ...
-licensed Japanese character description language (intended to eventually expand to Chinese as well) based on the SVG vector graphics format.
SCML
In 2007, Structural Character Modeling Language was proposed as a different kind of XML-based Chinese-character description language whose positioning is not based on a numerical grid, as CDL and HanGlyph are. The known database of characters whose strokes and components are encoded in SCML is for demonstration-of-principle only; no known effort exists to attempt to encode, say, all of Unicode's CJK characters in SCML.
See also
* Unicode
* List of Shuowen Jiezi radicals, a system of 540 components used by Xu Shen (d. ≈147 AD) in his Shuowen Jiezi
* List of Kangxi radicals, a system of 214 components used by the Kangxi dictionary (1716), made under the leadership of the Kangxi Emperor
* List of unicode radicals, a modern and computer-based ongoing attempt to create a complete and accurate set of CJK component list, led by Unicode.
* Cangjie input method
The Cangjie input method (Tsang-chieh input method, sometimes called Changjie, Cang Jie, Changjei or Chongkit) is a system for entering Chinese characters into a computer using a standard computer keyboard. In filenames and elsewhere, the name Can ...
* Radical
Radical may refer to:
Politics and ideology Politics
*Radical politics, the political intent of fundamental societal change
*Radicalism (historical), the Radical Movement that began in late 18th century Britain and spread to continental Europe and ...
* Stroke
A stroke is a medical condition in which poor blood flow to the brain causes cell death. There are two main types of stroke: ischemic, due to lack of blood flow, and hemorrhagic, due to bleeding. Both cause parts of the brain to stop functionin ...
* Stroke order
Stroke order is the order in which the strokes of a Chinese character (or Chinese derivative character) are written. A stroke is a movement of a writing instrument on a writing surface. Chinese characters are used in various forms in Chinese ...
Notes
External links
;CDL language from Wenlin Institute
*
*
*
*
**2003/12/31 correction:
*
*
Digital Humanities Start-up Grant from the U.S. National Endowment for the Humanities
;SCML
*
;HanGlyph
*
* {{citation , title=HanGlyph – a Chinese Character Description Language - Reference Manual , url=http://www.hanglyph.com/en/hanglyph/reference.pdf , date=13 September 2003 , pages=31 , access-date=11 December 2007 , archive-url=https://web.archive.org/web/20160304185736/http://www.hanglyph.com/en/hanglyph/reference.pdf , archive-date=4 March 2016 , url-status=dead
Chinese characters
XML