The combining grapheme joiner (CGJ), is a
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
character that has no visible glyph and is "default ignorable" by applications. Its name is a
misnomer and does not describe its function: the character does not join graphemes.
Its purpose is to semantically ''separate'' characters that should ''not'' be considered
digraphs as well as to block canonical reordering of
combining mark
In digital typography, combining characters are characters that are intended to modify other characters. The most common combining characters in the Latin script are the combining diacritical marks (including combining accents).
Unicode also ...
s during
normalization.
For example, in a
Hungarian language
Hungarian () is an Uralic language spoken in Hungary and parts of several neighbouring countries. It is the official language of Hungary and one of the 24 official languages of the European Union. Outside Hungary, it is also spoken by Hunga ...
context, adjoining letters ''c'' and ''s'' would normally be considered equivalent to the
cs digraph. If they are separated by the CGJ, they will be considered as two separate graphemes. However, in contrast to the
zero-width joiner and similar characters, the CGJ does not affect whether the two letters are ''rendered'' separately or as a
ligature or cursively joined—the default behavior for this is determined by the font.
The CGJ is also needed for
complex scripts. For example, in most cases the
Hebrew cantillation accent
metheg is supposed to appear to the left of the
vowel point and by default most display systems will render it like this even if it is typed before the vowel. But in some words in
Biblical Hebrew the metheg appears to the right of the vowel, and to tell the display engine to render it properly on the right, CGJ must be typed between the metheg and the vowel. Compare:
In the case of several consecutive
combining diacritics, an intervening CGJ indicates that they should not be subject to canonical reordering.
In contrast, the "
zero-width non-joiner" at U+200C in the
General Punctuation range, which prevents two adjacent character from turning into a ligature.
References
External links
Unicode FAQ - Characters and Combining Marks
{{Unicode navigation
Unicode special code points
Control characters