HOME

TheInfoList



OR:

The combining grapheme joiner (CGJ), is a
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
character that has no visible glyph and is "default ignorable" by applications. Its name is a
misnomer A misnomer is a name that is incorrectly or unsuitably applied. Misnomers often arise because something was named long before its correct nature was known, or because an earlier form of something has been replaced by a later form to which the nam ...
and does not describe its function: the character does not join graphemes. Its purpose is to semantically ''separate'' characters that should ''not'' be considered digraphs as well as to block canonical reordering of combining marks during normalization. For example, in a
Hungarian language Hungarian, or Magyar (, ), is an Ugric language of the Uralic language family spoken in Hungary and parts of several neighboring countries. It is the official language of Hungary and one of the 24 official languages of the European Union. Out ...
context, adjoining letters ''c'' and ''s'' would normally be considered equivalent to the cs digraph. If they are separated by the CGJ, they will be considered as two separate graphemes. However, in contrast to the zero-width joiner and similar characters, the CGJ does not affect whether the two letters are ''rendered'' separately or as a ligature or cursively joined—the default behavior for this is determined by the font. The CGJ is also needed for complex scripts. For example, in most cases the
Hebrew cantillation Hebrew cantillation, trope, trop, or ''te'amim'' is the manner of chanting ritual readings from the Hebrew Bible in synagogue Jewish services, services. The chants are written and notated in accordance with the special signs or marks printed ...
accent metheg is supposed to appear to the left of the vowel point and by default most display systems will render it like this even if it is typed before the vowel. But in some words in
Biblical Hebrew Biblical Hebrew ( or ), also called Classical Hebrew, is an archaic form of the Hebrew language, a language in the Canaanite languages, Canaanitic branch of the Semitic languages spoken by the Israelites in the area known as the Land of Isra ...
the metheg appears to the right of the vowel, and to tell the display engine to render it properly on the right, CGJ must be typed between the metheg and the vowel. Compare: In the case of several consecutive combining diacritics, an intervening CGJ indicates that they should not be subject to canonical reordering. In contrast, the " zero-width non-joiner" (at U+200C in the General Punctuation range) prevents two adjacent characters from turning into a ligature.


References


External links


Unicode FAQ - Characters and Combining Marks


{{Unicode navigation Unicode special code points Control characters