
The zero-width joiner (ZWJ, ; rendered: ;
HTML entity
In SGML, HTML and XML documents, the logical constructs known as ''character data'' and ''attribute values'' consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series ...
: or ) is a
non-printing character used in the computerized
typesetting
Typesetting is the composition of text for publication, display, or distribution by means of arranging physical ''type'' (or ''sort'') in mechanical systems or '' glyphs'' in digital systems representing '' characters'' (letters and other ...
of
writing system
A writing system comprises a set of symbols, called a ''script'', as well as the rules by which the script represents a particular language. The earliest writing appeared during the late 4th millennium BC. Throughout history, each independen ...
s in which the shape or positioning of a
grapheme
In linguistics, a grapheme is the smallest functional unit of a writing system.
The word ''grapheme'' is derived from Ancient Greek ('write'), and the suffix ''-eme'' by analogy with ''phoneme'' and other emic units. The study of graphemes ...
depends on its relation to other graphemes (
complex scripts), such as the
Arabic script
The Arabic script is the writing system used for Arabic (Arabic alphabet) and several other languages of Asia and Africa. It is the second-most widely used alphabetic writing system in the world (after the Latin script), the second-most widel ...
or any
Indic script
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout South Asia, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used b ...
. Sometimes the
Roman script is to be counted as complex, e.g. when using a
Fraktur
Fraktur () is a calligraphic hand of the Latin alphabet and any of several blackletter typefaces derived from this hand. It is designed such that the beginnings and ends of the individual strokes that make up each letter will be clearly vis ...
typeface. When placed between two characters that would otherwise not be connected, a ZWJ causes them to be printed in their connected forms.
The exact behaviour of the ZWJ varies depending on whether the use of a
conjunct consonant or ligature (where multiple characters are shown with a single
glyph
A glyph ( ) is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A ...
) is expected by default; for instance, it suppresses the use of conjuncts in
Devanagari
Devanagari ( ; in script: , , ) is an Indic script used in the Indian subcontinent. It is a left-to-right abugida (a type of segmental Writing systems#Segmental systems: alphabets, writing system), based on the ancient ''Brāhmī script, Brā ...
(whilst still allowing the use of the individual joining form of a dead consonant, as opposed to a
halant
Virama ( ्, ) is a Sanskrit phonological concept to suppress the inherent vowel that otherwise occurs with every consonant letter, commonly used as a generic term for a codepoint in Unicode, representing either
# halanta, hasanta or explicit vir ...
form as would be required by the
zero-width non-joiner), but induces the use of
conjuncts in Sinhala (which does not use them by default). Similarly to Sinhala, when a ZWJ is placed between two
emoji
An emoji ( ; plural emoji or emojis; , ) is a pictogram, logogram, ideogram, or smiley embedded in text and used in electronic messages and web pages. The primary function of modern emoji is to fill in emotional cues otherwise missing from type ...
characters (or interspersed between multiple), it can result in a single glyph being shown, such as the family emoji, made up of two adult emoji and one or two child emoji.
In some cases, such as the second Devanagari example below, the ZWJ can be used to display a joining form in isolation, when included after the character and combining halant code.
The character's code point is . In the
InScript
InScript (short for Indic Script) is the decreed standard keyboard layout for Indian scripts using a standard 104- or 105-key layout. This keyboard layout was standardised by the Government of India for inputting text in languages of India writ ...
keyboard layout for Indian languages, it is typed by the key combination . However, many layouts use the position of
QWERTY
QWERTY ( ) is a keyboard layout for Latin-script alphabets. The name comes from the order of the first six Computer keyboard keys#Types, keys on the top letter row of the keyboard: . The QWERTY design is based on a layout included in the Sh ...
's ']' key for this character.
Examples
See also
*
Word joiner
*
Zero-width non-joiner
References
External links
Proposal on Clarification and Consolidation of the Function of ZERO WIDTH JOINER in Indic Scripts
Control characters
Typography
Unicode formatting code points
{{Digital-typography-stub