ZWNJ
   HOME

TheInfoList



OR:

The zero-width non-joiner (ZWNJ, ; rendered: ;
HTML entity In SGML, HTML and XML documents, the logical constructs known as ''character data'' and ''attribute values'' consist of sequences of characters, in which each character can manifest directly (representing itself), or can be represented by a series ...
: or ) is a
non-printing character In computing and telecommunications, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than t ...
used in the computerization of
writing system A writing system comprises a set of symbols, called a ''script'', as well as the rules by which the script represents a particular language. The earliest writing appeared during the late 4th millennium BC. Throughout history, each independen ...
s that make use of
ligatures Ligature may refer to: Language * Ligature (writing), a combination of two or more letters into a single symbol (typography and calligraphy) * Ligature (grammar), a morpheme that links two words Medicine * Ligature (medicine), a piece of suture us ...
. For example, in writing systems that feature initial, medial and final letter-forms, such as the
Persian alphabet The Persian alphabet (), also known as the Perso-Arabic script, is the right-to-left alphabet used for the Persian language. It is a variation of the Arabic script with four additional letters: (the sounds 'g', 'zh', 'ch', and 'p', respecti ...
, when a ZWNJ is placed between two characters that would otherwise be joined into a ligature, it instead prevents the ligature and causes them to be printed in their final and initial forms, respectively. This is also an effect of a
space character A whitespace character is a character data element that represents white space when text is rendered for display by a computer. For example, a ''space'' character (, ASCII 32) represents blank space such as a word divider in a Western scri ...
, but a ZWNJ is used when it is desirable to keep the characters closer together or to connect a word with its morpheme. The ZWNJ is encoded in
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
as .


Use of ZWNJ for correct typography

In certain languages, the ZWNJ is necessary for unambiguously specifying the correct typographic form of a character sequence. The picture shows how the code looks when it is ''rendered'' correctly, and in every row the correct and incorrect pictures should be different. On a system which not configured to display the Unicode correctly, the correct display and the incorrect one may look the same, or either of them may be significantly different from the corresponding picture. In this
Biblical Hebrew Biblical Hebrew ( or ), also called Classical Hebrew, is an archaic form of the Hebrew language, a language in the Canaanite languages, Canaanitic branch of the Semitic languages spoken by the Israelites in the area known as the Land of Isra ...
example, the placement of the to the left of the is correct, which has a sign written as two vertical dots to denote short vowel. If a were placed to the left of , it would be erroneous. In
Modern Hebrew Modern Hebrew (, or ), also known as Israeli Hebrew or simply Hebrew, is the Standard language, standard form of the Hebrew language spoken today. It is the only surviving Canaanite language, as well as one of the List of languages by first w ...
, there is no reason to use the for spoken language, so it is rarely used in Modern Hebrew typesetting. In German typography, ligatures may not cross the constituent boundaries within compounds. Thus, in the first German example, the prefix is separated from the rest of the word to prohibit the ligature ''fl''. Similarly, in English, some argue ligatures should not cross
morpheme A morpheme is any of the smallest meaningful constituents within a linguistic expression and particularly within a word. Many words are themselves standalone morphemes, while other words contain multiple morphemes; in linguistic terminology, this ...
boundaries. For example, in some words ''fly'' and ''fish'' are morphemes but in others they're not; therefore, by their reasoning, words like ' and ' (here shown with the non-joiner) should not have ligatures (respectively of fl and fi) while ''dayfly'' and ''catfish'' should have them. Persian uses this character extensively for certain prefixes, suffixes and compound words. It is necessary for disambiguating compounds from non-compound words, which use a full space. In the
Jawi script Jawi (; ; ; ) is a writing system used for writing several languages of Southeast Asia, such as Acehnese, Banjarese, Betawi, Magindanao, Malay, Mëranaw, Minangkabau, Tausūg, Ternate and many other languages in Southeast Asia. Jawi ...
of Malay, ZWNJ is used whenever more than one consonants are written at the end of any phrase (, Malay for 'science' or in Latin script, pronounced /ˈsa.ɪns/.) It is used to signify that there are no vowels (specifically 'a' or 'ə') in between the two consonant letters as would otherwise be pronounced either /ˈsa.ɪnas/ or /ˈsa.ɪnəs/. A space would separate the phrase into different words, where phrases such as would now mean 'to sign the Arabic letter
sin In religious context, sin is a transgression against divine law or a law of the deities. Each culture has its own interpretation of what it means to commit a sin. While sins are generally considered actions, any thought, word, or act considered ...
' ( in Latin script.)


Use of ZWNJ to display alternative forms

In Indic scripts, insertion of a ZWNJ after a consonant either with a
halant Virama ( ्, ) is a Sanskrit phonological concept to suppress the inherent vowel that otherwise occurs with every consonant letter, commonly used as a generic term for a codepoint in Unicode, representing either # halanta, hasanta or explicit vir ...
or before a dependent vowel prevents the characters from being joined properly: In
Devanagari Devanagari ( ; in script: , , ) is an Indic script used in the Indian subcontinent. It is a left-to-right abugida (a type of segmental Writing systems#Segmental systems: alphabets, writing system), based on the ancient ''Brāhmī script, Brā ...
, the characters and typically combine to form , but when a ZWNJ is inserted between them, (code: क्‌ष) is seen instead. In
Kannada Kannada () is a Dravidian language spoken predominantly in the state of Karnataka in southwestern India, and spoken by a minority of the population in all neighbouring states. It has 44 million native speakers, and is additionally a ...
, the characters ನ್ and ನ combine to form ನ್ನ, but when a ZWNJ is inserted between them, ನ್‌ನ is displayed. That style is typically used to write foreign words in Kannada script: "
Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...
" is written as ಫೇಸ್‌ಬುಕ್, though it can be written as ಫೇಸ್ಬುಕ್. ರಾಜ್‌ಕುಮಾರ್ and ರಾಮ್‌ಗೊಪಾಲ್ are examples of other proper nouns that need ZWNJ. To insert a ZWNJ in Kannada, use Shift-V on
Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...
(
iBus The principal factors that characterize beer are bitterness, the variety of flavours present in the beverage and their intensity, ethanol, alcohol content, and colour. Standards for those characteristics allow a more objective and uniform determ ...
, InScript). On
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
(
InScript InScript (short for Indic Script) is the decreed standard keyboard layout for Indian scripts using a standard 104- or 105-key layout. This keyboard layout was standardised by the Government of India for inputting text in languages of India writ ...
), you can produce a ZWNJ with Ctrl+Shift+2 or Alt+0157. For the LipikaIME on
Mac Mac or MAC may refer to: Common meanings * Mac (computer), a line of personal computers made by Apple Inc. * Mackintosh, a raincoat made of rubberized cloth * Mac, a prefix to surnames derived from Gaelic languages * McIntosh (apple), a Canadi ...
, the caret returns a ZWNJ. In
Bengali Bengali or Bengalee, or Bengalese may refer to: *something of, from, or related to Bengal, a large region in South Asia * Bengalis, an ethnic and linguistic group of the region * Bengali language, the language they speak ** Bengali alphabet, the w ...
, when the Bengali letter য occurs at the end of a consonant cluster—i.e., য preceded by a ◌্ ('' hôsôntô'')—it appears in a special shape, , known as the য-ফলা (''ja-phala''), such as in ক্য (ক ্ য). Thus, when we want to write উদ্‌যাপন (correct Bengali spelling for celebration), it becomes উদ্যাপন (which is incorrect). Here ZWNJ works. If we want to write উদ্‌যাপন, we have to write in the following sequence (code: উদ্‌যাপন),Also see the Unicode chapter 12, Bengali (Bangla) between page 475 to 479 here i
PDF
then we will get the proper rendering and the correct spelling. In
Bengali Bengali or Bengalee, or Bengalese may refer to: *something of, from, or related to Bengal, a large region in South Asia * Bengalis, an ethnic and linguistic group of the region * Bengali language, the language they speak ** Bengali alphabet, the w ...
, the ''hôsôntô'' is used for making any conjuncts and falas (such as ra-fala, ba-fala etc)''.'' Where the ''hôsôntô'' needs to be displayed explicitly, it is required to insert ZWNJ after the ''hôsôntô''. Also in
Bengali Bengali or Bengalee, or Bengalese may refer to: *something of, from, or related to Bengal, a large region in South Asia * Bengalis, an ethnic and linguistic group of the region * Bengali language, the language they speak ** Bengali alphabet, the w ...
, when the Bengali letter র occurs at the beginning of a consonant cluster—i.e., র succeeded by a hôsôntô—it appears in a special shape, known as the রেফ (reph). Thus, the sequence র ্ য is rendered by default as র্য. When the য-ফলা shape needs to be retained rather than the রেফ shape, the ZWJ is inserted right after র, i.e., র‍্য to render র‍্য. র‍্য is commonly used for loanwords from English such as র‍্যাম (RAM), র‍্যান্ডম (random) etc.


Symbol

The symbol to be used on keyboards which enable the input of the ZWNJ directly is standardized in Amendment 1 (2012) of
ISO/IEC 9995 ISO/IEC 9995 ''Information technology — Keyboard layouts for text and office systems'' is an International Organization for Standardization, ISO/International Electrotechnical Commission, IEC standard series defining keyboard layout, layout prin ...
-7:2009 ''"Information technology – Keyboard layouts for text and office systems – Symbols used to represent functions"'' as symbol number 81, and in
IEC The International Electrotechnical Commission (IEC; ) is an international standards organization that prepares and publishes international standards for all electrical, electronic and related technologies. IEC standards cover a vast range of ...
60417 ''"Graphical Symbols for use on Equipment"'' as symbol no. IEC 60417-6177-2.


See also

*
Zero-width joiner The zero-width joiner (ZWJ, ; rendered: ; HTML entity: or ) is a non-printing character used in the computerized typesetting of writing systems in which the shape or positioning of a grapheme depends on its relation to other graphemes (complex ...
*
Zero-width space The zero-width space (rendered: ; HTML entity: or ), abbreviated ZWSP, is a control character, non-printing character used in computerized typesetting to indicate where the word boundaries are, without actually displaying a visible space in the re ...
*
Word divider In punctuation, a word divider is a form of glyph which separates written words. In languages which use the Latin, Cyrillic, and Arabic alphabets, as well as other scripts of Europe and West Asia, the word divider is a blank space, or ''whitesp ...


References


External links


Using the ZWNJ in Persian


/nowiki> JOINER)] {{Unicode navigation Control characters Persian orthography Typography Unicode formatting code points