Zero-width space
   HOME

TheInfoList



OR:

The zero-width space , abbreviated ZWSP, is a
non-printing character In computing and telecommunication, a control character or non-printing character (NPC) is a code point (a number) in a character set, that does not represent a written symbol. They are used as in-band signaling to cause effects other than ...
used in computerized
typesetting Typesetting is the composition of text by means of arranging physical ''type'' (or ''sort'') in mechanical systems or '' glyphs'' in digital systems representing '' characters'' (letters and other symbols).Dictionary.com Unabridged. Random ...
to indicate word boundaries to text-processing systems in scripts that do not use explicit spacing, or after characters (such as the
slash Slash may refer to: * Slash (punctuation), the "/" character Arts and entertainment Fictional characters * Slash (Marvel Comics) * Slash (''Teenage Mutant Ninja Turtles'') Music * Harry Slash & The Slashtones, an American rock band * Nash ...
) that are not followed by a visible
space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consi ...
but after which there may nevertheless be a line break. It is also used with languages without visible space between words, for example, Japanese. Normally, it is not a visible separation, but it may expand in passages that are fully justified.


Usage

In
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
pages, the zero-width space can be used to mark a potential line break ''without'' hyphenation, as can the HTML element <wbr>; for hyphenated line breaks, a soft hyphen is used. The zero-width space was not supported in some older
web browser A web browser is application software for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's screen. Browsers are used o ...
s. To show the effect of the zero-width space, the following words have been separated with zero-width spaces:
And the following words are not separated with these spaces:
On browsers supporting zero-width spaces, resizing the window will re-break the first text only at word boundaries, while the second text will not be broken at all.


Prohibited in URLs

ICANN The Internet Corporation for Assigned Names and Numbers (ICANN ) is an American multistakeholder group and nonprofit organization responsible for coordinating the maintenance and procedures of several databases related to the namespaces ...
rules prohibit
domain names A domain name is a string that identifies a realm of administrative autonomy, authority or control within the Internet. Domain names are often used to identify services provided through the Internet, such as websites, email services and more. ...
from including non-displayed characters such as zero-width space, and most browsers prohibit their use within domain names because they can be used to create a homograph attack, where a malicious URL is visually indistinguishable from a legitimate one.


Encoding

The zero-width space character is encoded in
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
as , and input in HTML as , or . Contrary to what their names suggest, the character entities &NegativeThickSpace;, &NegativeMediumSpace;, &NegativeThinSpace;, and &NegativeVeryThinSpace; also refer to the zero-width space. The TeX representation is ; the
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well. In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosperms ...
representation is \hspace; and the groff representation is \:. Its semantics and
HTML The HyperText Markup Language or HTML is the standard markup language for documents designed to be displayed in a web browser. It can be assisted by technologies such as Cascading Style Sheets (CSS) and scripting languages such as JavaS ...
implementation are similar to the soft hyphen, except that soft hyphens display a hyphen character at the point where the line is broken.


See also

* Hair space *
Whitespace character In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
– including a table comparing various space-like characters *
Word divider In punctuation, a word divider is a glyph that separates written words. In languages which use the Latin, Cyrillic, and Arabic alphabets, as well as other scripts of Europe and West Asia, the word divider is a blank space, or ''whitespace''. ...
*
Word wrapping Line breaking, also known as word wrapping, is breaking a section of text into lines so that it will fit into the available width of a page, window or other display area. In text display, line wrap is continuing on a new line when a line is ful ...
*
Word joiner The word joiner (WJ) is a format character in Unicode used to indicate that word separation should not occur at a position, when using scripts such as Arabic that do not use explicit spacing. It is encoded since Unicode version 3.2 (released i ...
(U+2060: ⁠), as well as ''zero-width no-break space'' (U+FEFF: ) * Zero-width joiner (U+200D: ‍) *
Zero-width non-joiner The zero-width non-joiner (ZWNJ) is a non-printing character used in the computerization of writing systems that make use of ligatures. When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to b ...
(U+200C: ‌)


References


Citations


Sources

*
Unicode Consortium The Unicode Consortium (legally Unicode, Inc.) is a 501(c)(3) non-profit organization incorporated and based in Mountain View, California. Its primary purpose is to maintain and publish the Unicode Standard which was developed with the intentio ...
,
Special Areas and Format Characters
(Chapter 16), ''The Unicode Standard'', Version 5.2. *
Victor H. Mair Victor Henry Mair (; born March 25, 1943) is an American sinologist. He is a professor of Chinese at the University of Pennsylvania. Among other accomplishments, Mair has edited the standard ''Columbia History of Chinese Literature'' and the ''Col ...
, Yongquan Liu, ''Characters and computers'', IOS Press, 1991. {{DEFAULTSORT:Zero-Width Space Control characters Typography Unicode formatting code points Whitespace