Text wrapping, also known as line wrapping, word wrapping or line breaking, is breaking a section of text into lines so that it will fit into the available width of a page, window or other display area. In text display, line wrap is continuing on a new line when a line is full, so that each line fits into the viewable area without overflowing, allowing text to be read from top to bottom without any horizontal
scrolling
In computer displays, filmmaking, television production, video games and other kinetic displays, scrolling is sliding text, images or video across a monitor or display, vertically or horizontally. "Scrolling," as such, does not change the layout ...
. Word wrap is the additional feature of most
text editor
A text editor is a type of computer program that edits plain text. An example of such program is "notepad" software (e.g. Windows Notepad). Text editors are provided with operating systems and software development packages, and can be used to c ...
s,
word processors A word processor (WP) is a device or computer program that provides for input, editing, formatting, and output of text, often with some additional features.
Early word processors were stand-alone devices dedicated to the function, but current word ...
, and
web browser
A web browser, often shortened to browser, is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's scr ...
s, of breaking lines between words rather than within words, where possible. Word wrap makes it unnecessary to
hard-code newline
A newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or ...
delimiters within paragraphs, and allows the display of text to adapt flexibly and dynamically to displays of varying sizes.
Examples
Soft and hard returns
A soft return or soft wrap is the break resulting from line wrap or word wrap (whether automatic or manual), whereas a hard return or hard wrap is an intentional break, creating a new paragraph. With a hard return, paragraph-break formatting can (and should) be applied (either
indenting or vertical whitespace). Soft wrapping allows line lengths to adjust automatically with adjustments to the width of the user's window or margin settings, and is a standard feature of all modern text editors, word processors, and
email client
An email client, email reader or, more formally, message user agent (MUA) or mail user agent is a computer program used to access and manage a user's email.
A web application which provides message management, composition, and reception functio ...
s. Manual soft breaks are unnecessary when word wrap is done automatically, so hitting the "Enter" key usually produces a hard return.
Alternatively, "soft return" can mean an intentional, stored line break that is not a paragraph break. For example, it is common to print postal addresses in a multiple-line format, but the several lines are understood to be a single paragraph. Line breaks are needed to divide the words of the address into lines of the appropriate length.
In the contemporary
graphical word processors
Microsoft Word
Microsoft Word is a word processor program, word processing program developed by Microsoft. It was first released on October 25, 1983, under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platf ...
and
Libreoffice Writer
LibreOffice Writer is the free and open-source Word processor program, word processor and desktop publishing component of the LibreOffice suite and is a Fork (software development), fork of OpenOffice.org#Components, OpenOffice.org Writer. Writer ...
, users are expected to type a carriage return () between each paragraph. Formatting settings, such as first-line indentation or spacing between paragraphs, take effect where the carriage return marks the break. A non-paragraph line break, which is a soft return, is inserted using or via the menus, and is provided for cases when the text should start on a new line but none of the other side effects of starting a new paragraph are desired.
In text-oriented markup languages, a soft return is typically offered as a markup tag. For example, in
HTML
Hypertext Markup Language (HTML) is the standard markup language for documents designed to be displayed in a web browser. It defines the content and structure of web content. It is often assisted by technologies such as Cascading Style Sheets ( ...
there is a <br> tag that has the same purpose as the soft return in word processors described above.
Unicode
The
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
Line Breaking Algorithm determines a set of positions, known as ''break opportunities'', that are appropriate places in which to begin a new line. The actual line break positions are picked from among the break opportunities by the higher level software that calls the algorithm, not by the algorithm itself, because only the higher level software knows about the width of the display the text is displayed on and the width of the glyphs that make up the displayed text.
The Unicode character set provides a line separator character as well as a paragraph separator to represent the semantics of the soft return and hard return.
;
: may be used to represent these semantics unambiguously
;
: may be used to represent three semantics unambiguously
Word boundaries, hyphenation, and hard spaces
The soft returns are usually placed after the ends of complete words, or after the punctuation that follows complete words. However, word wrap may also occur following a
hyphen
The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation.
The hyphen is sometimes confused with dashes (en dash , em dash and others), which are wider, or with t ...
inside of a word. This is sometimes not desired, and can be blocked by using a
non-breaking hyphen, or
hard hyphen, instead of a regular hyphen.
A word without hyphens can be made wrappable by having
soft hyphen
In computing and typesetting, a soft hyphen (Unicode ) or syllable hyphen, is a code point reserved in some coded character sets for the purpose of breaking words across lines by inserting visible hyphens if they fall on the line end but remain i ...
s in it. When the word isn't wrapped (i.e., isn't broken across lines), the soft hyphen isn't visible. But if the word is wrapped across lines, this is done at the soft hyphen, at which point it is shown as a visible hyphen on the top line where the word is broken. (In the rare case of a word that is meant to be wrappable by breaking it across lines but ''without'' making a hyphen ever appear, a
zero-width space
The zero-width space (rendered: ; HTML entity: or ), abbreviated ZWSP, is a control character, non-printing character used in computerized typesetting to indicate where the word boundaries are, without actually displaying a visible space in the re ...
is put at the permitted breaking point(s) in the word.)
Sometimes word wrap is undesirable between adjacent words. In such cases, word wrap can usually be blocked by using a ''hard space'' or
non-breaking space
In word processing and digital typesetting, a non-breaking space (), also called NBSP, required space, hard space, or fixed space ...
between the words, instead of regular spaces.
Word wrapping in text containing Chinese, Japanese, and Korean
In
Chinese,
Japanese, and
Korean
Korean may refer to:
People and culture
* Koreans, people from the Korean peninsula or of Korean descent
* Korean culture
* Korean language
**Korean alphabet, known as Hangul or Korean
**Korean dialects
**See also: North–South differences in t ...
, word wrapping can usually occur before and after any
Han character, but certain punctuation characters are not allowed to begin a new line. Japanese
kana
are syllabary, syllabaries used to write Japanese phonology, Japanese phonological units, Mora (linguistics), morae. In current usage, ''kana'' most commonly refers to ''hiragana'' and ''katakana''. It can also refer to their ancestor , wh ...
are treated the same way as Han Characters (
Kanji
are logographic Chinese characters, adapted from Chinese family of scripts, Chinese script, used in the writing of Japanese language, Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are ...
) by extension, meaning words can, and tend to be, broken without any explicit indication that a word continues on the next line.
Under certain circumstances, however, word wrapping is not desired. For instance,
* word wrapping might not be desired within personal names, and
* word wrapping might not be desired within any compound words (when the text is flush left but only in some styles).
Most existing word processors and
typesetting
Typesetting is the composition of text for publication, display, or distribution by means of arranging physical ''type'' (or ''sort'') in mechanical systems or '' glyphs'' in digital systems representing '' characters'' (letters and other ...
software cannot handle either of the above scenarios.
CJK punctuation may or may not follow rules similar to the above-mentioned special circumstances. It is up to
line breaking rules in CJK.
Algorithm
Word wrapping is an
optimization problem
In mathematics, engineering, computer science and economics
Economics () is a behavioral science that studies the Production (economics), production, distribution (economics), distribution, and Consumption (economics), consumption of goo ...
. Depending on what needs to be optimized for, different algorithms are used.
Minimum number of lines
A simple way to do word wrapping is to use a
greedy algorithm
A greedy algorithm is any algorithm that follows the problem-solving heuristic of making the locally optimal choice at each stage. In many problems, a greedy strategy does not produce an optimal solution, but a greedy heuristic can yield locally ...
that puts as many words on a line as possible, then moving on to the next line to do the same until there are no more words left to place. This method is used by many modern word processors, such as
Libreoffice Writer
LibreOffice Writer is the free and open-source Word processor program, word processor and desktop publishing component of the LibreOffice suite and is a Fork (software development), fork of OpenOffice.org#Components, OpenOffice.org Writer. Writer ...
and Microsoft Word. This algorithm always uses the minimum possible number of lines but may lead to lines of widely varying lengths. The following pseudocode implements this algorithm:
SpaceLeft := LineWidth
for each Word in Text
if (Width(Word) + SpaceWidth) > SpaceLeft
insert line break before Word in Text
SpaceLeft := LineWidth - Width(Word)
else
SpaceLeft := SpaceLeft - (Width(Word) + SpaceWidth)
Where
LineWidth
is the width of a line,
SpaceLeft
is the remaining width of space on the line to fill,
SpaceWidth
is the width of a single space character,
Text
is the input text to iterate over and
Word
is a word in this text.
Minimum raggedness
A different algorithm, used in
TeX
Tex, TeX, TEX, may refer to:
People and fictional characters
* Tex (nickname), a list of people and fictional characters with the nickname
* Tex Earnhardt (1930–2020), U.S. businessman
* Joe Tex (1933–1982), stage name of American soul singer ...
, minimizes the sum of the squares of the lengths of the spaces at the end of lines to produce a more aesthetically pleasing result than the greedy algorithm, which does not always minimize squared space.
History
A primitive line-breaking feature was used in 1955 in a "page printer control unit" developed by
Western Union
The Western Union Company is an American multinational financial services corporation headquartered in Denver, Denver, Colorado.
Founded in 1851 as the New York and Mississippi Valley Printing Telegraph Company in Rochester, New York, the co ...
. This system used relays rather than programmable digital computers, and therefore needed a simple algorithm that could be implemented without
data buffer
In computer science, a data buffer (or just buffer) is a region of memory used to store data temporarily while it is being moved from one place to another. Typically, the data is stored in a buffer as it is retrieved from an input device (such as ...
s. In the Western Union system, each line was broken at the first space character to appear after the 58th character, or at the 70th character if no space character was found.
The greedy algorithm for line-breaking predates the dynamic programming method outlined by
Donald Knuth
Donald Ervin Knuth ( ; born January 10, 1938) is an American computer scientist and mathematician. He is a professor emeritus at Stanford University. He is the 1974 recipient of the ACM Turing Award, informally considered the Nobel Prize of comp ...
in an unpublished 1977 memo describing his TeX typesetting system and later published in more detail by Knuth & Plass (1981).
See also
*
*
*
*
*
*
*
References
{{reflist
External links
Unicode Line Breaking Algorithm
Text editor features
Typography
Dynamic programming
Unicode algorithms