In
punctuation
Punctuation marks are marks indicating how a piece of writing, written text should be read (silently or aloud) and, consequently, understood. The oldest known examples of punctuation marks were found in the Mesha Stele from the 9th century BC, c ...
, a word divider is a form of
glyph
A glyph ( ) is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A ...
which separates written
words. In languages which use the
Latin
Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
,
Cyrillic
The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Ea ...
, and
Arabic alphabet
The Arabic alphabet, or the Arabic abjad, is the Arabic script as specifically codified for writing the Arabic language. It is a unicase, unicameral script written from right-to-left in a cursive style, and includes 28 letters, of which most ...
s, as well as other scripts of Europe and West Asia, the word divider is a blank
space
Space is a three-dimensional continuum containing positions and directions. In classical physics, physical space is often conceived in three linear dimensions. Modern physicists usually consider it, with time, to be part of a boundless ...
, or ''whitespace''. This convention is spreading, along with other aspects of European punctuation, to Asia and Africa, where words are usually written without word separation.
In
character encoding
Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
,
word segmentation depends on which characters are defined as word dividers.
History
In
Ancient Egyptian,
determinatives may have been used as much to demarcate word boundaries as to disambiguate the semantics of words. Rarely in
Assyrian cuneiform, but commonly in the later cuneiform
Ugaritic alphabet, a vertical stroke 𒑰 was used to separate words. In
Old Persian cuneiform, a diagonally sloping wedge 𐏐 was used.
As the alphabet spread throughout the ancient world, words were often run together without division, and this practice remains or remained until recently in much of South and Southeast Asia. However, not infrequently in inscriptions a vertical line, and in manuscripts a single (·), double (:), or triple (⁝)
interpunct (dot) was used to divide words. This practice was found in
Phoenician,
Aramaic,
Hebrew
Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
,
Greek, and
Latin
Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
, and continues today with
Ethiopic, though there whitespace is gaining ground.
Scriptio continua
The early
alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
ic writing systems, such as the
Phoenician alphabet
The Phoenician alphabet is an abjad (consonantal alphabet) used across the Mediterranean civilization of Phoenicia for most of the 1st millennium BC. It was one of the first alphabets, attested in Canaanite and Aramaic inscriptions fo ...
, had only signs for
consonant
In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract, except for the h sound, which is pronounced without any stricture in the vocal tract. Examples are and pronou ...
s (although some signs for consonants could also stand for a
vowel
A vowel is a speech sound pronounced without any stricture in the vocal tract, forming the nucleus of a syllable. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness a ...
, so-called ''
matres lectionis''). Without some form of visible word dividers, parsing a text into its separate words would have been a puzzle. With the introduction of letters representing vowels in the
Greek alphabet
The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BC. It was derived from the earlier Phoenician alphabet, and is the earliest known alphabetic script to systematically write vowels as wel ...
, the need for inter-word separation lessened. The earliest Greek inscriptions used interpuncts, as was common in the writing systems which preceded it, but soon the practice of ''
scriptio continua'', continuous writing in which all words ran together without separation became common.
Types
None
Alphabetic writing without inter-word separation, known as ''
scriptio continua'', was used in Ancient Egyptian. It appeared in Post-classical Latin after several centuries of the use of the interpunct.
Traditionally, ''scriptio continua'' was used for the
Indic alphabets of South and Southeast Asia and
hangul
The Korean alphabet is the modern writing system for the Korean language. In North Korea, the alphabet is known as (), and in South Korea, it is known as (). The letters for the five basic consonants reflect the shape of the speech organs ...
of Korea, but spacing is now used with hangul and increasingly with the Indic alphabets.
Today
Chinese and
Japanese are the most widely used scripts consistently written without punctuation to separate words, though other scripts such as
Thai and
Lao also follow this writing convention. In Classical Chinese, a word and a
character were almost the same thing, so that word dividers would have been superfluous. Although
Modern Mandarin has numerous polysyllabic words, and each syllable is written with a distinct character, the conceptual link between character and word or at least
morpheme
A morpheme is any of the smallest meaningful constituents within a linguistic expression and particularly within a word. Many words are themselves standalone morphemes, while other words contain multiple morphemes; in linguistic terminology, this ...
remains strong, and no need is felt for word separation apart from what characters already provide. This link is also found in the
Vietnamese language
Vietnamese () is an Austroasiatic languages, Austroasiatic language Speech, spoken primarily in Vietnam where it is the official language. It belongs to the Vietic languages, Vietic subgroup of the Austroasiatic language family. Vietnamese is s ...
; however, in the
Vietnamese alphabet, virtually all syllables are separated by spaces, whether or not they form word boundaries.
Space
Space is the most common word divider, especially in
Latin script
The Latin script, also known as the Roman script, is a writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae in Magna Graecia. The Gree ...
.
Vertical lines
Ancient inscribed and cuneiform scripts such as
Anatolian hieroglyphs frequently used short vertical lines to separate words, as did
Linear B
Linear B is a syllabary, syllabic script that was used for writing in Mycenaean Greek, the earliest Attested language, attested form of the Greek language. The script predates the Greek alphabet by several centuries, the earliest known examp ...
. In manuscripts, vertical lines were more commonly used for larger breaks, equivalent to the Latin comma and period. This continues with many Indic scripts today (the
danda).
Interpunct, multiple dots, and hypodiastole
As noted above, the single and double interpunct were used in manuscripts (on paper) throughout the ancient world. For example, Ethiopic inscriptions used a vertical line, whereas manuscripts used double dots (፡) resembling a colon. The latter practice continues today, though the space is making inroads. Classical Latin used the interpunct in both paper manuscripts and stone inscriptions.
[(Wingo 1972:16)] Ancient Greek orthography used between two and five dots as word separators, as well as the
hypodiastole.
Different letter forms
In the modern
Hebrew
Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
and
Arabic alphabet
The Arabic alphabet, or the Arabic abjad, is the Arabic script as specifically codified for writing the Arabic language. It is a unicase, unicameral script written from right-to-left in a cursive style, and includes 28 letters, of which most ...
s, some letters have distinct forms at the ends and/or beginnings of words. This demarcation is used in addition to spacing.
Vertical arrangement

The
Nastaʿlīq form of
Islamic calligraphy
Islamic calligraphy is the artistic practice of penmanship and calligraphy, in the languages which use Arabic alphabet or the Arabic script#Additional letters used in other languages, alphabets derived from it. It is a highly stylized and struc ...
uses vertical arrangement to separate words. The beginning of each word is written higher than the end of the preceding word, so that a line of text takes on a
sawtooth appearance. Nastaliq spread from Persia and today is used for
Persian,
Uyghur,
Pashto, and
Urdu
Urdu (; , , ) is an Indo-Aryan languages, Indo-Aryan language spoken chiefly in South Asia. It is the Languages of Pakistan, national language and ''lingua franca'' of Pakistan. In India, it is an Eighth Schedule to the Constitution of Indi ...
.
Pause
In
finger spelling and in
Morse code
Morse code is a telecommunications method which Character encoding, encodes Written language, text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code i ...
, words are separated by a pause.
Unicode
For use with computers, these marks have
codepoints in
Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
:
*
*
*
*
*
*
[''Punctuation'' § 5. Papyrological Punctuation]
*
*
In
Linear B
Linear B is a syllabary, syllabic script that was used for writing in Mycenaean Greek, the earliest Attested language, attested form of the Greek language. The script predates the Greek alphabet by several centuries, the earliest known examp ...
script:
*
*
See also
*
Whitespace
*
Sentence spacing
*
Speech segmentation
*
Zero-width non-joiner
*
Zero-width space
*
Substitute blank
*
Underscore
An underscore or underline is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on manuscript or typescript as an instruction to the printer. Its ...
References
Further reading
*
*
*
*
*
{{navbox punctuation
Punctuation