Interword spacing
   HOME

TheInfoList



OR:

In
punctuation Punctuation (or sometimes interpunction) is the use of spacing, conventional signs (called punctuation marks), and certain typographical devices as aids to the understanding and correct reading of written text, whether read silently or aloud. An ...
, a word divider is a
glyph A glyph () is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A g ...
that separates written words. In languages which use the
Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...
,
Cyrillic , bg, кирилица , mk, кирилица , russian: кириллица , sr, ћирилица, uk, кирилиця , fam1 = Egyptian hieroglyphs , fam2 = Proto-Sinaitic , fam3 = Phoenician , fam4 = G ...
, and Arabic alphabets, as well as other scripts of Europe and West Asia, the word divider is a blank
space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consider ...
, or ''whitespace''. This convention is spreading, along with other aspects of European punctuation, to Asia and Africa, where words are usually written without word separation. In computing, the word
delimiter A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts a ...
is used to refer to a
character Character or Characters may refer to: Arts, entertainment, and media Literature * ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk * ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...
that separates two words. In
character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
, word segmentation depends on which characters are defined as word dividers.


History

In Ancient Egyptian, determinatives may have been used as much to demarcate word boundaries as to disambiguate the semantics of words. Rarely in
Assyrian cuneiform Cuneiform is a Logogram, logo-Syllabary, syllabic writing system, script that was used to write several languages of the Ancient Near East, Ancient Middle East. The script was in active use from the early Bronze Age until the beginning of the ...
, but commonly in the later cuneiform
Ugaritic alphabet The Ugaritic writing system is a cuneiform abjad (consonantal alphabet) used from around either 1400 BCE or 1300 BCE for Ugaritic, an extinct Northwest Semitic language, and discovered in Ugarit (modern Ras Al Shamra), Syria, in 1928. It h ...
, a vertical stroke 𒑰 was used to separate words. In
Old Persian cuneiform Old Persian cuneiform is a semi-alphabetic cuneiform script that was the primary script for Old Persian. Texts written in this cuneiform have been found in Iran (Persepolis, Susa, Hamadan, Kharg Island), Armenia, Romania (Gherla), Turkey ( Van Fo ...
, a diagonally sloping wedge 𐏐 was used. As the alphabet spread throughout the ancient world, words were often run together without division, and this practice remains or remained until recently in much of South and Southeast Asia. However, not infrequently in inscriptions a vertical line, and in manuscripts a single (·), double (:), or triple (⫶)
interpunct An interpunct , also known as an interpoint, middle dot, middot and centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in ancient Latin script. (Word-separating spaces did no ...
(dot) was used to divide words. This practice was found in Phoenician, Aramaic, Hebrew, Greek, and
Latin Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power of the ...
, and continues today with Ethiopic, though there whitespace is gaining ground.


Scriptio continua

The early alphabetic writing systems, such as the Phoenician alphabet, had only signs for consonants (although some signs for consonants could also stand for a vowel, so-called ''
matres lectionis ''Matres lectionis'' (from Latin "mothers of reading", singular form: ''mater lectionis'', from he, אֵם קְרִיאָה ) are consonants that are used to indicate a vowel, primarily in the writing down of Semitic languages such as Arabic, ...
''). Without some form of visible word dividers, parsing a text into its separate words would have been a puzzle. With the introduction of letters representing vowels in the Greek alphabet, the need for inter-word separation lessened. The earliest Greek inscriptions used interpuncts, as was common in the writing systems which preceded it, but soon the practice of ''
scriptio continua ''Scriptio continua'' (Latin for "continuous script"), also known as ''scriptura continua'' or ''scripta continua'', is a style of writing without spaces or other marks between the words or sentences. The form also lacks punctuation, diacritic ...
'', continuous writing in which all words ran together without separation became common.


Types


None

Alphabetic writing without inter-word separation, known as ''
scriptio continua ''Scriptio continua'' (Latin for "continuous script"), also known as ''scriptura continua'' or ''scripta continua'', is a style of writing without spaces or other marks between the words or sentences. The form also lacks punctuation, diacritic ...
'', was used in Ancient Egyptian. It appeared in Post-classical Latin after several centuries of the use of the interpunct. Traditionally, ''scriptio continua'' was used for the
Indic alphabets The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
of South and Southeast Asia and hangul of Korea, but spacing is now used with hangul and increasingly with the Indic alphabets. Today
Chinese Chinese can refer to: * Something related to China * Chinese people, people of Chinese nationality, citizenship, and/or ethnicity **''Zhonghua minzu'', the supra-ethnic concept of the Chinese nation ** List of ethnic groups in China, people of va ...
and Japanese are the most widely-used scripts consistently written without punctuation to separate words, though other scripts such as
Thai Thai or THAI may refer to: * Of or from Thailand, a country in Southeast Asia ** Thai people, the dominant ethnic group of Thailand ** Thai language, a Tai-Kadai language spoken mainly in and around Thailand *** Thai script *** Thai (Unicode block ...
and Lao also follow this writing convention. In Classical Chinese, a word and a
character Character or Characters may refer to: Arts, entertainment, and media Literature * ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk * ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...
were almost the same thing, so that word dividers would have been superfluous. Although Modern Mandarin has numerous polysyllabic words, and each syllable is written with a distinct character, the conceptual link between character and word or at least morpheme remains strong, and no need is felt for word separation apart from what characters already provide. This link is also found in the Vietnamese language; however, in the Vietnamese alphabet, virtually all syllables are separated by spaces, whether or not they form word boundaries.


Space

Space is the most common word divider, especially in Latin script.


Vertical lines

Ancient inscribed and cuneiform scripts such as Anatolian hieroglyphs frequently used short vertical lines to separate words, as did
Linear B Linear B was a syllabic script used for writing in Mycenaean Greek, the earliest attested form of Greek. The script predates the Greek alphabet by several centuries. The oldest Mycenaean writing dates to about 1400 BC. It is descended from ...
. In manuscripts, vertical lines were more commonly used for larger breaks, equivalent to the Latin comma and period. This was the case for
Biblical Hebrew Biblical Hebrew (, or , ), also called Classical Hebrew, is an archaic form of the Hebrew language, a language in the Canaanite branch of Semitic languages spoken by the Israelites in the area known as the Land of Israel, roughly west of ...
(the paseq) and continues with many Indic scripts today (the danda).


Interpunct, multiple dots, and hypodiastole

As noted above, the single and double interpunct were used in manuscripts (on paper) throughout the ancient world. For example, Ethiopic inscriptions used a vertical line, whereas manuscripts used double dots (፡) resembling a colon. The latter practice continues today, though the space is making inroads. Classical Latin used the interpunct in both paper manuscripts and stone inscriptions.(Wingo 1972:16) Ancient Greek orthography used between two and five dots as word separators, as well as the
hypodiastole The hypodiastole (Greek: , , ), also known as a diastole,''Oxford English Dictionary'', "diastole, ''n.''" Oxford University Press (Oxford), 1895. was an interpunct developed in late Ancient and Byzantine Greek texts before the separation o ...
.


Different letter forms

In the modern Hebrew and Arabic alphabets, some letters have distinct forms at the ends and/or beginnings of words. This demarcation is used in addition to spacing.


Vertical arrangement

The
Nastaʿlīq ''Nastaliq'' (; fa, , ), also romanized as ''Nastaʿlīq'', is one of the main calligraphic hands used to write the Perso-Arabic script in the Persian and Urdu languages, often used also for Ottoman Turkish poetry, rarely for Arabic. ''Nasta ...
form of Islamic calligraphy uses vertical arrangement to separate words. The beginning of each word is written higher than the end of the preceding word, so that a line of text takes on a sawtooth appearance. Nastaliq spread from Persia and today is used for Persian, Uyghur, Pashto, and Urdu.


Pause

In
finger spelling Fingerspelling (or dactylology) is the representation of the letters of a writing system, and sometimes numeral systems, using only the hands. These manual alphabets (also known as finger alphabets or hand alphabets) have often been used in deaf e ...
and in
Morse code Morse code is a method used in telecommunication to encode text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code is named after Samuel Morse, one of ...
, words are separated by a pause.


Unicode

For use with computers, these marks have codepoints in Unicode: * * *


See also

*
Whitespace White space or whitespace may refer to: Technology * Whitespace characters, characters in computing that represent horizontal or vertical space * White spaces (radio), allocated but locally unused radio frequencies * TV White Space Database, a mec ...
* Sentence spacing * Speech segmentation * Zero-width non-joiner * Zero-width space *
Substitute blank In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
* Underscore


References


Further reading

* * * * * {{navbox punctuation Punctuation