Word space
   HOME

TheInfoList



OR:

In
punctuation Punctuation (or sometimes interpunction) is the use of spacing, conventional signs (called punctuation marks), and certain typographical devices as aids to the understanding and correct reading of written text, whether read silently or aloud. A ...
, a word divider is a
glyph A glyph () is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A g ...
that separates written words. In languages which use the
Latin Latin (, or , ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through ...
,
Cyrillic The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking co ...
, and Arabic alphabets, as well as other scripts of Europe and West Asia, the word divider is a blank
space Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually consi ...
, or ''whitespace''. This convention is spreading, along with other aspects of European punctuation, to Asia and Africa, where words are usually written without word separation. In computing, the word
delimiter A delimiter is a sequence of one or more characters for specifying the boundary between separate, independent regions in plain text, mathematical expressions or other data streams. An example of a delimiter is the comma character, which acts a ...
is used to refer to a
character Character or Characters may refer to: Arts, entertainment, and media Literature * ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk * ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...
that separates two words. In
character encoding Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using digital computers. The numerical values tha ...
, word segmentation depends on which characters are defined as word dividers.


History

In Ancient Egyptian,
determinative A determinative, also known as a taxogram or semagram, is an ideogram used to mark semantic categories of words in logographic scripts which helps to disambiguate interpretation. They have no direct counterpart in spoken language, though they may ...
s may have been used as much to demarcate word boundaries as to disambiguate the semantics of words. Rarely in
Assyrian cuneiform Cuneiform is a logo- syllabic script that was used to write several languages of the Ancient Middle East. The script was in active use from the early Bronze Age until the beginning of the Common Era. It is named for the characteristic wedge-s ...
, but commonly in the later cuneiform
Ugaritic alphabet The Ugaritic writing system is a cuneiform abjad (consonantal alphabet) used from around either 1400 BCE or 1300 BCE for Ugaritic, an extinct Northwest Semitic language, and discovered in Ugarit (modern Ras Al Shamra), Syria, in 1928. It ...
, a vertical stroke 𒑰 was used to separate words. In
Old Persian cuneiform Old Persian cuneiform is a semi-alphabetic cuneiform script that was the primary script for Old Persian. Texts written in this cuneiform have been found in Iran ( Persepolis, Susa, Hamadan, Kharg Island), Armenia, Romania ( Gherla), Turkey ( Va ...
, a diagonally sloping wedge 𐏐 was used. As the alphabet spread throughout the ancient world, words were often run together without division, and this practice remains or remained until recently in much of South and Southeast Asia. However, not infrequently in inscriptions a vertical line, and in manuscripts a single (·), double (:), or triple (⫶)
interpunct An interpunct , also known as an interpoint, middle dot, middot and centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in ancient Latin script. (Word-separating spaces did n ...
(dot) was used to divide words. This practice was found in Phoenician,
Aramaic The Aramaic languages, short Aramaic ( syc, ܐܪܡܝܐ, Arāmāyā; oar, 𐤀𐤓𐤌𐤉𐤀; arc, 𐡀𐡓𐡌𐡉𐡀; tmr, אֲרָמִית), are a language family containing many varieties (languages and dialects) that originated i ...
,
Hebrew Hebrew (; ; ) is a Northwest Semitic language of the Afroasiatic language family. Historically, it is one of the spoken languages of the Israelites and their longest-surviving descendants, the Jews and Samaritans. It was largely preserved ...
,
Greek Greek may refer to: Greece Anything of, from, or related to Greece, a country in Southern Europe: *Greeks, an ethnic group. *Greek language, a branch of the Indo-European language family. **Proto-Greek language, the assumed last common ancestor ...
, and
Latin Latin (, or , ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through ...
, and continues today with Ethiopic, though there whitespace is gaining ground.


Scriptio continua

The early
alphabet An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a syllab ...
ic writing systems, such as the
Phoenician alphabet The Phoenician alphabet is an alphabet (more specifically, an abjad) known in modern times from the Canaanite and Aramaic inscriptions found across the Mediterranean region. The name comes from the Phoenician civilization. The Phoenician al ...
, had only signs for
consonant In articulatory phonetics, a consonant is a speech sound that is articulated with complete or partial closure of the vocal tract. Examples are and pronounced with the lips; and pronounced with the front of the tongue; and pronounced w ...
s (although some signs for consonants could also stand for a
vowel A vowel is a syllabic speech sound pronounced without any stricture in the vocal tract. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness and also in quantity (len ...
, so-called ''
matres lectionis ''Matres lectionis'' (from Latin "mothers of reading", singular form: ''mater lectionis'', from he, אֵם קְרִיאָה ) are consonants that are used to indicate a vowel, primarily in the writing down of Semitic languages such as Arabic, ...
''). Without some form of visible word dividers, parsing a text into its separate words would have been a puzzle. With the introduction of letters representing vowels in the
Greek alphabet The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BCE. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as ...
, the need for inter-word separation lessened. The earliest Greek inscriptions used interpuncts, as was common in the writing systems which preceded it, but soon the practice of ''
scriptio continua ''Scriptio continua'' (Latin for "continuous script"), also known as ''scriptura continua'' or ''scripta continua'', is a style of writing without spaces or other marks between the words or sentences. The form also lacks punctuation, diacritic ...
'', continuous writing in which all words ran together without separation became common.


Types


None

Alphabetic writing without inter-word separation, known as ''
scriptio continua ''Scriptio continua'' (Latin for "continuous script"), also known as ''scriptura continua'' or ''scripta continua'', is a style of writing without spaces or other marks between the words or sentences. The form also lacks punctuation, diacritic ...
'', was used in Ancient Egyptian. It appeared in Post-classical Latin after several centuries of the use of the interpunct. Traditionally, ''scriptio continua'' was used for the
Indic alphabets The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout the Indian subcontinent, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India ...
of South and Southeast Asia and
hangul The Korean alphabet, known as Hangul, . Hangul may also be written as following South Korea's standard Romanization. ( ) in South Korea and Chosŏn'gŭl in North Korea, is the modern official writing system for the Korean language. The l ...
of Korea, but spacing is now used with hangul and increasingly with the Indic alphabets. Today Chinese and Japanese are the most widely-used scripts consistently written without punctuation to separate words, though other scripts such as Thai and Lao also follow this writing convention. In Classical Chinese, a word and a
character Character or Characters may refer to: Arts, entertainment, and media Literature * ''Character'' (novel), a 1936 Dutch novel by Ferdinand Bordewijk * ''Characters'' (Theophrastus), a classical Greek set of character sketches attributed to The ...
were almost the same thing, so that word dividers would have been superfluous. Although Modern Mandarin has numerous polysyllabic words, and each syllable is written with a distinct character, the conceptual link between character and word or at least
morpheme A morpheme is the smallest meaningful Constituent (linguistics), constituent of a linguistic expression. The field of linguistics, linguistic study dedicated to morphemes is called morphology (linguistics), morphology. In English, morphemes are ...
remains strong, and no need is felt for word separation apart from what characters already provide. This link is also found in the
Vietnamese language Vietnamese ( vi, tiếng Việt, links=no) is an Austroasiatic language originating from Vietnam where it is the national and official language. Vietnamese is spoken natively by over 70 million people, several times as many as the rest of the ...
; however, in the
Vietnamese alphabet The Vietnamese alphabet ( vi, chữ Quốc ngữ, lit=script of the National language) is the modern Latin writing script or writing system for Vietnamese. It uses the Latin script based on Romance languages originally developed by Portuguese m ...
, virtually all syllables are separated by spaces, whether or not they form word boundaries.


Space

Space is the most common word divider, especially in
Latin script The Latin script, also known as Roman script, is an alphabetic writing system based on the letters of the classical Latin alphabet, derived from a form of the Greek alphabet which was in use in the ancient Greek city of Cumae, in southern ...
.


Vertical lines

Ancient inscribed and cuneiform scripts such as
Anatolian hieroglyphs Anatolian hieroglyphs are an indigenous logographic script native to central Anatolia, consisting of some 500 signs. They were once commonly known as Hittite hieroglyphs, but the language they encode proved to be Luwian, not Hittite, and the te ...
frequently used short vertical lines to separate words, as did
Linear B Linear B was a syllabic script used for writing in Mycenaean Greek, the earliest attested form of Greek. The script predates the Greek alphabet by several centuries. The oldest Mycenaean writing dates to about 1400 BC. It is descended from ...
. In manuscripts, vertical lines were more commonly used for larger breaks, equivalent to the Latin comma and period. This was the case for
Biblical Hebrew Biblical Hebrew (, or , ), also called Classical Hebrew, is an archaic form of the Hebrew language, a language in the Canaanite branch of Semitic languages spoken by the Israelites in the area known as the Land of Israel, roughly west of t ...
(the paseq) and continues with many Indic scripts today (the danda).


Interpunct, multiple dots, and hypodiastole

As noted above, the single and double interpunct were used in manuscripts (on paper) throughout the ancient world. For example, Ethiopic inscriptions used a vertical line, whereas manuscripts used double dots (፡) resembling a colon. The latter practice continues today, though the space is making inroads. Classical Latin used the interpunct in both paper manuscripts and stone inscriptions.(Wingo 1972:16) Ancient Greek orthography used between two and five dots as word separators, as well as the hypodiastole.


Different letter forms

In the modern
Hebrew Hebrew (; ; ) is a Northwest Semitic language of the Afroasiatic language family. Historically, it is one of the spoken languages of the Israelites and their longest-surviving descendants, the Jews and Samaritans. It was largely preserved ...
and Arabic alphabets, some letters have distinct forms at the ends and/or beginnings of words. This demarcation is used in addition to spacing.


Vertical arrangement

The
Nastaʿlīq ''Nastaliq'' (; fa, , ), also romanized as ''Nastaʿlīq'', is one of the main calligraphic hands used to write the Perso-Arabic script in the Persian and Urdu languages, often used also for Ottoman Turkish poetry, rarely for Arabic. ''Na ...
form of
Islamic calligraphy Islamic calligraphy is the artistic practice of handwriting and calligraphy, in the languages which use Arabic alphabet or the alphabets derived from it. It includes Arabic, Persian, Ottoman, and Urdu calligraphy.Chapman, Caroline (2012). ...
uses vertical arrangement to separate words. The beginning of each word is written higher than the end of the preceding word, so that a line of text takes on a sawtooth appearance. Nastaliq spread from Persia and today is used for Persian, Uyghur,
Pashto Pashto (,; , ) is an Eastern Iranian language in the Indo-European language family. It is known in historical Persian literature as Afghani (). Spoken as a native language mostly by ethnic Pashtuns, it is one of the two official langua ...
, and
Urdu Urdu (;"Urdu"
'' finger spelling Fingerspelling (or dactylology) is the representation of the letters of a writing system, and sometimes numeral systems, using only the hands. These manual alphabets (also known as finger alphabets or hand alphabets) have often been used in deaf e ...
and in
Morse code Morse code is a method used in telecommunication to encode text characters as standardized sequences of two different signal durations, called ''dots'' and ''dashes'', or ''dits'' and ''dahs''. Morse code is named after Samuel Morse, one ...
, words are separated by a pause.


Unicode

For use with computers, these marks have
codepoint In character encoding terminology, a code point, codepoint or code position is a numerical value that maps to a specific character. Code points usually represent a single grapheme—usually a letter, digit, punctuation mark, or whitespace—but ...
s in
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
: * * *


See also

* Whitespace *
Sentence spacing Sentence spacing concerns how spaces are inserted between sentences in typeset text and is a matter of typographical convention. Since the introduction of movable-type printing in Europe, various sentence spacing conventions have been used in ...
*
Speech segmentation Speech segmentation is the process of identifying the boundaries between words, syllables, or phonemes in spoken natural languages. The term applies both to the mental processes used by humans, and to artificial processes of natural language proces ...
*
Zero-width non-joiner The zero-width non-joiner (ZWNJ) is a non-printing character used in the computerization of writing systems that make use of ligatures. When placed between two characters that would otherwise be connected into a ligature, a ZWNJ causes them to b ...
* Zero-width space * Substitute blank *
Underscore An underscore, ; also called an underline, low line, or low dash; is a line drawn under a segment of text. In proofreading, underscoring is a convention that says "set this text in italic type", traditionally used on manuscript or typescript ...


References


Further reading

* * * * * {{navbox punctuation Punctuation