Syllabification () or syllabication (), also known as hyphenation, is the separation of a
word
A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consen ...
into
syllables, whether spoken, written or signed.
Overview
The written separation into syllables is usually marked by a
hyphen
The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation. ''Son-in-law'' is an example of a hyphenated word. The hyphen is sometimes confused with dashes ( figur ...
when using
English orthography
English orthography is the writing system used to represent spoken English, allowing readers to connect the graphemes to sound and to meaning. It includes English's norms of spelling, hyphenation, capitalisation, word breaks, emphasis, ...
(e.g., syl-la-ble) and with a period when transcribing the actually spoken syllables in the
International Phonetic Alphabet (e.g., ). For presentation purposes,
typographer
Typography is the art and technique of arranging type to make written language legible, readable and appealing when displayed. The arrangement of type involves selecting typefaces, point sizes, line lengths, line-spacing (leading), and ...
s may use an
interpunct
An interpunct , also known as an interpoint, middle dot, middot and centered dot or centred dot, is a punctuation mark consisting of a vertically centered dot used for interword separation in ancient Latin script. (Word-separating spaces did no ...
(
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
character U+00B7, e.g., syl·la·ble), a special-purpose "hyphenation point" (U+2027, e.g., syl‧la‧ble), or a
space
Space is the boundless three-dimensional extent in which objects and events have relative position and direction. In classical physics, physical space is often conceived in three linear dimensions, although modern physicists usually con ...
(e.g., syl la ble).
At the end of a line, a word is separated in writing into parts, conventionally called "syllables", if it does not fit the line and if moving it to the next line would make the first line much shorter than the others. This can be a particular problem with very long words, and with narrow columns in newspapers.
Word processing
A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no consen ...
has automated the process of
justification
Justification may refer to:
* Justification (epistemology), a property of beliefs that a person has good reasons for holding
* Justification (jurisprudence), defence in a prosecution for a criminal offenses
* Justification (theology), God's act of ...
, making syllabification of shorter words often unnecessary.
In some languages, the spoken syllables are also the basis of syllabification in writing. However, possibly due to the weak correspondence between sounds and letters in the spelling of modern English, written syllabification in English is based mostly on
etymological
Etymology () The New Oxford Dictionary of English (1998) – p. 633 "Etymology /ˌɛtɪˈmɒlədʒi/ the study of the class in words and the way their meanings have changed throughout time". is the study of the history of the form of words an ...
or
morphological, instead of
phonetic
Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. ...
, principles. For example, it is not possible to syllabify "learning" as ''lear-ning'' according to the correct syllabification of the living language. Seeing only ''lear-'' at the end of a line might mislead the reader into pronouncing the word incorrectly, as the
digraph ''ea'' can hold
many different values. The history of English orthography accounts for such phenomena.
English written syllabification therefore deals with a concept of "syllable" that does not correspond to the linguistic concept of a phonological (as opposed to morphological) unit.
As a result, even most native English speakers are unable to syllabify words according to established rules without consulting a dictionary or using a word processor. Schools usually do not provide much more advice on the topic than to consult a dictionary. In addition, there are differences between British and US syllabification and even between dictionaries of the same English variety.
In
Finnish,
Italian,
Portuguese and other nearly phonemically spelled languages, writers can in principle correctly syllabify any existing or newly created word using only general rules. In Finland, children are first taught to hyphenate every word until they produce the correct syllabification reliably, after which the hyphens can be omitted.
Algorithm
A hyphenation algorithm is a set of rules, especially one codified for implementation in a computer program, that decides at which points a word can be broken over two lines with a hyphen. For example, a hyphenation algorithm might decide that ''impeachment'' can be broken as ''impeach-ment'' or ''im-peachment'' but not ''impe-achment''.
One of the reasons for the complexity of the rules of word-breaking is that different dialects of English tend to differ on hyphenation:
American English
American English, sometimes called United States English or U.S. English, is the set of varieties of the English language native to the United States. English is the most widely spoken language in the United States and in most circumstances ...
tends to work on sound, but
British English
British English (BrE, en-GB, or BE) is, according to Lexico, Oxford Dictionaries, "English language, English as used in Great Britain, as distinct from that used elsewhere". More narrowly, it can refer specifically to the English language in ...
tends to look to the origins of the word and then to sound. There are also a large number of exceptions, which further complicates matters.
Some rules of thumb can be found in the Major Keary's "On Hyphenation – Anarchy of Pedantry." Among the
algorithm
In mathematics and computer science, an algorithm () is a finite sequence of rigorous instructions, typically used to solve a class of specific problems or to perform a computation. Algorithms are used as specifications for performing ...
ic approaches to hyphenation, the one implemented in the
TeX typesetting system is widely used. It is thoroughly documented in the first two volumes of ''
Computers and Typesetting
''Computers and Typesetting'' is a 5-volume set of books by Donald Knuth published in 1986 describing the TeX and Metafont systems for digital typography. Knuth's computers and typesetting project was the result of his frustration with the la ...
'' by Donald Knuth and in Franklin Mark Liang's dissertation. The aim of Liang's work was to get the algorithm as accurate as he practically could and to keep any exception dictionary small.
In TeX's original hyphenation patterns for American English, the exception list contains only 14 words.
In TeX
Ports of the TeX hyphenation algorithm are available as libraries for several programming languages, including
Haskell,
JavaScript
JavaScript (), often abbreviated as JS, is a programming language that is one of the core technologies of the World Wide Web, alongside HTML and CSS. As of 2022, 98% of Website, websites use JavaScript on the Client (computing), client side ...
,
Perl
Perl is a family of two High-level programming language, high-level, General-purpose programming language, general-purpose, Interpreter (computing), interpreted, dynamic programming languages. "Perl" refers to Perl 5, but from 2000 to 2019 it ...
,
PostScript
PostScript (PS) is a page description language in the electronic publishing and desktop publishing realm. It is a dynamically typed, concatenative programming language. It was created at Adobe Systems by John Warnock, Charles Geschke, ...
,
Python,
Ruby
A ruby is a pinkish red to blood-red colored gemstone, a variety of the mineral corundum (aluminium oxide). Ruby is one of the most popular traditional jewelry gems and is very durable. Other varieties of gem-quality corundum are called sapp ...
,
C#, and TeX can be made to show hyphens in the log by the command
\showhyphens
.
In
LaTeX
Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latexes are found in nature, but synthetic latexes are common as well.
In nature, latex is found as a milky fluid found in 10% of all flowering plants (angiosper ...
, hyphenation correction can be added by users by using:
\hyphenation
The
\hyphenation
command declares allowed hyphenation points in which words is a list of words, separated by spaces, in which each hyphenation point is indicated by a
-
character. For example,
\hyphenation
declares that in the current job "fortran" should not be hyphenated and that if "ergonomic" must be hyphenated, it will be at one of the indicated points.
However, there are several limits. For example, the stock
\hyphenation
command accepts only
ASCII
ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
letters by default and so it cannot be used to correct hyphenation for words with non-ASCII characters (like ''ä'', ''é'', ''ç''), which are very common in almost all languages except English. Simple workarounds exist, however.
Worked
*
Phonotactics
Phonotactics (from Ancient Greek "voice, sound" and "having to do with arranging") is a branch of phonology that deals with restrictions in a language on the permissible combinations of phonemes. Phonotactics defines permissible syllable stru ...
*
Tautosyllabic, heterosyllabic and
ambisyllabic phones
*
Syllable structure in English phonology
Notes
External links
Online Lyric Hyphenator Hyphenates English text into syllables
Online hyphenation tool Hyphenation algorithms for several languages
Hyphenation tool for the French Language Hyphenates French words with explanation
{{Authority control
Phonotactics