
Alphabetical order is a system whereby
character string
In computer programming, a string is traditionally a sequence of characters, either as a literal constant or as some kind of variable. The latter may allow its elements to be mutated and the length changed, or it may be fixed (after creation). ...
s are placed in order based on the position of the characters in the conventional ordering of an
alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
. It is one of the methods of
collation
Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office fi ...
. In mathematics, a
lexicographical order
In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a ...
is the generalization of the alphabetical order to other data types, such as
sequences
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is call ...
of numbers or other ordered
mathematical object
A mathematical object is an abstract concept arising in mathematics. Typically, a mathematical object can be a value that can be assigned to a Glossary of mathematical symbols, symbol, and therefore can be involved in formulas. Commonly encounter ...
s.
When applied to strings or
sequences
In mathematics, a sequence is an enumerated collection of objects in which repetitions are allowed and order matters. Like a set, it contains members (also called ''elements'', or ''terms''). The number of elements (possibly infinite) is call ...
that may contain digits, numbers or more elaborate types of elements, in addition to alphabetical characters, the alphabetical order is generally called a
lexicographical order
In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a ...
.
To determine which of two strings of characters comes first when arranging in alphabetical order, their first
letters are compared. If they differ, then the string whose first letter comes earlier in the alphabet comes before the other string. If the first letters are the same, then the second letters are compared, and so on. If a position is reached where one string has no more letters to compare while the other does, then the first (shorter) string is deemed to come first in alphabetical order.
Capital or upper case letters are generally considered to be identical to their corresponding lower case letters for the purposes of alphabetical ordering, although conventions may be adopted to handle situations where two strings differ only in capitalization. Various conventions also exist for the handling of strings containing
space
Space is a three-dimensional continuum containing positions and directions. In classical physics, physical space is often conceived in three linear dimensions. Modern physicists usually consider it, with time, to be part of a boundless ...
s, modified letters, such as those with
diacritic
A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...
s, and non-letter characters such as marks of
punctuation
Punctuation marks are marks indicating how a piece of writing, written text should be read (silently or aloud) and, consequently, understood. The oldest known examples of punctuation marks were found in the Mesha Stele from the 9th century BC, c ...
.
The result of placing a set of words or strings in alphabetical order is that all of the strings beginning with the same letter are grouped together; within that grouping all words beginning with the same two-letter sequence are grouped together; and so on. The system thus tends to maximize the number of common initial letters between adjacent words.
History
The order of the letters of the alphabet is attested from the 14th century BC in the town of
Ugarit
Ugarit (; , ''ủgrt'' /ʾUgarītu/) was an ancient port city in northern Syria about 10 kilometers north of modern Latakia. At its height it ruled an area roughly equivalent to the modern Latakia Governorate. It was discovered by accident in 19 ...
on
Syria
Syria, officially the Syrian Arab Republic, is a country in West Asia located in the Eastern Mediterranean and the Levant. It borders the Mediterranean Sea to the west, Turkey to Syria–Turkey border, the north, Iraq to Iraq–Syria border, t ...
's northern coast. Tablets found there bear over one thousand cuneiform signs, but these signs are not Babylonian and there are only thirty distinct characters. About twelve of the tablets have the signs set out in alphabetic order. There are two orders found, one of which is nearly identical to the order used for
Hebrew
Hebrew (; ''ʿÎbrit'') is a Northwest Semitic languages, Northwest Semitic language within the Afroasiatic languages, Afroasiatic language family. A regional dialect of the Canaanite languages, it was natively spoken by the Israelites and ...
,
Greek
Greek may refer to:
Anything of, from, or related to Greece, a country in Southern Europe:
*Greeks, an ethnic group
*Greek language, a branch of the Indo-European language family
**Proto-Greek language, the assumed last common ancestor of all kno ...
and
Latin
Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
, and a second order very similar to that used for
Geʽez
Geez ( or ; , and sometimes referred to in scholarly literature as Classical Ethiopic) is an ancient South Semitic language. The language originates from what is now Ethiopia and Eritrea.
Today, Geez is used as the main liturgical langu ...
.
It is not known how many letters the
Proto-Sinaitic alphabet had nor what their alphabetic order was. Among its descendants, the
Ugaritic alphabet
The Ugaritic alphabet is an abjad (consonantal alphabet) with syllabic elements written using the same tools as cuneiform (i.e. pressing a wedge-shaped stylus into a clay tablet), which emerged or 1300 BCE to write Ugaritic, an extinct Nor ...
had 27 consonants, the
South Arabian alphabet
The Ancient South Arabian script (Old South Arabian: ; modern ) branched from the Proto-Sinaitic script in about the late 2nd millennium BCE, and remained in use through the late sixth century CE. It is an abjad, a writing system where only con ...
s had 29, and the
Phoenician alphabet
The Phoenician alphabet is an abjad (consonantal alphabet) used across the Mediterranean civilization of Phoenicia for most of the 1st millennium BC. It was one of the first alphabets, attested in Canaanite and Aramaic inscriptions fo ...
22. These scripts were arranged in two orders, an ''ABGDE'' order in Phoenician and an ''HMĦLQ'' order in the south; Ugaritic preserved both orders. Both sequences proved remarkably stable among the descendants of these scripts.
As applied to words, alphabetical order was first used in the 1st millennium
BCE by Northwest Semitic scribes using the
abjad
An abjad ( or abgad) is a writing system in which only consonants are represented, leaving the vowel sounds to be inferred by the reader. This contrasts with alphabets, which provide graphemes for both consonants and vowels. The term was introd ...
system. However, a range of other methods of classifying and ordering material, including geographical,
chronological
Chronology (from Latin , from Ancient Greek , , ; and , ''wikt:-logia, -logia'') is the science of arranging events in their order of occurrence in time. Consider, for example, the use of a timeline or sequence of events. It is also "the deter ...
,
hierarchical
A hierarchy (from Greek: , from , 'president of sacred rites') is an arrangement of items (objects, names, values, categories, etc.) that are represented as being "above", "below", or "at the same level as" one another. Hierarchy is an importan ...
and
by category, were preferred over alphabetical order for centuries.
[
Parts of the ]Bible
The Bible is a collection of religious texts that are central to Christianity and Judaism, and esteemed in other Abrahamic religions such as Islam. The Bible is an anthology (a compilation of texts of a variety of forms) originally writt ...
are dated to the 7th–6th centuries BCE. In the Book of Jeremiah
The Book of Jeremiah () is the second of the Latter Prophets in the Hebrew Bible, and the second of the Prophets in the Christian Old Testament. The superscription at chapter Jeremiah 1#Superscription, Jeremiah 1:1–3 identifies the book as "th ...
, the prophet utilizes the Atbash substitution cipher
In cryptography, a substitution cipher is a method of encrypting in which units of plaintext are replaced with the ciphertext, in a defined manner, with the help of a key; the "units" may be single letters (the most common), pairs of letters, t ...
, based on alphabetical order. Similarly, biblical authors used acrostic
An acrostic is a poem or other word composition in which the ''first'' letter (or syllable, or word) of each new line (or paragraph, or other recurring feature in the text) spells out a word, message or the alphabet. The term comes from the Fre ...
s based on the (ordered) Hebrew alphabet
The Hebrew alphabet (, ), known variously by scholars as the Ktav Ashuri, Jewish script, square script and block script, is a unicase, unicameral abjad script used in the writing of the Hebrew language and other Jewish languages, most notably ...
.
The first effective use of alphabetical order as a cataloging device among scholars may have been in ancient Alexandria, in the Great Library of Alexandria
The Great Library of Alexandria in Alexandria, Egypt, was one of the largest and most significant List of libraries in the ancient world, libraries of the ancient world. The library was part of a larger research institution called the Mousei ...
, which was founded around 300 BCE. The poet and scholar Callimachus
Callimachus (; ; ) was an ancient Greek poet, scholar, and librarian who was active in Alexandria during the 3rd century BC. A representative of Ancient Greek literature of the Hellenistic period, he wrote over 800 literary works, most of which ...
, who worked there, is thought to have created the world's first library catalog
A library catalog (or library catalogue in British English) is a register of all bibliography, bibliographic items found in a library or group of libraries, such as a network of libraries at several locations. A catalog for a group of libra ...
, known as the Pinakes
The ''Pinakes'' ( 'tables', plural of ''pinax'') is a lost bibliographic work composed by Callimachus (310/305–240 BCE) that is popularly considered to be the first library catalog in the West; its contents were based upon the holdings of th ...
, with scrolls shelved in alphabetical order of the first letter of authors' names.
In the 1st century BC, Roman writer Varro
Marcus Terentius Varro (116–27 BCE) was a Roman polymath and a prolific author. He is regarded as ancient Rome's greatest scholar, and was described by Petrarch as "the third great light of Rome" (after Virgil and Cicero). He is sometimes call ...
compiled alphabetic lists of authors and titles. In the 2nd century CE, Sextus Pompeius Festus
Sextus Pompeius Festus, usually known simply as Festus, was a Ancient Rome, Roman Grammarian (Greco-Roman), grammarian who probably flourished in the later 2nd century AD, perhaps at Narbo (Narbonne) in Gaul.
Work
He made a 20-volume epitome of V ...
wrote an encyclopedic epitome
An epitome (; , from ἐπιτέμνειν ''epitemnein'' meaning "to cut short") is a summary or miniature form, or an instance that represents a larger reality, also used as a synonym for embodiment. Epitomacy represents "to the degree of." A ...
of the works of Verrius Flaccus, ''De verborum significatu
''De verborum significatione libri XX'' ('Twenty Books on the Meaning of Words'), also known as the ''Lexicon of Festus'', is an epitome compiled, edited, and annotated by Sextus Pompeius Festus from the Encyclopedia, encyclopedic works of Verrius ...
'', with entries in alphabetic order. In the 3rd century CE, Harpocration
__NOTOC__
Valerius Harpocration ( or , ''gen''. Ἁρποκρατίωνος) was a Greek grammarian of Alexandria, probably working in the 2nd century AD. He is possibly the Harpocration mentioned by Julius Capitolinus (''Life of Verus'', 2) as ...
wrote a Homer
Homer (; , ; possibly born ) was an Ancient Greece, Ancient Greek poet who is credited as the author of the ''Iliad'' and the ''Odyssey'', two epic poems that are foundational works of ancient Greek literature. Despite doubts about his autho ...
ic lexicon alphabetized by all letters.
The 10th century saw major alphabetical lexicons of Greek (the ''Suda
The ''Suda'' or ''Souda'' (; ; ) is a large 10th-century Byzantine Empire, Byzantine encyclopedia of the History of the Mediterranean region, ancient Mediterranean world, formerly attributed to an author called Soudas () or Souidas (). It is an ...
''), Arabic (Ibn Faris
Ibn Faris (, , died Ray, Iran 395/1004) was a Persians, Persian linguist, scribe, scholar, philologist and lexicographer, As well as bearing the epithet ''al-Rāzī'' ('meaning 'from Ray'), ibn Fāris was also known variously by the epithets al-S ...
's ''al-Mujmal fī al-Lugha''), and Biblical Hebrew
Biblical Hebrew ( or ), also called Classical Hebrew, is an archaic form of the Hebrew language, a language in the Canaanite languages, Canaanitic branch of the Semitic languages spoken by the Israelites in the area known as the Land of Isra ...
( Menahem ben Saruq's ''Mahberet''). Alphabetical order as an aid to consultation flourished in 11th-century Italy, which contributed works on Latin ( Papias's ''Elementarium'') and Talmudic Aramaic
Jewish Babylonian Aramaic ( Aramaic: ) was the form of Middle Aramaic employed by writers in Lower Mesopotamia between the fourth and eleventh centuries. It is most commonly identified with the language of the Babylonian Talmud (which was co ...
(Nathan ben Jehiel
Nathan ben Jehiel of Rome (, 1035 – 1106) was a Jewish Italian lexicographer. He authored the Arukh, a dictionary for Rabbinic Judaism that was the first work to examine Jewish Babylonian Aramaic. He is therefore referred to as "the Arukh."
B ...
's ''Arukh'').
In the second half of the 12th century, Christian preachers adopted alphabetical tools to analyse biblical
The Bible is a collection of religious texts that are central to Christianity and Judaism, and esteemed in other Abrahamic religions such as Islam. The Bible is an anthology (a compilation of texts of a variety of forms) biblical languages ...
vocabulary. This led to the compilation of alphabetical concordances of the Bible by the Dominican friars in Paris
Paris () is the Capital city, capital and List of communes in France with over 20,000 inhabitants, largest city of France. With an estimated population of 2,048,472 residents in January 2025 in an area of more than , Paris is the List of ci ...
in the 13th century, under Hugh of Saint Cher. Older reference works such as St. Jerome's ''Interpretations of Hebrew Names'' were alphabetized for ease of consultation. The use of alphabetical order was initially resisted by scholars, who expected their students to master their area of study according to its own rational structures; its success was driven by such tools as Robert Kilwardby's index to the works of St. Augustine, which helped readers access the full original text instead of depending on the compilations of excerpts which had become prominent in 12th century scholasticism
Scholasticism was a medieval European philosophical movement or methodology that was the predominant education in Europe from about 1100 to 1700. It is known for employing logically precise analyses and reconciling classical philosophy and Ca ...
. The adoption of alphabetical order was part of the transition from the primacy of memory
Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remembe ...
to that of written works. The idea of ordering information by the order of the alphabet also met resistance from the compilers of encyclopaedias in the 12th and 13th centuries, who were all devout churchmen. They preferred to organise their material theologically – in the order of God's creation, starting with ''Deus'' (meaning God).
In 1604 Robert Cawdrey had to explain in ''Table Alphabeticall
''A Table Alphabeticall'' is the abbreviated title of the first monolingual dictionary in the English language, created by Robert Cawdrey and first published in London in 1604.
The work is notable for being the first collection of its kind. At ...
'', the first monolingual
Monoglottism ( Greek μόνος ''monos'', "alone, solitary", + γλῶττα , "tongue, language") or, more commonly, monolingualism or unilingualism, is the condition of being able to speak only a single language, as opposed to multilingualism. ...
English dictionary
A dictionary is a listing of lexemes from the lexicon of one or more specific languages, often arranged Alphabetical order, alphabetically (or by Semitic root, consonantal root for Semitic languages or radical-and-stroke sorting, radical an ...
, "Nowe if the word, which thou art desirous to finde, begin with (a) then looke in the beginning of this Table, but if with (v) looke towards the end". Although as late as 1803 Samuel Taylor Coleridge
Samuel Taylor Coleridge ( ; 21 October 177225 July 1834) was an English poet, literary critic, philosopher, and theologian who was a founder of the Romantic Movement in England and a member of the Lake Poets with his friend William Wordsworth ...
condemned encyclopedias with "an arrangement determined by the accident of initial letters", many lists are today based on this principle.
Ordering in the Latin script
Basic order and examples
The standard order of the modern ISO basic Latin alphabet
The ISO basic Latin alphabet is an international standard (beginning with ISO/IEC 646) for a Latin-script alphabet that consists of two sets (uppercase and lowercase) of 26 letters, codified in various national and international standards and u ...
is:
:A-B-C-D-E-F-G-H-I-J-K-L-M-N-O-P-Q-R-S-T-U-V-W-X-Y-Z
An example of straightforward alphabetical ordering follows:
*''As; Aster; Astrolabe; Astronomy; Astrophysics; At; Ataman; Attack; Baa''
Another example:
*''Barnacle; Be; Been; Benefit; Bent''
The above words are ordered alphabetically. ''As'' comes before ''Aster'' because they begin with the same two letters and ''As'' has no more letters after that whereas ''Aster'' does. The next three words come after ''Aster'' because their fourth letter (the first one that differs) is ''r'', which comes after ''e'' (the fourth letter of ''Aster'') in the alphabet. Those words themselves are ordered based on their sixth letters (''l'', ''n'' and ''p'' respectively). Then comes ''At'', which differs from the preceding words in the second letter (''t'' comes after ''s''). ''Ataman'' comes after ''At'' for the same reason that ''Aster'' came after ''As''. ''Attack'' follows ''Ataman'' based on comparison of their third letters, and ''Baa'' comes after all of the others because it has a different first letter.
Treatment of multiword strings
When some of the strings being ordered consist of more than one word, i.e., they contain spaces or other separators such as hyphen
The hyphen is a punctuation mark used to join words and to separate syllables of a single word. The use of hyphens is called hyphenation.
The hyphen is sometimes confused with dashes (en dash , em dash and others), which are wider, or with t ...
s, then two basic approaches may be taken. In the first approach, all strings are ordered initially according to their first word, as in the sequence:
*''Oak; Oak Hill; Oak Ridge; Oakley Park; Oakley River''
*:where all strings beginning with the separate word ''Oak'' precede all those beginning with ''Oakley'', because ''Oak'' precedes ''Oakley'' in alphabetical order.
In the second approach, strings are alphabetized as if they had no spaces or hyphens, giving the sequence:
*''Oak; Oak Hill; Oakley Park; Oakley River; Oak Ridge''
*:where ''Oak Ridge'' now comes after the ''Oakley'' strings, as it would if it were written "Oakridge".
The second approach is the one usually taken in dictionaries, and it is thus often called '' dictionary order'' by publishers
Publishing is the activities of making information, literature, music, software, and other content, physical or digital, available to the public for sale or free of charge. Traditionally, the term publishing refers to the creation and distribu ...
. The first approach has often been used in book indexes, although each publisher traditionally set its own standards for which approach to use therein; there was no ISO standard for book indexes (ISO 999
ISO 999 (Information and documentation—Guidelines for the content, organization and presentation of indexes) is an ISO standard which provides the information industry with guidelines for the content, organisation and presentation of indexes ...
) before 1975.
Special cases
Modified letters
In French, modified letters (such as those with diacritic
A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...
s) are treated the same as the base letter for alphabetical ordering purposes. For example, ''rôle'' comes between ''rock'' and ''rose'', as if it were written ''role''. However, languages that use such letters systematically generally have their own ordering rules. See below.
Ordering by surname
In most cultures where family name
In many societies, a surname, family name, or last name is the mostly hereditary portion of one's personal name that indicates one's family. It is typically combined with a given name to form the full name of a person, although several give ...
s are written after given name
A given name (also known as a forename or first name) is the part of a personal name quoted in that identifies a person, potentially with a middle name as well, and differentiates that person from the other members of a group (typically a f ...
s, it is still desired to sort lists of names (as in telephone directories) by family name first. In this case, names need to be reordered to be sorted correctly. For example, Juan Hernandes and Brian O'Leary should be sorted as "Hernandes, Juan" and "O'Leary, Brian" even if they are not written this way. Capturing this rule in a computer collation algorithm is complex, and simple attempts will fail. For example, unless the algorithm has at its disposal an extensive list of family names, there is no way to decide if "Gillian Lucille van der Waal" is "van der Waal, Gillian Lucille", "Waal, Gillian Lucille van der", or even "Lucille van der Waal, Gillian".
Ordering by surname is frequently encountered in academic contexts. Within a single multi-author paper, ordering the authors alphabetically by surname, rather than by other methods such as reverse seniority or subjective degree of contribution to the paper, is seen as a way of "acknowledg ngsimilar contributions" or "avoid ngdisharmony in collaborating groups". The practice in certain fields of ordering citation
A citation is a reference to a source. More precisely, a citation is an abbreviated alphanumeric expression embedded in the body of an intellectual work that denotes an entry in the bibliographic references section of the work for the purpose o ...
s in bibliographies by the surnames of their authors has been found to create bias in favour of authors with surnames which appear earlier in the alphabet, while this effect does not appear in fields in which bibliographies are ordered chronologically.
''The'' and other common words
If a phrase begins with a very common word (such as "the", "a" or "an", called articles in grammar), that word is sometimes ignored or moved to the end of the phrase, but this is not always the case. For example, the book " The Shining" might be treated as "Shining", or "Shining, The" and therefore before the book title "Summer of Sam
''Summer of Sam'' is a 1999 American crime thriller film about the 1977 David Berkowitz (Son of Sam) serial murders and their effect on a group of fictional residents of an Italian-American neighborhood in The Bronx in the late 1970s. It focu ...
". However, it may also be treated as simply "The Shining" and after "Summer of Sam". Similarly, "A Wrinkle in Time
''A Wrinkle in Time'' is a young adult science fantasy novel written by American author Madeleine L'Engle. First published in 1962, the book won the Newbery Medal, the Sequoyah Book Award and the Lewis Carroll Shelf Award, and was runner-u ...
" might be treated as "Wrinkle in Time", "Wrinkle in Time, A", or "A Wrinkle in Time". All three alphabetization methods are fairly easy to create by algorithm, but many programs rely on simple lexicographic order
In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a ...
ing instead.
''Mac'' prefixes
The prefixes ''M'' and ''Mc'' in Irish and Scottish surnames are abbreviations for ''Mac'' and are sometimes alphabetized as if the spelling is ''Mac'' in full. Thus ''McKinley'' might be listed before ''Mackintosh'' (as it would be if it had been spelled out as "MacKinley"). Since the advent of computer-sorted lists, this type of alphabetization is less frequently encountered, though it is still used in British telephone directories.
''St'' prefix
The prefix ''St'' or ''St.'' is an abbreviation of "Saint", and is traditionally alphabetized as if the spelling is ''Saint'' in full. Thus in a gazetteer ''St John's'' might be listed before ''Salem'' (as if it would be if it had been spelled out as "Saint John's"). Since the advent of computer-sorted lists, this type of alphabetization is less frequently encountered, though it is still sometimes used.
Ligatures
Ligatures (two or more letters merged into one symbol) which are not considered distinct letters, such as Æ and Œ in English, are typically collated as if the letters were separate—"æther" and "aether" would be ordered the same relative to all other words. This is true even when the ligature is not purely stylistic, such as in loanword
A loanword (also a loan word, loan-word) is a word at least partly assimilated from one language (the donor language) into another language (the recipient or target language), through the process of borrowing. Borrowing is a metaphorical term t ...
s and brand names.
Special rules may need to be adopted to sort strings which vary only by whether two letters are joined by a ligature.
Treatment of numerals
When some of the strings contain numerals (or other non-letter characters), various approaches are possible. Sometimes such characters are treated as if they came before or after all the letters of the alphabet. Another method is for numbers to be sorted alphabetically as they would be spelled: for example ''1776
Events January–February
* January 1 – American Revolutionary War – Burning of Norfolk: The town of Norfolk, Virginia is destroyed, by the combined actions of the British Royal Navy and occupying Patriot forces.
* January ...
'' would be sorted as if spelled out "seventeen seventy-six", and as if spelled "vingt-quatre..." (French for "twenty-four"). When numerals or other symbols are used as special graphical forms of letters, as ''1337'' for leet
Leet (or "1337"), also known as eleet or leetspeak, or simply hacker speech, is a system of modified spellings used primarily on the Internet. It often uses character replacements in ways that play on the similarity of their glyphs via refle ...
or the movie '' Seven'' (which was stylised as ''Se7en''), they may be sorted as if they were those letters. Natural sort order orders strings alphabetically, except that multi-digit numbers are treated as a single character and ordered by the value of the number encoded by the digits.
In the case of monarch
A monarch () is a head of stateWebster's II New College Dictionary. "Monarch". Houghton Mifflin. Boston. 2001. p. 707. Life tenure, for life or until abdication, and therefore the head of state of a monarchy. A monarch may exercise the highest ...
s and pope
The pope is the bishop of Rome and the Head of the Church#Catholic Church, visible head of the worldwide Catholic Church. He is also known as the supreme pontiff, Roman pontiff, or sovereign pontiff. From the 8th century until 1870, the po ...
s, although their numbers are in Roman numerals
Roman numerals are a numeral system that originated in ancient Rome and remained the usual way of writing numbers throughout Europe well into the Late Middle Ages. Numbers are written with combinations of letters from the Latin alphabet, eac ...
and resemble letters, they are normally arranged in numerical order: so, for example, even though V comes after I, the Danish king Christian IX comes after his predecessor Christian VIII.
Language-specific conventions
Languages which use an extended Latin alphabet generally have their own conventions for treatment of the extra letters. Also in some languages certain digraphs are treated as single letters for collation purposes. For example, the Spanish alphabet
Spanish orthography is the orthography used in the Spanish language. The alphabet uses the Latin script. The spelling is fairly phonemic orthography, phonemic, especially in comparison to more opaque orthographies like English orthography, Engl ...
treats ''ñ'' as a basic letter following ''n'', and formerly treated the digraphs ''ch'' and ''ll'' as basic letters following ''c'' and ''l'', respectively. Now ''ch'' and ''ll'' are alphabetized as two-letter combinations. The new alphabetization rule was issued by the Royal Spanish Academy
The Royal Spanish Academy (, ; ) is Spain's official royal institution with a mission to ensure the stability of the Spanish language. It is based in Madrid, Spain, and is affiliated with national language academies in 22 other Hispanophon ...
in 1994. These digraphs were still formally designated as letters but they are no longer so since 2010. On the other hand, the digraph ''rr'' follows ''rqu'' as expected (and did so even before the 1994 alphabetization rule), while vowels with acute accents (''á, é, í, ó, ú'') have always been ordered in parallel with their base letters, as has the letter ''ü''.
In a few cases, such as Arabic
Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
and Kiowa
Kiowa ( ) or Cáuigú () people are a Native Americans in the United States, Native American tribe and an Indigenous people of the Great Plains of the United States. They migrated southward from western Montana into the Rocky Mountains in Colora ...
, the alphabet has been completely reordered.
Alphabetization rules applied in various languages are listed below.
* In Arabic
Arabic (, , or , ) is a Central Semitic languages, Central Semitic language of the Afroasiatic languages, Afroasiatic language family spoken primarily in the Arab world. The International Organization for Standardization (ISO) assigns lang ...
, there are two main orders of the 28 letter alphabet used today. The standard and most commonly used is the '' hijāʾī'' order, which was created by the early Arab linguist Nasr ibn 'Asim al-Laythi and features a visual ordering method where letters are ordered based on their shapes. For example ''bāʾ'' (ب), ''tāʾ'' (ت), ''thāʾ'' (ث) are grouped as they have the same base shape or '' rasm'' (ٮ) and are differentiated only by consonant pointing known as '' iʻjām''. The original '' ʾabjadī'' order, which phonetically resembles that of other Semitic languages
The Semitic languages are a branch of the Afroasiatic languages, Afroasiatic language family. They include Arabic,
Amharic, Tigrinya language, Tigrinya, Aramaic, Hebrew language, Hebrew, Maltese language, Maltese, Modern South Arabian language ...
as well as Latin, is still in use today, usually limited for ordering lists in a document, analogous to Roman Numerals
Roman numerals are a numeral system that originated in ancient Rome and remained the usual way of writing numbers throughout Europe well into the Late Middle Ages. Numbers are written with combinations of letters from the Latin alphabet, eac ...
. When the ''ʾabjadī'' order is used in numbering, letters are written in a modified form to distinguish them from letters used in words and from numerals. For example, ''ʾalif'' (ا) which looks identical to the Eastern Arabic numeral one (١), a small oval loop extends clockwise of the letter's bottom, followed by a short tail (𞺀). Although these characters are rarely used digitally they are encoded in Unicode under Arabic Mathematical Alphabetic Symbols. A less common order, the ' order, is collated phonetically and was created by al-Khalil ibn Ahmad al-Farahidi
Abu ‘Abd ar-Raḥmān al-Khalīl ibn Aḥmad ibn ‘Amr ibn Tammām al-Farāhīdī al-Azdī al-Yaḥmadī (; 718 – 786 CE), known as al-Farāhīdī, or al-Khalīl, was an Arab philologist, lexicographer and leading grammarian of Basra in ...
.
* In Azerbaijani, there are eight additional letters to the standard Latin alphabet. Five of them are vowels: i, ı, ö, ü, ə and three are consonants: ç, ş, ğ. The alphabet is the same as the Turkish, with the same sounds written with the same letters, except for three additional letters: q, x and ə for sounds that do not exist in Turkish. Although all the "Turkish letters" are collated in their "normal" alphabetical order like in Turkish, the three extra letters are collated arbitrarily after letters whose sounds approach theirs. So, q is collated just after k, x (pronounced like a German ''ch'') is collated just after h and ə (pronounced roughly like an English short ''a'') is collated just after e.
* In Breton, there is no "c", "q", "x" but there are the digraphs "ch" and "c'h", which are collated between "b" and "d". For example: « buzhugenn, chug, c'hoar, daeraouenn » (earthworm, juice, sister, teardrop).
* In Czech
Czech may refer to:
* Anything from or related to the Czech Republic, a country in Europe
** Czech language
** Czechs, the people of the area
** Czech culture
** Czech cuisine
* One of three mythical brothers, Lech, Czech, and Rus
*Czech (surnam ...
and Slovak, accented vowels have secondary collating weight – compared to other letters, they are treated as their unaccented forms (in Czech, A-Á, E-É-Ě, I-Í, O-Ó, U-Ú-Ů, Y-Ý, and in Slovak, A-Á-Ä, E-É, I-Í, O-Ó-Ô, U-Ú, Y-Ý), but then they are sorted after the unaccented letters (for example, the correct lexicographic order is baa, baá, báa, báá, bab, báb, bac, bác, bač, báč n Czechand baa, baá, baä, báa, báá, báä, bäa, bäá, bää, bab, báb, bäb, bac, bác, bäc, bač, báč, bäč n Slovak. Accented consonants have primary collating weight and are collated immediately after their unaccented counterparts, with exception of Ď, Ň and Ť (in Czech) and Ď, Ĺ, Ľ, Ň, Ŕ and Ť (in Slovak), which have again secondary weight. CH is considered to be a separate letter and goes between H and I. In Slovak, DZ and DŽ are also considered separate letters and are positioned between Ď and E.
* In the Danish and Norwegian alphabet
The Danish and Norwegian alphabet is the set of symbols, forming a variant of the Latin alphabet, used for writing the Danish and Norwegian languages. It has consisted of the following 29 letters since 1917 (Norwegian) and 1948 (Danish):
The ...
s, the same extra vowels as in Swedish (see below) are also present but in a different order and with different glyph
A glyph ( ) is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A ...
s (..., X, Y, Z, Æ, Ø, Å). Also, "Aa" collates as an equivalent to "Å". The Danish alphabet has traditionally seen "W" as a variant of "V", but today "W" is considered a separate letter.
* In Dutch the combination IJ (representing IJ) was formerly to be collated as Y (or sometimes as a separate letter: Y < IJ < Z), but is currently mostly collated as 2 letters (II < IJ < IK). Exceptions are phone directories; IJ is always collated as Y here because in many Dutch family names Y is used where modern spelling would require IJ. Note that a word starting with ij that is written with a capital I is also written with a capital J, for example, the town IJmuiden, the river IJssel
The IJssel (; ) is a Dutch distributary of the river Rhine that flows northward and ultimately discharges into the IJsselmeer (before the 1932 completion of the Afsluitdijk known as the Zuiderzee), a North Sea natural harbour. It more immediatel ...
and the country IJsland (Iceland
Iceland is a Nordic countries, Nordic island country between the Atlantic Ocean, North Atlantic and Arctic Oceans, on the Mid-Atlantic Ridge between North America and Europe. It is culturally and politically linked with Europe and is the regi ...
).
* In Esperanto
Esperanto (, ) is the world's most widely spoken Constructed language, constructed international auxiliary language. Created by L. L. Zamenhof in 1887 to be 'the International Language' (), it is intended to be a universal second language for ...
, consonants with circumflex
The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from "bent around"a translation of ...
accents ( ĉ, ĝ, ĥ, ĵ, ŝ), as well as ŭ (u with breve
A breve ( , less often , grammatical gender, neuter form of the Latin "short, brief") is the diacritic mark , shaped like the bottom half of a circle. As used in Ancient Greek, it is also called , . It resembles the caron (, the wedge or in ...
), are counted as separate letters and collated separately (c, ĉ, d, e, f, g, ĝ, h, ĥ, i, j, ĵ ... s, ŝ, t, u, ŭ, v, z).
* In Estonian
Estonian may refer to:
* Something of, from, or related to Estonia, a country in the Baltic region in northern Europe
* Estonians, people from Estonia, or of Estonian descent
* Estonian language
* Estonian cuisine
* Estonian culture
See also ...
õ, ä, ö and ü are considered separate letters and collate after w. Letters š, z and ž appear in loanwords and foreign proper names only and follow the letter s in the Estonian alphabet, which otherwise does not differ from the basic Latin alphabet.
* The Faroese alphabet
Faroese orthography is the method employed to write the Faroese language, using a 29-letter Latin alphabet, although it does not include the letters C, Q, W, X and Z.
Alphabet
The Faroese alphabet consists of 29 letters derived from the Latin s ...
also has some of the Danish, Norwegian, and Swedish extra letters, namely Æ and Ø. Furthermore, the Faroese alphabet
Faroese orthography is the method employed to write the Faroese language, using a 29-letter Latin alphabet, although it does not include the letters C, Q, W, X and Z.
Alphabet
The Faroese alphabet consists of 29 letters derived from the Latin s ...
uses the Icelandic eth, which follows the D. Five of the six vowels A, I, O, U and Y can get accents and are after that considered separate letters. The consonants C, Q, X, W and Z are not found. Therefore, the first five letters are A, Á, B, D and Ð, and the last five are V, Y, Ý, Æ, Ø
* In Filipino (Tagalog) and other Philippine languages, the letter Ng is treated as a separate letter. It is pronounced as in ''sing'', ''ping-pong'', etc. By itself, it is pronounced ''nang'', but in general Filipino orthography
Filipino orthography () specifies the correct use of the writing system of the Filipino language, the national language#Philippines, national and co-official language, official languages of the Philippines, language of the Philippines.
In 2013, ...
, it is spelled as if it were two separate letters (n and g). Also, letter derivatives (such as Ñ) immediately follow the base letter. Filipino also is written with diacritics, but their use is very rare (except the tilde
The tilde (, also ) is a grapheme or with a number of uses. The name of the character came into English from Spanish , which in turn came from the Latin , meaning 'title' or 'superscription'. Its primary use is as a diacritic (accent) in ...
).
* The Finnish alphabet
Finnish orthography is based on the Latin script, and uses an alphabet derived from the Swedish alphabet, officially comprising twenty-nine letters but also including two additional letters found in some loanwords. The Finnish orthography striv ...
and collating rules are the same as those of Swedish.
* For French, the ''last'' accent in a given word determines the order. For example, in French, the following four words would be sorted this way: cote < côte < coté < côté. The letter e is ordered as e é è ê ë (œ considered as oe), same thing for o as ô ö.
* In German letters with umlaut ( Ä, Ö, Ü) are treated generally just like their non-umlauted versions; ß is always sorted as ss. This makes the alphabetic order Arbeit, Arg, Ärgerlich, Argument, Arm, Assistant, Aßlar, Assoziation. For phone directories and similar lists of names, the umlauts are to be collated like the letter combinations "ae", "oe", "ue" because a number of German surnames appear both with umlaut and in the non-umlauted form with "e" (Müller/Mueller). This makes the alphabetic order Udet, Übelacker, Uell, Ülle, Ueve, Üxküll, Uffenbach.
* The Hungarian vowels have accents, umlauts, and double accents, while consonants are written with single, double (digraphs) or triple (trigraph) characters. In collating, accented vowels are equivalent with their non-accented counterparts and double and triple characters follow their single originals. Hungarian alphabetic order is: A=Á, B, C, Cs, D, Dz, Dzs, E=É, F, G, Gy, H, I=Í, J, K, L, Ly, M, N, Ny, O=Ó, Ö=Ő, P, Q, R, S, Sz, T, Ty, U=Ú, Ü=Ű, V, W, X, Y, Z, Zs. (Before 1984, ''dz'' and ''dzs'' were not considered single letters for collation, but two letters each, d+z and d+zs instead.) It means that e.g. ''nádcukor'' should precede ''nádcsomó'' (even though ''s'' normally precedes ''u''), since ''c'' precedes ''cs'' in the collation. Difference in vowel length should only be taken into consideration if the two words are otherwise identical (e.g. ''egér, éger''). Spaces and hyphens within phrases are ignored in collation. ''Ch'' also occurs as a digraph in certain words but it is not considered as a grapheme on its own right in terms of collation.
*:A particular feature of Hungarian collation is that contracted forms of double di- and trigraphs (such as from ''gy + gy'' or from ''dzs + dzs'') should be collated as if they were written in full (independently of the fact of the contraction and the elements of the di- or trigraphs). For example, ''kaszinó'' should precede ''kassza'' (even though the fourth character ''z'' would normally come after ''s'' in the alphabet), because the fourth "character" (grapheme
In linguistics, a grapheme is the smallest functional unit of a writing system.
The word ''grapheme'' is derived from Ancient Greek ('write'), and the suffix ''-eme'' by analogy with ''phoneme'' and other emic units. The study of graphemes ...
) of the word ''kassza'' is considered a second ''sz'' (decomposing ''ssz'' into ''sz + sz''), which does follow ''i'' (in ''kaszinó'').
* In Icelandic, Þ is added, and D is followed by Ð. Each vowel (A, E, I, O, U, Y) is followed by its correspondent with acute: Á, É, Í, Ó, Ú, Ý. There is no Z, so the alphabet ends: ... X, Y, Ý, Þ, Æ, Ö.
** Both letters were also used by Anglo-Saxon
The Anglo-Saxons, in some contexts simply called Saxons or the English, were a Cultural identity, cultural group who spoke Old English and inhabited much of what is now England and south-eastern Scotland in the Early Middle Ages. They traced t ...
scribes who also used the Runic letter Wynn
Wynn or wyn (; also spelled wen, win, ƿynn, ƿyn, ƿen, and ƿin) is a letter of the Old English Latin alphabet, Old English alphabet, where it is used to represent the sound .
History The letter "W"
While the earliest Old English texts ...
to represent /w/.
** Þ (called thorn; lowercase þ) is also a Runic letter.
** Ð (called eth; lowercase ð) is the letter D with an added stroke.
* Kiowa
Kiowa ( ) or Cáuigú () people are a Native Americans in the United States, Native American tribe and an Indigenous people of the Great Plains of the United States. They migrated southward from western Montana into the Rocky Mountains in Colora ...
is ordered on phonetic principles, like the Brahmic scripts
The Brahmic scripts, also known as Indic scripts, are a family of abugida writing systems. They are used throughout South Asia, Southeast Asia and parts of East Asia. They are descended from the Brahmi script of ancient India and are used b ...
, rather than on the historical Latin order. Vowels come first, then stop consonants ordered from the front to the back of the mouth, and from negative to positive voice-onset time, then the affricates, fricatives, liquids, and nasals:
:: A, AU, E, I, O, U, B, F, P, V, D, J, T, TH, G, C, K, Q, CH, X, S, Z, L, Y, W, H, M, N
* In Lithuanian, specifically Lithuanian letters go after their Latin originals. Another change is that Y comes just before J: ... G, H, I, Į, Y, J, K...
* In Maltese alphabet
The Maltese alphabet is based on the Latin alphabet with the addition of some letters with diacritic marks and Digraph (orthography), digraphs. It is used to write the Maltese language, which evolved from the otherwise extinct Siculo-Arabic diale ...
the digraphs GĦ and IE are treated as single letters, and each is listed after the first character of the pair. The dotted letters (Ċ Ġ Ż) are collated before their originals, while Ħ is after H. Accents, apostrophes and hyphens are ignored. However, when two words sort identically these diacritics are taken into consideration, such that accented letters follow non-accented.
* In Polish, specifically Polish letters derived from the Latin alphabet are collated after their originals: A, Ą, B, C, Ć, D, E, Ę, ..., L, Ł, M, N, Ń, O, Ó, P, ..., S, Ś, T, ..., Z, Ź, Ż. The digraphs for collation purposes are treated as if they were two separate letters.
* In Pinyin alphabetical order, where words have the same basic letters in pinyin and differ only in modifying diacritics, the unmodified letter comes before the modified letter. For example, comes before (額 (''è'') before 欸 (''ê̄'')), and comes before and (路 (''lù'') before 驢 (''lǘ'') and 努 (''nǔ'') before 女 (''nǚ'')). Characters with the same pinyin letters (including modified letters and ) are arranged according to their tones in the order of "first tone (i.e., "flat tone"), second tone (rising tone), third tone (falling-rising tone), fourth tone (falling tone), fifth tone (neutral tone)", for example "媽 (''mā''), 麻 (''má''), 馬 (''mǎ''), 罵 (''mà''), 嗎 (''ma'')".
* In Portuguese, the collating order is just like in English: A, B, C, D, E, F, G, H, I, J, K, L, M, N, O, P, Q, R, S, T, U, V, W, X, Y, Z. Digraphs and letters with diacritics are not included in the alphabet.
* In Romanian, special characters derived from the Latin alphabet are collated after their originals: A, Ă, Â, ..., I, Î, ..., S, Ș, T, Ț, ..., Z.
* In Serbo-Croatian
Serbo-Croatian ( / ), also known as Bosnian-Croatian-Montenegrin-Serbian (BCMS), is a South Slavic language and the primary language of Serbia, Croatia, Bosnia and Herzegovina, and Montenegro. It is a pluricentric language with four mutually i ...
and other related South Slavic languages, the five accented characters and three conjoined characters are sorted after the originals: ..., C, Č, Ć, D, DŽ, Đ, E, ..., L, LJ, M, N, NJ, O, ..., S, Š, T, ..., Z, Ž.
* Spanish treated (until 1994) "CH" and "LL" as single letters, giving an ordering of '', , '' and '', , .'' This is not true any more since in 1994 the RAE adopted the more conventional usage, and now LL is collated between LK and LM, and CH between CG and CI. The six characters with diacritics Á, É, Í, Ó, Ú, Ü are treated as the original letters A, E, I, O, U, for example: '', , , , .'' The only Spanish-specific collating question is Ñ () as a different letter collated after N.
* In the Swedish alphabet
The Swedish alphabet () is a basic element of the Latin writing system used for the Swedish language. The 29 letters of this alphabet are the modern 26-letter basic Latin alphabet ( to ) plus , , and , in that order. It contains 20 consonants a ...
, there are three extra vowel
A vowel is a speech sound pronounced without any stricture in the vocal tract, forming the nucleus of a syllable. Vowels are one of the two principal classes of speech sounds, the other being the consonant. Vowels vary in quality, in loudness a ...
s placed at its end (..., X, Y, Z, Å, Ä, Ö), similar to the Danish and Norwegian alphabet, but with different glyphs and a different collating order. The letter "W" has been treated as a variant of "V", but in the 13th edition of ''Svenska Akademiens ordlista
''Svenska Akademiens ordlista'' (, "Word list of the Swedish Academy"), abbreviated SAOL, is a spelling dictionary published every few years by the Swedish Academy. It is a single volume that is considered the final arbiter of Swedish spellin ...
'' (2006) "W" was considered a separate letter.
* In the Turkish alphabet
The Turkish alphabet () is a Latin-script alphabet used for writing the Turkish language, consisting of 29 letters, seven of which ( Ç, Ğ, I, İ, Ö, Ş and Ü) have been modified from their Latin originals for the phonetic requirements o ...
there are six additional letters: ç, ğ, ı, ö, ş, and ü (but no q, w, and x). They are collated with ç after c, ğ after g, ı ''before'' i, ö after o, ş after s, and ü after u. Originally, when the alphabet was introduced in 1928, ı was collated after i, but the order was changed later so that letters having shapes containing dots, cedilles or other adorning marks always follow the letters with corresponding bare shapes. Note that in Turkish orthography the letter I is the majuscule of dotless ı, whereas İ is the majuscule of dotted i.
* In many Turkic languages
The Turkic languages are a language family of more than 35 documented languages, spoken by the Turkic peoples of Eurasia from Eastern Europe and Southern Europe to Central Asia, East Asia, North Asia (Siberia), and West Asia. The Turkic langua ...
(such as Azeri or the Jaꞑalif orthography for Tatar), there used to be the letter Gha (Ƣƣ), which came between G and H. It is now in disuse.
* In Vietnamese, there are seven additional letters: ă, â, đ, ê, ô, ơ, ư while f, j, w, z are absent, even though they are still in some use (like Internet address, foreign loan language). "f" is replaced by the combination "ph". The same as for "w" is "qu".
* In Volapük ä, ö and ü are counted as separate letters and collated separately (a, ä, b ... o, ö, p ... u, ü, v) while q and w are absent.
* In Welsh the digraphs CH, DD, FF, NG, LL, PH, RH, and TH are treated as single letters, and each is listed after the first character of the pair (except for NG which is listed after G), producing the order A, B, C, CH, D, DD, E, F, FF, G, NG, H, and so on. It can sometimes happen, however, that word compounding results in the juxtaposition of two letters which do ''not'' form a digraph. An example is the word LLONGYFARCH (composed from LLON + GYFARCH). This results in such an ordering as, for example, LAWR, LWCUS, LLONG, LLOM, LLONGYFARCH (NG is a digraph in LLONG, but not in LLONGYFARCH). The letter combination R+H (as distinct from the digraph RH) may similarly arise by juxtaposition in compounds, although this tends not to produce any pairs in which misidentification could affect the ordering. For the other potentially confusing letter combinations that may occur – namely, D+D and L+L – a hyphen is used in the spelling (e.g. AD-DAL, CHWIL-LYS).
Automation
Collation algorithms (in combination with sorting algorithm
In computer science, a sorting algorithm is an algorithm that puts elements of a List (computing), list into an Total order, order. The most frequently used orders are numerical order and lexicographical order, and either ascending or descending ...
s) are used in computer programming to place strings in alphabetical order. A standard example is the Unicode Collation Algorithm
__NOTOC__
The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represente ...
, which can be used to put strings containing any Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
symbols into (an extension of) alphabetical order.[ It can be made to conform to most of the language-specific conventions described above by tailoring its default collation table. Several such tailorings are collected in ]Common Locale Data Repository
The Common Locale Data Repository (CLDR) is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating system will typically provide to ...
.
Similar orderings
The principle behind alphabetical ordering can still be applied in languages that do not strictly speaking use an alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
– for example, they may be written using a syllabary
In the Linguistics, linguistic study of Written language, written languages, a syllabary is a set of grapheme, written symbols that represent the syllables or (more frequently) mora (linguistics), morae which make up words.
A symbol in a syllaba ...
or abugida
An abugida (; from Geʽez: , )sometimes also called alphasyllabary, neosyllabary, or pseudo-alphabetis a segmental Writing systems#Segmental writing system, writing system in which consonant–vowel sequences are written as units; each unit ...
– provided the symbols used have an established ordering.
For logograph
In a written language, a logogram (from Ancient Greek 'word', and 'that which is drawn or written'), also logograph or lexigraph, is a written character that represents a semantic component of a language, such as a word or morpheme. Chines ...
ic writing systems, such as Chinese hanzi
Chinese characters are logographs used to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represent the only one ...
or Japanese kanji
are logographic Chinese characters, adapted from Chinese family of scripts, Chinese script, used in the writing of Japanese language, Japanese. They were made a major part of the Japanese writing system during the time of Old Japanese and are ...
, the method of radical-and-stroke sorting
Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office fil ...
is frequently used as a way of defining an ordering on the symbols. Japanese sometimes uses pronunciation order, most commonly with the Gojūon
In the Japanese language, the is a traditional system ordering kana characters by their component phonemes, roughly analogous to alphabetical order. The "fifty" (''gojū'') in its name refers to the 5×10 grid in which the characters are dis ...
order but sometimes with the older Iroha
The is a Japanese poem. Originally the poem was attributed to Kūkai, the founder of Shingon Buddhism, but more modern research has found the date of composition to be later in the Heian period (794–1179). The first record of its existence ...
ordering.
In mathematics, lexicographical order
In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a ...
is a means of ordering sequences in a manner analogous to that used to produce alphabetical order.
Some computer applications use a version of alphabetical order that can be achieved using a very simple algorithm
In mathematics and computer science, an algorithm () is a finite sequence of Rigour#Mathematics, mathematically rigorous instructions, typically used to solve a class of specific Computational problem, problems or to perform a computation. Algo ...
, based purely on the ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
or Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
codes for characters. This may have non-standard effects such as placing all capital letters before lower-case ones. See ASCIIbetical order.
A rhyming dictionary is based on sorting words in alphabetical order starting from the last to the first letter of the word.
See also
*Collation
Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office fi ...
*Sorting
Sorting refers to ordering data in an increasing or decreasing manner according to some linear relationship among the data items.
# ordering: arranging items in a sequence ordered by some criterion;
# categorizing: grouping items with similar p ...
*Ugaritic alphabet
The Ugaritic alphabet is an abjad (consonantal alphabet) with syllabic elements written using the same tools as cuneiform (i.e. pressing a wedge-shaped stylus into a clay tablet), which emerged or 1300 BCE to write Ugaritic, an extinct Nor ...
, giving the first example of such an ordering
Notes
References
Further reading
* Chauvin, Yvonne. ''Pratique du classement alphabétique''. 4th ed. Paris: Bordas, 1977.
* Flanders, Judith. ''A Place for Everything: The Curious History of Alphabetical Order''. New York: Basic Books / Hatchette Books, 2020.
{{Authority control
Alphabets
Collation