The European ordering rules (EOR / EN 13710) define an ordering for strings written in languages that are written with the
Latin
Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
,
Greek
Greek may refer to:
Anything of, from, or related to Greece, a country in Southern Europe:
*Greeks, an ethnic group
*Greek language, a branch of the Indo-European language family
**Proto-Greek language, the assumed last common ancestor of all kno ...
and
Cyrillic
The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Ea ...
alphabet
An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...
s. The standard covers languages used by the
European Union
The European Union (EU) is a supranational union, supranational political union, political and economic union of Member state of the European Union, member states that are Geography of the European Union, located primarily in Europe. The u ...
, the
European Free Trade Association
The European Free Trade Association (EFTA) is a regional trade organization and free trade area consisting of four List of sovereign states and dependent territories in Europe, European states: Iceland, Liechtenstein, Norway and Switzerland. ...
, and parts of the
former Soviet Union
The post-Soviet states, also referred to as the former Soviet Union or the former Soviet republics, are the independent sovereign states that emerged/re-emerged from the dissolution of the Soviet Union in 1991. Prior to their independence, they ...
. It is a tailoring of the ''Common Tailorable Template'' of
ISO/IEC 14651
'ISO/IEC 14651:2016'', ''Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering'', is an International Organization for Standardiz ...
.
[
] EOR can in turn be tailored for different (European) languages. But in inter-European contexts, EOR can be used without further tailoring.
Method
Just as for
ISO/IEC 14651
'ISO/IEC 14651:2016'', ''Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering'', is an International Organization for Standardiz ...
, upon which EOR is based, EOR has 4 levels of weights.
Level 1
The first level sorts the letters. The following
Latin
Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
letters are concerned by this level, in order:
:a b c d ð e f g h i j k l m n o p q r s t u v w x y z þ
The
Greek alphabet
The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BC. It was derived from the earlier Phoenician alphabet, and is the earliest known alphabetic script to systematically write vowels as wel ...
has the following order:
:α β γ δ ε
ϝ ϛ ζ η θ ι κ λ μ ν ξ ο π
ϟ ρ σ τ υ φ χ ψ ω
ϡ
Cyrillic script
The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic languages, Slavic, Turkic languages, Turkic, Mongolic languages, Mongolic, Uralic languages, Uralic, C ...
has the following order:
:а б в г ґ д ђ ѓ е ё є ж з з́ ѕ и і ї й ј к л љ м н њ о п р с с́ т ћ ќ у ў ф х ц ч џ ш щ ъ ы ь ѣ э ю я
The order for the three alphabets is:
# Latin alphabet
# Greek alphabet
# Cyrillic alphabet
The
Georgian and
Armenian alphabet
The Armenian alphabet (, or , ) or, more broadly, the Armenian script, is an alphabetic writing system developed for Armenian and occasionally used to write other languages. It is one of the three historical alphabets of the South Caucasu ...
s had not been included in ENV 13710:2000. However, they were covered in CR 14400:2001 "European ordering rules – Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts". They have both been incorporated in and replaced by EN 13710:2011.
CEN/CENELEC: EN 13710:2011-09 ''European Ordering Rules - Ordering of characters from Latin, Greek, Cyrillic, Georgian and Armenian scripts''
/ref>
All scripts encoded in ISO/IEC 10646 (Unicode) are covered by ISO/IEC 14651
'ISO/IEC 14651:2016'', ''Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering'', is an International Organization for Standardiz ...
(and its datafile CTT) as well as Unicode collation algorithm
__NOTOC__
The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represente ...
(UCA and the associated DUCET), both of which are available at no charge.
Level 2
The second level is where different additions, such as diacritic
A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...
s and variations, to the letters are ordered. Letters with diacritical marks (like , , , and ) are ordered as variants of the base letter. , , and are ordered as modifications of , , and respectively, similarly for similar cases.
Level 2 defines the following order of diacritics and other modifications:
# Acute accent
The acute accent (), ,
is a diacritic used in many modern written languages with alphabets based on the Latin alphabet, Latin, Cyrillic script, Cyrillic, and Greek alphabet, Greek scripts. For the most commonly encountered uses of the accen ...
(á)
# Grave accent
The grave accent () ( or ) is a diacritical mark used to varying degrees in French, Dutch, Portuguese, Italian, Catalan and many other Western European languages as well as for a few unusual uses in English. It is also used in other ...
(à)
# Breve
A breve ( , less often , grammatical gender, neuter form of the Latin "short, brief") is the diacritic mark , shaped like the bottom half of a circle. As used in Ancient Greek, it is also called , . It resembles the caron (, the wedge or in ...
(ă)
# Circumflex
The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from "bent around"a translation of ...
(â)
# Caron
A caron or háček ( ), is a diacritic mark () placed over certain letters in the orthography of some languages, to indicate a change of the related letter's pronunciation.
Typographers tend to use the term ''caron'', while linguists prefer ...
(š)
# Ring
(The) Ring(s) may refer to:
* Ring (jewellery), a round band, usually made of metal, worn as ornamental jewelry
* To make a sound with a bell, and the sound made by a bell
Arts, entertainment, and media Film and TV
* ''The Ring'' (franchise), a ...
(å)
# Diaeresis (ä)
# Double acute accent
The double acute accent () is a diacritic mark of the Latin and Cyrillic scripts. It is used primarily in Hungarian or Chuvash, and consequently it is sometimes referred to by typographers as hungarumlaut. The signs formed with a regular umlau ...
(ő)
# Tilde
The tilde (, also ) is a grapheme or with a number of uses. The name of the character came into English from Spanish , which in turn came from the Latin , meaning 'title' or 'superscription'. Its primary use is as a diacritic (accent) in ...
(ã)
# Dot
A dot is usually a small, round spot.
Dot, DoT or DOT may also refer to:
Orthography
* Full stop or "period", a sentence terminator
* Dot (diacritic), a mark above or below a character (e.g. ȧ, ạ, İ, Ċ, ċ, etc.), usually to indicate sou ...
(ż)
# Cedilla
A cedilla ( ; from Spanish language, Spanish ', "small ''ceda''", i.e. small "z"), or cedille (from French , ), is a hook or tail () added under certain letters (as a diacritic, diacritical mark) to indicate that their pronunciation is modif ...
(ç)
# Ogonek
The tail or ( ; Polish: , "little tail", diminutive of ) is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European languages, and directly under a vowel in several Native American langu ...
(ą)
# Macron
Macron may refer to:
People
* Emmanuel Macron (born 1977), president of France since 2017
* Brigitte Macron (born 1953), French teacher, wife of Emmanuel Macron
* Jean-Michel Macron (born 1950), French professor of neurology, father of Emmanuel ...
(ā)
# With stroke through (ø)
# Modified letter(s) (æ)
Level 3
The third level makes the distinction between Capital and small letters, as in "Polish" and "polish".
Level 4
The fourth level concerns punctuation
Punctuation marks are marks indicating how a piece of writing, written text should be read (silently or aloud) and, consequently, understood. The oldest known examples of punctuation marks were found in the Mesha Stele from the 9th century BC, c ...
and whitespace character
A whitespace character is a character data element that represents white space when text is
rendered for display by a computer.
For example, a ''space'' character (, ASCII 32) represents blank space such as a word divider in a Western scrip ...
s. This level makes the distinction between "MacDonald" and "Mac Donald", "its" and "it's".
Level 5
An optional, and usually omitted, fifth level can distinguish typographical differences, including whether the text is ''italic'', normal or bold.
See also
* Collation
Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office fi ...
* Common Locale Data Repository
The Common Locale Data Repository (CLDR) is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating system will typically provide to ...
(CLDR)
* Unicode
Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
* Universal Character Set
The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/ IEC 10646, ''Information technology — Universal Coded Character Set (UCS)'' (plus amendments to that standard), w ...
** DIN 91379
The DIN, DIN standard DIN 91379: "Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM" defines a normative subset of Unicode Latin characters, sequences of base char ...
– a European Unicode subset (also includes Greek and Cyrillic
The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Ea ...
for Bulgarian), uses UTF-8
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8.
UTF-8 supports all 1,112,0 ...
at interfaces, normalization form C (NFC
NFC usually refers to:
* Near-field communication, a set of communication protocols for electronic devices
* National Football Conference, part of US National Football League
NFC may also refer to:
Psychology
* Need for cognition, in psychol ...
) – a German 2022 standard; will be mandatory for German authorities and organizations in the exchange of data from 1 November 2024
* UTF-8
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8.
UTF-8 supports all 1,112,0 ...
References
;Notes
* Hansson, Roger; Lindgren, Carl Göran; Ljung, Heléne; Lundén, Thomas. ''Språk och skrift i Europa''. SNS Förlag. (2004)
* Küster, Marc Wilhelm: ''Geordnetes Weltbild. Die Tradition des alphabetischen Sortierens von der Keilschrift bis zur EDV. Eine Kulturgeschichte.'' Niemeyer (2006) . Written by the editor of ENV 13710, it discusses in chapter 17.4 the genesis and the contents of the EOR. Cf. als
in particular als
{{refend
External links
European Ordering Rules
ENV 13710 – a "European Pre-Standard"
Library science
Collation