The European ordering rules (EOR / EN 13710) define an ordering for strings written in languages that are written with the

Latin Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...

Greek Greek may refer to: Anything of, from, or related to Greece, a country in Southern Europe: *Greeks, an ethnic group *Greek language, a branch of the Indo-European language family **Proto-Greek language, the assumed last common ancestor of all kno ...

and

Cyrillic The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Ea ...

alphabet An alphabet is a standard set of letter (alphabet), letters written to represent particular sounds in a spoken language. Specifically, letters largely correspond to phonemes as the smallest sound segments that can distinguish one word from a ...

s. The standard covers languages used by the

European Union The European Union (EU) is a supranational union, supranational political union, political and economic union of Member state of the European Union, member states that are Geography of the European Union, located primarily in Europe. The u ...

, the

European Free Trade Association The European Free Trade Association (EFTA) is a regional trade organization and free trade area consisting of four List of sovereign states and dependent territories in Europe, European states: Iceland, Liechtenstein, Norway and Switzerland. ...

, and parts of the

former Soviet Union The post-Soviet states, also referred to as the former Soviet Union or the former Soviet republics, are the independent sovereign states that emerged/re-emerged from the dissolution of the Soviet Union in 1991. Prior to their independence, they ...

. It is a tailoring of the ''Common Tailorable Template'' of

ISO/IEC 14651 'ISO/IEC 14651:2016'', ''Information technology -- International string ordering and comparison -- Method for comparing character strings and description of the common template tailorable ordering'', is an International Organization for Standardiz ...

. EOR can in turn be tailored for different (European) languages. But in inter-European contexts, EOR can be used without further tailoring.

Method

Just as for

, upon which EOR is based, EOR has 4 levels of weights.

Level 1

The first level sorts the letters. The following

letters are concerned by this level, in order: :a b c d ð e f g h i j k l m n o p q r s t u v w x y z þ The

Greek alphabet The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BC. It was derived from the earlier Phoenician alphabet, and is the earliest known alphabetic script to systematically write vowels as wel ...

has the following order: :α β γ δ ε ϝ ϛ ζ η θ ι κ λ μ ν ξ ο π ϟ ρ σ τ υ φ χ ψ ω ϡ

Cyrillic script The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic languages, Slavic, Turkic languages, Turkic, Mongolic languages, Mongolic, Uralic languages, Uralic, C ...

has the following order: :а б в г ґ д ђ ѓ е ё є ж з з́ ѕ и і ї й ј к л љ м н њ о п р с с́ т ћ ќ у ў ф х ц ч џ ш щ ъ ы ь ѣ э ю я The order for the three alphabets is: # Latin alphabet # Greek alphabet # Cyrillic alphabet The Georgian and

Armenian alphabet The Armenian alphabet (, or , ) or, more broadly, the Armenian script, is an alphabetic writing system developed for Armenian and occasionally used to write other languages. It is one of the three historical alphabets of the South Caucasu ...

s had not been included in ENV 13710:2000. However, they were covered in CR 14400:2001 "European ordering rules – Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts". They have both been incorporated in and replaced by EN 13710:2011.CEN/CENELEC: EN 13710:2011-09 ''European Ordering Rules - Ordering of characters from Latin, Greek, Cyrillic, Georgian and Armenian scripts''
/ref> All scripts encoded in ISO/IEC 10646 (Unicode) are covered by

(and its datafile CTT) as well as

Unicode collation algorithm __NOTOC__ The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represente ...

(UCA and the associated DUCET), both of which are available at no charge.

Level 2

The second level is where different additions, such as

diacritic A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...

s and variations, to the letters are ordered. Letters with diacritical marks (like , , , and ) are ordered as variants of the base letter. , , and are ordered as modifications of , , and respectively, similarly for similar cases. Level 2 defines the following order of diacritics and other modifications: #

Acute accent The acute accent (), , is a diacritic used in many modern written languages with alphabets based on the Latin alphabet, Latin, Cyrillic script, Cyrillic, and Greek alphabet, Greek scripts. For the most commonly encountered uses of the accen ...

(á) #

Grave accent The grave accent () ( or ) is a diacritical mark used to varying degrees in French, Dutch, Portuguese, Italian, Catalan and many other Western European languages as well as for a few unusual uses in English. It is also used in other ...

(à) #

Breve A breve ( , less often , grammatical gender, neuter form of the Latin "short, brief") is the diacritic mark , shaped like the bottom half of a circle. As used in Ancient Greek, it is also called , . It resembles the caron (, the wedge or in ...

(ă) #

Circumflex The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from "bent around"a translation of ...

(â) #

Caron A caron or háček ( ), is a diacritic mark () placed over certain letters in the orthography of some languages, to indicate a change of the related letter's pronunciation. Typographers tend to use the term ''caron'', while linguists prefer ...

(š) #

Ring (The) Ring(s) may refer to: * Ring (jewellery), a round band, usually made of metal, worn as ornamental jewelry * To make a sound with a bell, and the sound made by a bell Arts, entertainment, and media Film and TV * ''The Ring'' (franchise), a ...

(å) # Diaeresis (ä) #

Double acute accent The double acute accent () is a diacritic mark of the Latin and Cyrillic scripts. It is used primarily in Hungarian or Chuvash, and consequently it is sometimes referred to by typographers as hungarumlaut. The signs formed with a regular umlau ...

(ő) #

Tilde The tilde (, also ) is a grapheme or with a number of uses. The name of the character came into English from Spanish , which in turn came from the Latin , meaning 'title' or 'superscription'. Its primary use is as a diacritic (accent) in ...

(ã) #

Dot A dot is usually a small, round spot. Dot, DoT or DOT may also refer to: Orthography * Full stop or "period", a sentence terminator * Dot (diacritic), a mark above or below a character (e.g. ȧ, ạ, İ, Ċ, ċ, etc.), usually to indicate sou ...

(ż) #

Cedilla A cedilla ( ; from Spanish language, Spanish ', "small ''ceda''", i.e. small "z"), or cedille (from French , ), is a hook or tail () added under certain letters (as a diacritic, diacritical mark) to indicate that their pronunciation is modif ...

(ç) #

Ogonek The tail or ( ; Polish: , "little tail", diminutive of ) is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European languages, and directly under a vowel in several Native American langu ...

(ą) #

Macron Macron may refer to: People * Emmanuel Macron (born 1977), president of France since 2017 * Brigitte Macron (born 1953), French teacher, wife of Emmanuel Macron * Jean-Michel Macron (born 1950), French professor of neurology, father of Emmanuel ...

(ā) # With stroke through (ø) # Modified letter(s) (æ)

Level 3

The third level makes the distinction between Capital and small letters, as in "Polish" and "polish".

Level 4

The fourth level concerns

punctuation Punctuation marks are marks indicating how a piece of writing, written text should be read (silently or aloud) and, consequently, understood. The oldest known examples of punctuation marks were found in the Mesha Stele from the 9th century BC, c ...

and

whitespace character A whitespace character is a character data element that represents white space when text is rendered for display by a computer. For example, a ''space'' character (, ASCII 32) represents blank space such as a word divider in a Western scrip ...

s. This level makes the distinction between "MacDonald" and "Mac Donald", "its" and "it's".

Level 5

An optional, and usually omitted, fifth level can distinguish typographical differences, including whether the text is ''italic'', normal or bold.

References

;Notes * Hansson, Roger; Lindgren, Carl Göran; Ljung, Heléne; Lundén, Thomas. ''Språk och skrift i Europa''. SNS Förlag. (2004) * Küster, Marc Wilhelm: ''Geordnetes Weltbild. Die Tradition des alphabetischen Sortierens von der Keilschrift bis zur EDV. Eine Kulturgeschichte.'' Niemeyer (2006) . Written by the editor of ENV 13710, it discusses in chapter 17.4 the genesis and the contents of the EOR. Cf. als

in particular als

{{refend

External links

European Ordering Rules
ENV 13710 – a "European Pre-Standard" Library science Collation