
The European ordering rules (EOR / EN 13710), define an ordering for strings written in languages that are written with the
Latin
Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power ...
,
Greek
Greek may refer to:
Greece
Anything of, from, or related to Greece, a country in Southern Europe:
*Greeks, an ethnic group.
*Greek language, a branch of the Indo-European language family.
**Proto-Greek language, the assumed last common ancestor ...
and
Cyrillic alphabet
An alphabet is a standardized set of basic written graphemes (called letters) that represent the phonemes of certain spoken languages. Not all writing systems represent language in this way; in a syllabary, each character represents a s ...
s. The standard covers languages used by the
European Union
The European Union (EU) is a supranational political and economic union of member states that are located primarily in Europe. The union has a total area of and an estimated total population of about 447million. The EU has often been ...
, the
European Free Trade Association
The European Free Trade Association (EFTA) is a regional trade organization and free trade area consisting of four European states: Iceland, Liechtenstein, Norway and Switzerland. The organization operates in parallel with the European ...
, and parts of the
former Soviet Union
The post-Soviet states, also known as the former Soviet Union (FSU), the former Soviet Republics and in Russia as the near abroad (russian: links=no, ближнее зарубежье, blizhneye zarubezhye), are the 15 sovereign states that wer ...
. It is a tailoring of the ''Common Tailorable Template'' of
ISO/IEC 14651.
[
] EOR can in turn be tailored for different (European) languages. But in inter-European contexts, EOR can be used without further tailoring.
Method
Just as for
ISO/IEC 14651, upon which EOR is based, EOR has 4 levels of weights.
Level 1 sorts the letters. The following
Latin
Latin (, or , ) is a classical language belonging to the Italic branch of the Indo-European languages. Latin was originally a dialect spoken in the lower Tiber area (then known as Latium) around present-day Rome, but through the power ...
letters are concerned by this level, in order:
:a b c d ð e ə ɛ f g h i j k l m n o ɔ p q r s ɯ t u v w x y z þ æ
The
Greek alphabet
The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BCE. It is derived from the earlier Phoenician alphabet, and was the earliest known alphabetic script to have distinct letters for vowels as ...
has the following order:
:α β γ δ ε Ϝ Ϛ ζ η θ ι κ λ μ ν ξ ο π Ϟ ρ σ τ υ φ χ ψ ω Ϡ
Cyrillic script
The Cyrillic script ( ), Slavonic script or the Slavic script, is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking cou ...
has the following order:
:а ӑ ӓ ә ӛ ӕ б в г ғ ҕ ґ д ђ ҙ е ӗ ё є є̈ ж ӝ җ ӂ з ӟ з́ ѕ ӡ и ӥ і ї й ј к қ ӄ ҡ ҟ ҝ л љ ꙥ м ꙧ н ң ӊ ҥ њ ӈ о ӧ ŏ ө ӫ ө̆ ѡ ꙍ ҩ п ҧ р с ҫ с́ т ҭ ћ у ў ӱ ӳ ү ұ ф х ҳ ӽ ѯ һ ц ҵ ч ӵ ҷ ӌ ҹ ҽ ҿ џ ш щ ъ ы ӹ ь ѣ э ю ю̆ я я̆ Ӏ ѫ ѭ ѧ ѩ ѱ ѳ ѵ ѷ ҁ ꙟ
The order for the three alphabets is:
# Latin alphabet
# Greek alphabet
# Cyrillic alphabet
The
Georgian and
Armenian alphabet
The Armenian alphabet ( hy, Հայոց գրեր, ' or , ') is an alphabetic writing system used to write Armenian. It was developed around 405 AD by Mesrop Mashtots, an Armenian linguist and ecclesiastical leader. The system originally ha ...
s have not been included in ENV 13710. However, they are covered in CR 14400:2001 "European ordering rules – Ordering for Latin, Greek, Cyrillic, Georgian and Armenian scripts". All scripts encoded in ISO/IEC 10646 and Unicode are covered by
ISO/IEC 14651 (and its datafile CTT) as well as
Unicode collation algorithm
The Unicode collation algorithm (UCA) is an algorithm defined in Unicode Technical Report #10, which is a customizable method to produce binary keys from strings representing text in any writing system and language that can be represented with Uni ...
(UCA and the associated DUCET), both of which are available at no charge.
Level 2 is where different additions, such as
diacritic
A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacriti ...
s and variations, to the letters are ordered. Letters with diacritical marks (like , , , and ) are ordered as variants of the base letter. , , and are ordered as modifications of , , and respectively, similarly for similar cases.
Level 2 defines the following order of diacritics and other modifications:
#
Acute accent (á)
#
Grave accent (à)
#
Breve
A breve (, less often , neuter form of the Latin "short, brief") is the diacritic mark ˘, shaped like the bottom half of a circle. As used in Ancient Greek, it is also called , . It resembles the caron (the wedge or in Czech, in ...
(ă)
#
Circumflex
The circumflex () is a diacritic in the Latin and Greek scripts that is also used in the written forms of many languages and in various romanization and transcription schemes. It received its English name from la, circumflexus "bent around"a ...
(â)
#
Caron
A caron (), háček or haček (, or ; plural ''háčeks'' or ''háčky'') also known as a hachek, wedge, check, kvačica, strešica, mäkčeň, varnelė, inverted circumflex, inverted hat, flying bird, inverted chevron, is a diacritic mark (� ...
(š)
#
Ring (å)
#
Diaeresis (ä)
#
Double acute accent (ő)
#
Tilde
The tilde () or , is a grapheme with several uses. The name of the character came into English from Spanish, which in turn came from the Latin ''titulus'', meaning "title" or "superscription". Its primary use is as a diacritic (accent) in ...
(ã)
#
Dot (ż)
#
Cedilla
A cedilla ( ; from Spanish) or cedille (from French , ) is a hook or tail ( ¸ ) added under certain letters as a diacritical mark to modify their pronunciation. In Catalan, French, and Portuguese (called cedilha) it is used only under the ...
(ş)
#
Ogonek
The (; Polish: , "little tail", diminutive of ) is a diacritic hook placed under the lower right corner of a vowel in the Latin alphabet used in several European languages, and directly under a vowel in several Native American languages. It i ...
(ą)
#
Macron
Macron may refer to:
People
* Emmanuel Macron (born 1977), president of France since 2017
** Brigitte Macron (born 1953), French teacher, wife of Emmanuel Macron
* Jean-Michel Macron (born 1950), French professor of neurology, father of Emmanu ...
(ā)
# With stroke through (ø)
# Modified letter(s) (æ)
Level 3 makes the distinction between Capital and small letters, as in "Polish" and "polish".
Level 4 concerns
punctuation
Punctuation (or sometimes interpunction) is the use of spacing, conventional signs (called punctuation marks), and certain typographical devices as aids to the understanding and correct reading of written text, whether read silently or aloud. A ...
and
whitespace character
In computer programming, whitespace is any character or series of characters that represent horizontal or vertical space in typography. When rendered, a whitespace character does not correspond to a visible mark, but typically does occupy an area ...
s. This level makes the distinction between "MacDonald" and "Mac Donald", "its" and "it's".
An optional, and usually omitted, fifth level can distinguish typographical differences, including whether the text is ''italic'', normal or bold.
See also
*
Collation
Collation is the assembly of written information into a standard order. Many systems of collation are based on numerical order or alphabetical order, or extensions and combinations thereof. Collation is a fundamental element of most office fil ...
*
Common Locale Data Repository
The Common Locale Data Repository Project, often abbreviated as CLDR, is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating sys ...
(CLDR)
*
Unicode
Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
*
Universal Character Set
The Universal Coded Character Set (UCS, Unicode) is a standard set of characters defined by the international standard ISO/ IEC 10646, ''Information technology — Universal Coded Character Set (UCS)'' (plus amendments to that standard), ...
**
DIN 91379
The DIN standard DIN 91379: "Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM" defines a normative subset of Unicode Latin characters, sequences of base character ...
– a European Unicode subset (also includes Greek and
Cyrillic for
Bulgarian), uses
UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
at interfaces, normalization form C (
NFC) – a German 2022 standard; will be mandatory for German authorities and organizations in the exchange of data from 1 November 2024
*
UTF-8
UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''.
UTF-8 is capable of ...
References
;Notes
* Hansson, Roger; Lindgren, Carl Göran; Ljung, Heléne; Lundén, Thomas. ''Språk och skrift i Europa''. SNS Förlag. (2004)
* Küster, Marc Wilhelm: ''Geordnetes Weltbild. Die Tradition des alphabetischen Sortierens von der Keilschrift bis zur EDV. Eine Kulturgeschichte.'' Niemeyer (2006) . Written by the editor of ENV 13710, it discusses in chapter 17.4 the genesis and the contents of the EOR. Cf. als
in particular als
{{refend
External links
European Ordering Rules ENV 13710 – a "European Pre-Standard"
Library science
Collation