DIN 91379
   HOME

TheInfoList



OR:

The DIN standard DIN 91379: "Characters and defined character sequences in Unicode for the electronic processing of names and data exchange in Europe, with CD-ROM" defines a normative subset of
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
Latin characters, sequences of base characters and diacritic signs, and special characters for use in names of persons, legal entities, products, addresses etc. The standard defines a normative mapping of Latin letters to base letters A-Z as an extension of the recommendations of
ICAO The International Civil Aviation Organization (ICAO ) is a specialized agency of the United Nations that coordinates the principles and techniques of international air navigation, and fosters the planning and development of international sch ...
. In the informative part of the standard, a set of extended characters is defined, which includes
Greek Greek may refer to: Anything of, from, or related to Greece, a country in Southern Europe: *Greeks, an ethnic group *Greek language, a branch of the Indo-European language family **Proto-Greek language, the assumed last common ancestor of all kno ...
and
Cyrillic The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Ea ...
letters as well as other special characters for names of legal entities and product names.


Languages and scripts supported

The subset supports all official languages of
European Union The European Union (EU) is a supranational union, supranational political union, political and economic union of Member state of the European Union, member states that are Geography of the European Union, located primarily in Europe. The u ...
countries as well as the official languages of
Iceland Iceland is a Nordic countries, Nordic island country between the Atlantic Ocean, North Atlantic and Arctic Oceans, on the Mid-Atlantic Ridge between North America and Europe. It is culturally and politically linked with Europe and is the regi ...
,
Liechtenstein Liechtenstein (, ; ; ), officially the Principality of Liechtenstein ( ), is a Landlocked country#Doubly landlocked, doubly landlocked Swiss Standard German, German-speaking microstate in the Central European Alps, between Austria in the east ...
,
Norway Norway, officially the Kingdom of Norway, is a Nordic countries, Nordic country located on the Scandinavian Peninsula in Northern Europe. The remote Arctic island of Jan Mayen and the archipelago of Svalbard also form part of the Kingdom of ...
,
Switzerland Switzerland, officially the Swiss Confederation, is a landlocked country located in west-central Europe. It is bordered by Italy to the south, France to the west, Germany to the north, and Austria and Liechtenstein to the east. Switzerland ...
, and also the German minority languages. To support other languages that do not use the Latin writing system, the set of normative letters contains all combinations of Latin letters with diacritical marks that are necessary for the
transliteration Transliteration is a type of conversion of a text from one script to another that involves swapping letters (thus '' trans-'' + '' liter-'') in predictable ways, such as Greek → and → the digraph , Cyrillic → , Armenian → or L ...
of names into the Latin writing system according to the
ISO The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries. Me ...
standards relevant at the time of publication. The standard supports the necessary characters for entries in the civil status registers. According to the ''Law on the Convention of September 13, 1973 on the recording of surnames and forenames in civil status registers'' information in Latin characters is to be taken over true to the letter with all diacritic marks and information in other characters is to be reproduced by transliteration, if possible in accordance with ISO standards. This support is not complete; for non-European languages that use Latin script, for example Vietnamese is supported, but not, for example, the
Togo Togo, officially the Togolese Republic, is a country in West Africa. It is bordered by Ghana to Ghana–Togo border, the west, Benin to Benin–Togo border, the east and Burkina Faso to Burkina Faso–Togo border, the north. It is one of the le ...
national languages '' '' A national language is a language (or language variant, e.g. dialect) that has some connection— de facto or de jure—with a nation. The term is applied quite differently in various contexts. One or more languages spoken as first languag ...
Ewe ( ɖ, ɛ, ƒ, ɣ, ɔ, ʋ are missing) and Kabiye (ɖ, ɛ, ɣ'','' ɩ'','' ɔ, ʊ are missing), the South African
official language An official language is defined by the Cambridge English Dictionary as, "the language or one of the languages that is accepted by a country's government, is taught in schools, used in the courts of law, etc." Depending on the decree, establishmen ...
Tshivenda (ḓ, ḽ, ṋ, ṱ are missing), the Namibian
national language '' '' A national language is a language (or language variant, e.g. dialect) that has some connection— de facto or de jure—with a nation. The term is applied quite differently in various contexts. One or more languages spoken as first languag ...
Khoekhoegowab (the click sound letters ǀ, ǁ, ǃ, ǂ are missing), or Tongan (the fakauʻa is missing). Although the characters mentioned in brackets appear in personal names in the respective countries, the standard does not mention any transliteration rules or mapping rules for writing names in basic Latin letters. In addition to the normative characters the standard defines subsets of extended characters that contain modern
Greek letters The Greek alphabet has been used to write the Greek language since the late 9th or early 8th century BC. It was derived from the earlier Phoenician alphabet, and is the earliest known alphabetic script to systematically write vowels as we ...
for
Greece Greece, officially the Hellenic Republic, is a country in Southeast Europe. Located on the southern tip of the Balkan peninsula, it shares land borders with Albania to the northwest, North Macedonia and Bulgaria to the north, and Turkey to th ...
and
Cyprus Cyprus (), officially the Republic of Cyprus, is an island country in the eastern Mediterranean Sea. Situated in West Asia, its cultural identity and geopolitical orientation are overwhelmingly Southeast European. Cyprus is the List of isl ...
,
Cyrillic letters The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Easte ...
for
Bulgaria Bulgaria, officially the Republic of Bulgaria, is a country in Southeast Europe. It is situated on the eastern portion of the Balkans directly south of the Danube river and west of the Black Sea. Bulgaria is bordered by Greece and Turkey t ...
and special characters for names of products and legal entities. Conforming applications may support additional characters, however for interface agreements or registers it may be appropriate to support only a final subset of characters and sequences based on this standard. The text of the predecessor, DIN SPEC 91379, explanations and lists of characters and sequences as Excel and
XML Extensible Markup Language (XML) is a markup language and file format for storing, transmitting, and reconstructing data. It defines a set of rules for encoding electronic document, documents in a format that is both human-readable and Machine-r ...
files can be found in ''Koordinierungsstelle für IT-Standards'' (KoSIT). This reference contains also an
XML schema An XML schema is a description of a type of XML document, typically expressed in terms of constraints on the structure and content of documents of that type, above and beyond the basic syntactical constraints imposed by XML itself. These constrai ...
file with patterns to check conformance of text to subsets defined in this standard. Lists of characters and sequences of DIN SPEC 91379 and DIN 91379 as
plain text In computing, plain text is a loose term for data (e.g. file contents) that represent only characters of readable material but not its graphical representation nor other objects ( floating-point numbers, images, etc.). It may also include a lim ...
files are available via GitHub in ''DIN 91379 Characters and Sequences''. The DIN contains few additional characters and sequences.


Application of the standard

All IT procedures used for data exchange within and between the federal and state governments or for data exchange with citizens and companies must comply with DIN 91379 from 1 November 2024. The architecture guideline for German federal IT demands the usage of the predecessor DIN SPEC 91379 in the version from July 2022. Continuous text and historic letters are not in the scope of this norm.


Structure of the standard

The DIN standard consists of a normative and an informative part. The requirements in the normative part are binding for all compliant systems. In the normative part, the letters for processing names with basic Latin letters and diacritics are specified. All compliant systems must support these letters. Furthermore, a mapping of the normative letters to the basic Latin letters A-Z is defined. A compliant system may support additional letters in addition to the normative letters. The recommendations in the informative part are not binding for compliant systems. The informative part determines a UNICODE subset of extended letters, e.g. for legal entities, product names and for data exchange in the EU. In addition the informative part defines data types that can be used for checking data fields.


Normative part


Compliance

To be compliant to this norm, it is required to * support all normative letters and sequences at all processing stages, * use the encoding
UTF-8 UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,0 ...
at interfaces, and * normalize the characters according to Unicode normalization form C ( NFC).


Normative letters

Any conforming IT system must be able to process the normative letters in all name fields. This includes the collection, storage, transmission, display, and printout. The normative character groups are given below. The associated characters can also be found in ''DIN 91379 Characters and Sequences'' for machine processing. The following tables of characters were generated from the XML file ''chars.xml'' in the DIN appendix.


Latin letters (bll)

These letters must be supported to represent names, especially personal names.


Non-letters N1 (bnlreq)

These characters must be supported to represent names, especially personal names.


Non-letters N2 (bnl)

These characters must be supported to represent names in a broader sense, e. g. place names, street names, house numbers, legal entity names, and product names. They are not suitable for personal names.


Non-letters N3 (bnlopt)

These letters are included for backwards compatibility with the standard ''Latin characters in Unicode. Version 1.1.1''. They are not relevant for personal names or other names, only for legal entity names and product names.


Non-letters N4 (bnlnot)

These
whitespace White space or whitespace may refer to: Technology * Whitespace characters, characters in computing that represent horizontal or vertical space * White spaces (radio), allocated but locally unused radio frequencies * TV White Space Database, a m ...
letters are unsuitable for representing names, but they must be processed. The letter ''NO-BREAK SPACE'' is necessary to prevent a line break in special names that could change the meaning. The other letters are included for backwards compatibility with the standard ''Latin characters in Unicode. Version 1.1.1''.


Deprecated letters

Existing documents and register entries contain deprecated letters that are no longer used today. These letters must be supported by compliant IT systems. When creating new entries, deprecated letters should not be used.


Normative mapping of Latin letters to basic letters (search form)

A normative mapping of all normative letters to the basic Latin letters A–Z is given below. This mapping is required, for example, for the machine-readable zone of passports. Another application is the creation of search forms, so that names can be found even if they are spelled differently or without specifying the diacritics. The following table is based on table 9 of DIN 91379 and chapter 6, table A of the ICAO specifications for machine-readable travel documents. The table was created with the information from the XML file ''chars.xml'' in the DIN 91379 appendix. Entries that appear in the ICAO specification and in table 9 of DIN are marked with ''ICAO'' in the ''Mapping'' column, additional entries in table 9 of the DIN are marked with ''EXT''. In the ''Type'' column, ''ID'' is specified for entries that describe an identity mapping, and ''MAP'' for other mappings.


Informative part


Extended letters

Each conforming IT system should be able to handle the extended letters for all name fields. This includes the collection, storage, transmission, display, and printout.


Greek letters (gl)

For cross-border data exchange, every IT system should support Greek letters in name fields.


Cyrillic letters (cl)

For cross-border data exchange, every IT system should support Cyrillic letters in name fields for Bulgarian names.


Non-letters E1 (enl)

These letters should be supported for legal entity names and product names.


Technical data types (informative)

For information, technical data types are defined as subsets of the letters defined in the standard. These can be used for interface agreements, for technical checks or as a basis for creating your own data types. An implementation as an XML schema type is included in the ''din-91379-datatypes.xsd'' file attached to the standard. This implementation is also freely available under the CC BY-ND license as part of the XOEV library.


Added letters

Compared to DIN SPEC 91379, some additional letters have been included, only two of these letters are not deprecated.


Current state

Current results of the standardization process include the specification DIN SPEC 91379 in March 2019 and final DIN standard in August 2022. The CEN/TC 224/WG 19 working group is working on the further development of this standard into the European standard EN 00224284 in the 04301181 project. According to AFNOR norminfo the project started in Dec. 2024 with a design phase, in April 2026 a public inquiry should start and the publication of the standard is planned for Nov. 2027.


Open-source software supporting DIN 91379

* Free Java library for creating and editing PDF supporting DIN 91379: ** OpenPDF (Project has not been active since Nov. 2024) **OpenPDFSaucer contains a maintained fork of OpenPDF * Free converter from XSL formatting objects to PDF ** Apache FOP * Free Fonts for DIN 91379 ** Arimo ** Noto Latin, Greek, Cyrillic, see also issue "Combining comma above right" at wrong position ** Sudo coding font


Related standards


Keyboard standard DIN 2137

The German
keyboard layout A keyboard layout is any specific physical, visual, or functional arrangement of the keys, legends, or key-meaning associations (respectively) of a computer keyboard, mobile phone, or other computer-controlled typographic keyboard. Standard keybo ...
s E1 and E2 standardized in the DIN 2137-1 standard enable the entry of all characters listed in DIN 91379 except Cyrillic letters without recourse to their Unicode value or their decimal code. Achieving this was one of the main reasons for revising these keyboard layouts compared to the previous version DIN 2137-1:2018-12.DIN 2137-1:2023-08, section “Vorwort”, subsection “Änderungen”, page 4 (full text subject to charge)


Character naming and spelling standard DIN 5009

The version of
DIN 5009 The German national standard DIN 5009 ''“Word and information processing for office applications — Announcing and dictating of text and characters”'' by Deutsches Institut für Normung (DIN) provides rules for the spoken announcement of text ...
:2022-06 “Word and information processing for office applications — Announcing and dictating of text and characters” published in May 2022 together with its supplement "Announcing, naming and keyboard input of special letters and characters" contains German-language names, spelling rules and spelling announcement words for all characters listed in DIN 91379 (except some outdated characters and the
Greek Greek may refer to: Anything of, from, or related to Greece, a country in Southern Europe: *Greeks, an ethnic group *Greek language, a branch of the Indo-European language family **Proto-Greek language, the assumed last common ancestor of all kno ...
and
Cyrillic The Cyrillic script ( ) is a writing system used for various languages across Eurasia. It is the designated national script in various Slavic, Turkic, Mongolic, Uralic, Caucasian and Iranic-speaking countries in Southeastern Europe, Ea ...
letters). This ensures that the characters can be reproduced correctly in oral communication (e.g. on the telephone).


Notes


References


External links

* * * Adobe Glyph List {{Latin alphabet 91379 Unicode Character sets Latin script