Use of ISO 639 codes
The language codes defined in the several sections of ISO 639 are used for bibliographic purposes and, in computing and internet environments, as a key element of locale data. The codes also find use in various applications, such as Wikipedia URLs for its different language editions.Current and historical parts of the standard
Each part of the standard is maintained by a maintenance agency, which adds codes and changes the status of codes when needed. ISO 639-6 was withdrawn in 2014.Characteristics of individual codes
Scopes: * Individual languages * Macrolanguages (Part 3) * Collections of languages (Parts 1, 2, 5). Part 1 contains only one collection (bh
), some collections were already in Part 2, and others were added only in Part 5:
** Remainder groups: 36 collections in both Part 2 and 5 are of this kind (including one that was also coded in Part 1) — for compatibility with Part 2 when Part 5 was still not published, the remainder groups do not contain any language and collection that was already coded in Part 2 (however new applications compatible with Part 5 may treat these groups inclusively, as long they respect the containment hierarchy published in Part 5 and they use the most specific collection when grouping languages);
** Regular groups: 29 collections in both Parts 2 and 5 are of this kind — for compatibility with Part 2, they can't contain other groups;
** Families: 50 new collections coded only in Part 5 (including one containing a regular group already coded in Part 2) — for compatibility with Part 2, they may contain other collections except remainder groups.
* mis
is not suitable), or an alpha-3 code for collections like standard codes in Part 5.
Types (for individual languages):
* Living languages (Parts 2, 3) (all macrolanguages are living languages)
* chb
, chg
, cop
, lui
, sam
; none are in Part 1)
* Ancient languages (Parts 1, 2, 3) (124, 19 of them are in Part 2; and 5 of them, namely ave
, chu
, lat
, pli
and san
, also have a code in Part 1: ae
, cu
, la
, pi
, sa
)
* Historical languages (Parts 2, 3) (83, 16 of them are in Part 2; none are in Part 1)
* Constructed languages (Parts 1, 2, 3) (23, 9 of them in Part 2: afh
, epo
, ido
, ile
, ina
, jbo
, tlh
, vol
, zbl
; 5 of them in Part 1: eo
, ia
, ie
, io
, vo
)
Individual languages and macrolanguages with two distinct alpha-3 codes in Part 2:
* Bibliographic (some of them were deprecated, none were defined in Part 3): these are legacy codes (based on language names in English).
* Terminologic (also defined in Part 3): these are the preferred codes (based on native language names, romanized if needed).
* All others (including collections of languages and special/reserved codes) only have a single alpha-3 code for both uses.
Relations between the parts
The different parts of ISO 639 are designed to work together, in such a way that no code means one thing in one part and something else in another. However, not all languages are in all parts, and there is a variety of different ways that specific languages and other elements are treated in the different parts. This depends, for example, whether a language is listed in Parts 1 or 2, whether it has separate B/T codes in Part 2, or is classified as a macrolanguage in Part 3, and so forth. These various treatments are detailed in the following chart. In each group of rows (one for each scope of ISO 639-3), the last four columns contain codes for a representative language that exemplifies a specific type of relation between the parts of ISO 639, the second column provides an explanation of the relationship, and the first column indicates the number of elements that have that type of relationship. For example, there are four elements that have a code in Part 1, have a B/T code, and are classified as macrolanguages in Part 3. One representative of these four elements is "Persian"fa
/per
/fas
.
These differences are due to the following factors.
In de
) has two codes in Part 2: ger
(B code) and deu
(T code), whereas there is only one code in Part 2, eng
, for the eng
corresponds to Part 2 eng
and Part 1 en
* Part 3 ast
corresponds to Part 2 ast
but lacks a code in Part 1.
Some codes (62) in Part 3 are macrolanguages. These are groups containing multiple individual languages that have a good mutual understanding and are commonly mixed or confused. Some macrolanguages developed a default standard form on one of their individual languages (e.g. Mandarin is implied by default for the Chinese macrolanguage, other individual languages may be still distinguished if needed but the specific code cmn
for Mandarin is rarely used).
* 1 macrolanguage has a Part 2 code and a Part 1 code, while its member individual languages also have codes in Part 1 and Part 2: nor
/no
contains non
/nn
, nob
/nb
; or
* 4 macrolanguages have two Part 2 codes (B/T) and a Part 1 code: per
/fas
/fa
, may
/msa
/ms
, alb
/sqi
/sq
, and chi
/zho
/zh
;
* 28 macrolanguages have a Part 2 code but no Part 1 code;
* 29 other macrolanguages only have codes in Part 3.
Collective codes in Part 2 have a code in Part 5: e.g. aus
in Parts 2 and 5, which stands for Australian languages.
* One collective code in Part 2 also has a code in Part 1: bih
/bh
.
* Some codes were added in Part 5 but had no code in Part 2: e.g. sqj
Parts 2 and 3 also have a reserved range and four special codes:
* Codes qaa
through qtz
are reserved for local use.
* There are four special codes: mis
for languages that have no code yet assigned, mul
for "multiple languages", und
for "undefined", and zxx
for "no linguistic content, not applicable".
Code space
Alpha-2 code space
"Alpha-2" codes (for codes composed of 2 letters of the ISO basic Latin alphabet) are used in ISO 639-1. When codes for a wider range of languages were desired, more than 2 letter combinations could cover (a maximum of 262 = 676),Alpha-3 code space
"Alpha-3" codes (for codes composed of 3 letters of the ISO basic Latin alphabet) are used inmis
, mul
, und
, zxx
, a reserved range qaa-qtz
(20 × 26 = 520 codes) and has 20 double entries (the B/T codes), plus 2 entries with deprecated B-codes. This sums up to 520 + 22 + 4 = 546 codes that cannot be used in part 3 to represent languages or in part 5 to represent language families or groups. The remainder is 17,576 – 546 = 17,030.
There are somewhere around six or seven thousand languages on Earth today. So those 17,030 codes are adequate to assign a unique code to each language, although some languages may end up with arbitrary codes that sound nothing like the traditional name(s) of that language.
Alpha-4 code space (withdrawn)
"Alpha-4" codes (for codes composed of 4 letters of the ISO basic Latin alphabet) were proposed to be used inSee also
* IETF language tags (based on ISO 639) * ISO 3166 (codes for countries) * ISO 15924 (codes for writing systems) * Codes for constructed languages * Language code *Notes and references
External links