ISO 639-3:2007, ''Codes for the representation of names of languages – Part 3: Alpha-3 code for comprehensive coverage of languages'', is an international standard for

language code A language code is a code that assigns letters or numbers as identifiers or classifiers for languages. These codes may be used to organize library collections or presentations of data, to choose the correct localizations and translations in comput ...

s in the

ISO 639 ISO 639 is a international standard, standard by the International Organization for Standardization (ISO) concerned with representation of languages and language groups. It currently consists of four sets (1-3, 5) of code, named after each part w ...

series. It defines three-letter codes for identifying languages. The standard was published by

International Organization for Standardization The International Organization for Standardization (ISO ; ; ) is an independent, non-governmental, international standard development organization composed of representatives from the national standards organizations of member countries. M ...

(ISO) on 1 February 2007. As of 2023, this edition of the standard has been officially withdrawn and replaced by ISO 639:2023. ISO 639-3 extends the

ISO 639-2 ISO 639-2:1998, ''Codes for the representation of names of languages — Part 2: Alpha-3 code'', is the second part of the ISO 639 International standard, standard, which lists Language code, codes for the representation of the names of languages ...

alpha-3 codes with an aim to cover all known

natural language A natural language or ordinary language is a language that occurs naturally in a human community by a process of use, repetition, and change. It can take different forms, typically either a spoken language or a sign language. Natural languages ...

s. The extended language coverage was based primarily on the language codes used in the ''

Ethnologue ''Ethnologue: Languages of the World'' is an annual reference publication in print and online that provides statistics and other information on the living languages of the world. It is the world's most comprehensive catalogue of languages. It w ...

'' (volumes 10–14) published by

SIL International SIL Global (formerly known as the Summer Institute of Linguistics International) is an evangelical Christian nonprofit organization whose main purpose is to study, develop and document languages, especially those that are lesser-known, to expan ...

, which is now the

registration authority Registration authorities (RAs) exist for many standards organizations, such as ISO, the Object Management Group, W3C, and others. In general, registration authorities all perform a similar function, in promoting the use of a particular standard ...

for ISO 639-3. It provides an enumeration of languages as complete as possible, including living and extinct, ancient and constructed, major and minor, written and unwritten. However, it does not include

reconstructed language Linguistic reconstruction is the practice of establishing the features of an unattested ancestor language of one or more given languages. There are two kinds of reconstruction: * Internal reconstruction uses irregularities in a single language t ...

s such as

Proto-Indo-European Proto-Indo-European (PIE) is the reconstructed common ancestor of the Indo-European language family. No direct record of Proto-Indo-European exists; its proposed features have been derived by linguistic reconstruction from documented Indo-Euro ...

. ISO 639-3 is intended for use as

metadata Metadata (or metainformation) is "data that provides information about other data", but not the content of the data itself, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive ...

codes in a wide range of applications. It is widely used in computer and information systems, such as the Internet, in which many languages need to be supported. In archives and other information storage, it is used in cataloging systems, indicating what language a resource is in or about. The codes are also frequently used in the linguistic literature and elsewhere to compensate for the fact that language names may be obscure or ambiguous.

Language codes

ISO 639-3 includes all languages in

ISO 639-1 ISO 639-1:2002, ''Codes for the representation of names of languages—Part 1: Alpha-2 code'', is the first part of the ISO 639 series of international standards for language codes. Part 1 covers the registration of "set 1" two-letter codes. The ...

and all individual languages in

. ISO 639-1 and ISO 639-2 focused on major languages, most frequently represented in the total body of the world's literature. Since ISO 639-2 also includes language collections and Part 3 does not, ISO 639-3 is not a superset of ISO 639-2. Where B and T codes exist in ISO 639-2, ISO 639-3 uses the T-codes. , the standard contains 7,916 entries. The inventory of languages is based on a number of sources including: the individual languages contained in 639-2, modern languages from the

, historic varieties, ancient languages and artificial languages from the

Linguist List The LINGUIST List is an online resource for the academic field of linguistics. It was founded by Anthony Aristar in early 1990 at the University of Western Australia, and is used as a reference by the National Science Foundation in the United S ...

, as well as languages recommended within the annual public commenting period.

Machine-readable data In communications and computing, a machine-readable medium (or computer-readable medium) is a medium capable of storing data in a format easily readable by a digital computer or a sensor. It contrasts with ''human-readable'' medium and data ...

files are provided by the registration authority. Mappings from ISO 639-1 or ISO 639-2 to ISO 639-3 can be done using these data files. ISO 639-3 is intended to assume distinctions based on criteria that are not entirely objective. It is not intended to document or provide identifiers for dialects or other sub-language variations. Nevertheless, judgments regarding distinctions between languages may be subjective, particularly in the case of language varieties without established literary traditions, usage in education or media, or other factors that contribute to language conventionalization. Therefore, the standard should not be regarded as an authoritative statement of what distinct languages exist in the world (about which there may be substantial disagreement in some cases), but rather simply one useful way for identifying different language varieties precisely.

Code space

Since the code is three-letter alphabetic, one upper bound for the number of languages that can be represented is 26 × 26 × 26 = 17,576. Since ISO 639-2 defines special codes (4), a reserved range (520) and B-only codes (22), 546 codes cannot be used in part 3. Therefore, a stricter upper bound is 17,576 − 546 = 17,030. The upper bound gets even stricter if one subtracts the language collections defined in 639-2 and the ones yet to be defined in

ISO 639-5 ISO 639-5:2008 "Codes for the representation of names of languages—Part 5: Alpha-3 code for language families and groups" is an international standard published by the International Organization for Standardization (ISO). It was developed by ISO ...

Reference names

Reference names may only use a subset of ASCII characters. According to section 6.4.1 of the standard, they may only contain the 26 letters of the

ISO basic Latin alphabet The ISO basic Latin alphabet is an international standard (beginning with ISO/IEC 646) for a Latin-script alphabet that consists of two sets (uppercase and lowercase) of 26 letters, codified in various national and international standards and u ...

and diacritics from the following set: Reference names may also include any of five ASCII punctuation marks (space, hyphen-minus, apostrophe and the parentheses), and within parentheses the full stop and the ten digits. Alternative names may use proper language orthography, if Latin, or transliteration into Latin script that contains additional characters.

Macrolanguages

There are 58 languages in ISO 639-2 which are considered, for the purposes of the standard, to be "macrolanguages" in ISO 639-3. Some of these

macrolanguage A macrolanguage is a group of mutually intelligible speech varieties, or dialect continuum, that have no traditional name in common, and which may be considered distinct languages by their speakers. Macrolanguages are used as a book-keeping mech ...

s had no individual language as defined by ISO 639-3 in the code set of ISO 639-2, e.g. ara (Generic Arabic). Others like nor (Norwegian) had their two individual parts (nno (

Nynorsk Nynorsk (; ) is one of the two official written standards of the Norwegian language, the other being Bokmål. From 12 May 1885, it became the state-sanctioned version of Ivar Aasen's standard Norwegian language (''Landsmål''), parallel to the Da ...

), nob (

Bokmål Bokmål () (, ; ) is one of the official written standards for the Norwegian language, alongside Nynorsk. Bokmål is by far the most used written form of Norwegian today, as it is adopted by 85% to 90% of the population in Norway. There is no cou ...

)) already in ISO 639-2. That means some languages (e.g. arb, Standard Arabic) that were considered by ISO 639-2 to be dialects of one language (ara) are now in ISO 639-3 in certain contexts considered to be individual languages themselves. This is an attempt to deal with varieties that may be linguistically distinct from each other, but are treated by their speakers as two forms of the same language, e.g. in cases of

diglossia In linguistics, diglossia ( , ) is where two dialects or languages are used (in fairly strict compartmentalization) by a single language community. In addition to the community's everyday or vernacular language variety (labeled "L" or "low" v ...

. For example: * ara (Generic Arabic, 639-2) * arb (Standard Arabic, 639-3) A complete list is available on the ISO 639-3 registrar's website.

Collective languages

"A collective language code element is an identifier that represents a group of individual languages that are not deemed to be one language in any usage context." These codes do not precisely represent a particular language or macrolanguage. While ISO 639-2 includes three-letter identifiers for collective languages, these codes are excluded from ISO 639-3. Hence ISO 639-3 is not a superset of ISO 639-2.

defines 3-letter collective codes for language families and groups, including the collective language codes from ISO 639-2.

Special codes

Four codes are set aside in

and ISO 639-3 for cases where none of the specific codes are appropriate. These are intended primarily for applications like databases where an ISO code is required regardless of whether one exists. * (uncoded languages, originally an abbreviation for 'miscellaneous') is intended for languages which have not (yet) been included in the ISO standard. * (multiple languages) is intended for cases where the data includes more than one language, and (for example) the database requires a single ISO code. * (undetermined) is intended for cases where the language in the data has not been identified, such as when it is mislabeled or never had been labeled. It is not intended for cases such as

Trojan Trojan or Trojans may refer to: * Of or from the ancient city of Troy * Trojan language, the language of the historical Trojans Arts and entertainment Music * '' Les Troyens'' ('The Trojans'), an opera by Berlioz, premiered part 1863, part 18 ...

where an unattested language has been given a name. * (no linguistic content / not applicable) is intended for data which is not a language at all, such as animal calls. In addition, 520 codes in the range – are 'reserved for local use'. For example, Rebecca Bettencourt assigns a code to

constructed language A constructed language (shortened to conlang) is a language whose phonology, grammar, orthography, and vocabulary, instead of having developed natural language, naturally, are consciously devised for some purpose, which may include being devise ...

s, and new assignments are made upon request. The

once used them for

extinct language An extinct language or dead language is a language with no living native speakers. A dormant language is a dead language that still serves as a symbol of ethnic identity to an ethnic group; these languages are often undergoing a process of r ...

s. Linguist List assigned one of them a generic value: , unnamed proto-language. This was used for proposed intermediate nodes in a family tree that had no name.

Maintenance processes

The code table for ISO 639-3 is open to changes. In order to protect stability of existing usage, the changes permitted are limited to: *modifications to the reference information for an entry (including names or categorizations for type and scope), *addition of new entries, *deprecation of entries that are duplicates or spurious, *merging one or more entries into another entry, and *splitting an existing language entry into multiple new language entries. The code assigned to a language is not changed unless there is also a change in denotation. Changes are made on an annual cycle. Every request is given a minimum period of three months for public review. The ISO 639-3 Web site has pages that describe "scopes of denotation" ( languoid types) and types of languages, which explain what concepts are in scope for encoding and certain criteria that need to be met. For example, constructed languages can be encoded, but only if they are designed for human communication and have a body of literature, preventing requests for idiosyncratic inventions. The registration authority documents on its Web site instructions made in the text of the ISO 639-3 standard regarding how the code tables are to be maintained. It also documents the processes used for receiving and processing change requests. A change request form is provided, and there is a second form for collecting information about proposed additions. Any party can submit change requests. When submitted, requests are initially reviewed by the registration authority for completeness. When a fully documented request is received, it is added to a published Change Request Index. Also, announcements are sent to the general LINGUIST discussion list at Linguist List and other lists the registration authority may consider relevant, inviting public review and input on the requested change. Any list owner or individual is able to request notifications of change requests for particular regions or language families. Comments that are received are published for other parties to review. Based on consensus in comments received, a change request may be withdrawn or promoted to "candidate status". Three months prior to the end of an annual review cycle (typically in September), an announcement is sent to the LINGUIST discussion list and other lists regarding Candidate Status Change Requests. All requests remain open for review and comment through the end of the annual review cycle. Decisions are announced at the end of the annual review cycle (typically in January). At that time, requests may be adopted in whole or in part, amended and carried forward into the next review cycle, or rejected. Rejections often include suggestions on how to modify proposals for resubmission. A public archive of every change request is maintained along with the decisions taken and the rationale for the decisions.

Criticism

Linguists Morey, Post and Friedman raise various criticisms of ISO 639, and in particular ISO 639-3: *The three-letter codes themselves are problematic, because while officially arbitrary technical labels, they are often derived from mnemonic abbreviations for language names, some of which are pejorative. For example, Yemsa was assigned the code jnj, from pejorative "Janejero". These codes may thus be considered offensive by native speakers. *The administration of the standard is problematic because SIL is a missionary organization with inadequate transparency and accountability. Decisions as to what deserves to be encoded as a language are made internally. While outside input may or may not be welcomed, the decisions themselves are opaque, and many linguists have given up trying to improve the standard. *Permanent identification of a language is incompatible with language change. *Languages and dialects often cannot be rigorously distinguished, and

dialect continua A dialect continuum or dialect chain is a series of language varieties spoken across some geographical area such that neighboring varieties are mutually intelligible, but the differences accumulate over distance so that widely separated variet ...

may be subdivided in many ways, whereas the standard privileges one choice. Such distinctions are often based instead on social and political factors. *ISO 639-3 may be misunderstood and misused by authorities that make decisions about people's identity and language, abolishing the right of speakers to identify or not to identify with their speech variety. Though SIL is sensitive to such issues, this problem is inherent in the nature of an established standard, which may be used (or mis-used) in ways that ISO and SIL do not intend.

Martin Haspelmath Martin Haspelmath (; born 2 February 1963 in Hoya, Lower Saxony) is a German linguist working in the field of linguistic typology. He is a researcher at the Max Planck Institute for Evolutionary Anthropology in Leipzig, where he worked from 19 ...

agrees with four of these points, but not the point about language change. He disagrees because any account of a language requires identifying it, and we can easily identify different stages of a language. He suggests that linguists may prefer to use a codification that is made at the languoid level since "it rarely matters to linguists whether what they are talking about is a language, a dialect or a close-knit family of languages." He also questions whether an ISO standard for language identification is appropriate since ISO is an industrial organization, while he views language documentation and nomenclature as a scientific endeavor. He cites the original need for standardized language identifiers as having been "the economic significance of translation and

software localization In computing, internationalization and localization ( American) or internationalisation and localisation (British), often abbreviated i18n and l10n respectively, are means of adapting to different languages, regional peculiarities and technical r ...

", for which purposes the ISO 639-1 and 639-2 standards were established. But he raises doubts about industry need for the comprehensive coverage provided by ISO 639-3, including as it does "little-known languages of small communities that are never or hardly used in writing and that are often in danger of extinction".

Usage

OLAC OLAC, the Open Language Archives Community, is an initiative to create a unified means of searching online databases of language resources for linguistic research. The information about resources is stored in XML format for easy searching. OLAC wa ...

: the Open Languages Archive Community * Microsoft

Windows 8 Windows 8 is a major release of the Windows NT operating system developed by Microsoft. It was Software release life cycle#Release to manufacturing (RTM), released to manufacturing on August 1, 2012, made available for download via Microsoft ...

: Supports all codes in ISO 639-3 at the time of release. * Wikimedia Foundation: New language-based projects (e.g. Wikipedias in new languages) must have an identifier from ISO 639-1, -2, or -3. * Other standards that rely on ISO 639-3: ** Language tags as defined by the

Internet Engineering Task Force The Internet Engineering Task Force (IETF) is a standards organization for the Internet standard, Internet and is responsible for the technical standards that make up the Internet protocol suite (TCP/IP). It has no formal membership roster ...

(IETF), as documented in: *** BCP 47: ''Best Current Practice 47'', which includes *** , which superseded , which superseded . (Therefore, all standards which depend on any of these 3 IETF standards now use ISO 639-3.) ** The

ePub EPUB is an e-book file format that uses the ".epub" file extension. The term is short for ''electronic publication'' and is sometimes stylized as ''ePUB''. EPUB is supported by many e-readers, and compatible software is available for most smart ...

3.0 standard for language metadata uses Dublin Core Metadata elements. These language metadata elements in ePubs must contain valid codes for languages. RFC5646 points to ISO 639-3 for languages without shorter IANA codes. **

Dublin Core Metadata Initiative 140px, Logo of DCMI, maintenance agency for Dublin Core Terms The Dublin Core vocabulary, also known as the Dublin Core Metadata Terms (DCMT), is a general purpose metadata vocabulary for describing resources of any type. It was first developed ...

: DCMI Metadata Term for language, via IETF's (now superseded by ). ** Internet Assigned Numbers Authority (IANA) The W3C's internationalization effort recommends the use of the IANA Language Subtag Registry for selecting codes for languages. The IANA Language Subtag Registry depends on ISO 639-3 codes for languages which did not previously have codes in other parts of the ISO 639 standard. ** HTML5: via IETF's BCP 47. ** XML: via IETF's BCP 47. ** SVG: via IETF's BCP 47. ** MODS library codes: Incorporates IETF's (now superseded by ). ** Text Encoding Initiative (TEI): via IETF's BCP 47. **

Lexical Markup Framework Language resource management – Lexical markup framework (LMF; ISO 24613), produced by ISO/TC 37, is the ISO standard for natural language processing (NLP) and machine-readable dictionary (MRD) lexicons. The scope is standardization of principles ...

: ISO specification for representation of machine-readable dictionaries. **

Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...

Common locale data repository The Common Locale Data Repository (CLDR) is a project of the Unicode Consortium to provide locale data in XML format for use in computer applications. CLDR contains locale-specific information that an operating system will typically provide to ...

: Uses several hundred codes from ISO 639-3 not included in ISO 639-2.

References

External links

ISO 639-3 Registration Authority

at the United States

Library of Congress The Library of Congress (LOC) is a research library in Washington, D.C., serving as the library and research service for the United States Congress and the ''de facto'' national library of the United States. It also administers Copyright law o ...

website
Pending ISO 639-3 applications

{{DEFAULTSORT:Iso 639-3 ISO 639 2007 works Language identifiers de:ISO 639#ISO 639-3