KOI8-R
   HOME

TheInfoList



OR:

KOI8-R (RFC 1489) is an 8-bit
character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
, derived from the
KOI-8 KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. Маркелова Л. Н. Эксплуатация программоуправляемой вычислительной машины «Искра 226». — М.: Ма ...
encoding by the programmer Andrei Chernov in 1993 and designed to cover
Russian Russian(s) refers to anything related to Russia, including: *Russians (, ''russkiye''), an ethnic group of the East Slavic peoples, primarily living in Russia and neighboring countries *Rossiyane (), Russian language term for all citizens and peo ...
, which uses a Cyrillic alphabet. KOI8-R was based on
Russian Morse code The Russian Morse code approximates the Morse code for the Latin alphabet. It was enacted by the Russian government in 1856. Полное собрание законов Российской Империи. Собрание Второе. Том XX ...
, which was created from a
phonetic Phonetics is a branch of linguistics that studies how humans produce and perceive sounds, or in the case of sign languages, the equivalent aspects of sign. Linguists who specialize in studying the physical properties of speech are phoneticians. ...
version of Latin Morse code. As a result, Russian Cyrillic letters are in pseudo-Roman order rather than the normal Cyrillic alphabetical order. Although this may seem unnatural, if the 8th bit is stripped, the text is partially readable in ASCII and may convert to syntactically correct KOI-7. For example, "Русский Текст" in KOI8-R becomes ''rUSSKIJ tEKST'' ("Russian Text"). KOI8 stands for ''Kod Obmena Informatsiey, 8 bit'' (russian: Код Обмена Информацией, 8 бит) which means "Code for Information Exchange, 8 bit". In Microsoft Windows, KOI8-R is assigned the code page number 20866. In IBM, KOI8-R is assigned code page 878. KOI8-R also happens to cover Bulgarian, but has not been used for that purpose since CP1251 was accepted. The use of these older code pages is being replaced with
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
as a more common way to represent Cyrillic together with other languages.
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
is preferred to
KOI-8 KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. Маркелова Л. Н. Эксплуатация программоуправляемой вычислительной машины «Искра 226». — М.: Ма ...
and its variants (KOI8-R, the most popular variant, is used by less than 0.004% of websites, mainly used for Russians, which prefer other encodings, and so do Bulgarians too) or other Cyrillic encodings in modern applications, especially on the Internet, making
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of ...
the dominant encoding for web pages. (For further discussion of Unicode's complete coverage, of 436 Cyrillic letters/code points, including for
Old Cyrillic The Early Cyrillic alphabet, also called classical Cyrillic or paleo-Cyrillic, is a writing system that was developed in the First Bulgarian Empire during the late 9th century on the basis of the Greek alphabet for the Slavic people living ...
, and how single-byte character encodings, such as
Windows-1251 Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages. On the web, it is the second most-used ...
and KOI8 variants, cannot provide this, see Cyrillic script in Unicode.)


Character set

The following table shows the KOI8-R encoding. Each character is shown with its equivalent
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
code point.


See also

*
KOI8-B KOI8-B is the informal name for an 8-bit Roman / Cyrillic character set constituting the common subset of the major KOI-8 variants ( KOI8-R, KOI8-U, KOI8-RU, KOI8-E, KOI8-F). Accordingly, it is closely related to KOI8-R, but defines only t ...
, a derivation of KOI8-R with only the letter subset implemented *
KOI8-U KOI8-U (RFC 2319) is an 8-bit character encoding, designed to cover Ukrainian, which uses a Cyrillic alphabet. It is based on KOI8-R, which covers Russian and Bulgarian, but replaces eight box drawing characters with four Ukrainian letters Ґ ...
, another derivative encoding which adds Ukrainian characters *
KOI character encodings KOI (''КОИ'') is a family of several code pages for the Cyrillic script. The name stands for ''Kod obmena informatsiey'' (russian: Код обмена информацией) which means "Code for Information Interchange". A particular feature ...
*
RELCOM RELCOM or Relcom (russian: РЕЛКОМ, Релком), an acronym for "RELiable COMmunications" is a computer network in Russia. It was launched in the Soviet Union on August 1, 1990 in the Kurchatov Institute in collaboration with DEMOS co-operat ...
*
Windows-1251 Windows-1251 is an 8-bit character encoding, designed to cover languages that use the Cyrillic script such as Russian, Ukrainian, Belarusian, Bulgarian, Serbian Cyrillic, Macedonian and other languages. On the web, it is the second most-used ...
, another common Cyrillic character encoding


References


Further reading

* * * * *


External links


Universal Cyrillic decoder
an online program that may help recovering Cyrillic texts with broken KOI8-R or other
character encoding Character encoding is the process of assigning numbers to Graphics, graphical character (computing), characters, especially the written characters of Language, human language, allowing them to be Data storage, stored, Data communication, transmi ...
s. * * * * {{Character encoding Character sets Computing in the Soviet Union