HOME





KOI-8
KOI-8 (КОИ-8) is an 8-bit character set standardized in GOST 19768-74. Маркелова Л. Н. Эксплуатация программоуправляемой вычислительной машины «Искра 226». — М.: Машиностроение, 1987. — С. 41—42. It is an extension of KOI-7 which allows the use of the Latin alphabet along with the Russian alphabet, both the upper and lower case letters; however, the letter Ёё and the uppercase Ъ are missed, the latter to avoid conflicts with the delete character (both are added in most extensions, see KOI8-B). The first 127 code points are identical to ASCII with the exception of the dollar sign $ (code point 24hex) replaced by the universal currency sign ¤. The rows x8_ and x9_ (code points 128–159) might be filled with the additional control characters from EBCDIC (code points 32–63). This standard has become the base for the later Internet standards such as KOI8-RU. Unicode is pref ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


KOI Character Encodings
KOI (''КОИ'') is a family of several code pages for the Cyrillic script. The name stands for ''Kod obmena informatsiey'' () which means "Code for Information Interchange". A particular feature of the KOI code pages is that the text remains human-readable when the leftmost bit is stripped, should it inadvertently pass through equipment or software that can only deal with 7 bit wide characters. This is due to characters being placed in a special order (128 codepoints apart from the Latin letter they sound most similar to), which, however, does not correspond to the alphabetic order in any language that is written in Cyrillic and necessitates the use of lookup tables to perform Sorting algorithm, sorting. These encodings are derived from ASCII on the base of some correspondence between Latin and Cyrillic (nearly phonetical), which was already used in Russian Morse code, Russian dialect of Morse code and in MTK-2 telegraph code. The first 26 characters from А (0xE1) in KOI8-R are � ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


ISO-IR-153
ISO-IR-153 (ST SEV 358-88) is an 8-bit character set that covers the Russian and Bulgarian alphabets. Unlike the KOI encodings, this encoding lists the Cyrillic letters in their correct traditional order. This has become the basis for ISO/IEC 8859-5 and the Cyrillic Unicode block. Standards and Naming The name ISO-IR-153 refers to this set's number in the ISO-IR registry, and marks it as a set which may be used within ISO/IEC 2022. ISO-IR-153 is a subset of ISO/IEC 8859-5 (synchronised with ECMA-113 since 1988). The ISO-IR-153 documentation cites ST SEV 358-88 as the source standard. While it also cites the earlier GOST 19768-74 (which defines KOI-8 and was conformed to by the first version of ECMA-113, i.e. ISO-IR-111), it does not follow the KOI-8 layout (rather using a close modification of the letter layout from the Main code page) so this appears to be in error. The ISO-IR-153 encoding was intended to replace GOST 19768-74, and is sometimes referred to as GOST-19768-87. ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


KOI8-B
KOI8-B is the informal name for an 8-bit Roman / Cyrillic character set constituting the common subset of the major KOI-8 variants (KOI8-R, KOI8-U, KOI8-RU, KOI8-E, KOI8-F). Accordingly, it is closely related to KOI8-R, but defines only the letter subset in the upper half. As such it was implemented by some font vendors for PC Unixes like Xenix in the late 1980s. Character set The following table shows the KOI8-B encoding. Each character is shown with its equivalent Unicode code point. See also * KOI character encodings KOI (''КОИ'') is a family of several code pages for the Cyrillic script. The name stands for ''Kod obmena informatsiey'' () which means "Code for Information Interchange". A particular feature of the KOI code pages is that the text remains huma ... References External links *http://czyborra.com/charsets/koi8-b.txt.gz *http://czyborra.com/charsets/koi8-b.bdf.gz {{Character encoding Character sets Computing in the Soviet Union ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  




KOI8-R
KOI8-R (RFC 1489) is an 8-bit character encoding derived from the KOI-8 encoding by the programmer Andrei Chernov in 1993 and designed to cover Russian, which uses the Russian subset of a Cyrillic script. KOI-8, on its turn, is an 8-bit extension of the KOI-7 encoding, which inherited a phonetic correspondence of Russian and Latin letters from the MTK-2 teletype code. As a result, Russian Cyrillic letters in KOI8-R are in pseudo-Latin alphabetical order rather than the normal Cyrillic one like in ISO 8859-5. Although this may seem unnatural, this has the useful effect that if the 8th bit is stripped, the text remains partially readable in any ASCII-based encoding (including KOI8-R itself) as a case-reversed transliteration. For example, "Код для обмена и обработки информации" (the Russian meaning of the "KOI" acronym) becomes ''kOD DLQ OBMENA I OBRABOTKI INFORMACII''. KOI-8 stands for ''8-bitnyy kod dlya obmena i obrabotki informatsii'' ( ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


KOI-7
KOI-7 (КОИ-7) is a 7-bit character encoding, designed to cover Russian, which uses the Cyrillic alphabet. In Russian, KOI-7 stands for ''Kod Obmena Informatsiey, 7 bit'' (Код Обмена Информацией, 7 бит) which means "Code for Information Exchange, 7 bit". It was first standardized in GOST 13052-67 (with the 2nd revision GOST 13052-74 / ST SEV 356-76) and GOST 27463-87 / ST SEV 356-86. Shift Out (SO) and Shift In (SI) control characters In computing and telecommunications, a control character or non-printing character (NPC) is a code point in a character set that does not represent a written character or symbol. They are used as in-band signaling to cause effects other than ... are used in KOI-7, where SO starts printing Russian alphabet, Russian letters (KOI-7 N1), and SI starts printing Latin alphabet, Latin letters again (KOI-7 N0), or for lowercase and uppercase switching. This version is also known as KOI7-switched aka csKOI7switched. On ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


ISO-IR-111
ISO-IR-111 or KOI8-E is an 8-bit character set. It is a multinational extension of KOI-8 for Belarusian, Macedonian, Serbian, and Ukrainian (except Ґґ which is added to KOI8-F). The name "ISO-IR-111" refers to its registration number in the ISO-IR registry, and denotes it as a set usable with ISO/IEC 2022. It was defined by the first (1986) edition of ECMA-113, which is the Ecma International standard corresponding to , and as such also corresponds to a 1987 draft version of ISO-8859-5. The published editions of instead correspond to subsequent editions of ECMA-113, which defines a different encoding. Naming confusion ISO-IR-111, the 1985 edition of ECMA-113 (also called "ECMA-Cyrillic" or "KOI8-E"), was based on the 1974 edition of GOST 19768 (i.e. KOI-8). In 1987 ECMA-113 was redesigned. These newer editions of ECMA-113 are equivalent to ISO-8859-5, and do not follow the KOI layout. This confusion has led to a common misconception that ISO-8859-5 was defined in or ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Extended ASCII
Extended ASCII is a repertoire of character encodings that include (most of) the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes criticized, because it can be mistakenly interpreted to mean that the American National Standards Institute (ANSI) had updated its standard to include more characters, or that the term identifies a single unambiguous encoding, neither of which is the case. The ISO standard ISO 8859 was the first international standard to formalise a (limited) expansion of the ASCII character set: of the many language variants it encoded, ISO 8859-1 ("ISO Latin 1")which supports most Western European languages is best known in the West. There are many other extended ASCII encodings (more than 220 DOS and Windows codepages). EBCDIC ("the other" major character code) likewise developed many extended variants (more than 186 EBCDIC codepages) over the decades. All ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


INIS-8
INIS-8 is an 8-bit character encoding developed by the International Nuclear Information System (INIS). It is an 8-bit extension of the 7-bit INIS character set (itself a subset of ASCII), adding a G1 set, and has MIB 52. It is also known as iso-ir-50 (after the ISO 2022 registration of its G1 set) and csISO50INIS8. Character set ISO-IR-51 ISO-IR-51, "INIS Cyrillic Extension", is an alternative G1 set for 8-bit INIS, supporting KOI-8 encoded Russian alphabet The Russian alphabet (, or , more traditionally) is the script used to write the Russian language. The modern Russian alphabet consists of 33 letters: twenty consonants (, , , , , , , , , , , , , , , , , , , ), ten vowels (, , , , , , , , , ) ... letters, at the expense of the superscript and subscript digits. See also * INIS character set Footnotes References {{Character encoding Character sets International Atomic Energy Agency ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control characters a total of 128 code points. The set of available punctuation had significant impact on the syntax of computer languages and text markup. ASCII hugely influenced the design of character sets used by modern computers; for example, the first 128 code points of Unicode are the same as ASCII. ASCII encodes each code-point as a value from 0 to 127 storable as a seven-bit integer. Ninety-five code-points are printable, including digits ''0'' to ''9'', lowercase letters ''a'' to ''z'', uppercase letters ''A'' to ''Z'', and commonly used punctuation symbols. For example, the letter is represented as 105 (decimal). Also, ASCII specifies 33 non-printing control codes which originated with ; most of which are now obsolete. The control cha ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

ISO 646
ISO/IEC 646 ''Information technology — ISO 7-bit coded character set for information interchange'', is an International Organization for Standardization, ISO/International Electrotechnical Commission, IEC standard in the field of character encoding. It is equivalent to the Ecma International, ECMA standard ECMA-6 and developed in cooperation with ASCII at least since 1964. The first version of ECMA-6 had been published in 1965, based on work the ECMA's Technical Committee TC1 had carried out since December 1960. The first edition of ISO/IEC 646 was published in 1973, and the most recent, third, edition in 1991. ISO/IEC 646 specifies a 7-bit character code from which several national standards are derived. It allocates a set of 82 unique graphic characters to 7-bit code points, known as the ''invariant'' (INV) or ''basic character set'', including letters of the ISO basic Latin alphabet, Numerical digit, digits, and some common English language, English pun ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

UTF-8
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,064 valid Unicode code points using a variable-width encoding of one to four one- byte (8-bit) code units. Code points with lower numerical values, which tend to occur more frequently, are encoded using fewer bytes. It was designed for backward compatibility with ASCII: the first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single byte with the same binary value as ASCII, so that a UTF-8-encoded file using only those characters is identical to an ASCII file. Most software designed for any extended ASCII can read and write UTF-8, and this results in fewer internationalization issues than any alternative text encoding. UTF-8 is dominant for all countries/languages on the internet, with 99% global ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]  


picture info

Old Cyrillic
The Early Cyrillic alphabet, also called classical Cyrillic or paleo-Cyrillic, is an alphabetic writing system that was developed in Medieval Bulgaria in the Preslav Literary School during the late 9th century. It is used to write the Church Slavonic language, and was historically used for its ancestor, Old Church Slavonic. It was also used for other languages, but between the 18th and 20th centuries was mostly replaced by the modern Cyrillic script, which is used for some Slavic languages (such as Russian), and for East European and Asian languages that have experienced a great amount of Russian cultural influence. History The earliest form of manuscript Cyrillic, known as '' ustav'', was based on Greek uncial script, augmented by ligatures and by letters from the Glagolitic alphabet for phonemes not found in Greek. The Glagolitic script was created by the Byzantine monk Saint Cyril, possibly with the aid of his brother Saint Methodius, around 863. Most scholars agr ...
[...More Info...]      
[...Related Items...]     OR:     [Wikipedia]   [Google]   [Baidu]