HOME

TheInfoList



OR:

Several 8-bit
character sets Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
(encodings) were designed for binary representation of common Western European languages ( Italian, Spanish, Portuguese, French, German, Dutch, English, Danish, Swedish, Norwegian, and Icelandic), which use the
Latin alphabet The Latin alphabet, also known as the Roman alphabet, is the collection of letters originally used by the Ancient Rome, ancient Romans to write the Latin language. Largely unaltered except several letters splitting—i.e. from , and from � ...
, a few additional letters and ones with precomposed
diacritic A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...
s, some
punctuation Punctuation marks are marks indicating how a piece of writing, written text should be read (silently or aloud) and, consequently, understood. The oldest known examples of punctuation marks were found in the Mesha Stele from the 9th century BC, c ...
, and various
symbol A symbol is a mark, Sign (semiotics), sign, or word that indicates, signifies, or is understood as representing an idea, physical object, object, or wikt:relationship, relationship. Symbols allow people to go beyond what is known or seen by cr ...
s (including some Greek letters). These character sets also happen to support many other languages such as Malay, Swahili, and
Classical Latin Classical Latin is the form of Literary Latin recognized as a Literary language, literary standard language, standard by writers of the late Roman Republic and early Roman Empire. It formed parallel to Vulgar Latin around 75 BC out of Old Latin ...
. ''This material is technically obsolete, having been functionally replaced by
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
. However it continues to have historical interest.''


Summary

The ISO-8859 series of 8-bit
character sets Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
encodes all
Latin Latin ( or ) is a classical language belonging to the Italic languages, Italic branch of the Indo-European languages. Latin was originally spoken by the Latins (Italic tribe), Latins in Latium (now known as Lazio), the lower Tiber area aroun ...
character sets used in
Europe Europe is a continent located entirely in the Northern Hemisphere and mostly in the Eastern Hemisphere. It is bordered by the Arctic Ocean to the north, the Atlantic Ocean to the west, the Mediterranean Sea to the south, and Asia to the east ...
, albeit that the same
code point A code point, codepoint or code position is a particular position in a Table (database), table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dime ...
s have multiple uses that caused some difficulty (including
mojibake Mojibake (; , 'character transformation') is the garbled or gibberish text that is the result of text being decoded using an unintended character encoding. The result is a systematic replacement of symbols with completely unrelated ones, often ...
, or garbled characters, and communication issues). The arrival of
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
, with a unique code point for every
glyph A glyph ( ) is any kind of purposeful mark. In typography, a glyph is "the specific shape, design, or representation of a character". It is a particular graphical representation, in a particular typeface, of an element of written language. A ...
, resolved these issues. * ISO/IEC 8859-1 or Latin-1 is the most used and also defines the first 256 codepoints in
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
. * ISO/IEC 8859-15 modifies
ISO-8859-1 ISO/IEC 8859-1:1998, ''Information technology—8-bit computing, 8-bit single-byte coded graphic character (computing), character sets—Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character enc ...
to fully support Estonian, Finnish and French and add the
euro sign The euro sign () is the currency sign used for the euro, the official currency of the eurozone. The design was presented to the public by the European Commission on 12 December 1996. It consists of a stylized letter E (or epsilon), crossed by ...
. * Windows-1252 is a superset of
ISO-8859-1 ISO/IEC 8859-1:1998, ''Information technology—8-bit computing, 8-bit single-byte coded graphic character (computing), character sets—Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character enc ...
that includes the printable characters from ISO/IEC 8859-15 and popular
punctuation Punctuation marks are marks indicating how a piece of writing, written text should be read (silently or aloud) and, consequently, understood. The oldest known examples of punctuation marks were found in the Mesha Stele from the 9th century BC, c ...
such as curved
quotation mark Quotation marks are punctuation marks used in pairs in various writing systems to identify direct speech, a quotation, or a phrase. The pair consists of an opening quotation mark and a closing quotation mark, which may or may not be the sam ...
s (also known as smart quotes, such as in
Microsoft Word Microsoft Word is a word processor program, word processing program developed by Microsoft. It was first released on October 25, 1983, under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platf ...
settings and similar programs). It is common that web page tools for
Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
use Windows-1252 but label the
web page A web page (or webpage) is a World Wide Web, Web document that is accessed in a web browser. A website typically consists of many web pages hyperlink, linked together under a common domain name. The term "web page" is therefore a metaphor of pap ...
as using ISO-8859-1, this has been addressed in
HTML5 HTML5 (Hypertext Markup Language 5) is a markup language used for structuring and presenting hypertext documents on the World Wide Web. It was the fifth and final major HTML version that is now a retired World Wide Web Consortium (W3C) recommend ...
, which mandates that pages labeled as ISO-8859-1 must be interpreted as Windows-1252. * IBM CP437, being intended for English only, has very little in the way of accented letters (particularly
uppercase Letter case is the distinction between the letters that are in larger uppercase or capitals (more formally ''#Majuscule, majuscule'') and smaller lowercase (more formally ''#Minuscule, minuscule'') in the written representation of certain langua ...
) but has far more graphics characters than the other IBM
code page In computing, a code page is a character encoding and as such it is a specific association of a set of printable character (computing), characters and control characters with unique numbers. Typically each number represents the binary value in a s ...
s listed here and also some
mathematical Mathematics is a field of study that discovers and organizes methods, Mathematical theory, theories and theorems that are developed and Mathematical proof, proved for the needs of empirical sciences and mathematics itself. There are many ar ...
and Greek characters that are useful as technical
symbol A symbol is a mark, Sign (semiotics), sign, or word that indicates, signifies, or is understood as representing an idea, physical object, object, or wikt:relationship, relationship. Symbols allow people to go beyond what is known or seen by cr ...
s. * IBM CP850 has all the printable characters that
ISO-8859-1 ISO/IEC 8859-1:1998, ''Information technology—8-bit computing, 8-bit single-byte coded graphic character (computing), character sets—Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character enc ...
has (albeit arranged differently) and still manages to have enough graphics characters to build a usable text-mode
user interface In the industrial design field of human–computer interaction, a user interface (UI) is the space where interactions between humans and machines occur. The goal of this interaction is to allow effective operation and control of the machine fro ...
. * IBM CP858 differs from CP850 only by one character — a ''dotless i'' ( ı), rarely used outside Turkey and with no
uppercase Letter case is the distinction between the letters that are in larger uppercase or capitals (more formally ''#Majuscule, majuscule'') and smaller lowercase (more formally ''#Minuscule, minuscule'') in the written representation of certain langua ...
equivalent provided, was replaced by ''euro currency sign'' ( ). * IBM CP859 contains all the printable characters that ISO/IEC 8859-15 has, so unlike CP850 it supports the
euro sign The euro sign () is the currency sign used for the euro, the official currency of the eurozone. The design was presented to the public by the European Commission on 12 December 1996. It consists of a stylized letter E (or epsilon), crossed by ...
, Estonian, Finnish and French. * IBM code pages 037, 500, and 1047 are
EBCDIC Extended Binary Coded Decimal Interchange Code (EBCDIC; ) is an eight- bit character encoding used mainly on IBM mainframe and IBM midrange computer operating systems. It descended from the code used with punched cards and the corresponding si ...
encodings that include all of the
ISO-8859-1 ISO/IEC 8859-1:1998, ''Information technology—8-bit computing, 8-bit single-byte coded graphic character (computing), character sets—Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character enc ...
characters. * The Mac OS Roman character set (often referred to as MacRoman and known by the IANA as simply MACINTOSH) has most, but not all, of the same characters as ISO/IEC 8859-1 but in a very different arrangement; and it also adds many technical and mathematical characters (though it lacks the important
multiplication sign The multiplication sign (), also known as the times sign or the dimension sign, is a mathematical symbol used to denote the operation of multiplication, which results in a product. The symbol is also used in botany, in botanical hybrid nam ...
, ) and more
diacritic A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...
s. Older
Macintosh Mac is a brand of personal computers designed and marketed by Apple Inc., Apple since 1984. The name is short for Macintosh (its official name until 1999), a reference to the McIntosh (apple), McIntosh apple. The current product lineup inclu ...
web browser A web browser, often shortened to browser, is an application for accessing websites. When a user requests a web page from a particular website, the browser retrieves its files from a web server and then displays the page on the user's scr ...
s were known to munge the few characters that were in ISO/IEC 8859-1 but not their native
Macintosh Mac is a brand of personal computers designed and marketed by Apple Inc., Apple since 1984. The name is short for Macintosh (its official name until 1999), a reference to the McIntosh (apple), McIntosh apple. The current product lineup inclu ...
character set Character encoding is the process of assigning numbers to graphical characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical values that make up a c ...
when editing text from Web sites. Conversely, in Web material prepared on an older Macintosh, many characters were displayed incorrectly when read by other
operating system An operating system (OS) is system software that manages computer hardware and software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ...
s. The Macintosh Latin encoding, a modification of Mac OS Roman to support ISO/IEC 8859-1, was created by the creators of Kermit (protocol) to solve this problem.


History

The earlier seven- bit U.S. American Standard Code for Information Interchange ('ASCII') encoding has characters sufficient to properly represent only a few languages such as English, Latin, Malay and Swahili. It is missing some letters and letter-diacritic combinations used in other Latin-alphabet languages. However, since there was no other choice on most US-supplied computer platforms, use of ASCII was unavoidable except where there was a strong national computing industry. There was the
ISO 646 ISO/IEC 646 ''Information technology — ISO 7-bit coded character set for information interchange'', is an International Organization for Standardization, ISO/International Electrotechnical Commission, IEC standard in the ...
group of encodings which replaced some of the symbols in ASCII with local characters, but space was very limited, and some of the symbols replaced were quite common in things like programming languages. Most computers internally used eight-bit bytes but communication (seen as inherently unreliable) used seven data bits plus one parity bit. In time, it became common to use all eight bits for data, creating space for another 128 characters. In the early days most of these were system specific, but gradually the
ISO/IEC 8859 ISO/IEC 8859 is a joint International Organization for Standardization, ISO and International Electrotechnical Commission, IEC series of standards for 8-bit character encodings. The series of standards consists of numbered parts, such as ISO/IEC ...
standards emerged to provide some cross-platform similarity to enable information interchange. Towards the end of the 20th century, as storage and memory costs fell, the issues associated with multiple meanings of a given eight-bit code (there are seven ISO-Latin code sets alone) have ceased to be justified. All major operating systems have moved to
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
as their main internal representation. However, as Windows did not support the
UTF-8 UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,0 ...
method of encoding Unicode (preferring UTF-16), many applications continued to be restricted to these legacy character sets.


The euro sign

The introduction of the euro and its associated
euro sign The euro sign () is the currency sign used for the euro, the official currency of the eurozone. The design was presented to the public by the European Commission on 12 December 1996. It consists of a stylized letter E (or epsilon), crossed by ...
() introduced significant pressure on computer systems developers to support this new symbol, and most 8-bit character sets had to be adapted in some way. * Apple with MacRoman and
Sun Microsystems Sun Microsystems, Inc., often known as Sun for short, was an American technology company that existed from 1982 to 2010 which developed and sold computers, computer components, software, and information technology services. Sun contributed sig ...
with Solaris OS simply replaced the generic currency sign (). This caused difficulty in some places because organisations had found other uses for its
code point A code point, codepoint or code position is a particular position in a Table (database), table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dime ...
, such as the company logo. * ISO introduced a further variant of ISO 8859, ISO 8859-15, which replaced the generic currency sign with the euro sign as well as making some other replacements of symbols with letters with diacritics. ISO 8859-15 never received widespread adoption. * With Windows-1252, Microsoft placed the euro sign in a gap (position 80hex) in the existing C1 control codes, a decision that other vendors considered counter-architectural. Whilst these decisions had limited effect for documents that were only used within a single computer (or at least within a single vendor's " digital ecosystem"), it meant that documents containing a euro sign would fail to render as expected when interchanged between ecosystems. All of these issues have been resolved as operating systems have been upgraded to support
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
as standard, which encodes the euro sign at U+20AC (decimal 8364).


Comparison table

Code points to U+007F are not shown in this table currently, as they are directly mapped in all character sets listed here. The
ASCII ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
coding standard defines the original specification for the mapping of the first 0-127 characters. The table is arranged by
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
code point. Character sets are referred to here by their IANA names in
upper case Letter case is the distinction between the letters that are in larger uppercase or capitals (more formally ''#Majuscule, majuscule'') and smaller lowercase (more formally ''#Minuscule, minuscule'') in the written representation of certain langua ...
. * The mappings for the IBM code pages are from the
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
site supplied by
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
. The Unicode Consortium's document has links to sources giving the differences between IBM's and Microsoft's mappings for these code pages. * IBM437 and IBM850 defined printable characters for the control code ranges. While these could not be used when printing text through DOS, as they would be trapped before reaching the screen, they could be used by applications that used screen memory directly. * Macintosh has an Apple logo at 0xF0, and translates it to U+F8FF in the Private Use Area for Unicode.


Notes


References

{{DEFAULTSORT:Western Latin Character Sets (Computing) Character sets Articles with unsupported Private Use Area characters History of computing