HOME

TheInfoList



OR:

Windows-1252 or CP-1252 (
Windows code page Windows code pages are sets of characters or code pages (known as character encodings in other operating systems) used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Win ...
1252) is a
legacy Legacy or Legacies may refer to: Arts and entertainment Comics * " Batman: Legacy", a 1996 Batman storyline * '' DC Universe: Legacies'', a comic book series from DC Comics * ''Legacy'', a 1999 quarterly series from Antarctic Press * ''Legacy ...
single-byte
character encoding Character encoding is the process of assigning numbers to graphical character (computing), characters, especially the written characters of human language, allowing them to be stored, transmitted, and transformed using computers. The numerical v ...
that is used by default (as the "ANSI code page") in
Microsoft Windows Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
throughout the
Americas The Americas, sometimes collectively called America, are a landmass comprising the totality of North America and South America.''Webster's New World College Dictionary'', 2010 by Wiley Publishing, Inc., Cleveland, Ohio. When viewed as a sing ...
,
Western Europe Western Europe is the western region of Europe. The region's extent varies depending on context. The concept of "the West" appeared in Europe in juxtaposition to "the East" and originally applied to the Western half of the ancient Mediterranean ...
,
Oceania Oceania ( , ) is a region, geographical region including Australasia, Melanesia, Micronesia, and Polynesia. Outside of the English-speaking world, Oceania is generally considered a continent, while Mainland Australia is regarded as its co ...
, and much of
Africa Africa is the world's second-largest and second-most populous continent after Asia. At about 30.3 million km2 (11.7 million square miles) including adjacent islands, it covers 20% of Earth's land area and 6% of its total surfac ...
. Initially the same as ISO 8859-1, it began to diverge starting in Windows 2.0 by adding additional characters in the 0x80 to 0x9F ( hex) range (the ISO standards reserve this range for C1 control codes). Notable additional characters include curly quotation marks and all printable characters from ISO 8859-15. It is the most-used single-byte character encoding in the world. Although almost all websites now use the multi-byte character encoding
UTF-8 UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,0 ...
, , 1.1% of websites declared ISO 8859-1 which is treated as Windows-1252 by all modern browsers (as required by the
HTML5 HTML5 (Hypertext Markup Language 5) is a markup language used for structuring and presenting hypertext documents on the World Wide Web. It was the fifth and final major HTML version that is now a retired World Wide Web Consortium (W3C) recommend ...
standard), plus 0.3% declared Windows-1252 directly, for a total of 1.4%. Some countries or languages show a higher usage than the global average, in 2025 Brazil according to website use, use is at 2.9%, and in Germany at 2.4% (these are the sums of ISO-8859-1 and CP-1252 declarations).


Name

It is known to Windows by the
code page In computing, a code page is a character encoding and as such it is a specific association of a set of printable character (computing), characters and control characters with unique numbers. Typically each number represents the binary value in a s ...
number 1252, and by the
IANA The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Internet P ...
-approved name "windows-1252". Historically, the phrase "ANSI Code Page" was used in Windows to refer to non-DOS encodings; the intention was that most of these would be
ANSI The American National Standards Institute (ANSI ) is a private nonprofit organization that oversees the development of voluntary consensus standards for products, services, processes, systems, and personnel in the United States. The organiz ...
standards such as
ISO-8859-1 ISO/IEC 8859-1:1998, ''Information technology—8-bit computing, 8-bit single-byte coded graphic character (computing), character sets—Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character enc ...
. Even though Windows-1252 was the first and by far most popular code page named so in Microsoft Windows parlance, the code page has never been an ANSI standard. Microsoft explains, "The term ANSI as used to signify Windows code pages is a historical reference, but is nowadays a misnomer that continues to persist in the Windows community."
LaTeX Latex is an emulsion (stable dispersion) of polymer microparticles in water. Latices are found in nature, but synthetic latices are common as well. In nature, latex is found as a wikt:milky, milky fluid, which is present in 10% of all floweri ...
can input Windows-1252 by using ''inputenc.sty'' with parameter ''ansinew'' (and more recently ''cp1252'').
IBM International Business Machines Corporation (using the trademark IBM), nicknamed Big Blue, is an American Multinational corporation, multinational technology company headquartered in Armonk, New York, and present in over 175 countries. It is ...
uses code page 1252 (
CCSID A CCSID (coded character set identifier) is a 16-bit number that represents a particular encoding of a specific code page. For example, Unicode is a code page that has several character encoding schemes (referred to as "transformation formats")—i ...
1252 and
euro sign The euro sign () is the currency sign used for the euro, the official currency of the eurozone. The design was presented to the public by the European Commission on 12 December 1996. It consists of a stylized letter E (or epsilon), crossed by ...
extended CCSID 5348) for Windows-1252. It is called "WE8MSWIN1252" by
Oracle Database Oracle Database (commonly referred to as Oracle DBMS, Oracle Autonomous Database, or simply as Oracle) is a proprietary multi-model database management system produced and marketed by Oracle Corporation. It is a database commonly used for ru ...
.


History

* The first version of the codepage was used in Microsoft Windows 1.0. It matched the ISO-8859-1 standard (including leaving code points 0xD7 and 0xF7 undefined, as they were not in the standard at that time). * The second version of the codepage was introduced in Microsoft Windows 2.0. In this version, code points 0xD7, 0xF7, 0x91, and 0x92 are defined. * The third version of the codepage was introduced in Microsoft Windows 3.1. It defined all code points used in the final version except the
euro sign The euro sign () is the currency sign used for the euro, the official currency of the eurozone. The design was presented to the public by the European Commission on 12 December 1996. It consists of a stylized letter E (or epsilon), crossed by ...
and the Z with caron character pair. * The final version (shown below) was introduced in Microsoft
Windows 98 Windows 98 is a consumer-oriented operating system developed by Microsoft as part of its Windows 9x family of Microsoft Windows operating systems. It was the second operating system in the 9x line, as the successor to Windows 95. It was Software ...
. Starting in the 1990s, many
Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...
products that could produce HTML included Windows-1252-exclusive characters, but marked the
encoding In communications and Data processing, information processing, code is a system of rules to convert information—such as a letter (alphabet), letter, word, sound, image, or gesture—into another form, sometimes data compression, shortened or ...
as ISO-8859-1, ASCII, or undeclared. Characters exclusive to Windows-1252 would render incorrectly on non-Windows operating systems (often as question marks). In particular, typographers' quotes—curly variants of the standard straight
apostrophe The apostrophe (, ) is a punctuation mark, and sometimes a diacritical mark, in languages that use the Latin alphabet and some other alphabets. In English, the apostrophe is used for two basic purposes: * The marking of the omission of one o ...
s and
quotation mark Quotation marks are punctuation marks used in pairs in various writing systems to identify direct speech, a quotation, or a phrase. The pair consists of an opening quotation mark and a closing quotation mark, which may or may not be the sam ...
s in US-ASCII—were commonly used in files produced in Windows applications such as
Microsoft Word Microsoft Word is a word processor program, word processing program developed by Microsoft. It was first released on October 25, 1983, under the name Multi-Tool Word for Xenix systems. Subsequent versions were later written for several other platf ...
due to the smart quotes feature, which can automatically convert straight apostrophes and quotation marks to the curly variants. To fix this, by 2000 most web browsers and e-mail clients treated the charsets ISO-8859-1 and US-ASCII as Windows-1252—this behavior is now required by the HTML5 specification. Undeclared charsets in HTML are also assumed to be Windows-1252. Although
Windows NT Windows NT is a Proprietary software, proprietary Graphical user interface, graphical operating system produced by Microsoft as part of its Windows product line, the first version of which, Windows NT 3.1, was released on July 27, 1993. Original ...
supported
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
and attempted to encourage programs to use it, it only provided the 16-bit code units of
UCS-2 UTF-16 (16-bit Unicode Transformation Format) is a character encoding that supports all 1,112,064 valid code points of Unicode. The encoding is variable-length as code points are encoded with one or two ''code units''. UTF-16 arose from an earli ...
/
UTF-16 UTF-16 (16-bit Unicode Transformation Format) is a character encoding that supports all 1,112,064 valid code points of Unicode. The encoding is variable-length as code points are encoded with one or two ''code units''. UTF-16 arose from an earli ...
, despite the existing support for other multibyte character encodings such as
Shift-JIS Shift JIS (also SJIS, MIME name Shift_JIS, known as PCK in Solaris contexts) is a character encoding for the Japanese language, originally developed by the Japanese company ASCII Corporation in conjunction with Microsoft and standardized as JIS ...
. As many applications preferred to use 8-bit strings, Windows-1252 remained the most popular encoding on Windows. UTF-8 has been supported since
Windows 10 Windows 10 is a major release of Microsoft's Windows NT operating system. The successor to Windows 8.1, it was Software release cycle#Release to manufacturing (RTM), released to manufacturing on July 15, 2015, and later to retail on July 2 ...
so this is gradually changing.


Codepage layout

The following table shows Windows-1252. Differences from
ISO-8859-1 ISO/IEC 8859-1:1998, ''Information technology—8-bit computing, 8-bit single-byte coded graphic character (computing), character sets—Part 1: Latin alphabet No. 1'', is part of the ISO/IEC 8859 series of ASCII-based standard character enc ...
have the
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
code point A code point, codepoint or code position is a particular position in a Table (database), table, where the position has been assigned a meaning. The table may be one dimensional (a column), two dimensional (like cells in a spreadsheet), three dime ...
number below the character, based on the Unicode.org mapping of Windows-1252 with "best fit". A tooltip, generally available only when one points to the immediate right of the character, shows the Unicode code point name and the decimal
Alt code On personal computers with numeric keypads that use Microsoft operating systems, such as Windows, many characters that do not have a dedicated key combination on the keyboard may nevertheless be entered using the Alt code (the Alt numpad input ...
. According to the information on Microsoft's and the Unicode Consortium's websites, positions 81, 8D, 8F, 90, and 9D are unused; however, the Windows API MultiByteToWideChar
/code> maps these to the corresponding C1 control codes. The "best fit" mapping documents this behavior, too.


Related encodings


OS/2 extensions

The
OS/2 OS/2 is a Proprietary software, proprietary computer operating system for x86 and PowerPC based personal computers. It was created and initially developed jointly by IBM and Microsoft, under the leadership of IBM software designer Ed Iacobucci, ...
operating system supports an encoding by the name of Code page 1004 (
CCSID A CCSID (coded character set identifier) is a 16-bit number that represents a particular encoding of a specific code page. For example, Unicode is a code page that has several character encoding schemes (referred to as "transformation formats")—i ...
1004) or "Windows Extended". This mostly matches code page 1252, with the exception of certain C0 control characters being replaced by
diacritic A diacritic (also diacritical mark, diacritical point, diacritical sign, or accent) is a glyph added to a letter or to a basic glyph. The term derives from the Ancient Greek (, "distinguishing"), from (, "to distinguish"). The word ''diacrit ...
characters.


MS-DOS extensions (rare)

There is a rarely used, but useful, graphics extended code page 1252 where codes 0x00 to 0x1f allow for box drawing as used in applications such as MSDOS Edit and Codeview. One of the applications to use this code page was an Intel Corporation Install/Recovery disk image utility from mid/late 1995. These programs were written for its P6 User Test Program machines (US example). It was used exclusively in its then EMEA region (Europe, Middle East & Africa). In time the programs were changed to use code page 850.


See also

* Latin script in Unicode *
Unicode Unicode or ''The Unicode Standard'' or TUS is a character encoding standard maintained by the Unicode Consortium designed to support the use of text in all of the world's writing systems that can be digitized. Version 16.0 defines 154,998 Char ...
*
Universal Coded Character Set The Universal Coded Character Set (UCS, Unicode) is a standard set of character (computing), characters defined by the international standard International Organization for Standardization, ISO/International Electrotechnical Commission, IEC  ...
** European Unicode subset (DIN 91379) *
UTF-8 UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8. UTF-8 supports all 1,112,0 ...
*
Western Latin character sets (computing) Several 8-bit character encoding, character sets (encodings) were designed for binary representation of common Western European languages (Italian language, Italian, Spanish language, Spanish, Portuguese language, Portuguese, French language, Fren ...
*
Windows-1250 Windows-1250 is a code page used under Microsoft Windows to represent texts in Central European and Eastern European languages that use the Latin script. It is primarily used by Czech. It is also used for Polish (as can Windows-1257), Slovak, H ...
*
Windows code page Windows code pages are sets of characters or code pages (known as character encodings in other operating systems) used in Microsoft Windows from the 1980s and 1990s. Windows code pages were gradually superseded when Unicode was implemented in Win ...
s *
ISO/IEC JTC 1/SC 2 ISO/IEC JTC 1/SC 2 Coded character sets is a standardization subcommittee of the Joint Technical Committee ISO/IEC JTC 1 of the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), that devel ...
*
Extended ASCII Extended ASCII is a repertoire of character encodings that include (most of) the original 96 ASCII character set, plus up to 128 additional characters. There is no formal definition of "extended ASCII", and even use of the term is sometimes critic ...


Notes


References


External links


Microsoft's
code charts for Windows-1252 ("Code Page 1252 Windows Latin 1 (ANSI)")
Unicode mapping table
an
code page definition with best fit mappings
for Windows-1252 {{Character encodings Windows code pages Computer-related introductions in 1985