"Bush hid the facts" is a common name for a
bug present in
Microsoft Windows
Windows is a Product lining, product line of Proprietary software, proprietary graphical user interface, graphical operating systems developed and marketed by Microsoft. It is grouped into families and subfamilies that cater to particular sec ...
which causes text encoded in
ASCII
ASCII ( ), an acronym for American Standard Code for Information Interchange, is a character encoding standard for representing a particular set of 95 (English language focused) printable character, printable and 33 control character, control c ...
to be interpreted as if it were
UTF-16LE, resulting in
garbled text. When the string "Bush hid the facts", without quotes, was put in a
Notepad document and saved, closed, and reopened, the nonsensical sequence of the
Chinese characters
Chinese characters are logographs used Written Chinese, to write the Chinese languages and others from regions historically influenced by Chinese culture. Of the four independently invented writing systems accepted by scholars, they represe ...
"" would appear instead.
While "
Bush hid the facts" is the sentence most commonly presented to induce the error, the bug can also be triggered by other strings such as , , and even or .
Cause
When a text file is opened in Notepad, Windows checks if the text is encoded in UTF-16 using the Win32
charset detection function . guesses it is Unicode if the total changes to the "low byte" (the even indexes starting at 0) is three times greater than the total changes to the "high byte" (the odd indexes).
If so, it
returns , causing the application to incorrectly interpret the text as UTF-16LE. As a result, Notepad renders the text as Chinese characters. It is commonly believed that spaces at even indexes trigger the bug, this is due to space (32) being farther away from the lower-case letters (97...122) than letters are from each other.
The bug had existed since was introduced with in 1994, but was not discovered until early 2004. Many text editors and tools exhibit this behavior on Windows because they use to determine the encoding of text files. In
Windows Vista
Windows Vista is a major release of the Windows NT operating system developed by Microsoft. It was the direct successor to Windows XP, released five years earlier, which was then the longest time span between successive releases of Microsoft W ...
, Notepad was modified to use a different detection algorithm that does not exhibit the bug, but remains unchanged so any other tools that use it are still affected. Modern documentation states "These tests are not foolproof."
Workarounds
Several workarounds exist for this bug:
*Add a character so the string is an odd number of bytes long.
*Save the file as "
UTF-8
UTF-8 is a character encoding standard used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode Transformation Format 8-bit''. Almost every webpage is transmitted as UTF-8.
UTF-8 supports all 1,112,0 ...
" (before 2018) or "UTF-8 with BOM" (after 2018) rather than "ANSI". This prepends a UTF-8
byte order mark which avoids the bug. UTF-8 ''without'' the byte order mark would still trigger the bug, as it is identical to the "ANSI" file.
*Saving as "Unicode", which in Microsoft Windows means UTF-16LE. When loading this text should (and does) return and the text is correct.
*To retrieve the original text using Notepad, bring up the "Open a file" dialog box, select the file, select "ANSI" or "UTF-8" in the "Encoding" list box, and click Open. Under Windows 2000, Notepad lacks the "Encoding" list box.
WordPad appears to load the text correctly without choosing the encoding, since it uses its own encoding detection.
References
{{Reflist
External links
The Notepad file encoding problem, redux–
Raymond Chen
IsTextUnicode–
Microsoft Docs
Microsoft Docs was a library of technical documentation for end users, developers, and IT professionals who work with Microsoft products. The Microsoft Docs website provided technical specifications, conceptual articles, tutorials, guides, API ...
Censor oracle– A tool to identify strings that might trigger the bug
source code on GitHub
Character encoding
Software bugs
Microsoft Windows