HOME

TheInfoList



OR:

In
computer programming Computer programming is the process of performing a particular computation (or more generally, accomplishing a specific computing result), usually by designing and building an executable computer program. Programming involves tasks such as anal ...
, a magic number is any of the following: * A unique value with unexplained meaning or multiple occurrences which could (preferably) be replaced with a named constant * A constant numerical or text value used to identify a
file format A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file format ...
or protocol; for files, see
List of file signatures This is a list of file signatures, data used to identify or verify the content of a file. Such signatures are also known as magic numbers or Magic Bytes. Many file formats are not intended to be read as text. If such a file is accidentally view ...
* A distinctive unique value that is unlikely to be mistaken for other meanings (e.g.,
Globally Unique Identifier A universally unique identifier (UUID) is a 128-bit label used for information in computer systems. The term globally unique identifier (GUID) is also used. When generated according to the standard methods, UUIDs are, for practical purposes, u ...
s)


Unnamed numerical constants

The term ''magic number'' or ''magic constant'' refers to the anti-pattern of using numbers directly in source code. This has been referred to as breaking one of the oldest rules of programming, dating back to the
COBOL COBOL (; an acronym for "common business-oriented language") is a compiled English-like computer programming language designed for business use. It is an imperative, procedural and, since 2002, object-oriented language. COBOL is primarily u ...
, FORTRAN and PL/1 manuals of the 1960s. The use of unnamed magic numbers in code obscures the developers' intent in choosing that number, increases opportunities for subtle errors (e.g. is every digit correct in 3.14159265358979323846 and is this equal to 3.14159?) and makes it more difficult for the program to be adapted and extended in the future. Replacing all significant magic numbers with named constants (also called explanatory variables) makes programs easier to read, understand and maintain. Names chosen to be meaningful in the context of the program can result in code that is more easily understood by a maintainer who is not the original author (or even by the original author after a period of time). An example of an uninformatively-named constant is int SIXTEEN = 16, while int NUMBER_OF_BITS = 16 is more descriptive. The problems associated with magic 'numbers' described above are not limited to numerical types and the term is also applied to other data types where declaring a named constant would be more flexible and communicative. Thus, declaring const string testUserName = "John" is better than several occurrences of the 'magic value' "John" in a test suite. For example, if it is required to randomly shuffle the values in an array representing a standard pack of playing cards, this
pseudocode In computer science, pseudocode is a plain language description of the steps in an algorithm or another system. Pseudocode often uses structural conventions of a normal programming language, but is intended for human reading rather than machine re ...
does the job using the Fisher–Yates shuffle algorithm: for i from 1 to 52 j := i + randomInt(53 - i) - 1 a.swapEntries(i, j) where a is an array object, the function randomInt(x) chooses a random integer between 1 and ''x'', inclusive, and swapEntries(i, j) swaps the ''i''th and ''j''th entries in the array. In the preceding example, 52 is a magic number. It is considered better programming style to write the following: constant ''int'' deckSize := 52 for i from 1 to deckSize j := i + randomInt(deckSize + 1 - i) - 1 a.swapEntries(i, j) This is preferable for several reasons: * It is easier to read and understand. A programmer reading the first example might wonder, ''What does the number 52 mean here? Why 52?'' The programmer might infer the meaning after reading the code carefully, but it is not obvious. Magic numbers become particularly confusing when the same number is used for different purposes in one section of code. * It is easier to alter the value of the number, as it is not duplicated. Changing the value of a magic number is error-prone, because the same value is often used several times in different places within a program. Also, when two semantically distinct variables or numbers have the same value they may be accidentally both edited together. To modify the first example to shuffle a
Tarot The tarot (, first known as '' trionfi'' and later as ''tarocchi'' or ''tarocks'') is a pack of playing cards, used from at least the mid-15th century in various parts of Europe to play card games such as Tarocchini. From their Italian roots ...
deck, which has 78 cards, a programmer might naively replace every instance of 52 in the program with 78. This would cause two problems. First, it would miss the value 53 on the second line of the example, which would cause the algorithm to fail in a subtle way. Second, it would likely replace the characters "52" everywhere, regardless of whether they refer to the deck size or to something else entirely, such as the number of weeks in a Gregorian calendar year, or more insidiously, are part of a number like "1523", all of which would introduce bugs. By contrast, changing the value of the deckSize variable in the second example would be a simple, one-line change. * It encourages and facilitates documentation. The single place where the named variable is declared makes a good place to document what the value means and why it has the value it does. Having the same value in a plethora of places either leads to duplicate comments (and attendant problems when updating some but missing some) or leaves no ''one'' place where it's both natural for the author to explain the value and likely the reader shall look for an explanation. * The declarations of "magic number" variables are placed together, usually at the top of a function or file, facilitating their review and change. * It helps detect typos. Using a variable (instead of a literal) takes advantage of a compiler's checking. Accidentally typing "62" instead of "52" would go undetected, whereas typing "dekSize" instead of "deckSize" would result in the compiler's warning that dekSize is undeclared. * It can reduce typing in some IDEs. If an IDE supports
code completion Autocomplete, or word completion, is a feature in which an application predicts the rest of a word a user is typing. In Android and iOS smartphones, this is called predictive text. In graphical user interfaces, users can typically press the t ...
, it will fill in most of the variable's name from the first few letters. * It facilitates parameterization. For example, to generalize the above example into a procedure that shuffles a deck of any number of cards, it would be sufficient to turn deckSize into a parameter of that procedure, whereas the first example would require several changes. function shuffle (int deckSize) for i from 1 to deckSize j := i + randomInt(deckSize + 1 - i) - 1 a.swapEntries(i, j) Disadvantages are: * When the named constant is not defined near its use, it hurts the locality, and thus comprehensibility, of the code. Putting the 52 in a possibly distant place means that, to understand the workings of the “for” loop completely (for example to estimate the run-time of the loop), one must track down the definition and verify that it is the expected number. This is easy to avoid (by relocating the declaration) when the constant is only used in one portion of the code. When the named constant is used in disparate portions, on the other hand, the remote location is a clue to the reader that the same value appears in other places in the code, which may also be worth looking into. * It may make the code more verbose. The declaration of the constant adds a line. When the constant's name is longer than the value's, particularly if several such constants appear in one line, it may make it necessary to split one logical statement of the code across several lines. An increase in verbosity may be justified when there is some likelihood of confusion about the constant, or when there is a likelihood the constant may need to be changed, such as
reuse Reuse is the action or practice of using an item, whether for its original purpose (conventional reuse) or to fulfill a different function ( creative reuse or repurposing). It should be distinguished from recycling, which is the breaking down of u ...
of a shuffling routine for other card games. It may equally be justified as an increase in expressiveness. * It may be slower to process the expression deckSize + 1 at run-time than the value "53", although most modern compilers and interpreters will notice that deckSize has been declared as a constant and pre-calculate the value 53 in the compiled code. Even when that's not an option, loop optimization will move the addition so that it is performed before the loop. There is therefore usually no (or negligible) speed penalty compared to using magic numbers in code. Especially the cost of debugging and the time needed trying to understand non-explanatory code must be held against the tiny calculation cost.


Accepted uses

In some contexts, the use of unnamed numerical constants is generally accepted (and arguably "not magic"). While such acceptance is subjective, and often depends on individual coding habits, the following are common examples: * the use of 0 and 1 as initial or incremental values in a for loop, such as * the use of 2 to check whether a number is even or odd, as in isEven = (x % 2

0)
, where % is the
modulo In computing, the modulo operation returns the remainder or signed remainder of a division, after one number is divided by another (called the '' modulus'' of the operation). Given two positive numbers and , modulo (often abbreviated as ) is ...
operator * the use of simple arithmetic constants, e.g., in expressions such as circumference = 2 * Math.PI * radius, or for calculating the
discriminant In mathematics, the discriminant of a polynomial is a quantity that depends on the coefficients and allows deducing some properties of the roots without computing them. More precisely, it is a polynomial function of the coefficients of the orig ...
of a
quadratic equation In algebra, a quadratic equation () is any equation that can be rearranged in standard form as ax^2 + bx + c = 0\,, where represents an unknown value, and , , and represent known numbers, where . (If and then the equation is linear, not qu ...
as d = b^2 − 4*a*c * the use of powers of 10 to convert metric values (e.g. between grams and kilograms) or to calculate percentage and
per mille Per mille (from Latin , "in each thousand") is an expression that means parts per thousand. Other recognised spellings include per mil, per mill, permil, permill, or permille. The associated sign is written , which looks like a percent ...
values * exponents in expressions such as (f(x) ** 2 + f(y) ** 2) ** 0.5 for \sqrt The constants 1 and 0 are sometimes used to represent the boolean values True and False in programming languages without a boolean type, such as older versions of C. Most modern programming languages provide a boolean or bool primitive type and so the use of 0 and 1 is ill-advised. This can be more confusing since 0 sometimes means programmatic success (when -1 means failure) and failure in other cases (when 1 means success). In C and C++, 0 represents the
null pointer In computing, a null pointer or null reference is a value saved for indicating that the pointer or reference does not refer to a valid object. Programs routinely use null pointers to represent conditions such as the end of a list of unknown leng ...
. As with boolean values, the C standard library includes a macro definition NULL whose use is encouraged. Other languages provide a specific null or nil value and when this is the case no alternative should be used. The typed pointer constant nullptr has been introduced with C++11.


Format indicators


Origin

Format indicators were first used in early Version 7 Unix source code.
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, ...
was ported to one of the first DEC
PDP-11 The PDP-11 is a series of 16-bit minicomputers sold by Digital Equipment Corporation (DEC) from 1970 into the 1990s, one of a set of products in the Programmed Data Processor (PDP) series. In total, around 600,000 PDP-11s of all models were sol ...
/20s, which did not have memory protection. So early versions of Unix used the relocatable memory reference model. Pre- Sixth Edition Unix versions read an executable file into
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remember ...
and jumped to the first low memory address of the program, relative address zero. With the development of paged versions of Unix, a header was created to describe the executable image components. Also, a
branch instruction A branch is an instruction in a computer program that can cause a computer to begin executing a different instruction sequence and thus deviate from its default behavior of executing instructions in order. ''Branch'' (or ''branching'', ''branc ...
was inserted as the first word of the header to skip the header and start the program. In this way a program could be run in the older relocatable memory reference (regular) mode or in paged mode. As more executable formats were developed, new constants were added by incrementing the branch
offset Offset or Off-Set may refer to: Arts, entertainment, and media * "Off-Set", a song by T.I. and Young Thug from the '' Furious 7: Original Motion Picture Soundtrack'' * ''Offset'' (EP), a 2018 EP by singer Kim Chung-ha * ''Offset'' (film), a 200 ...
. In the Sixth Edition
source code In computing, source code, or simply code, is any collection of code, with or without comments, written using a human-readable programming language, usually as plain text. The source code of a program is specially designed to facilitate the ...
of the Unix program loader, the exec() function read the executable ( binary) image from the file system. The first 8
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
s of the file was a header containing the sizes of the program (text) and initialized (global) data areas. Also, the first 16-bit word of the header was compared to two constants to determine if the executable image contained relocatable memory references (normal), the newly implemented paged read-only executable image, or the separated instruction and data paged image. There was no mention of the dual role of the header constant, but the high order byte of the constant was, in fact, the operation code for the PDP-11 branch instruction (
octal The octal numeral system, or oct for short, is the radix, base-8 number system, and uses the Numerical digit, digits 0 to 7. This is to say that 10octal represents eight and 100octal represents sixty-four. However, English, like most languages, ...
000407 or hex 0107). Adding seven to the program counter showed that if this constant was executed, it would branch the Unix exec() service over the executable image eight byte header and start the program. Since the Sixth and Seventh Editions of Unix employed paging code, the dual role of the header constant was hidden. That is, the exec() service read the executable file header (
meta Meta (from the Greek μετά, '' meta'', meaning "after" or "beyond") is a prefix meaning "more comprehensive" or "transcending". In modern nomenclature, ''meta''- can also serve as a prefix meaning self-referential, as a field of study or end ...
) data into a
kernel space A modern computer operating system usually segregates virtual memory into user space and kernel space. Primarily, this separation serves to provide memory protection and hardware protection from malicious or errant software behaviour. Kernel ...
buffer, but read the executable image into user space, thereby not using the constant's branching feature. Magic number creation was implemented in the Unix linker and
loader Loader can refer to: * Loader (equipment) * Loader (computing) ** LOADER.EXE, an auto-start program loader optionally used in the startup process of Microsoft Windows ME * Loader (surname) * Fast loader * Speedloader * Boot loader ** LOADER.COM ...
and magic number branching was probably still used in the suite of stand-alone
diagnostic program A diagnostic program (also known as a Test Mode) is an automatic computer program sequence that determines the operational status within the software, hardware, or any combination thereof in a component, a system, or a network of systems. Diagno ...
s that came with the Sixth and Seventh Editions. Thus, the header constant did provide an illusion and met the criteria for magic. In Version Seven Unix, the header constant was not tested directly, but assigned to a variable labeled ux_mag and subsequently referred to as the magic number. Probably because of its uniqueness, the term magic number came to mean executable format type, then expanded to mean file system type, and expanded again to mean any type of file.


In files

Magic numbers are common in programs across many operating systems. Magic numbers implement
strongly typed In computer programming, one of the many ways that programming languages are colloquially classified is whether the language's type system makes it strongly typed or weakly typed (loosely typed). However, there is no precise technical definition ...
data and are a form of
in-band signaling In telecommunications, in-band signaling is the sending of control information within the same band or channel used for data such as voice or video. This is in contrast to out-of-band signaling which is sent over a different channel, or even o ...
to the controlling program that reads the data type(s) at program run-time. Many files have such constants that identify the contained data. Detecting such constants in files is a simple and effective way of distinguishing between many
file format A file format is a standard way that information is encoded for storage in a computer file. It specifies how bits are used to encode information in a digital storage medium. File formats may be either proprietary or free. Some file format ...
s and can yield further run-time
information Information is an abstract concept that refers to that which has the power to inform. At the most fundamental level information pertains to the interpretation of that which may be sensed. Any natural process that is not completely random, ...
. ;Examples *
Compiled In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
Java class file A Java class file is a file (with the filename extension) containing Java bytecode that can be executed on the Java Virtual Machine (JVM). A Java class file is usually produced by a Java compiler from Java programming language source files ...
s (
bytecode Bytecode (also called portable code or p-code) is a form of instruction set designed for efficient execution by a software interpreter. Unlike human-readable source code, bytecodes are compact numeric codes, constants, and references (norma ...
) and
Mach-O Mach-O, short for Mach object file format, is a file format for executables, object code, shared libraries, dynamically-loaded code, and core dumps. It was developed to replace the a.out format. Mach-O is used by some systems based on the Mac ...
binaries start with hex CAFEBABE. When compressed with Pack200 the bytes are changed to CAFED00D. * GIF image files have the
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
code for "GIF89a" (47 49 46 38 39 61) or "GIF87a" (47 49 46 38 37 61) *
JPEG JPEG ( ) is a commonly used method of lossy compression for digital images, particularly for those images produced by digital photography. The degree of compression can be adjusted, allowing a selectable tradeoff between storage size and imag ...
image files begin with FF D8 and end with FF D9. JPEG/ JFIF files contain the
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
code for "JFIF" (4A 46 49 46) as a null terminated string. JPEG/ Exif files contain the
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
code for "Exif" (45 78 69 66) also as a null terminated string, followed by more
metadata Metadata is "data that provides information about other data", but not the content of the data, such as the text of a message or the image itself. There are many distinct types of metadata, including: * Descriptive metadata – the descriptive ...
about the file. * PNG image files begin with an 8-
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
signature which identifies the file as a PNG file and allows detection of common file transfer problems: \211 P N G \r \n \032 \n (89 50 4E 47 0D 0A 1A 0A). That signature contains various
newline Newline (frequently called line ending, end of line (EOL), next line (NEL) or line break) is a control character or sequence of control characters in character encoding specifications such as ASCII, EBCDIC, Unicode, etc. This character, or ...
characters to permit detecting unwarranted automated newline conversions, such as transferring the file using FTP with the ''ASCII'' transfer mode instead of the ''binary'' mode. * Standard
MIDI MIDI (; Musical Instrument Digital Interface) is a technical standard that describes a communications protocol, digital interface, and electrical connectors that connect a wide variety of electronic musical instruments, computers, and ...
audio files have the
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
code for "MThd" (MIDI Track header, 4D 54 68 64) followed by more metadata. *
Unix Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, ...
or
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, whi ...
scripts may start with a "shebang" (#!, 23 21) followed by the path to an interpreter, if the interpreter is likely to be different from the one from which the script was invoked. * ELF executables start with 7F E L F *
PostScript PostScript (PS) is a page description language in the electronic publishing and desktop publishing realm. It is a dynamically typed, concatenative programming language. It was created at Adobe Systems by John Warnock, Charles Geschke, Do ...
files and programs start with "%!" (25 21). *
PDF Portable Document Format (PDF), standardized as ISO 32000, is a file format developed by Adobe in 1992 to present documents, including text formatting and images, in a manner independent of application software, hardware, and operating systems. ...
files start with "%PDF" (hex 25 50 44 46). *
DOS MZ executable The DOS MZ executable format is the executable file format used for .EXE files in DOS. The file can be identified by the ASCII string "MZ" ( hexadecimal: 4D 5A) at the beginning of the file (the " magic number"). "MZ" are the initials of Mar ...
files and the EXE stub of the
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
PE (Portable Executable) files start with the characters "MZ" (4D 5A), the initials of the designer of the file format,
Mark Zbikowski Mark "Zibo" Joseph Zbikowski (born March 21, 1956) is a former Microsoft Architect and an early computer hacker. He started working at the company only a few years after its inception, leading efforts in MS-DOS, OS/2, Cairo and Windows NT. In 200 ...
. The definition allows the uncommon "ZM" (5A 4D) as well for dosZMXP, a non-PE EXE. * The Berkeley Fast File System superblock format is identified as either 19 54 01 19 or 01 19 54 depending on version; both represent the birthday of the author,
Marshall Kirk McKusick Marshall Kirk McKusick (born January 19, 1954) is a computer scientist, known for his extensive work on BSD UNIX, from the 1980s to FreeBSD in the present day. He was president of the USENIX Association from 1990 to 1992 and again from 2002 to ...
. * The
Master Boot Record A master boot record (MBR) is a special type of boot sector at the very beginning of partitioned computer mass storage devices like fixed disks or removable drives intended for use with IBM PC-compatible systems and beyond. The concept of MB ...
of bootable storage devices on almost all
IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnatio ...
IBM PC compatible IBM PC compatible computers are similar to the original IBM PC, XT, and AT, all from computer giant IBM, that are able to use the same software and expansion cards. Such computers were referred to as PC clones, IBM clones or IBM PC clones ...
s has a code of 55 AA as its last two bytes. * Executables for the
Game Boy The is an 8-bit fourth generation handheld game console developed and manufactured by Nintendo. It was first released in Japan on April 21, 1989, in North America later the same year, and in Europe in late 1990. It was designed by the same t ...
and Game Boy Advance handheld video game systems have a 48-byte or 156-byte magic number, respectively, at a fixed spot in the header. This magic number encodes a bitmap of the
Nintendo is a Japanese multinational video game company headquartered in Kyoto, Japan. It develops video games and video game consoles. Nintendo was founded in 1889 as by craftsman Fusajiro Yamauchi and originally produced handmade playing cards ...
logo. *
Amiga Amiga is a family of personal computers introduced by Commodore International, Commodore in 1985. The original model is one of a number of mid-1980s computers with 16- or 32-bit processors, 256 KB or more of RAM, mouse-based GUIs, and sign ...
software executable Hunk files running on Amiga classic 68000 machines all started with the hexadecimal number $000003f3, nicknamed the "Magic Cookie." * In the Amiga, the only absolute address in the system is hex $0000 0004 (memory location 4), which contains the start location called SysBase, a pointer to exec.library, the so-called kernel of Amiga. * PEF files, used by the
classic Mac OS Mac OS (originally System Software; retronym: Classic Mac OS) is the series of operating systems developed for the Macintosh family of personal computers by Apple Computer from 1984 to 2001, starting with System 1 and ending with Mac OS 9. ...
and
BeOS BeOS is an operating system for personal computers first developed by Be Inc. in 1990. It was first written to run on BeBox hardware. BeOS was positioned as a multimedia platform that could be used by a substantial population of desktop users an ...
for
PowerPC PowerPC (with the backronym Performance Optimization With Enhanced RISC – Performance Computing, sometimes abbreviated as PPC) is a reduced instruction set computer (RISC) instruction set architecture (ISA) created by the 1991 Apple– IBM– ...
executables, contain the
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
code for "Joy!" (4A 6F 79 21) as a prefix. *
TIFF Tag Image File Format, abbreviated TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word process ...
files begin with either II or MM followed by 42 as a two-byte integer in little or big endian byte ordering. II is for Intel, which uses little endian byte ordering, so the magic number is 49 49 2A 00. MM is for Motorola, which uses
big endian In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most s ...
byte ordering, so the magic number is 4D 4D 00 2A. *
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, ...
text files encoded in UTF-16 often start with the
Byte Order Mark The byte order mark (BOM) is a particular usage of the special Unicode character, , whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text: * The byte order, or endianness, of t ...
to detect
endianness In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the mos ...
(FE FF for big endian and FF FE for little endian). And on
Microsoft Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for ...
,
UTF-8 UTF-8 is a variable-length character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from ''Unicode'' (or ''Universal Coded Character Set'') ''Transformation Format 8-bit''. UTF-8 is capable of e ...
text files often start with the UTF-8 encoding of the same character, EF BB BF. *
LLVM LLVM is a set of compiler and toolchain technologies that can be used to develop a front end for any programming language and a back end for any instruction set architecture. LLVM is designed around a language-independent intermediate repre ...
Bitcode files start with BC (0x42, 0x43) *
WAD Wad is an old mining term for any black manganese oxide or hydroxide mineral-rich rock in the oxidized zone of various ore deposits. Typically closely associated with various iron oxides. Specific mineral varieties include pyrolusite, lithiophorit ...
files start with IWAD or PWAD (for ''
Doom Doom is another name for damnation. Doom may also refer to: People * Doom (professional wrestling), the tag team of Ron Simmons and Butch Reed * Daniel Doom (born 1934), Belgian cyclist * Debbie Doom (born 1963), American softball pitcher * ...
''), WAD2 (for '' Quake'') and WAD3 (for ''
Half-Life Half-life (symbol ) is the time required for a quantity (of substance) to reduce to half of its initial value. The term is commonly used in nuclear physics to describe how quickly unstable atoms undergo radioactive decay or how long stable ...
''). * Microsoft
Compound File Binary Format Compound File Binary Format (CFBF), also called Compound File, Compound Document format, or Composite Document File V2 (CDF), is a compound document file format for storing numerous files and streams within a single file on a disk. CFBF is develope ...
(mostly known as one of the older formats of
Microsoft Office Microsoft Office, or simply Office, is the former name of a family of client software, server software, and services developed by Microsoft. It was first announced by Bill Gates on August 1, 1988, at COMDEX in Las Vegas. Initially a marketin ...
documents) files start with D0 CF 11 E0, which is visually suggestive of the word "DOCFILE0". * Headers in ZIP files begin with "PK" (50 4B), the initials of
Phil Katz Phillip Walter Katz (November 3, 1962 – April 14, 2000) was a computer programmer best known as the co-creator of the Zip file format for data compression, and the author of PKZIP, a program for creating zip files that ran under DOS. A ...
, author of
DOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicat ...
compression utility
PKZIP PKZIP is a file archiving computer program, notable for introducing the popular ZIP file format. PKZIP was first introduced for MS-DOS on the IBM-PC compatible platform in 1989. Since then versions have been released for a number of other ...
. * Headers in 7z files begin with "7z" (full magic number: 37 7A BC AF 27 1C). ;Detection The Unix utility program file can read and interpret magic numbers from files, and the file which is used to parse the information is called ''magic''. The Windows utility TrID has a similar purpose.


In protocols

;Examples * The
OSCAR protocol OSCAR (''Open System for CommunicAtion in Realtime'') is AOL's proprietary instant messaging and presence information protocol. It was used by AOL's AIM instant messaging system and ICQ. Despite its name, the specifications for the protocol rem ...
, used in AIM/ ICQ, prefixes requests with 2A. * In the
RFB protocol RFB ("remote framebuffer") is an open simple protocol for remote access to graphical user interfaces. Because it works at the framebuffer level it is applicable to all windowing systems and applications, including Microsoft Windows, macOS and t ...
used by VNC, a client starts its conversation with a server by sending "RFB" (52 46 42, for "Remote Frame Buffer") followed by the client's protocol version number. * In the SMB protocol used by Microsoft Windows, each SMB request or server reply begins with 'FF 53 4D 42', or "\xFFSMB" at the start of the SMB request. * In the MSRPC protocol used by Microsoft Windows, each TCP-based request begins with 05 at the start of the request (representing Microsoft DCE/RPC Version 5), followed immediately by a 00 or 01 for the minor version. In UDP-based MSRPC requests the first byte is always 04. * In
COM Com or COM may refer to: Computing * COM (hardware interface), a serial port interface on IBM PC-compatible computers * COM file, or .com file, short for "command", a file extension for an executable file in MS-DOS * .com, an Internet top-level d ...
and DCOM marshalled interfaces, called OBJREFs, always start with the byte sequence "MEOW" (4D 45 4F 57). Debugging extensions (used for DCOM channel hooking) are prefaced with the byte sequence "MARB" (4D 41 52 42). * Unencrypted
BitTorrent tracker A BitTorrent tracker is a special type of server that assists in the communication between peers using the BitTorrent protocol. In peer-to-peer file sharing, a software client on an end-user PC requests a file, and portions of the requested fi ...
requests begin with a single byte containing the value 19 representing the header length, followed immediately by the phrase "BitTorrent protocol" at byte position 1. * eDonkey2000/
eMule eMule is a free peer-to-peer file sharing application for Microsoft Windows. Started in May 2002 as an alternative to eDonkey2000, eMule now connects to both the eDonkey network and the Kad network. The distinguishing features of eMule are th ...
traffic begins with a single byte representing the client version. Currently E3 represents an eDonkey client, C5 represents eMule, and D4 represents compressed eMule. * The first 04 bytes of a block in the
Bitcoin Bitcoin (abbreviation: BTC; sign: ₿) is a decentralized digital currency that can be transferred on the peer-to-peer bitcoin network. Bitcoin transactions are verified by network nodes through cryptography and recorded in a public distr ...
Blockchain contains a magic number which serves as the network identifier. The value is a constant 0xD9B4BEF9, which indicates the main network, while the constant 0xDAB5BFFA indicates the testnet. *
SSL SSL may refer to: Entertainment * RoboCup Small Size League, robotics football competition * ''Sesame Street Live'', a touring version of the children's television show * StarCraft II StarLeague, a Korean league in the video game Natural language ...
transactions always begin with a "client hello" message. The record encapsulation scheme used to prefix all SSL packets consists of two- and three- byte header forms. Typically an SSL version 2 client hello message is prefixed with a 80 and an SSLv3 server response to a client hello begins with 16 (though this may vary). * DHCP packets use a "magic cookie" value of '0x63 0x82 0x53 0x63' at the start of the options section of the packet. This value is included in all DHCP packet types. * HTTP/2 connections are opened with the preface '0x505249202a20485454502f322e300d0a0d0a534d0d0a0d0a', or "PRI * HTTP/2.0\r\n\r\nSM\r\n\r\n". The preface is designed to avoid the processing of frames by servers and intermediaries which support earlier versions of HTTP but not 2.0.


In interfaces

Magic numbers are common in
API function An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software Interface (computing), interface, offering a service to other pieces of software. A document or standa ...
s and interfaces across many
operating system An operating system (OS) is system software that manages computer hardware, software resources, and provides common daemon (computing), services for computer programs. Time-sharing operating systems scheduler (computing), schedule tasks for ef ...
s, including
DOS DOS is shorthand for the MS-DOS and IBM PC DOS family of operating systems. DOS may also refer to: Computing * Data over signalling (DoS), multiplexing data onto a signalling channel * Denial-of-service attack (DoS), an attack on a communicat ...
,
Windows Windows is a group of several proprietary graphical operating system families developed and marketed by Microsoft. Each family caters to a certain sector of the computing industry. For example, Windows NT for consumers, Windows Server for se ...
and
NetWare NetWare is a discontinued computer network operating system developed by Novell, Inc. It initially used cooperative multitasking to run various services on a personal computer, using the IPX network protocol. The original NetWare product in ...
: ;Examples *
IBM PC The IBM Personal Computer (model 5150, commonly known as the IBM PC) is the first microcomputer released in the IBM PC model line and the basis for the IBM PC compatible de facto standard. Released on August 12, 1981, it was created by a team ...
-compatible
BIOS In computing, BIOS (, ; Basic Input/Output System, also known as the System BIOS, ROM BIOS, BIOS ROM or PC BIOS) is firmware used to provide runtime services for operating systems and programs and to perform hardware initialization during the b ...
es use magic values 0000 and 1234 to decide if the system should count up memory or not on reboot, thereby performing a cold or a warm boot. Theses values are also used by
EMM386 EMM386 is the expanded memory manager of Microsoft's MS-DOS, IBM's PC DOS, Digital Research's DR-DOS, and Datalight's ROM-DOS which is used to create expanded memory using extended memory on Intel 80386 CPUs. There also is an EMM386.EXE availabl ...
memory managers intercepting boot requests. BIOSes also use magic values 55 AA to determine if a disk is bootable. * The
MS-DOS MS-DOS ( ; acronym for Microsoft Disk Operating System, also known as Microsoft DOS) is an operating system for x86-based personal computers mostly developed by Microsoft. Collectively, MS-DOS, its rebranding as IBM PC DOS, and a few o ...
disk cache SMARTDRV (codenamed "Bambi") uses magic values BABE and EBAB in API functions. * Many DR DOS, Novell DOS and
OpenDOS DR-DOS (written as DR DOS, without a hyphen, in versions up to and including 6.0) is a disk operating system for IBM PC compatibles. Upon its introduction in 1988, it was the first DOS attempting to be compatible with IBM PC DOS and MS- ...
drivers developed in the former ''European Development Centre'' in the UK use the value 0EDC as magic token when invoking or providing additional functionality sitting on top of the (emulated) standard DOS functions, NWCACHE being one example.


Other uses

;Examples * The default
MAC address A media access control address (MAC address) is a unique identifier assigned to a network interface controller (NIC) for use as a network address in communications within a network segment. This use is common in most IEEE 802 networking te ...
on Texas Instruments
SOCs SOCS (suppressor of cytokine signaling proteins) refers to a family of genes involved in inhibiting the JAK-STAT signaling pathway. Genes * CISH * SOCS1 * SOCS2 * SOCS3 * SOCS4 * SOCS5 * SOCS6 * SOCS7 Suppressor of cytokine signaling 7 is a ...
is DE:AD:BE:EF:00:00.


Data type limits

This is a list of limits of data storage types:


GUIDs

It is possible to create or alter
globally unique identifier A universally unique identifier (UUID) is a 128-bit label used for information in computer systems. The term globally unique identifier (GUID) is also used. When generated according to the standard methods, UUIDs are, for practical purposes, u ...
s (GUIDs) so that they are memorable, but this is highly discouraged as it compromises their strength as near-unique identifiers. The specifications for generating GUIDs and UUIDs are quite complex, which is what leads to them being virtually unique, if properly implemented. . Microsoft Windows product ID numbers for
Microsoft Office Microsoft Office, or simply Office, is the former name of a family of client software, server software, and services developed by Microsoft. It was first announced by Bill Gates on August 1, 1988, at COMDEX in Las Vegas. Initially a marketin ...
products sometimes end with 0000-0000-0000000FF1CE ("OFFICE"), such as , the product ID for the "Office 16 Click-to-Run Extensibility Component". Java uses several GUIDs starting with CAFEEFAC. In the
GUID Partition Table The GUID Partition Table (GPT) is a standard for the layout of partition tables of a physical computer storage device, such as a hard disk drive or solid-state drive, using universally unique identifiers, which are also known as globally uniqu ...
of the GPT partitioning scheme, BIOS Boot partitions use the special GUID which does not follow the GUID definition; instead, it is formed by using the
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
codes for the string "Hah!IdontNeedEFI" partially in little endian order.


Debug values

Magic debug values are specific values written to
memory Memory is the faculty of the mind by which data or information is encoded, stored, and retrieved when needed. It is the retention of information over time for the purpose of influencing future action. If past events could not be remember ...
during allocation or deallocation, so that it will later be possible to tell whether or not they have become corrupted, and to make it obvious when values taken from uninitialized memory are being used. Memory is usually viewed in hexadecimal, so memorable repeating or hexspeak values are common. Numerically odd values may be preferred so that processors without byte addressing will fault when attempting to use them as pointers (which must fall at even addresses). Values should be chosen that are away from likely addresses (the program code, static data, heap data, or the stack). Similarly, they may be chosen so that they are not valid codes in the instruction set for the given architecture. Since it is very unlikely, although possible, that a 32-bit integer would take this specific value, the appearance of such a number in a debugger or memory dump most likely indicates an error such as a buffer overflow or an uninitialized variable. Famous and common examples include: Most of these are each 32 bits longthe
word size In computing, a word is the natural unit of data used by a particular processor design. A word is a fixed-sized datum handled as a unit by the instruction set or the hardware of the processor. The number of bits or digits in a word (the ''word s ...
of most 32-bit architecture computers. The prevalence of these values in Microsoft technology is no coincidence; they are discussed in detail in Steve Maguire's book ''Writing Solid Code'' from
Microsoft Press Microsoft Press is the publishing arm of Microsoft, usually releasing books dealing with various current Microsoft technologies. Microsoft Press' first introduced books were ''The Apple Macintosh Book'' by Cary Lu and ''Exploring the IBM PCjr H ...
. He gives a variety of criteria for these values, such as: * They should not be useful; that is, most algorithms that operate on them should be expected to do something unusual. Numbers like zero don't fit this criterion. * They should be easily recognized by the programmer as invalid values in the debugger. * On machines that don't have
byte alignment Data structure alignment is the way data is arranged and accessed in computer memory. It consists of three separate but related issues: data alignment, data structure padding, and packing. The CPU in modern computer hardware performs reads and ...
, they should be
odd number In mathematics, parity is the property of an integer of whether it is even or odd. An integer is even if it is a multiple of two, and odd if it is not.. For example, −4, 0, 82 are even because \begin -2 \cdot 2 &= -4 \\ 0 \cdot 2 &= 0 \\ 41 ...
s, so that dereferencing them as addresses causes an exception. * They should cause an exception, or perhaps even a debugger break, if executed as code. Since they were often used to mark areas of memory that were essentially empty, some of these terms came to be used in phrases meaning "gone, aborted, flushed from memory"; e.g. "Your program is DEADBEEF".


See also

* Magic string * *
List of file signatures This is a list of file signatures, data used to identify or verify the content of a file. Such signatures are also known as magic numbers or Magic Bytes. Many file formats are not intended to be read as text. If such a file is accidentally view ...
* FourCC * Hard coding *
Magic (programming) In the context of computer programming, magic is an informal term for abstraction; it is used to describe code that handles complex tasks while hiding that complexity to present a simple interface. The term is somewhat tongue-in-cheek, and ofte ...
* NaN (Not a Number) *
Enumerated type In computer programming, an enumerated type (also called enumeration, enum, or factor in the R programming language, and a categorical variable in statistics) is a data type consisting of a set of named values called ''elements'', ''members'', '' ...
* Hexspeak, for another set of magic values * Nothing up my sleeve number about magic constants in
cryptographic Cryptography, or cryptology (from grc, , translit=kryptós "hidden, secret"; and ''graphein'', "to write", or '' -logia'', "study", respectively), is the practice and study of techniques for secure communication in the presence of adv ...
algorithms * Time formatting and storage bugs, for problems that can be caused by magics * Sentinel value (aka flag value, trip value, rogue value, signal value, dummy data) *
Canary value Canary originally referred to the island of Gran Canaria on the west coast of Africa, and the group of surrounding islands (the Canary Islands). It may also refer to: Animals Birds * Canaries, birds in the genera '' Serinus'' and '' Crithagra'' ...
, special value to detect buffer overflows * XYZZY (magic word) * Fast inverse square root, using the constant 0x5F3759DF


References

{{DEFAULTSORT:Magic number (programming) Anti-patterns Debugging Computer programming folklore Software engineering folklore