Big-endian
   HOME

TheInfoList



OR:

In
computing Computing is any goal-oriented activity requiring, benefiting from, or creating computing machinery. It includes the study and experimentation of algorithmic processes, and development of both hardware and software. Computing has scientific, ...
, endianness, also known as byte sex, is the order or sequence of
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
s of a
word A word is a basic element of language that carries an objective or practical meaning, can be used on its own, and is uninterruptible. Despite the fact that language speakers often have an intuitive grasp of what a word is, there is no conse ...
of digital data in
computer memory In computing, memory is a device or system that is used to store information for immediate use in a computer or related computer hardware and digital electronic devices. The term ''memory'' is often synonymous with the term '' primary storag ...
. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the
most significant byte In computing, bit numbering is the convention used to identify the bit positions in a binary number. Bit significance and indexing In computing, the least significant bit (LSB) is the bit position in a binary integer representing the binary ...
of a word at the smallest
memory address In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. ...
and the least significant byte at the largest. A little-endian system, in contrast, stores the least-significant byte at the smallest address. Bi-endianness is a feature supported by numerous computer architectures that feature switchable endianness in data fetches and stores or for instruction fetches. Other orderings are generically called middle-endian or mixed-endian. Endianness may also be used to describe the order in which the
bit The bit is the most basic unit of information in computing and digital communications. The name is a portmanteau of binary digit. The bit represents a logical state with one of two possible values. These values are most commonly represente ...
s are transmitted over a communication channel, e.g., big-endian in a communications channel transmits the most significant bits first. Bit-endianness is seldom used in other contexts.


Etymology

Danny Cohen introduced the terms ''big-endian'' and ''little-endian'' into computer science for data ordering in an
Internet Experiment Note An Internet Experiment Note (IEN) is a sequentially numbered document in a series of technical publications issued by the participants of the early development work groups that created the precursors of the modern Internet. After DARPA began the ...
published in 1980. Also published at '' IEEE Computer''
October 1981 issue
The adjective ''endian'' has its origin in the writings of 18th century Anglo-Irish writer
Jonathan Swift Jonathan Swift (30 November 1667 – 19 October 1745) was an Anglo-Irish satirist, author, essayist, political pamphleteer (first for the Whigs, then for the Tories), poet, and Anglican cleric who became Dean of St Patrick's Cathedral, Dubl ...
. In the 1726 novel '' Gulliver's Travels'', he portrays the conflict between sects of Lilliputians divided into those breaking the shell of a
boiled egg Boiled eggs are eggs, typically from a chicken, cooked with their shells unbroken, usually by immersion in boiling water. Hard-boiled eggs are cooked so that the egg white and egg yolk both solidify, while soft-boiled eggs may leave the yolk, ...
from the big end or from the little end. He called them the ''Big-Endians'' and the ''Little-Endians''. Cohen makes the connection to ''Gulliver's Travels'' explicit in the appendix to his 1980 note.


Overview

Computers store information in various-sized groups of binary bits. Each group is assigned a number, called its ''address'', that the computer uses to access that data. On most modern computers, the smallest data group with an address is eight bits long and is called a byte. Larger groups comprise two or more bytes, for example, a 32-bit word contains four bytes. There are two possible ways a computer could number the individual bytes in a larger group, starting at either end. Both types of endianness are in widespread use in digital electronic engineering. The initial choice of endianness of a new design is often arbitrary, but later technology revisions and updates perpetuate the existing endianness to maintain
backward compatibility Backward compatibility (sometimes known as backwards compatibility) is a property of an operating system, product, or technology that allows for interoperability with an older legacy system, or with input designed for such a system, especiall ...
. Internally, any given computer will work equally well regardless of what endianness it uses since its hardware will consistently use the same endianness to both store and load its data. For this reason, programmers and computer users normally ignore the endianness of the computer they are working with. However, endianness can become an issue when moving data external to the computer – as when transmitting data between different computers, or a programmer investigating internal computer bytes of data from a
memory dump In computing, a core dump, memory dump, crash dump, storage dump, system dump, or ABEND dump consists of the recorded state of the working memory of a computer program at a specific time, generally when the program has crashed or otherwise termina ...
– and the endianness used differs from expectation. In these cases, the endianness of the data must be understood and accounted for. These two diagrams show how two computers using different endianness store a 32-bit (four byte) integer with the value of . In both cases, the integer is broken into four bytes, , , , and , and the bytes are stored in four sequential byte locations in memory, starting with the memory location with address ''a'', then ''a + 1'', ''a + 2'', and ''a + 3''. The difference between big and little endian is the order of the four bytes of the integer being stored. The left-side diagram shows a computer using big-endian. This starts the storing of the integer with the ''most''-significant byte, , at address ''a'', and ends with the ''least''-significant byte, , at address ''a + 3''. The right-side diagram shows a computer using little-endian. This starts the storing of the integer with the ''least''-significant byte, , at address ''a'', and ends with the ''most''-significant byte, , at address ''a + 3''. Since each computer uses its same endianness to both store and retrieve the integer, the results will be the same for both computers. Issues may arise when memory is addressed by bytes instead of integers, or when memory contents are transmitted between computers with different endianness. Big-endianness is the dominant ordering in networking protocols, such as in the
internet protocol suite The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the set of communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the sui ...
, where it is referred to as network order, transmitting the most significant byte first. Conversely, little-endianness is the dominant ordering for processor architectures (
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
, most
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
implementations, base
RISC-V RISC-V (pronounced "risk-five" where five refers to the number of generations of RISC architecture that were developed at the University of California, Berkeley since 1981) is an open standard instruction set architecture (ISA) based on estab ...
implementations) and their associated memory. File formats can use either ordering; some formats use a mixture of both or contain an indicator of which ordering is used throughout the file. The styles of little- and big-endian may also be used more generally to characterize the ordering of any representation, e.g. the digits in a
numeral system A numeral system (or system of numeration) is a writing system for expressing numbers; that is, a mathematical notation for representing numbers of a given set, using digits or other symbols in a consistent manner. The same sequence of symbo ...
or the sections of a
date Date or dates may refer to: *Date (fruit), the fruit of the date palm (''Phoenix dactylifera'') Social activity *Dating, a form of courtship involving social activity, with the aim of assessing a potential partner ** Group dating *Play date, a ...
. Numbers in positional notation are generally written with their digits in left-to-right big-endian order, even in right-to-left scripts. Similarly, programming languages use big-endian digit ordering for numeric literals.


Basics

Computer memory In computing, memory is a device or system that is used to store information for immediate use in a computer or related computer hardware and digital electronic devices. The term ''memory'' is often synonymous with the term '' primary storag ...
consists of a sequence of storage cells (smallest addressable units); in machines that support
byte addressing Byte addressing in hardware architectures supports accessing individual bytes. Computers with byte addressing are sometimes called ''byte machines,'' in contrast to '' word-addressable'' architectures, ''word machines'', that access data by word ...
, those units are called ''
byte The byte is a unit of digital information that most commonly consists of eight bits. Historically, the byte was the number of bits used to encode a single character of text in a computer and for this reason it is the smallest addressable uni ...
s''. Each byte is identified and accessed in hardware and software by its
memory address In computing, a memory address is a reference to a specific memory location used at various levels by software and hardware. Memory addresses are fixed-length sequences of digits conventionally displayed and manipulated as unsigned integers. ...
. If the total number of bytes in memory is ''n'', then addresses are enumerated from 0 to ''n'' − 1. Computer programs often use data structures or
fields Fields may refer to: Music * Fields (band), an indie rock band formed in 2006 * Fields (progressive rock band), a progressive rock band formed in 1971 * ''Fields'' (album), an LP by Swedish-based indie rock band Junip (2010) * "Fields", a song b ...
that may consist of more data than can be stored in one byte. In the context of this article where its type cannot be arbitrarily complicated, a "field" consists of a consecutive sequence of bytes and represents a "simple data value" which – at least potentially – can be manipulated by ''one'' single hardware instruction. On most systems, the address of a multi-byte simple data value is the address of its first byte (the byte with the lowest address). Another important attribute of a byte being part of a "field" is its "significance". These attributes of the parts of a field play an important role in the sequence the bytes are accessed by the computer hardware, more precisely: by the low-level algorithms contributing to the results of a computer instruction.


Numbers

Positional number systems (mostly base 2, or less often base 10) are the predominant way of representing and particularly of manipulating integer data by computers. In pure form this is valid for moderate sized non-negative integers, e.g. of C data type unsigned. In such a number system, the ''value'' of a digit which it contributes to the whole number is determined not only by its value as a single digit, but also by the position it holds in the complete number, called its significance. These positions can be mapped to memory mainly in two ways: * decreasing numeric significance with increasing memory addresses (or increasing time), known as ''big-endian'' and * increasing numeric significance with increasing memory addresses (or increasing time), known as ''little-endian''. The integer data that are directly supported by the computer hardware have a fixed width of a low power of 2, e.g. 8 bits ≙ 1 byte, 16 bits ≙ 2 bytes, 32 bits ≙ 4 bytes, 64 bits ≙ 8 bytes, 128 bits ≙ 16 bytes. The low-level access sequence to the bytes of such a field depends on the operation to be performed. The least-significant byte is accessed first for addition, subtraction and multiplication. The most-significant byte is accessed first for
division Division or divider may refer to: Mathematics *Division (mathematics), the inverse of multiplication *Division algorithm, a method for computing the result of mathematical division Military *Division (military), a formation typically consisting ...
and comparison. See . For floating-point numbers, see .


Text

When character (text) strings are to be compared with one another, e.g. in order to support some mechanism like sorting, this is very frequently done
lexicographically In mathematics, the lexicographic or lexicographical order (also known as lexical order, or dictionary order) is a generalization of the alphabetical order of the dictionaries to sequences of ordered symbols or, more generally, of elements of a ...
where a single positional element (character) also has a positional value. Lexicographical comparison means almost everywhere: first character ranks highest – as in the telephone book. Integer numbers written as text are always represented most significant digit first in memory, which is similar to big-endian, independently of
text direction A writing system is a method of visually representing verbal communication, based on a script and a set of rules regulating its use. While both writing and speech are useful in conveying messages, writing differs in also being a reliable form ...
.


Hardware

Many historical and extant processors use a big-endian memory representation, either exclusively or as a design option. Other processor types use little-endian memory representation; others use yet another scheme called '' middle-endian'', ''mixed-endian'' or '' PDP-11-endian''. Some instruction sets feature a setting which allows for switchable endianness in data fetches and stores, instruction fetches, or both. This feature can improve performance or simplify the logic of networking devices and software. The word ''bi-endian'', when said of hardware, denotes the capability of the machine to compute or pass data in either endian format. Dealing with data of different endianness is sometimes termed the ''NUXI problem''. This terminology alludes to the byte order conflicts encountered while adapting
UNIX Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, an ...
, which ran on the mixed-endian PDP-11, to a big-endian IBM Series/1 computer. Unix was one of the first systems to allow the same code to be compiled for platforms with different internal representations. One of the first programs converted was supposed to print out , but on the Series/1 it printed instead. The IBM System/360 uses big-endian byte order, as do its successors
System/370 The IBM System/370 (S/370) is a model range of IBM mainframe computers announced on June 30, 1970, as the successors to the System/360 family. The series mostly maintains backward compatibility with the S/360, allowing an easy migration path ...
, ESA/390, and z/Architecture. The
PDP-10 Digital Equipment Corporation (DEC)'s PDP-10, later marketed as the DECsystem-10, is a mainframe computer family manufactured beginning in 1966 and discontinued in 1983. 1970s models and beyond were marketed under the DECsystem-10 name, espec ...
uses big-endian addressing for byte-oriented instructions. The IBM Series/1 minicomputer uses big-endian byte order. The
Datapoint 2200 The Datapoint 2200 was a mass-produced programmable computer terminal usable as a computer, designed by Computer Terminal Corporation (CTC) founders Phil Ray and Gus Roche and announced by CTC in June 1970 (with units shipping in 1971). It was ...
used simple bit-serial logic with little-endian to facilitate carry propagation. When Intel developed the
8008 The Intel 8008 ("''eight-thousand-eight''" or "''eighty-oh-eight''") is an early byte-oriented microprocessor designed by Computer Terminal Corporation (CTC), implemented and manufactured by Intel, and introduced in April 1972. It is an 8-bit ...
microprocessor for Datapoint, they used little-endian for compatibility. However, as Intel was unable to deliver the 8008 in time, Datapoint used a
medium-scale integration An integrated circuit or monolithic integrated circuit (also referred to as an IC, a chip, or a microchip) is a set of electronic circuits on one small flat piece (or "chip") of semiconductor material, usually silicon. Transistor count, Large ...
equivalent, but the little-endianness was retained in most Intel designs, including the MCS-48 and the
8086 The 8086 (also called iAPX 86) is a 16-bit microprocessor chip designed by Intel between early 1976 and June 8, 1978, when it was released. The Intel 8088, released July 1, 1979, is a slightly modified chip with an external 8-bit data bus (allowi ...
and its
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
successors. The
DEC Alpha Alpha (original name Alpha AXP) is a 64-bit reduced instruction set computer (RISC) instruction set architecture (ISA) developed by Digital Equipment Corporation (DEC). Alpha was designed to replace 32-bit VAX complex instruction set compute ...
, Atmel AVR,
VAX VAX (an acronym for Virtual Address eXtension) is a series of computers featuring a 32-bit instruction set architecture (ISA) and virtual memory that was developed and sold by Digital Equipment Corporation (DEC) in the late 20th century. The V ...
, the
MOS Technology 6502 The MOS Technology 6502 (typically pronounced "sixty-five-oh-two" or "six-five-oh-two") William Mensch and the moderator both pronounce the 6502 microprocessor as ''"sixty-five-oh-two"''. is an 8-bit microprocessor that was designed by a small te ...
family (including
Western Design Center The Western Design Center (WDC), located in Mesa, Arizona, is a company which develops intellectual property for, and licenses manufacture of, MOS Technology 65xx based microprocessors, microcontrollers (µCs), and related support devices. W ...
65802 and 65C816), the Zilog
Z80 The Z80 is an 8-bit microprocessor introduced by Zilog as the startup company's first product. The Z80 was conceived by Federico Faggin in late 1974 and developed by him and his 11 employees starting in early 1975. The first working samples were ...
(including Z180 and
eZ80 The Zilog eZ80 is an 8-bit microprocessor from Zilog, introduced in 2001. eZ80 is an updated version of the company's first product, the Z80 microprocessor. Design The eZ80 (like the Z380) is binary compatible with the Z80 and Z180, but a ...
), the
Altera Altera Corporation was a manufacturer of programmable logic devices (PLDs) headquartered in San Jose, California. It was founded in 1983 and acquired by Intel in 2015. The main product lines from Altera were the flagship Stratix series, mid-ran ...
Nios II Nios II is a 32-bit embedded processor architecture designed specifically for the Altera family of field-programmable gate array (FPGA) integrated circuits. Nios II incorporates many enhancements over the original Nios architecture, making it mo ...
, and many other processors and processor families are also little-endian. The Motorola 6800 / 6801, the 6809 and the 68000 series of processors used the big-endian format. The Intel 8051, unlike other Intel processors, expects 16-bit addresses for LJMP and LCALL in big-endian format; however, xCALL instructions store the return address onto the stack in little-endian format.
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system develope ...
historically used big-endian until version 9, which is
bi-endian In computing, endianness, also known as byte sex, is the order or sequence of bytes of a word of digital data in computer memory. Endianness is primarily expressed as big-endian (BE) or little-endian (LE). A big-endian system stores the most si ...
. Similarly early IBM POWER processors were big-endian, but the PowerPC and Power ISA descendants are now bi-endian. The ARM architecture was little-endian before version 3 when it became bi-endian.


Newer architectures

The
IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnation o ...
and
x86-64 x86-64 (also known as x64, x86_64, AMD64, and Intel 64) is a 64-bit version of the x86 instruction set, first released in 1999. It introduced two new modes of operation, 64-bit mode and compatibility mode, along with a new 4-level paging ...
instruction set architectures use the little-endian format. Other instruction set architectures that follow this convention, allowing only little-endian mode, include
Nios II Nios II is a 32-bit embedded processor architecture designed specifically for the Altera family of field-programmable gate array (FPGA) integrated circuits. Nios II incorporates many enhancements over the original Nios architecture, making it mo ...
, Andes Technology NDS32, and
Qualcomm Hexagon Hexagon is the brand name for a family of digital signal processor (DSP) products by Qualcomm. Hexagon is also known as QDSP6, standing for “sixth generation digital signal processor.” According to Qualcomm, the Hexagon architecture is design ...
. Solely big-endian architectures include the IBM z/Architecture and
OpenRISC OpenRISC is a project to develop a series of open-source hardware based central processing units (CPUs) on established reduced instruction set computer (RISC) principles. It includes an instruction set architecture (ISA) using an open-source lic ...
. Some instruction set architectures are "bi-endian" and allow running software of either endianness; these include Power ISA,
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system develope ...
, ARM
AArch64 AArch64 or ARM64 is the 64-bit extension of the ARM architecture family. It was first introduced with the Armv8-A architecture. Arm releases a new extension every year. ARMv8.x and ARMv9.x extensions and features Announced in October 2011, AR ...
, C-Sky, and
RISC-V RISC-V (pronounced "risk-five" where five refers to the number of generations of RISC architecture that were developed at the University of California, Berkeley since 1981) is an open standard instruction set architecture (ISA) based on estab ...
.
IBM AIX AIX (Advanced Interactive eXecutive, pronounced , "ay-eye-ex") is a series of proprietary Unix operating systems developed and sold by IBM for several of its computer platforms. Background Originally released for the IBM RT PC RISC ...
and IBM i run in big-endian mode on bi-endian Power ISA;
Linux Linux ( or ) is a family of open-source Unix-like operating systems based on the Linux kernel, an operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically packaged as a Linux distribution, w ...
originally ran in big-endian mode, but by 2019, IBM had transitioned to little-endian mode for Linux to ease the porting of Linux software from x86 to Power. SPARC has no relevant little-endian deployment, as both
Oracle Solaris Solaris is a proprietary Unix operating system originally developed by Sun Microsystems. After the Sun acquisition by Oracle in 2010, it was renamed Oracle Solaris. Solaris superseded the company's earlier SunOS in 1993, and became known for i ...
and Linux run in big-endian mode on bi-endian SPARC systems, and can be considered big-endian in practice. ARM, C-Sky, and RISC-V have no relevant big-endian deployments, and can be considered little-endian in practice.


Bi-endianness

Some architectures (including
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
versions 3 and above, PowerPC, Alpha,
SPARC SPARC (Scalable Processor Architecture) is a reduced instruction set computer (RISC) instruction set architecture originally developed by Sun Microsystems. Its design was strongly influenced by the experimental Berkeley RISC system develope ...
V9, MIPS,
Intel i860 The Intel i860 (also known as 80860) is a RISC microprocessor design introduced by Intel in 1989. It is one of Intel's first attempts at an entirely new, high-end instruction set architecture since the failed Intel iAPX 432 from the beginning of ...
,
PA-RISC PA-RISC is an instruction set architecture (ISA) developed by Hewlett-Packard. As the name implies, it is a reduced instruction set computer (RISC) architecture, where the PA stands for Precision Architecture. The design is also referred to as ...
, SuperH SH-4 and
IA-64 IA-64 (Intel Itanium architecture) is the instruction set architecture (ISA) of the Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by Intel in col ...
) feature a setting which allows for switchable endianness in data fetches and stores, instruction fetches, or both. This feature can improve performance or simplify the logic of networking devices and software. The word ''bi-endian'', when said of hardware, denotes the capability of the machine to compute or pass data in either endian format. Many of these architectures can be switched via software to default to a specific endian format (usually done when the computer starts up); however, on some systems, the default endianness is selected by hardware on the motherboard and cannot be changed via software (e.g. the Alpha, which runs only in big-endian mode on the Cray T3E). Note that the term ''bi-endian'' refers primarily to how a processor treats data accesses. Instruction accesses (fetches of instruction words) on a given processor may still assume a fixed endianness, even if data accesses are fully bi-endian, though this is not always the case, such as on Intel's
IA-64 IA-64 (Intel Itanium architecture) is the instruction set architecture (ISA) of the Itanium family of 64-bit Intel microprocessors. The basic ISA specification originated at Hewlett-Packard (HP), and was subsequently implemented by Intel in col ...
-based Itanium CPU, which allows both. Note, too, that some nominally bi-endian CPUs require motherboard help to fully switch endianness. For instance, the 32-bit desktop-oriented PowerPC processors in little-endian mode act as little-endian from the point of view of the executing programs, but they require the motherboard to perform a 64-bit swap across all 8 byte lanes to ensure that the little-endian view of things will apply to I/O devices. In the absence of this unusual motherboard hardware, device driver software must write to different addresses to undo the incomplete transformation and also must perform a normal byte swap. Some CPUs, such as many PowerPC processors intended for embedded use and almost all SPARC processors, allow per-page choice of endianness. SPARC processors since the late 1990s (SPARC v9 compliant processors) allow data endianness to be chosen with each individual instruction that loads from or stores to memory. The ARM architecture supports two big-endian modes, called ''BE-8'' and ''BE-32''. CPUs up to ARMv5 only support BE-32 or word-invariant mode. Here any naturally aligned 32-bit access works like in little-endian mode, but access to a byte or 16-bit word is redirected to the corresponding address and unaligned access is not allowed. ARMv6 introduces BE-8 or byte-invariant mode, where access to a single byte works as in little-endian mode, but accessing a 16-bit, 32-bit or (starting with ARMv8) 64-bit word results in a byte swap of the data. This simplifies unaligned memory access as well as memory-mapped access to registers other than 32 bit. Many processors have instructions to convert a word in a register to the opposite endianness, that is, they swap the order of the bytes in a 16-, 32- or 64-bit word. All the individual bits are not reversed though. Recent Intel x86 and x86-64 architecture CPUs have a MOVBE instruction (
Intel Core Intel Core is a line of streamlined midrange consumer, workstation and enthusiast computer central processing units (CPUs) marketed by Intel Corporation. These processors displaced the existing mid- to high-end Pentium processors at the time o ...
since generation 4, after
Atom Every atom is composed of a nucleus and one or more electrons bound to the nucleus. The nucleus is made of one or more protons and a number of neutrons. Only the most common variety of hydrogen has no neutrons. Every solid, liquid, gas, ...
), which fetches a big-endian format word from memory or writes a word into memory in big-endian format. These processors are otherwise thoroughly little-endian. There are also devices which use different formats in different places. For instance, the BQ27421
Texas Instruments Texas Instruments Incorporated (TI) is an American technology company headquartered in Dallas, Texas, that designs and manufactures semiconductors and various integrated circuits, which it sells to electronics designers and manufacturers globa ...
battery gauge uses the little-endian format for its registers and the big-endian format for its
random-access memory Random-access memory (RAM; ) is a form of computer memory that can be read and changed in any order, typically used to store working data and machine code. A random-access memory device allows data items to be read or written in almost the ...
. This behavior does not seem to be modifiable.


Floating point

Although many processors use little-endian storage for all types of data (integer, floating point), there are a number of hardware architectures where floating-point numbers are represented in big-endian form while integers are represented in little-endian form. There are
ARM In human anatomy, the arm refers to the upper limb in common usage, although academically the term specifically means the upper arm between the glenohumeral joint (shoulder joint) and the elbow joint. The distal part of the upper limb between th ...
processors that have mixed-endian floating-point representation for double-precision numbers: each of the two 32-bit words is stored as little-endian, but the most significant word is stored first.
VAX VAX (an acronym for Virtual Address eXtension) is a series of computers featuring a 32-bit instruction set architecture (ISA) and virtual memory that was developed and sold by Digital Equipment Corporation (DEC) in the late 20th century. The V ...
floating point stores little-endian 16-bit words in big-endian order. Because there have been many floating-point formats with no network standard representation for them, the XDR standard uses big-endian IEEE 754 as its representation. It may therefore appear strange that the widespread IEEE 754 floating-point standard does not specify endianness. Theoretically, this means that even standard IEEE floating-point data written by one machine might not be readable by another. However, on modern standard computers (i.e., implementing IEEE 754), one may safely assume that the endianness is the same for floating-point numbers as for integers, making the conversion straightforward regardless of data type. Small
embedded system An embedded system is a computer system—a combination of a computer processor, computer memory, and input/output peripheral devices—that has a dedicated function within a larger mechanical or electronic system. It is ''embedded'' ...
s using special floating-point formats may be another matter, however.


Variable-length data

Most instructions considered so far contain the size (lengths) of their
operand In mathematics, an operand is the object of a mathematical operation, i.e., it is the object or quantity that is operated on. Example The following arithmetic expression shows an example of operators and operands: :3 + 6 = 9 In the above exam ...
s within the operation code. Frequently available operand lengths are 1, 2, 4, 8, or 16 bytes. But there are also architectures where the length of an operand may be held in a separate field of the instruction or with the operand itself, e.g. by means of a word mark. Such an approach allows operand lengths up to 256 bytes or larger. The data types of such operands are character strings or BCD. Machines able to manipulate such data with one instruction (e.g. compare, add) include the
IBM 1401 The IBM 1401 is a variable-wordlength decimal computer that was announced by IBM on October 5, 1959. The first member of the highly successful IBM 1400 series, it was aimed at replacing unit record equipment for processing data stored on pu ...
,
1410 Year 1410 ( MCDX) was a common year starting on Wednesday (link will display the full calendar) of the Julian calendar. Events January–December * March 25 – The first of the Yongle Emperor's campaigns against the Mongols is ...
, 1620, System/360,
System/370 The IBM System/370 (S/370) is a model range of IBM mainframe computers announced on June 30, 1970, as the successors to the System/360 family. The series mostly maintains backward compatibility with the S/360, allowing an easy migration path ...
, ESA/390, and z/Architecture, all of them of type big-endian.


Simplified access to part of a field

On most systems, the address of a multi-byte value is the address of its first byte (the byte with the lowest address); little-endian systems of that type have the property that, for sufficiently low data values, the same value can be read from memory at different lengths without using different addresses (even when alignment restrictions are imposed). For example, a 32-bit memory location with content can be read at the same address as either 8-bit (value = 4A),
16-bit 16-bit microcomputers are microcomputers that use 16-bit microprocessors. A 16-bit register can store 216 different values. The range of integer values that can be stored in 16 bits depends on the integer representation used. With the two mo ...
(004A), 24-bit (00004A), or 32-bit (0000004A), all of which retain the same numeric value. Although this little-endian property is rarely used directly by high-level programmers, it is occasionally employed by code optimizers as well as by assembly language programmers. In more concrete terms, identities like this are the equivalent of the following C code returning ''true'' on most little-endian systems: union u = ; puts(u.u8

u.u16 && u.u8

u.u32 && u.u8

u.u64 ? "true" : "false");
While not allowed by C++, such type punning code is allowed as "implementation-defined" by the C11 standard and commonly used in code interacting with hardware. On the other hand, in some situations it may be useful to obtain an approximation of a multi-byte or multi-word value by reading only its most significant portion instead of the complete representation; a big-endian processor may read such an approximation using the same base address that would be used for the full value. Simplifications of this kind are of course not portable across systems of different endianness.


Calculation order

Some operations in
positional number system Positional notation (or place-value notation, or positional numeral system) usually denotes the extension to any base of the Hindu–Arabic numeral system (or decimal system). More generally, a positional system is a numeral system in which the ...
s have a natural or preferred order in which the elementary steps are to be executed. This order may affect their performance on small-scale byte-addressable processors and microcontrollers. However, high-performance processors usually fetch multi-byte operands from memory in the same amount of time they would have fetched a single byte, so the complexity of the hardware is not affected by the byte ordering. Addition, subtraction, and multiplication start at the least significant digit position and propagate the carry to the subsequent more significant position. On most systems, the address of a multi-byte value is the address of its first byte (the byte with the lowest address). The implementation of these operations is marginally simpler using little-endian machines where this first byte contains the least significant digit. Comparison and division start at the most significant digit and propagate a possible carry to the subsequent less significant digits. For fixed-length numerical values (typically of length 1,2,4,8,16), the implementation of these operations is marginally simpler on big-endian machines. Some big-endian processors (e.g. the IBM System/360 and its successors) contain hardware instructions for lexicographically comparing varying length character strings. The normal data transport by an
assignment Assignment, assign or The Assignment may refer to: * Homework * Sex assignment * The process of sending National Basketball Association players to its development league; see Computing * Assignment (computer science), a type of modification to ...
statement is in principle independent of the endianness of the processor.


Middle-endian

Numerous other orderings, generically called ''middle-endian'' or ''mixed-endian'', are possible. The PDP-11 is in principle a 16-bit little-endian system. The instructions to convert between floating-point and integer values in the optional floating-point processor of the PDP-11/45, PDP-11/70, and in some later processors, stored 32-bit "double precision integer long" values with the 16-bit halves swapped from the expected little-endian order. The
UNIX Unix (; trademarked as UNIX) is a family of multitasking, multiuser computer operating systems that derive from the original AT&T Unix, whose development started in 1969 at the Bell Labs research center by Ken Thompson, Dennis Ritchie, an ...
C compiler used the same format for 32-bit long integers. This ordering is known as ''PDP-endian''. A way to interpret this endianness is that it stores a 32-bit integer as two little-endian 16-bit words, with a big-endian word ordering: The 16-bit values here refer to their numerical values, not their actual layout.
Segment descriptors In memory addressing for Intel x86 computer architectures, segment descriptors are a part of the segmentation unit, used for translating a logical address to a linear address. Segment descriptors describe the memory segment referred to in the lo ...
of
IA-32 IA-32 (short for "Intel Architecture, 32-bit", commonly called i386) is the 32-bit version of the x86 instruction set architecture, designed by Intel and first implemented in the 80386 microprocessor in 1985. IA-32 is the first incarnation o ...
and compatible processors keep a 32-bit base address of the segment stored in little-endian order, but in four nonconsecutive bytes, at relative positions 2, 3, 4 and 7 of the descriptor start.


Endian dates

Dates can be represented with different endianness by the ordering of the year, month and day. For example, September 13, 2002 can be represented as: * little-endian date (day, month, year), * middle-endian dates (month, day, year), * big-endian date (year, month, day), as with
ISO 8601 ISO 8601 is an international standard covering the worldwide exchange and communication of date and time-related data. It is maintained by the Geneva-based International Organization for Standardization (ISO) and was first published in 1988, w ...
In date and time notation in the United States, dates are middle-endian and differ from date formats worldwide.


Byte addressing

When memory bytes are printed sequentially from left to right (e.g. in a hex dump), little-endian representation of integers has the significance increasing from left to right. In other words, it appears backwards when visualized, which can be counter-intuitive. This behavior arises, for example, in
FourCC A FourCC ("four-character code") is a sequence of four bytes (typically ASCII) used to uniquely identify data formats. It originated from the OSType or ResType metadata system used in classic Mac OS and was adopted for the Amiga/Electronic Arts I ...
or similar techniques that involve packing characters into an integer, so that it becomes a sequences of specific characters in memory. Let's define the notation as simply the result of writing the characters in hexadecimal
ASCII ASCII ( ), abbreviated from American Standard Code for Information Interchange, is a character encoding standard for electronic communication. ASCII codes represent text in computers, telecommunications equipment, and other devices. Because ...
and appending to the front, and analogously for shorter sequences (a C multicharacter literal, in Unix/MacOS style): ' J o h n ' hex 4A 6F 68 6E ---------------- -> 0x4A6F686E On big-endian machines, the value appears left-to-right, coinciding with the correct string order for reading the result: But on a little-endian machine, one would see: Middle-endian machines complicate this even further; for example, on the PDP-11, the 32-bit value is stored as two 16-bit words in big-endian, with the characters in the 16-bit words being stored in little-endian:


Byte swapping

Byte-swapping consists of rearranging bytes to change endianness. Many compilers provide built-ins that are likely to be compiled into native processor instructions (/), such as . Software interfaces for swapping include: * Standard network endianness functions (from/to BE, up to 32-bit). Windows has a 64-bit extension in . * BSD and Glibc functions (from/to BE and LE, up to 64-bit). *
macOS macOS (; previously OS X and originally Mac OS X) is a Unix operating system developed and marketed by Apple Inc. since 2001. It is the primary operating system for Apple's Mac computers. Within the market of desktop and lapt ...
macros (from/to BE and LE, up to 64-bit). Some CPU instruction sets provide native support for endian byte swapping, such as (
x86 x86 (also known as 80x86 or the 8086 family) is a family of complex instruction set computer (CISC) instruction set architectures initially developed by Intel based on the Intel 8086 microprocessor and its 8088 variant. The 8086 was intr ...
- 486 and later), and ( ARMv6 and later). Some
compiler In computing, a compiler is a computer program that translates computer code written in one programming language (the ''source'' language) into another language (the ''target'' language). The name "compiler" is primarily used for programs tha ...
s have built-in facilities for byte swapping. For example, the
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
Fortran compiler supports the non-standard specifier when opening a file, e.g.: . Other compilers have options for generating code that globally enables the conversion for all file IO operations. This permits the reuse of code on a system with the opposite endianness without code modification.


Logic design

Hardware description language In computer engineering, a hardware description language (HDL) is a specialized computer language used to describe the structure and behavior of electronic circuits, and most commonly, digital logic circuits. A hardware description language en ...
s (HDLs) used to express digital logic often support arbitrary endianness, with arbitrary granularity. For example, in
SystemVerilog SystemVerilog, standardized as IEEE 1800, is a hardware description and hardware verification language used to model, design, simulate, test and implement electronic systems. SystemVerilog is based on Verilog and some extensions, and since ...
, a word can be defined as little endian or big endian: logic 1:0little_endian; // bit 0 is the least significant bit logic :31big_endian; // bit 31 is the least significant bit logic :37:0] mixed; // each byte is little-endian, but bytes are packed in big-endian order.


Files and filesystems

The recognition of endianness is important when reading a file or filesystem created on a computer with different endianness. Fortran sequential unformatted files created with one endianness usually cannot be read on a system using the other endianness because Fortran usually implements a storage record, record (defined as the data written by a single Fortran statement) as data preceded and succeeded by count fields, which are integers equal to the number of bytes in the data. An attempt to read such a file using Fortran on a system of the other endianness results in a run-time error, because the count fields are incorrect.
Unicode Unicode, formally The Unicode Standard,The formal version reference is is an information technology standard for the consistent encoding, representation, and handling of text expressed in most of the world's writing systems. The standard, wh ...
text can optionally start with a
byte order mark The byte order mark (BOM) is a particular usage of the special Unicode character, , whose appearance as a magic number at the start of a text stream can signal several things to a program reading the text: * The byte order, or endianness, of t ...
(BOM) to signal the endianness of the file or stream. Its code point is U+FEFF. In
UTF-32 UTF-32 (32- bit Unicode Transformation Format) is a fixed-length encoding used to encode Unicode code points that uses exactly 32 bits (four bytes) per code point (but a number of leading bits must be zero as there are far fewer than 232 Unicode ...
for example, a big-endian file should start with ; a little-endian should start with . Application binary data formats, such as
MATLAB MATLAB (an abbreviation of "MATrix LABoratory") is a proprietary multi-paradigm programming language and numeric computing environment developed by MathWorks. MATLAB allows matrix manipulations, plotting of functions and data, implementa ...
''.mat'' files, or the ''.bil'' data format, used in topography, are usually endianness-independent. This is achieved by storing the data always in one fixed endianness or carrying with the data a switch to indicate the endianness. An example of the former is the binary XLS file format that is portable between Windows and Mac systems and always little-endian, requiring the Mac application to swap the bytes on load and save when running on a big-endian Motorola 68K or PowerPC processor.
TIFF Tag Image File Format, abbreviated TIFF or TIF, is an image file format for storing raster graphics images, popular among graphic artists, the publishing industry, and photographers. TIFF is widely supported by scanning, faxing, word process ...
image files are an example of the second strategy, whose header instructs the application about endianness of their internal binary integers. If a file starts with the signature it means that integers are represented as big-endian, while means little-endian. Those signatures need a single 16-bit word each, and they are palindromes (that is, they read the same forwards and backwards), so they are endianness independent. stands for
Intel Intel Corporation is an American multinational corporation and technology company headquartered in Santa Clara, California. It is the world's largest semiconductor chip manufacturer by revenue, and is one of the developers of the x86 seri ...
and stands for
Motorola Motorola, Inc. () was an American multinational telecommunications company based in Schaumburg, Illinois, United States. After having lost $4.3 billion from 2007 to 2009, the company split into two independent public companies, Motorol ...
, the respective CPU providers of the IBM PC compatibles (Intel) and
Apple Macintosh The Mac (known as Macintosh until 1999) is a family of personal computers designed and marketed by Apple Inc., Apple Inc. Macs are known for their ease of use and minimalist designs, and are popular among students, creative professionals, and ...
platforms (Motorola) in the 1980s. Intel CPUs are little-endian, while Motorola 680x0 CPUs are big-endian. This explicit signature allows a TIFF reader program to swap bytes if necessary when a given file was generated by a TIFF writer program running on a computer with a different endianness. As a consequence of its original implementation on the Intel 8080 platform, the operating system-independent File Allocation Table (FAT) file system is defined with little-endian byte ordering, even on platforms using another endianness natively, necessitating byte-swap operations for maintaining the FAT. ZFS, which combines a
filesystem In computing, file system or filesystem (often abbreviated to fs) is a method and data structure that the operating system uses to control how data is Computer data storage, stored and retrieved. Without a file system, data placed in a storage me ...
and a logical volume manager, is known to provide adaptive endianness and to work with both big-endian and little-endian systems.


Networking

Many IETF RFCs use the term ''network order'', meaning the order of transmission for bits and bytes ''over the wire'' in
network protocols A communication protocol is a system of rules that allows two or more entities of a communications system to transmit information via any kind of variation of a physical quantity. The protocol defines the rules, syntax, semantics and synchroniza ...
. Among others, the historic RFC 1700 (also known as
Internet standard In computer network engineering, an Internet Standard is a normative specification of a technology or methodology applicable to the Internet. Internet Standards are created and published by the Internet Engineering Task Force (IETF). They allow ...
STD 2) has defined the network order for protocols in the
Internet protocol suite The Internet protocol suite, commonly known as TCP/IP, is a framework for organizing the set of communication protocols used in the Internet and similar computer networks according to functional criteria. The foundational protocols in the sui ...
to be big-endian, hence the use of the term for big-endian byte order. However, not all protocols use big-endian byte order as the network order. The
Server Message Block Server Message Block (SMB) is a communication protocol originally developed in 1983 by Barry A. Feigenbaum at IBM and intended to provide shared access to files and printers across nodes on a network of systems running IBM's OS/2. It also provide ...
(SMB) protocol uses little-endian byte order. In
CANopen CANopen is a communication protocol and device profile specification for embedded systems used in automation. In terms of the OSI model, CANopen implements the layers above and including the network layer. The CANopen standard consists of an address ...
, multi-byte parameters are always sent least significant byte first (little-endian). The same is true for Ethernet Powerlink. The
Berkeley sockets Berkeley sockets is an application programming interface (API) for Internet sockets and Unix domain sockets, used for inter-process communication (IPC). It is commonly implemented as a library of linkable modules. It originated with the 4.2BSD ...
API An application programming interface (API) is a way for two or more computer programs to communicate with each other. It is a type of software interface, offering a service to other pieces of software. A document or standard that describes how ...
defines a set of functions to convert 16-bit and 32-bit integers to and from network byte order: the (host-to-network-short) and (host-to-network-long) functions convert 16-bit and 32-bit values respectively from machine (''host'') to network order; the and functions convert from network to host order. These functions may be a no-op on a big-endian system. While the high-level network protocols usually consider the byte (mostly meant as ''
octet Octet may refer to: Music * Octet (music), ensemble consisting of eight instruments or voices, or composition written for such an ensemble ** String octet, a piece of music written for eight string instruments *** Octet (Mendelssohn), 1825 compos ...
'') as their atomic unit, the lowest network protocols may deal with ordering of bits within a byte.


Bit endianness

Bit numbering In computing, bit numbering is the convention used to identify the bit positions in a binary number. Bit significance and indexing In computing, the least significant bit (LSB) is the bit position in a binary integer representing the binar ...
is a concept similar to endianness, but on a level of bits, not bytes. Bit endianness or bit-level endianness refers to the transmission order of bits over a serial medium. The bit-level analogue of little-endian (least significant bit goes first) is used in
RS-232 In telecommunications, RS-232 or Recommended Standard 232 is a standard originally introduced in 1960 for serial communication transmission of data. It formally defines signals connecting between a ''DTE'' (''data terminal equipment'') such ...
, HDLC,
Ethernet Ethernet () is a family of wired computer networking technologies commonly used in local area networks (LAN), metropolitan area networks (MAN) and wide area networks (WAN). It was commercially introduced in 1980 and first standardized in 1 ...
, and USB. Some protocols use the opposite ordering (e.g.
Teletext A British Ceefax football index page from October 2009, showing the three-digit page numbers for a variety of football news stories Teletext, or broadcast teletext, is a standard for displaying text and rudimentary graphics on suitably equipp ...
, I2C,
SMBus The System Management Bus (abbreviated to SMBus or SMB) is a single-ended simple two-wire bus for the purpose of lightweight communication. Most commonly it is found in computer motherboards for communication with the power source for ON/OFF instru ...
,
PMBus The Power Management Bus (PMBus) is a variant of the System Management Bus (SMBus) which is targeted at digital management of power supplies. Like SMBus, it is a relatively slow speed two wire communications protocol based on I²C. Unlike either ...
, and SONET and SDHCf. Sec. 2.1 Bit Transmission o
draft-ietf-pppext-sonet-as-00 "Applicability Statement for PPP over SONET/SDH"
/ref>), and
ARINC 429 ARINC 429, "Mark33 Digital Information Transfer System (DITS)," is also known as the Aeronautical Radio INC. (ARINC) technical standard for the predominant avionics data bus used on most higher-end commercial and transport aircraft. It defines the ...
uses one ordering for its label field and the other ordering for the remainder of the frame. Usually, there exists a consistent view to the bits irrespective of their order in the byte, such that the latter becomes relevant only on a very low level. One exception is caused by the feature of some cyclic redundancy checks to detect ''all''
burst error In telecommunication, a burst error or error burst is a contiguous sequence of symbols, received over a communication channel, such that the first and last symbols are in error and there exists no contiguous subsequence of ''m'' correctly receive ...
s up to a known length, which would be spoiled if the bit order is different from the byte order on serial transmission. Apart from serialization, the terms ''bit endianness'' and ''bit-level endianness'' are seldom used, as computer architectures where each individual bit has a unique address are rare. Individual bits or
bit field A bit field is a data structure that consists of one or more adjacent bits which have been allocated for specific purposes, so that any single bit or group of bits within the structure can be set or inspected. A bit field is most commonly used to ...
s are accessed via their numerical value or, in high-level programming languages, assigned names, the effects of which, however, may be machine dependent or lack
software portability A computer program is said to be portable if there is very low effort required to make it run on different platforms. The pre-requirement for portability is the generalized abstraction between the application logic and system interfaces. When ...
.


Notes


References

{{Reflist Computer memory Data transmission Metaphors Software wars