LEB128
   HOME

TheInfoList



OR:

LEB128 or Little Endian Base 128 is a
variable-length code In coding theory, a variable-length code is a code which maps source symbols to a ''variable'' number of bits. The equivalent concept in computer science is '' bit string''. Variable-length codes can allow sources to be compressed and decompr ...
compression used to store arbitrarily large integers in a small number of bytes. LEB128 is used in the
DWARF Dwarf, dwarfs or dwarves may refer to: Common uses *Dwarf (folklore), a supernatural being from Germanic folklore * Dwarf, a human or animal with dwarfism Arts, entertainment, and media Fictional entities * Dwarf (''Dungeons & Dragons''), a sh ...
debug file format and the
WebAssembly WebAssembly (Wasm) defines a portable binary-code format and a corresponding text format for executable programs as well as software interfaces for facilitating communication between such programs and their host environment. The main goal of ...
binary encoding for all integer literals.


Encoding format

LEB128 format is very similar to
variable-length quantity A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eight- bit bytes) to represent an arbitrarily large integer. A VLQ is essentially a base-128 representation of an unsigned integer with the add ...
(VLQ) format; the primary difference is that LEB128 is
little-endian '' Jonathan_Swift.html" ;"title="Gulliver's Travels'' by Jonathan Swift">Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined In computing, endianness is the order in which bytes within a word (data type), word of d ...
whereas variable-length quantities are
big-endian '' Jonathan_Swift.html" ;"title="Gulliver's Travels'' by Jonathan Swift">Gulliver's Travels'' by Jonathan Swift, the novel from which the term was coined In computing, endianness is the order in which bytes within a word (data type), word of d ...
. Both allow small numbers to be stored in a single byte, while also allowing encoding of arbitrarily long numbers. There are two versions of LEB128: unsigned LEB128 and signed LEB128. The decoder must know whether the encoded value is unsigned LEB128 or signed LEB128.


Unsigned LEB128

To encode an unsigned number using unsigned LEB128 (ULEB128) first represent the number in binary. Then zero extend the number up to a multiple of 7 bits (such that if the number is non-zero, the most significant 7 bits are not all 0). Break the number up into groups of 7 bits. Output one encoded byte for each 7 bit group, from least significant to most significant group. Each byte will have the group in its 7 least significant bits. Set the most significant bit on each byte except the last byte. The number zero is usually encoded as a single byte 0x00. WebAssembly allows alternate encodings of zero (0x80 0x00, 0x80 0x80 0x00, ...). As an example, here is how the unsigned number 624485 gets encoded: MSB ------------------ LSB 10011000011101100101 In raw binary 010011000011101100101 Padded to a multiple of 7 bits 0100110 0001110 1100101 Split into 7-bit groups 00100110 10001110 11100101 Add high 1 bits on all but last (most significant) group to form bytes 0x26 0x8E 0xE5 In hexadecimal → 0xE5 0x8E 0x26 Output stream (LSB to MSB) Unsigned LEB128 and VLQ (
variable-length quantity A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eight- bit bytes) to represent an arbitrarily large integer. A VLQ is essentially a base-128 representation of an unsigned integer with the add ...
) both compress any given integer into not only the same number of bits, but exactly the same bits—the two formats differ only in exactly how those bits are arranged.


Signed LEB128

A signed number is represented similarly: Starting with an N-bit
two's complement Two's complement is the most common method of representing signed (positive, negative, and zero) integers on computers, and more generally, fixed point binary values. Two's complement uses the binary digit with the ''greatest'' value as the ''s ...
representation, where N is a multiple of 7, the number is broken into groups as for the unsigned encoding. For example, the signed number -123456 is encoded as 0xC0 0xBB 0x78: MSB ------------------ LSB 11110001001000000 Binary encoding of 123456 000011110001001000000 As a 21-bit number 111100001110110111111 Negating all bits (
ones' complement The ones' complement of a binary number is the value obtained by inverting (flipping) all the bits in the Binary number, binary representation of the number. The name "ones' complement" refers to the fact that such an inverted value, if added t ...
) 111100001110111000000 Adding one (two's complement) 1111000 0111011 1000000 Split into 7-bit groups 01111000 10111011 11000000 Add high 1 bits on all but last (most significant) group to form bytes 0x78 0xBB 0xC0 In hexadecimal → 0xC0 0xBB 0x78 Output stream (LSB to MSB)


Fast decoding

A straightforward scalar implementation of LEB128 decoding is fairly slow, even more so on modern hardware where branch misprediction is relatively expensive. A series of papers presents SIMD techniques for accelerating decoding (it is called VByte in these papers, but is another name for the same encoding). The "Vectorized VByte Decoding" paper presented "Masked VByte", which demonstrated speeds of 650–2700 million integers per second on commodity Haswell hardware, depending on encoding density. A followup paper presented a variant encoding, "Stream VByte: Faster Byte Oriented Integer Compression", which increased speeds to over 4 billion integers per second. This stream encoding separates the control stream from the encoded data, so is not binary compatible with LEB128.


C-like pseudocode


Encode unsigned integer

do while (value != 0);


Encode signed integer

more = 1; negative = (value < 0); /* the size in bits of the variable value, e.g., 64 if value's type is int64_t */ size = no. of bits in signed integer; while (more)


Decode unsigned integer

result = 0; shift = 0; while (true)


Decode signed integer

result = 0; shift = 0; /* the size in bits of the result variable, e.g., 64 if result's type is int64_t */ size = number of bits in signed integer; do while (high-order bit of byte != 0); /* sign bit of byte is second high-order bit (0x40) */ if ((shift


JavaScript code


Encode signed 32-bit integer

const encodeSignedLeb128FromInt32 = (value) => ;


Decode signed 32-bit integer

const decodeSignedLeb128 = (input) => ;


Uses

*The Android project uses LEB128 in its Dalvik Executable Format (.dex) file format. *Compressing tables in Hewlett-Packard IA-64 exception handling. *The
DWARF Dwarf, dwarfs or dwarves may refer to: Common uses *Dwarf (folklore), a supernatural being from Germanic folklore * Dwarf, a human or animal with dwarfism Arts, entertainment, and media Fictional entities * Dwarf (''Dungeons & Dragons''), a sh ...
file format uses both unsigned and signed LEB128 encoding for various fields. *
LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
, in its Coverage Mapping Format LLVM's implementation of LEB128 encoding and decoding is useful alongside the pseudocode above. *
.NET The .NET platform (pronounced as "''dot net"'') is a free and open-source, managed code, managed computer software framework for Microsoft Windows, Windows, Linux, and macOS operating systems. The project is mainly developed by Microsoft emplo ...
supports a "7-bit encoded int" format in the BinaryReader and BinaryWriter classes. When writing a string to a BinaryWriter, the string length is encoded with this method. * ''
Minecraft ''Minecraft'' is a 2011 sandbox game developed and published by the Swedish video game developer Mojang Studios. Originally created by Markus Persson, Markus "Notch" Persson using the Java (programming language), Java programming language, the ...
'' uses LEB128 in its protocol for measuring lengths of data within packets. *The mpatrol debugging tool uses LEB128 in its tracing file format. *''
osu! ''Osu!'' (stylized as ''osu!'') is a freeware rhythm game originally created and self-published by Australian developer Dean Herbert. It was released for Microsoft Windows on 16 September 2007, with later ports to macOS, Linux, Android (oper ...
'' uses LEB128 in its osu! replay (.osr) format. * W3C Efficient XML Interchange (EXI) represents unsigned integers using LEB128, in exactly the same way as described here. *
WebAssembly WebAssembly (Wasm) defines a portable binary-code format and a corresponding text format for executable programs as well as software interfaces for facilitating communication between such programs and their host environment. The main goal of ...
, in its portable binary encoding of the modules *In the xz file format


Related encodings


Dlugosz' Variable-Length Integer Encoding
uses multiples of 7 bits for the first three size breaks, but after that the increments vary. It also puts all the prefix bits at the beginning of the word, instead of at the beginning of each byte. *
Human interface device A human interface device (HID) is a type of computer device usually used by humans that takes input from or provides output to humans. The term "HID" most commonly refers to the USB HID specification. The term was coined by Mike Van Flandern ...
report descriptor bytes use a byte-count bitfield of 2 bits to encode the size of the following integer of zero, one, two, or four bytes, always little endian. Signedness, i.e. whether to expand the shortened integer with sign or not, depends on the descriptor type. * The
LLVM LLVM, also called LLVM Core, is a target-independent optimizer and code generator. It can be used to develop a Compiler#Front end, frontend for any programming language and a Compiler#Back end, backend for any instruction set architecture. LLVM i ...
bitcode file format uses a similar technique except that the value is broken into groups of bits of context-dependent size, with the highest bit indicating a continuation, instead of a fixed 7 bits. *
Protocol Buffers Protocol Buffers (Protobuf) is a free and open-source cross-platform data format used to serialize structured data. It is useful in developing programs that communicate with each other over a network or for storing data. The method involves an ...
(Protobuf) uses the same encoding for unsigned integers, but encode signed integers by prepending the sign as the least significant bit of the first byte.
ASN.1 BER, DER
Encode values of each ASN.1 type as a string of eight-bit octets


References

{{Reflist


See also

* The DWARF debugging file format * UTF-7 Lossless compression algorithms