Asymmetric numeral systems (ANS)J. Duda, K. Tahboub, N. J. Gadil, E. J. Delp
''The use of asymmetric numeral systems as an accurate replacement for Huffman coding''
Picture Coding Symposium, 2015.J. Duda
''Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding''
arXiv:1311.2540, 2013. is a family of

entropy encoding In information theory, an entropy coding (or entropy encoding) is any lossless data compression method that attempts to approach the lower bound declared by Shannon's source coding theorem, which states that any lossless data compression method ...

methods introduced by Jarosław (Jarek) Duda from

Jagiellonian University The Jagiellonian University (, UJ) is a public research university in Kraków, Poland. Founded in 1364 by Casimir III the Great, King Casimir III the Great, it is the oldest university in Poland and one of the List of oldest universities in con ...

, used in

data compression In information theory, data compression, source coding, or bit-rate reduction is the process of encoding information using fewer bits than the original representation. Any particular compression is either lossy or lossless. Lossless compressi ...

since 2014 due to improved performance compared to previous methods. ANS combines the compression ratio of

arithmetic coding Arithmetic coding (AC) is a form of entropy encoding used in lossless data compression. Normally, a String (computer science), string of characters is represented using a fixed number of bits per character, as in the American Standard Code for In ...

(which uses a nearly accurate

probability distribution In probability theory and statistics, a probability distribution is a Function (mathematics), function that gives the probabilities of occurrence of possible events for an Experiment (probability theory), experiment. It is a mathematical descri ...

), with a processing cost similar to that of

Huffman coding In computer science and information theory, a Huffman code is a particular type of optimal prefix code that is commonly used for lossless data compression. The process of finding or using such a code is Huffman coding, an algorithm developed by ...

. In the tabled ANS (tANS) variant, this is achieved by constructing a

finite-state machine A finite-state machine (FSM) or finite-state automaton (FSA, plural: ''automata''), finite automaton, or simply a state machine, is a mathematical model of computation. It is an abstract machine that can be in exactly one of a finite number o ...

to operate on a large alphabet without using multiplication. Among others, ANS is used in the

Facebook Facebook is a social media and social networking service owned by the American technology conglomerate Meta Platforms, Meta. Created in 2004 by Mark Zuckerberg with four other Harvard College students and roommates, Eduardo Saverin, Andre ...

Zstandard compressor''Smaller and faster data compression with Zstandard''
Facebook, August 2016.''5 ways Facebook improved compression at scale with Zstandard''
Facebook, December 2018. (also used e.g. in

Linux Linux ( ) is a family of open source Unix-like operating systems based on the Linux kernel, an kernel (operating system), operating system kernel first released on September 17, 1991, by Linus Torvalds. Linux is typically package manager, pac ...

kernel,''Zstd Compression For Btrfs & Squashfs Set For Linux 4.14, Already Used Within Facebook''
Phoronix, September 2017.

Google Chrome Google Chrome is a web browser developed by Google. It was first released in 2008 for Microsoft Windows, built with free software components from Apple WebKit and Mozilla Firefox. Versions were later released for Linux, macOS, iOS, iPadOS, an ...

browser,''New in Chrome 123 (Content-Encoding)''
Google, March 2024. Android operating system, was published as RFC 8478 for

MIME A mime artist, or simply mime (from Greek language, Greek , , "imitator, actor"), is a person who uses ''mime'' (also called ''pantomime'' outside of Britain), the acting out of a story through body motions without the use of speech, as a the ...

''Zstandard Compression and The application/zstd Media Type (email standard)''
and

HTTP HTTP (Hypertext Transfer Protocol) is an application layer protocol in the Internet protocol suite model for distributed, collaborative, hypermedia information systems. HTTP is the foundation of data communication for the World Wide Web, wher ...

''Hypertext Transfer Protocol (HTTP) Parameters''

IANA The Internet Assigned Numbers Authority (IANA) is a standards organization that oversees global IP address allocation, autonomous system number allocation, root zone management in the Domain Name System (DNS), media types, and other Internet P ...

.),

Apple An apple is a round, edible fruit produced by an apple tree (''Malus'' spp.). Fruit trees of the orchard or domestic apple (''Malus domestica''), the most widely grown in the genus, are agriculture, cultivated worldwide. The tree originated ...

LZFSE compressor,''Apple Open-Sources its New Compression Algorithm LZFSE''
InfoQ, July 2016.

Google Google LLC (, ) is an American multinational corporation and technology company focusing on online advertising, search engine technology, cloud computing, computer software, quantum computing, e-commerce, consumer electronics, and artificial ...

Draco 3D compressor''Google Draco 3D compression library''
(used e.g. in

Pixar Pixar (), doing business as Pixar Animation Studios, is an American animation studio based in Emeryville, California, known for its critically and commercially successful computer-animated feature films. Pixar is a subsidiary of Walt Disney ...

Universal Scene Description format''Google and Pixar add Draco Compression to Universal Scene Description (USD) Format ''
) and PIK image compressor,''Google PIK: new lossy image format for the internet''
CRAM DNA compressor''CRAM format specification (version 3.0)''
from SAMtools utilities,

NVIDIA Nvidia Corporation ( ) is an American multinational corporation and technology company headquartered in Santa Clara, California, and incorporated in Delaware. Founded in 1993 by Jensen Huang (president and CEO), Chris Malachowsky, and Curti ...

nvCOMP high speed compression library,''High Speed Data Compression Using NVIDIA GPUs''

Dropbox Dropbox is a file hosting service operated by the American company Dropbox, Inc., headquartered in San Francisco, California, that offers cloud storage, file synchronization, personal cloud, and Client (computing), client software. Dropbox w ...

DivANS compressor,''Building better compression together with DivANS''

Microsoft Microsoft Corporation is an American multinational corporation and technology company, technology conglomerate headquartered in Redmond, Washington. Founded in 1975, the company became influential in the History of personal computers#The ear ...

DirectStorage BCPack texture compressor,''Microsoft DirectStorage overview''
and

JPEG XL The JPEG XL Image Coding System is a royalty-free open standard for a image compression, compressed Raster graphics, raster image format. It defines a graphics file format and the abstract device for coding JPEG XL bitstreams. It is developed by t ...

image compressor. The basic idea is to encode information into a single natural number

x

. In the standard binary number system, we can add a bit

s \in \

of information to

x

by appending

s

at the end of

x

, which gives us

x' = 2x + s

. For an entropy coder, this is optimal if

\Pr(0) = \Pr(1) = 1/2

. ANS generalizes this process for arbitrary sets of symbols

s \in S

with an accompanying probability distribution

(p_s)_

. In ANS, if the information from

s

is appended to

x

to result in

x'

, then

x' \approx x \cdot p_s^

. Equivalently,

\log_2(x') \approx \log_2(x) + \log_2(1/p_s)

, where

\log_2(x)

is the number of bits of information stored in the number

x

, and

\log_2(1/p_s)

is the number of bits contained in the symbol

s

. For the encoding rule, the set of natural numbers is split into disjoint subsets corresponding to different symbols like into even and odd numbers, but with densities corresponding to the probability distribution of the symbols to encode. Then to add information from symbol

s

into the information already stored in the current number

x

, we go to number

x' = C(x, s) \approx x/p

being the position of the

x

-th appearance from the

s

-th subset. There are alternative ways to apply it in practice direct mathematical formulas for encoding and decoding steps (uABS and rANS variants), or one can put the entire behavior into a table (tANS variant). Renormalization is used to prevent

x

going to infinity transferring accumulated bits to or from the bitstream.

Entropy coding

Suppose a sequence of 1,000 zeros and ones would be encoded, which would take 1000 bits to store directly. However, if it is somehow known that it only contains 1 zero and 999 ones, it would be sufficient to encode the zero's position, which requires only

\lceil \log_2(1000)\rceil \approx 10

bits here instead of the original 1000 bits. Generally, such sequences of length

n

containing

pn

zeros and

(1-p)n

ones, for some probability

p\in(0,1)

, are called

combinations In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter (unlike permutations). For example, given three fruits, say an apple, an orange and a pear, there are t ...

. Using

Stirling's approximation In mathematics, Stirling's approximation (or Stirling's formula) is an asymptotic approximation for factorials. It is a good approximation, leading to accurate results even for small values of n. It is named after James Stirling, though a related ...

we get their asymptotic number being :

\approx 2^ \text n \text h(p) =-p \log_2(p)-(1-p)\log_2(1-p),

called Shannon entropy. Hence, to choose one such sequence we need approximately

n h(p)

bits. It is still

n

bits if

p=1/2

, however, it can also be much smaller. For example, we need only

\approx n/2

bits for

p = 0.11

. An entropy coder allows the encoding of a sequence of symbols using approximately the Shannon entropy bits per symbol. For example, ANS could be directly used to enumerate combinations: assign a different natural number to every sequence of symbols having fixed proportions in a nearly optimal way. In contrast to encoding combinations, this probability distribution usually varies in data compressors. For this purpose, Shannon entropy can be seen as a weighted average: a symbol of probability

p

contains

\log_2(1/p)

bits of information. ANS encodes information into a single natural number

x

, interpreted as containing

\log_2(x)

bits of information. Adding information from a symbol of probability

p

increases this informational content to

\log_2(x)+\log_2(1/p)=\log_2(x/p)

. Hence, the new number containing both information should be

x'\approx x/p

Motivating examples

Consider a source with 3 letters A, B, C, with probability 1/2, 1/4, 1/4. It is simple to construct the optimal prefix code in binary: A = 0, B = 10, C = 11. Then, a message is encoded as ABC -> 01011. We see that an equivalent method for performing the encoding is as follows: * Start with number 1, and perform an operation on the number for each input letter. * A = multiply by 2; B = multiply by 4, add 2; C = multiply by 4, add 3. * Express the number in binary, then remove the first digit 1. Consider a more general source with k letters, with rational probabilities

n_1/N, ..., n_k/N

. Then performing

on the source requires only exact arithmetic with integers. In general, ANS is an approximation of arithmetic coding that approximates the real probabilities

r_1, ..., r_k

by rational numbers

n_1/N, ..., n_k/N

with a small denominator

N

Basic concepts of ANS

Imagine there is some information stored in a natural number

x

, for example as bit sequence of its binary expansion. To add information from a binary variable

s

, we can use coding function

x'=C(x,s) = 2x + s

, which shifts all bits one position up, and place the new bit in the least significant position. Now decoding function

D(x')=(\lfloor x'/2 \rfloor, \mathrm(x',2))

allows one to retrieve the previous

x

and this added bit:

D(C(x,s))=(x,s),\ C(D(x'))=x'

. We can start with

x=1

initial state, then use the

C

function on the successive bits of a finite bit sequence to obtain a final

x

number storing this entire sequence. Then using

D

function multiple times until

x=1

allows one to retrieve the bit sequence in reversed order. The above procedure is optimal for the uniform (symmetric) probability distribution of symbols

\Pr(0)=\Pr(1)=1/2

. ANS generalize it to make it optimal for any chosen (asymmetric) probability distribution of symbols:

\Pr(s)=p_s

. While

s

in the above example was choosing between even and odd

C(x,s)

, in ANS this even/odd division of natural numbers is replaced with division into subsets having densities corresponding to the assumed probability distribution

\_s

: up to position

x

, there are approximately

x p_s

occurrences of symbol

s

. The coding function

C(x,s)

returns the

x

-th appearance from such subset corresponding to symbol

s

. The density assumption is equivalent to condition

x'=C(x,s) \approx x / p_s

. Assuming that a natural number

x

contains

\log_2(x)

bits of information,

\log_2(C(x,s)) \approx \log_2(x) + \log_2(1/ p_s)

. Hence the symbol of probability

p_s

is encoded as containing

\approx\log_2(1/ p_s)

bits of information as it is required from entropy coders.

Variants

Uniform binary variant (uABS)

Let us start with the binary alphabet and a probability distribution

\Pr(1) = p

\Pr(0)=1-p

. Up to position

x

we want approximately

p\cdot x

analogues of odd numbers (for

s = 1

). We can choose this number of appearances as

\lceil x\cdot p \rceil

, getting

s = \lceil (x+1)\cdot p \rceil - \lceil x\cdot p \rceil

. This variant is called ''uABS'' and leads to the following decoding and encoding functions:Data Compression Explained
Matt Mahoney Decoding: s = ceil((x+1)*p) - ceil(x*p) // 0 if fract(x*p) < 1-p, else 1 if s = 0 then new_x = x - ceil(x*p) // D(x) = (new_x, 0), this is the same as new_x = floor(x*(1-p)) if s = 1 then new_x = ceil(x*p) // D(x) = (new_x, 1) Encoding: if s = 0 then new_x = ceil((x+1)/(1-p)) - 1 // C(x,0) = new_x if s = 1 then new_x = floor(x/p) // C(x,1) = new_x For

p=1/2

it amounts to the standard binary system (with 0 and 1 inverted), for a different

p

it becomes optimal for this given probability distribution. For example, for

p=0.3

these formulas lead to a table for small values of

x

: The symbol

s=1

corresponds to a subset of natural numbers with density

p=0.3

, which in this case are positions

\

. As

1/4< 0.3 < 1/3

, these positions increase by 3 or 4. Because

p=3/10

here, the pattern of symbols repeats every 10 positions. The coding

C(x,s)

can be found by taking the row corresponding to a given symbol

s

, and choosing the given

x

in this row. Then the top row provides

C(x,s)

. For example,

C(7,0)=11

from the middle to the top row. Imagine we would like to encode the sequence '0100' starting from

x=1

. First

s=0

takes us to

x=2

, then

s=1

x=6

, then

s=0

x=9

, then

s=0

x=14

. By using the decoding function

D(x')

on this final

x

, we can retrieve the symbol sequence. Using the table for this purpose,

x

in the first row determines the column, then the non-empty row and the written value determine the corresponding

s

and

x

Range variants (rANS) and streaming

The range variant also uses arithmetic formulas, but allows operation on a large alphabet. Intuitively, it divides the set of natural numbers into size

2^n

ranges, and split each of them in identical way into subranges of proportions given by the assumed probability distribution. We start with quantization of probability distribution to

2^n

denominator, where is chosen (usually 8-12 bits):

p_s \approx f 2^n

for some natural

f /math> numbers (sizes of subranges).

Denote \text = 2^n-1, and a cumulative distribution function:

: \operatorname = \sum_ f =f \cdots +f -1 Note here that the CDF /syntaxhighlight> function is not a true CDF in that the current symbol's probability is not included in the expression's value. Instead, the CDF /syntaxhighlight> represents the total probability of all previous symbols. Example: Instead of the normal definition of CDF = f /syntaxhighlight>, it is evaluated as CDF = 0, since there are no previous symbols.

For y\in,2^n-1 /math> denote function (usually tabled) symbol(y) = s  such that  CDF <= y < CDF +1 Now coding function is: C(x,s) = (floor(x / f << n) + (x % f + CDF Decoding: s = symbol(x & mask) D(x) = (f * (x >> n) + (x & mask ) - CDF s) This way we can encode a sequence of symbols into a large natural number . To avoid using large number arithmetic, in practice stream variants are used: which enforce x\in, b \cdot L-1 by renormalization: sending the least significant bits of  to or from the bitstream (usually  and  are powers of 2).

In rANS variant  is for example 32 bit. For 16 bit renormalization, x\in^,2^-1 /math>, decoder refills the least significant bits from the bitstream when needed: if(x < (1 << 16)) x = (x << 16) + read16bits()

Tabled variant (tANS)

tANS variant puts the entire behavior (including renormalization) for

x\in,2L-1 /math> into a table which yields a

avoiding the need of multiplication. Finally, the step of the decoding loop can be written as: t = decodingTable(x); x = t.newX + readBits(t.nbBits); //state transition writeSymbol(t.symbol); //decoded symbol The step of the encoding loop: s = ReadSymbol(); nbBits = (x + ns >> r; // # of bits for renormalization writeBits(x, nbBits); // send the least significant bits to bitstream x = encodingTable + (x >> nbBits)">tart + (x >> nbBits) A specific tANS coding is determined by assigning a symbol to every

,2L-1 /math> position, their number of appearances should be proportional to the assumed probabilities. For example, one could choose "abdacdac" assignment for Pr(a)=3/8, Pr(b)=1/8, Pr(c)=2/8, Pr(d)=2/8 probability distribution. If symbols are assigned in ranges of lengths being powers of 2, we would get

. For example, a->0, b->100, c->101, d->11 prefix code would be obtained for tANS with "aaaabcdd" symbol assignment.

Remarks

As for Huffman coding, modifying the probability distribution of tANS is relatively costly, hence it is mainly used in static situations, usually with some Lempel–Ziv scheme (e.g. ZSTD, LZFSE). In this case, the file is divided into blocks - for each of them symbol frequencies are independently counted, then after approximation (quantization) written in the block header and used as static probability distribution for tANS. In contrast, rANS is usually used as a faster replacement for range coding (e.g. CRAM, LZNA, Draco,). It requires multiplication, but is more memory efficient and is appropriate for dynamically adapting probability distributions. Encoding and decoding of ANS are performed in opposite directions, making it a

stack Stack may refer to: Places * Stack Island, an island game reserve in Bass Strait, south-eastern Australia, in Tasmania’s Hunter Island Group * Blue Stack Mountains, in Co. Donegal, Ireland People * Stack (surname) (including a list of people ...

for symbols. This inconvenience is usually resolved by encoding in backward direction, after which decoding can be done forward. For context-dependence, like

Markov model In probability theory, a Markov model is a stochastic model used to Mathematical model, model pseudo-randomly changing systems. It is assumed that future states depend only on the current state, not on the events that occurred before it (that is, ...

, the encoder needs to use context from the perspective of later decoding. For adaptivity, the encoder should first go forward to find probabilities which will be used (predicted) by decoder and store them in a buffer, then encode in backward direction using the buffered probabilities. The final state of encoding is required to start decoding, hence it needs to be stored in the compressed file. This cost can be compensated by storing some information in the initial state of encoder. For example, instead of starting with "10000" state, start with "1****" state, where "*" are some additional stored bits, which can be retrieved at the end of the decoding. Alternatively, this state can be used as a checksum by starting encoding with a fixed state, and testing if the final state of decoding is the expected one.

Patent controversy

The author of the novel ANS algorithm and its variants tANS and rANS specifically intended his work to be available freely in the public domain, for altruistic reasons. He has not sought to profit from them and took steps to ensure they would not become a "legal minefield", or restricted by, or profited from by others. In 2015, Google published a US and then worldwide patent for "Mixed boolean-token ans coefficient coding". At the time, Professor Duda had been asked by Google to help it with video compression, so was intimately aware of this domain, having the original author assisting them. Duda was not pleased by (accidentally) discovering Google's patent intentions, given he had been clear he wanted it as public domain, and had assisted Google specifically on that basis. Duda subsequently filed a third-party application to the US Patent office seeking a rejection. The USPTO rejected its application in 2018, and Google subsequently abandoned the patent. In June 2019 Microsoft lodged a patent application called "Features of range asymmetric number system encoding and decoding". The USPTO issued a final rejection of the application on October 27, 2020. Yet on March 2, 2021, Microsoft gave a USPTO explanatory filing stating "The Applicant respectfully disagrees with the rejections.", seeking to overturn the final rejection under the "After Final Consideration Pilot 2.0" program. After reconsideration, the USPTO granted the application on January 25, 2022.

References

External links

High throughput hardware architectures for asymmetric numeral systems entropy coding
S. M. Najmabadi, Z. Wang, Y. Baroud, S. Simon, ISPA 2015
New Generation Entropy coders
Finite state entropy (FSE) implementation of tANS by Yann Collet
rygorous/ryg_rans
Implementation of rANS by Fabian Giesen
jkbonfield/rans_static
Fast implementation of rANS and arithmetic coding by James K. Bonfield
facebook/zstd

Zstandard compressor by Yann Collet (author of LZ4)
LZFSE
LZFSE compressor (LZ+FSE) of

Apple Inc. Apple Inc. is an American multinational corporation and technology company headquartered in Cupertino, California, in Silicon Valley. It is best known for its consumer electronics, software, and services. Founded in 1976 as Apple Comput ...

CRAM 3.0 DNA compressor (order 1 rANS)
(part of SAMtools) by European Bioinformatics Institute

implementation for Google VP10

implementation for Google WebP

Googl

aom_dsp - aom - Git at Google
implementation of Alliance for Open Media
Data Compression Using Asymmetric Numeral Systems - Wolfram Demonstrations Project
Wolfram Demonstrations Project
GST: GPU-decodable Supercompressed Textures
GST: GPU-decodable Supercompressed Textures
Understanding compression
book by A. Haecky, C. McAnlis {{Compression Methods Lossless compression algorithms Finite-state machines Non-standard positional numeral systems Data compression

Entropy coding

Motivating examples

Basic concepts of ANS

Variants

Uniform binary variant (uABS)

Range variants (rANS) and streaming

Tabled variant (tANS)

Remarks

Patent controversy

See also

References

External links