The evolution of entropy coding: from Huffman coding to ANS

May 1, 2026 - By Stefano Tommesani

When a compression format becomes well known, the entropy coder is rarely the part people remember: ZIP is usually described as “LZ77 plus Huffman”, but the LZ77 part gets most of the attention because it is easy to explain: find a repeated string and replace it with a reference to an earlier copy. JPEG is usually explained through DCT blocks and quantization tables. Video codecs are explained through motion compensation, prediction modes, transforms, and residuals.

The entropy coder sits after all of that. By the time it runs, the compressor has already made most of the interesting modelling decisions. It has decided that a run of bytes should become a length/distance pair, that an image block should become a set of coefficients, or that a prediction error should be represented as a small signed value. What remains is a more mechanical but very important problem: the compressor has a stream of symbols, some more likely than others, and it needs to write them using as few bits as possible.

This is where Huffman coding, arithmetic coding, and ANS belong. They are not alternatives to LZ77, image prediction, or transform coding. They are the final stage that turns the symbols produced by those models into a bitstream.

That distinction is worth making because it avoids a common misunderstanding: a weak model will not become good simply because it uses a sophisticated entropy coder. If the symbols are close to uniformly distributed, there is not much for any entropy coder to exploit. On the other hand, once a model has produced a skewed distribution of symbols, the choice of entropy coder determines how much of that statistical advantage is actually captured in the file.

Huffman coding was a very good answer for a long time because it is simple, fast, and easy to decode. Arithmetic coding improved compression by getting around the whole-bit limitation of prefix codes, but it came with more complicated arithmetic and less attractive implementation properties. ANS is interesting because it keeps much of the compression density of arithmetic coding while moving the implementation closer to the kind of table-driven integer work that modern CPUs handle well.

The evolution is not a clean sequence where each algorithm makes the previous one obsolete. Huffman coding is still used because it is often the right engineering choice. Arithmetic coding is still used where adaptive probability models and compression ratio matter more than decoder simplicity. ANS has become important because it occupies a useful middle ground: high compression density, fast decoding, and good fit for block-based compressors.

The rest of this article looks at that progression from the point of view of someone choosing an entropy coder.

The job of the entropy coder

A compressor usually has two quite different layers. The first layer tries to make the data predictable. The second layer exploits that predictability when writing the final stream. For example, an LZ77 compressor transforms repeated text into literals and references to previous data. After that stage, the original bytes are no longer the only thing being encoded, as the compressor now has different classes of symbols: literal values, match lengths, match distances, and sometimes extra bits attached to those symbols.

An image codec goes through a different path, but ends up with the same kind of problem. Prediction and transforms produce residuals or coefficients, and many of them are zero or small, while some are more likely than others. The entropy coder does not need to know why those symbols have that distribution; it only needs to encode them efficiently.

This separation is useful because it explains why entropy coders can be reused across very different domains. Huffman coding appears in DEFLATE and baseline JPEG. Arithmetic coding appears in image and video codecs. ANS appears in modern compressors and image formats. The data being compressed is different, but the final statistical problem is similar.

Shannon’s limit

The theoretical target comes from information theory. If a symbol has probability p, its ideal information content is:

-log2(p)

This gives the familiar values when probabilities are powers of two.

Probability	Ideal cost
1/2	1 bit
1/4	2 bits
1/8	3 bits
1/16	4 bits

Those cases are convenient. They fit naturally into binary code lengths. But real distributions are rarely that tidy: e.g. a symbol with probability 0.30 has an ideal cost of:

-log2(0.30) = 1.737 bits

That number is the source of much of the complexity in entropy coding. A bitstream is made of whole bits, but the ideal cost of a symbol is often fractional. No encoder can literally write 1.737 bits for one symbol in isolation. The way around this is to make the average work over a longer sequence. If a symbol should cost 1.737 bits on average, the coder needs a representation where some combination of many symbols approaches that cost, even though the physical output is still a sequence of zeros and ones. Huffman coding, arithmetic coding, and ANS differ mainly in how they deal with that mismatch between fractional information and whole bits.

Huffman coding

Huffman coding is one of the few compression techniques that can be explained quickly without hiding too much. Given a set of symbols and their frequencies, Huffman coding builds a binary tree. Common symbols are placed closer to the root and receive shorter bit strings. Rare symbols are placed deeper and receive longer bit strings.

Take this frequency table:

Symbol	Frequency
A	45
B	13
C	12
D	16
E	9
F	5

The algorithm repeatedly combines the two least frequent entries:

F:5  + E:9  -> 14
C:12 + B:13 -> 25
14   + D:16 -> 30
25   + 30   -> 55
A:45 + 55   -> 100

The final tree might produce code lengths like these:

Symbol	Example code
A	`0`
C	`100`
B	`101`
F	`1100`
E	`1101`
D	`111`

The exact bit patterns are not the important part. Different implementations can choose left and right branches differently, especially when frequencies tie. What matters is the length assigned to each symbol.

Huffman coding is optimal among prefix codes: this does not mean it reaches Shannon’s theoretical limit! Instead it means that if every symbol must be represented by a whole number of bits, and if no codeword can be the prefix of another, then Huffman coding gives the best possible average length. This is a narrower claim than “optimal compression”, but it is still a very powerful one.

Why prefix codes are such a good engineering fit

A prefix code is easy to decode because no codeword can be the beginning of another codeword.

For example:

A = 0
B = 10
C = 110
D = 111

If the decoder sees 0, it can immediately emit A. It does not need to wait and see whether a longer code is coming, because no longer code starts with 0. This maps nicely to simple decoders. The most direct implementation walks a binary tree one bit at a time. Faster implementations use tables to decode several bits at once. Hardware implementations are also practical because the required operations are simple and predictable.

This matters more than it may seem. Compression formats live for decades if decoders are easy to implement and hard to get wrong. A slightly worse compression ratio can be a reasonable price for simple, fast, portable decoding. This is one of the reasons Huffman coding has lasted so long.

Canonical Huffman coding

Textbook explanations tend to show Huffman trees. File formats usually do something more practical: they store code lengths and reconstruct a canonical Huffman code. A canonical Huffman code assigns bit patterns deterministically from the sorted list of code lengths. If the encoder and decoder agree on the lengths, they do not need to store the shape of the tree.

This has several advantages. The header is smaller, the decoder can build lookup tables directly, and the bitstream specification becomes less ambiguous. It also avoids wasting space describing a tree that is only a means to derive code lengths.

DEFLATE is a good example. The format used by gzip, zlib, ZIP, and PNG combines LZ77-style matches with Huffman coding. In dynamic Huffman blocks, the stream describes the code lengths for literals, match lengths, and distances. The decoder reconstructs the canonical codes and then decodes the block.

This design is effective, because LZ77 removes repeated byte sequences. Huffman coding then represents the resulting symbols with a compact and fast decoder. The result is not state of the art by modern compression ratio standards, but it remains a very successful engineering compromise.

The whole-bit problem

The weakness of Huffman coding is that each symbol gets a code made of a whole number of bits. That works well when symbol probabilities are near powers of two. It works less well when they are not. A simple binary source shows the problem clearly:

Symbol	Probability
A	90%
B	10%

A one-symbol Huffman code has no useful choice:

A = 0
B = 1

Both symbols cost one bit. But the entropy of this source is much lower:

-(0.9 * log2(0.9) + 0.1 * log2(0.1)) ≈ 0.469 bits per symbol

Huffman coding spends one bit per symbol where the theoretical limit is less than half a bit. This is a consequence of assigning one complete codeword to each symbol.

There are ways to reduce the gap while still using Huffman coding. One option is to encode groups of symbols instead of individual symbols: if you encode pairs, triples, or larger blocks, the average cost per original symbol can become fractional. The problem is that the alphabet grows quickly. If the original alphabet has n symbols and groups have length k, there can be up to n^k grouped symbols, and then the codebook is too large or too sparse.

Arithmetic coding: encode the interval, not the symbol

Arithmetic coding takes a different route. Instead of assigning one codeword to each symbol, it assigns a representation to the whole message. It is easier to understand if we temporarily ignore implementation details.

Start with the interval:

[0, 1)

Split it according to the probabilities of the symbols:

Symbol	Probability	Interval
A	0.50	`[0.00, 0.50)`
B	0.30	`[0.50, 0.80)`
C	0.15	`[0.80, 0.95)`
D	0.05	`[0.95, 1.00)`

To encode the first symbol, choose its subinterval. To encode the next symbol, subdivide the current interval using the same probability model and choose again. Each symbol narrows the interval.

If the message is:

A C B

then the interval is first narrowed to the range for A [0.00, 0.50), then to the range for C inside that range, then to the range for B inside that smaller range. At the end, any number inside the final interval identifies the entire sequence. So arithmetic coding does not need to assign a whole-bit codeword to A, another whole-bit codeword to B, and so on. The sequence as a whole gets represented by a number. Fractional bit costs appear naturally in the average length of that representation.

For the 90% / 10% binary source, arithmetic coding can approach the entropy of about 0.469 bits per symbol over a long stream. As we have seen before, Huffman coding cannot do that when encoding one symbol at a time.

That is why arithmetic coding became important in domains where a few percent of compression ratio is worth extra complexity. Image and video codecs are the obvious examples: once a codec has invested heavily in prediction and transforms, wasting bits in the entropy coder becomes harder to justify.

Arithmetic coding does not use real numbers

The interval explanation is useful, but it is not how a production arithmetic coder works. A real implementation uses integers. It keeps a range, narrows that range according to cumulative frequencies, and emits bits or bytes when enough of the range becomes known. The core update looks roughly like this:

range = high - low

new_low  = low + range * cumulative_low  / total
new_high = low + range * cumulative_high / total

The decoder performs the corresponding inverse operation to determine which symbol contains the current encoded value. The hard parts are around the edges: finite precision, renormalization, underflow, carry handling, and exact agreement between encoder and decoder. These details are manageable, but they are less forgiving than canonical Huffman coding.

Range coding is a closely related approach. Instead of thinking in terms of two explicit interval endpoints, range coders often keep a base and a range. The conceptual model is still the same: narrow a numerical range according to symbol probabilities, and emit output as the range becomes determined. From the point of view of engineering trade-offs, arithmetic coding and range coding belong in the same family. They solve the fractional-bit problem well, but they ask more from the implementation.

Why arithmetic coding did not replace Huffman coding everywhere

If arithmetic coding is more efficient, thern why Huffman coding survived? Because compression ratio is only one requirement. Huffman decoders are small, fast, and relatively easy to make interoperable. Arithmetic coders are more stateful. They involve more arithmetic per symbol and more subtle edge cases.

There was also the patent history. Arithmetic coding was affected for years by patent concerns, which made some format designers cautious. That did not prevent arithmetic coding from being used, but it made Huffman coding a safer default for widely deployed formats.

This helps explain the split that emerged. General-purpose and compatibility-sensitive formats often stayed with Huffman coding. High-efficiency media codecs were more willing to pay for arithmetic coding or context-adaptive variants because the compression gains directly translated into lower bandwidth or better quality at the same bitrate.

JPEG is a good example of the tension. Baseline JPEG uses Huffman coding and became universally supported. Arithmetic-coded JPEG exists, but the Huffman-coded baseline is what most software and hardware support.

Video went further toward arithmetic-style coding because the payoff was larger. H.264/AVC and HEVC use CABAC in their higher-efficiency profiles. AV1 uses an arithmetic/range-style entropy coder. In those codecs, the entropy coder is tightly coupled with context models and syntax probabilities, and the extra complexity pays for itself.

ANS: a different compromise

By the time ANS became practical, the old compromise was familiar. Huffman coding was fast and simple, but achieving lesser compression ratios. Arithmetic and range coding were more efficient, but more complex and often harder to optimize for speed.

Asymmetric Numeral Systems, introduced by Jarek Duda, offered a different point in the design space. The short version is often stated as:

compression density close to arithmetic coding, with speed closer to Huffman coding.

Huffman coding represents symbols as prefix codewords. Arithmetic coding represents the message as a narrowing interval. ANS represents the message as an evolving integer state.

The encoder maintains a number: encoding a symbol transforms that number into another number. Common symbols grow the state less, while rare symbols grow it more. The average growth in the number of bits needed to represent the state corresponds to the information content of the symbol. When the state grows too large, the encoder emits some low bits and shifts the state down. The decoder later reads those bits back when the state becomes too small.

A mental model for ANS

Suppose the encoder has an integer state x. Encoding a symbol produces a new state:

x' = encode(x, symbol)

The transformation is designed so that the decoder can recover both the symbol and the previous state from x'.

(symbol, x) = decode(x')

As symbols are encoded, the state tends to grow, and the amount of growth depends on symbol probability. A common symbol should add fewer bits to the state, while a rare symbol should add more. Because the state cannot grow without bound, the coder renormalizes it. Encoding looks roughly like this:

encode symbol:
    while state is too large for this symbol:
        output low bits
        shift state down

    update state using symbol frequency

Decoding does the inverse:

decode symbol:
    recover symbol from state
    update state to previous state

    while state is too small:
        read bits
        shift state up

The exact order depends on the ANS variant, but the idea is the same. The output bits are pieces of the numerical state, not independent prefix codewords. This is why ANS can achieve fractional average costs without using an arithmetic-coding interval.

rANS and tANS

Most practical implementations of ANS eventually split into two families.

rANS, or range ANS, uses arithmetic formulas with symbol frequencies and cumulative counts. It is usually the easier variant to explain mathematically and is widely used in research code and codecs.

tANS, or tabled ANS, uses precomputed state transition tables. For a given frequency distribution, the encoder and decoder build tables that describe how states map to symbols and next states. Decoding can then be very fast, often dominated by table lookups, shifts, masks, and additions.

Finite State Entropy, or FSE, is a well-known tabled ANS implementation by Yann Collet. Zstandard uses FSE for several internal symbol streams. It also uses Huffman coding for literals, which is a good reminder that real compressors mix techniques where appropriate. This is important: a modern compressor does not need to choose one entropy coder for everything, as literals, lengths, distances, flags, and other symbol streams may have different distributions and different performance constraints. Using Huffman for one stream and ANS for another is a solid approach.

Why ANS works well on modern CPUs

The appeal of ANS comes from compressing well and having good implementations that run very fast. Modern CPUs like predictable integer work, table lookups that fit in cache, and independent instruction streams. Entropy decoding is naturally serial. The current state determines the next symbol, and the next state depends on that symbol, so no entropy coder can remove that dependency completely.

ANS helps because it can be interleaved: a decoder can keep multiple ANS states and alternate between them, giving the CPU more independent work. Fabian Giesen’s writing on interleaved entropy coders is useful background here, because it explains why this matters for actual throughput rather than just theoretical operation counts. This is why ANS became attractive in production compressors: it is not simply an alternative to arithmetic coding, but it also gives implementers a way to build entropy coders that are dense, table-friendly, and more compatible with modern CPU execution.

Comparing the three approaches

The three coders can be compared by the kind of object they use to represent information.

Huffman coding builds a prefix tree.
Arithmetic coding narrows an interval.
ANS evolves an integer state.

That difference leads to different engineering properties.

Property	Huffman coding	Arithmetic/range coding	ANS
Representation	Prefix codewords	Numerical interval/range	Integer state
Per-symbol cost	Whole bits	Fractional on average	Fractional on average
Compression density	Good	Excellent	Excellent
Decoder complexity	Low	Higher	Medium
Decoder speed	Very high	Often good, but more stateful	Very high in good implementations
Table-driven implementation	Natural	Less central	Natural, especially tANS
Adaptive probabilities	Possible, but not the best fit	Very natural	Possible, design-dependent
Good use cases	Simple and fast formats	High-efficiency adaptive codecs	Block compressors and modern codecs needing speed and density

Huffman coding remains a good choice when decoder simplicity and speed are more important than the last few percent of compression. Arithmetic coding remains a strong choice when adaptive probability models and compression ratio are central. ANS is attractive when you want arithmetic-like density while keeping the implementation closer to table-driven integer decoding.

A simple educational compressor should probably start with canonical Huffman coding. A codec with sophisticated adaptive contexts might naturally use range coding or arithmetic coding. A modern block compressor that cares about decompression speed should seriously consider ANS. The right answer depends on the data, the model, the decoder, and the amount of implementation complexity the format can afford.

Where the algorithms show up

DEFLATE combines LZ77 and Huffman coding. It is the basis for gzip, zlib, ZIP, and PNG. It has been surpassed in compression ratio by newer formats, but its design remains instructive: a good model for repeated strings, a simple entropy coder, and a decoder that can be implemented almost anywhere.
Brotli still uses Huffman coding, but with stronger modelling than DEFLATE. A good probability distribution fed into Huffman coding can beat a poor distribution fed into a more sophisticated coder.
Baseline JPEG uses Huffman coding, which helped make it easy to implement broadly in software and hardware. JPEG 2000 moved to arithmetic coding, reflecting a different set of priorities.
H.264/AVC and HEVC use CABAC for high-efficiency entropy coding. AV1 uses an arithmetic/range-style entropy coder. These codecs justify more complex entropy coding because bitrate matters enormously and the syntax is already highly modelled.
Zstandard is a good example of modern general-purpose compression. It combines LZ-style matching, advanced parsing, Huffman coding for literals, and FSE for other symbol streams.
JPEG XL also uses modern entropy coding tools, including ANS and Huffman coding depending on context. Like Zstandard, it reflects the more recent view that entropy coding should be chosen per stream rather than treated as a single global decision.

The model still dominates

It is easy to overstate the importance of the entropy coder because the algorithms are interesting. But in a real compressor, the model often matters more. If your model produces symbols that are nearly uniformly distributed, Huffman coding, arithmetic coding, and ANS will all have little to exploit. The entropy coder cannot manufacture predictability that the model failed to expose. If your model produces a sharply skewed distribution, then the entropy coder matters a lot.

This is why modern compressors spend so much effort on parsing, context selection, transforms, prediction, and probability estimation. Entropy coding is the last stage, but it is not a substitute for modelling.

Why this evolution matters

Huffman coding fits a world where small, fast, robust decoders are the dominant requirement. Arithmetic coding fits a world where compression density is worth more complex arithmetic and careful implementation. ANS fits modern CPUs and block-based compressors particularly well because it can combine high density with fast table-driven decoding and interleaving. This is why all three still matter. The older algorithms did not disappear because their trade-offs are still valid in many contexts.

References

Claude E. Shannon, A Mathematical Theory of Communication, Bell System Technical Journal, 1948.
https://people.math.harvard.edu/~ctm/home/text/others/shannon/entropy/entropy.pdf
David A. Huffman, A Method for the Construction of Minimum-Redundancy Codes, Proceedings of the IRE, 1952.
https://pzs.dstu.dp.ua/ComputerGraphics/ic/bibl/huffman.pdf
Ian H. Witten, Radford M. Neal, and John G. Cleary, Arithmetic Coding for Data Compression, Communications of the ACM, 1987.
https://dl.acm.org/doi/10.1145/214762.214771
Jarek Duda, Asymmetric Numeral Systems: Entropy Coding Combining Speed of Huffman Coding with Compression Rate of Arithmetic Coding, 2013.
https://arxiv.org/abs/1311.2540
RFC 1951, DEFLATE Compressed Data Format Specification version 1.3.
https://www.ietf.org/rfc/rfc1951.txt
RFC 7932, Brotli Compressed Data Format.
https://datatracker.ietf.org/doc/html/rfc7932
RFC 8478, Zstandard Compression and the application/zstd Media Type.
https://datatracker.ietf.org/doc/html/rfc8478
Yann Collet, Finite State Entropy.
https://github.com/Cyan4973/FiniteStateEntropy
Fabian Giesen, Interleaved Entropy Coders, 2014.
https://arxiv.org/abs/1402.3392
JPEG Committee, JPEG XL Image Coding System — White Paper.
https://ds.jpeg.org/whitepapers/jpeg-xl-whitepaper.pdf
Alliance for Open Media, AV1 Bitstream & Decoding Process Specification.
https://aomediacodec.github.io/av1-spec/