Monday, December 23, 2013

Zhuff v0.9, or a first FSE application

 As a quick follow up to last week release of FSE, I wanted to produce a quick demo to show the kind of gain to expect with this new entropy algorithm.

And a simple example shall be Zhuff.

Zhuff, is basically, LZ4 and Huff0 stuffed together. It's very fast in spite of this 2-pass process. Its major limitation so far is a 64KB window size, as an inheritance from LZ4, but that's something we'll talk again in a later post.

For the time being, my only intention is to swap huff0 with FSE, and look at the results.

And the results are positive.
Here are some benchmark numbers, on a Core i5-3340M @2.7GHz, Windows  Seven 64 bits, using single-thread, fast algorithm

Filename Compressor   Ratio Decoding speed
silesia  zhuff v0.8   2.783    275 MB/s
silesia  zhuff v0.9   2.805    311 MB/s

enwik8   zhuff v0.8   2.440    200 MB/s
enwik8   zhuff v0.9   2.460    230 MB/s

calgary  zhuff v0.8   2.672    220 MB/s
calgary  zhuff v0.9   2.693    250 MB/s

(Note : I've not mentioned improvements in compression speed, they are measurable but unrelated to FSE, so out of this report)

So, in a nutshell, we have a moderate (>10%) increase of decoding speed, and a small improvement to compression ratio. Nothing earth-shattering, but still worthy to grab.
The situation would have been even better for FSE if higher probabilities had to be compressed, but Zhuff is a simple fast LZ algorithm, so it doesn't exactly qualify as generating "high probabilities".

The purpose of this demo was just to show that FSE can easily replace and improve results compared to Huffman. As a side-effect, it makes Zhuff an interesting playground to test future FSE improvements.

With this part now settled, the next articles shall describe FSE principles, especially how and why it works.