Thursday, April 21, 2011

Zhuff + MMC = Zhuff HC

 What can you get marrying MMC with Zhuff ? Well, a stronger version, called ZhuffHC.

MMC is an improved implementation of a full match search function. It always provides the best possible match length in a given search window. In the case of Zhuff, this window is 64K.
MMC is much faster than a simple Hash Chain algorithm, and requires no "early end" trick to keep pace on badly crafted files. The result it provides is guaranteed to be the best possible. So it is quite the opposite from a fast scan strategy.

Integrating MMC with Zhuff has been refreshingly simple. It just required to remove the fast scan matchfinder, and replace it with a single call to MMC matchfinder. In the process, i decided to slightly simplify MMC interface, a newer version of which has been uploaded to Google Code. Differences are very minor, but it should make the code easier to read.

ZhuffHC inherits all the benefits of regular Zhuff, including multi-threading, number of cores detection, drag'n'drop support, benchmark mode, and so on. It improves compression ratio by a fair amount (+8%). That's not as large an improvement as it has been for LZ4HC, which means that both the Huffman encoding stage and improved sampling were already providing a nice compression boost, eating into the potential gains of a full search.
All this comes at a steep price on compression speed though. At just 28 MB/s per core, it is just a fraction of its older brother.

An important point is that both versions, Zhuff and ZhuffHC, are fully compatible : this is in fact exactly the same format, just the search time is different.

As an interesting side effect, decoding speed is improved on compressed files produced by ZhuffHC (decodable by Zhuff too). This is because the more compressed version consists of less but longer matches. With less copy operations to handle, the decoder works 13% faster.




versionthreadsCompression RatioSpeedDecoding
Zhuff0.7-t12.584147 MB/s278 MB/s
Zhuff0.7-t22.584285 MB/s550 MB/s
ZhuffHC0.1-t12.78128.3 MB/s312 MB/s
ZhuffHC0.1-t22.78155.6 MB/s618 MB/s


You can grab the new version here.