Thursday, March 3, 2011

LZ4 : confirmed as world's fastest compressor

 After Stephan's SqueezeChart statement on LZ4 being the fastest compressor seen on his public benchmark, LZ4 was once again put to the test, this time by Sami Runsas 's Compression Ratings.

This benchmark is very interesting, since its policy is very clear, its benchmark corpus is intentionally heterogeneous and carefully selected (so much that i use it regularly for calibration), and its emphasis on speed and precision makes it a reliable comparison tool, especially for fast compressors.

It is also heavily multi-thread oriented. Therefore, in order to sustain the comparison, LZ4 needed to be upgraded with multi-threading capabilities, since it would stand no chance when running on a single core against the best-of-breed multi-threaded. That's where enters version 0.9....

And it fares pretty well on its first try.
LZ4 gets top position on both compression and decompression speed, while also providing better ratio than previous king of the hill, the excellent 4x4 from Bulat Ziganshin.
And the difference is pretty large : +50% for compression speed, +25% for decompression speed using less than a full thread to max out, and +9% compression ratio.

As a complement, it's interesting to observe that the decoding speed of LZ4 does not vary by increasing the number of threads. That's because decoding speed is so fast that half a CPU usage is enough to exhaust RAMdisk steam.

With such a speed available, compressing/decompressing data costs about the same time as just copying it ... from a main memory RAM Drive. It opens the way to almost free compression for, well, any usage, even on-the-fly compression of temporary data for instance.

LZ4 is available on its homepage.

Tuesday, March 1, 2011

Multi-threading compression available : LZ4 v0.9

 Previous versions of LZ4 were displaying promising results using multi-threading, but only in benchmark mode. Now this feature is made fully accessible, with real "file i/o" interface.

You will need a very fast disk drive to experiment with it. In effect, only a RAM Drive can expect to keep fast enough steam to feed LZ4, especially when using multi-threading modes.

Block subdivision method was selected, due to its design simplicity and scalability. Compared with previous versions, even single thread mode greatly benefits, since multi-threading requires asynchronous data loading, which means that reading and writing to the disk is done in parallel with compression. 

I can't really test beyond 2 threads, since i only have dual-core systems within reach, but the code should prove pretty scalable with quad-core systems and more (up to 32 threads).

You can download and test this new version on the LZ4 homepage.

Sunday, February 27, 2011

LZ4 on steroids : version 0.8



Back to playground. It's always nice to work with LZ4, since its layout is very easy to test new ideas.
In this case, i wanted to check the principle of selective sampling, but this time purely in order to improve speed. This is a bit different from Zhuff, which used this idea to improve compression.

As stated in an earlier post, increasing speed is achieved by decreasing the number of samples. However, since LZ4 is a pure LZ77 compressor (using offset to identify repeated patterns), it also means that reducing samples can only decrease compression. Therefore, this is a matter of trade-off.

The end result achieved with LZ4 version 0.8 seems really interesting. On average, we end up with a 0,05% compression loss, and achieve a more than 10% speed boost. This is almost an unfair trade-off.

Don't be fooled by this "10%" average value, since it hides some very vast differences. Effectively, some files will gain a lot, while others will not get any benefit. For instance, an almost incompressible file can get a speed improvement of more than 500%, while another file with no possibility to "discard" elements, such as enwik8, will not see any performance improvement. But this is still an interesting bet "on average".

While at it, i also modified the multi-threading code branch. The new behavior is now much closer to an I/O algorithm, which makes it more representative of real usage. It also results in a slightly faster operation.
And finally, there is a new checksum algorithm, which ensures compression results are correct.

The resulting binary is rightly available here, and can be tested on your system.


versionCompression RatioSpeed Decoding
LZ4 "Ultra Fast"0.72.062232 MB/s805 MB/s
LZ4 "Ultra Fast"0.82.061260 MB/s810 MB/s
LZ4 "Ultra Fast" - 2 threads0.72.062430 MB/s1510 MB/s
LZ4 "Ultra Fast" - 2 threads0.82.061500 MB/s1510 MB/s