Welcome to the ‘Practical Compressor Test’. Unlike some other compressor comparison sites, I won’t be looking for a compressor offering for the last bit of compression. Instead I’ll try to find the most practical compressor out there. This means compression and decompression times are taken into account, so PAQAR and the like, which can achieve very good compression at the expense of insanely long run times (several hours on this benchmark!) are not considered.
Instead I’ll be focusing on very well known, established compressors that are easily obtained (I only use precompiled packages and won’t build from source) and have reasonable run times. Also I won’t try every combination of compression options but limit the test to one general option (-1 to -9 for gzip and bzip2, -m1 to -m5 for RAR, …).
- 2007/04/25: Updated 7-zip to version 4.45: small improvements in speed and compression ratio
- 2006/06/08: Reran all tests on a Pentium D 830 (dual-core, 3 GHz, 2x 1MiB L2) and 2 GiB RAM
- 2006/02/13: Updated 7-zip to version 4.33: both faster compression (up to 20%) and better compression ratios (a few %)
- 2005/11/27: Updated 7-zip to version 4.30 beta: high compression (-mx=7 … -mx=9) is now up to 40% faster
- 2005/11/03: Updated 7-zip to version 4.29 beta: no changes
- 2005/04/12: Updated 7-zip to version 4.16: the ratio is the same, but compression and decompression are now 30% to 40% faster!
- 2004/11/10: Added the 7-zip compressor
- 2004/09/30: Added the lzop compression algorithm
Just like Johan De Bock’s excellent GIMP Source Compression Test, I’ll be using the GIMP 2.0 Source tarfile:
- The test file: GIMP 2.0.0 Sources as one TAR
- MD5-Hash: d2a1c33317fb57bbed3641671b2da163
- Total size: 78,745,600 Bytes
Each program is used to compress and decompress the file with each of the selected command line switches. The time required for both compression and decompression is measured, as is the size of the resulting archive. To have an idea of the accuracy of the timing measurements, the test is repeated three times and the minimum runtime (= the one with the least disturbances) is reported. All this is done on a Pentium D 830 (3.0 GHz) processor with 2 GiB RAM, running Fedora Core 5 Linux. Only one CPU is used for the tests (using -mmt=off for 7-zip, default for other compressors).
|gzip||1.3.5||-1 … -9|
|bzip2||1.0.3||-1 … -9|
|RAR||3.60b2||-m1 … -m5|
|Zip||2.31||-1 … -9|
|7-zip||4.42||-mx=1 … -mx=9|
|lzop||1.02rc1||-1 … -9|
|Zoo||2.1||(none) and -h|
The versions aren’t always the latest and the greatest, but they are – again in the spirit of being ‘practical’ – the most recent ones installed by the Fedora Core 5 distribution.
Results – graphs
These graphs show the results for the various compressors and their switches. Compression and decompression time, respectively, are in the horizontal axis, compression efficiency (compressed size / original size, so smaller is better) is on the vertical axis.
- The best compressor can be found near the lower (efficient) left (fast) corner. You’re right: there isn’t any! All real compressors are either inefficient (gzip), slow (lzma/7-zip), or moderate on both counts (zip). This means your ideal compressor will depend on how you value speed against efficiency.
- gzip is consistently faster but less efficient than bzip2. RAR is on all counts better than bzip2, so that’s probably why you have to pay for it 😉
- The simple switches used give you a decent choice between speed and efficiency, however, the effects are usually smaller than the differences between different compression algorithms. gzip and zip use the same algorithm so compression ratios are similar, but the timing depends on the optimizations used in the actual implementation.
- Between the GNU compressors, the sequence lzop – gzip – bzip2 – 7zip gives you a wide range of speed / ratio trade-off. compress and zoo are obviously outdated, they are at the same time slower and less efficient than gzip.
As we have already seen, your ideal compressor will depend on what you use it for. This ranking can be customized to what you want to do. It will compute the time required for 1 compression, a number of downloads over a network and the following decompressions.
total time = compression time + n * (compressed file size / network speed + decompression time)
For instance, if you compress a file to send it over a network once, n equals one and compression time will have a big influence. If you want to post a file to be downloaded many times, n is big so long compression times will weigh less in the final decision. Finally, slow networks will do best with a slow but efficient algorithm, while for fast networks a speedy, possibly less efficient algorithm is needed.
Compression time Transmission time Decompression time
For any bandwidth / #downloads combination we can now determine the optimal compressor.
- Ultimate Command Line Compressors, comparing almost all existing algorithms on efficiency, by Johan De Bock.
- Compression Links Info, a good all-round site about everything related to compression.
- The Wikipedia article on data compression
Results – table
|Algorithm||Effort||Compression time (s)||Decompression time (s)||Compression ratio|