[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: TO COMPRESS OR NOT TO CMPRS (please reply)



>   The test file was the University of Calgary Text Compression Corpus
>   [Calgary].  The length of the file prior to compression was 3,278,000
>   bytes.  When the entire file was compressed as a single payload, a
>   compression ratio of 2.34 resulted.
>
>    Datagram size,|  64   128   256   512  1024  2048  4096  8192 16384 
>    bytes         |
>    --------------|----------------------------------------------------
>    Compression   |1.18  1.28  1.43  1.58  1.74  1.91  2.04  2.11  2.14
>    ratio         |
>
>   [Calgary] Text Compression Corpus, University of Calgary, available at
>         ftp://ftp.cpsc.ucalgary.ca/pub/projects/text.compression.corpus

I dug up this file, specifically the archive

text.compression.corpus.tar.Z

and tried a few quick experiments. As compressed with "compress" the
ratio was 2.4:1.  When I decompressed the file and recompressed it
with gzip, the ratio improved to 3.05:1.  Using the strongest (most
CPU intensive) gzip level, 9, the ratio improved slightly more to
3.077:1. By default, the ssh compression option uses gzip level 6.

Interesting that a university group interested in compression wouldn't
use the most popular and effective compression algorithm to distribute
their work! :-)

I think this discussion shows that compression at the packet layer is
better than nothing, but the best performance is attained by using a
really good stream compression algorithm above the transport layer.

Phil




References: