[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: TO COMPRESS OR NOT TO CMPRS (please reply)
> The test file was the University of Calgary Text Compression Corpus
> [Calgary]. The length of the file prior to compression was 3,278,000
> bytes. When the entire file was compressed as a single payload, a
> compression ratio of 2.34 resulted.
>
> Datagram size,| 64 128 256 512 1024 2048 4096 8192 16384
> bytes |
> --------------|----------------------------------------------------
> Compression |1.18 1.28 1.43 1.58 1.74 1.91 2.04 2.11 2.14
> ratio |
>
> [Calgary] Text Compression Corpus, University of Calgary, available at
> ftp://ftp.cpsc.ucalgary.ca/pub/projects/text.compression.corpus
I dug up this file, specifically the archive
text.compression.corpus.tar.Z
and tried a few quick experiments. As compressed with "compress" the
ratio was 2.4:1. When I decompressed the file and recompressed it
with gzip, the ratio improved to 3.05:1. Using the strongest (most
CPU intensive) gzip level, 9, the ratio improved slightly more to
3.077:1. By default, the ssh compression option uses gzip level 6.
Interesting that a university group interested in compression wouldn't
use the most popular and effective compression algorithm to distribute
their work! :-)
I think this discussion shows that compression at the packet layer is
better than nothing, but the best performance is attained by using a
really good stream compression algorithm above the transport layer.
Phil
References: