[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
>I'd like to insert a few words of caution about timings of crypto routines.
>The speed of eating an apple, orange, or banana differs depending on
>whether or not you start with peeled fruit, partially digested, or juiced.
>Similarly, the following factors must be normalized when timing software:
Glad you brought them up.
> 1. Blocksize and number of blocks. Working with the same
> short piece of data several hundred thousand times
> can be misleading due to data cache effects.
> Blocks that are too large can cause swapping and TLB miss
> rates that might cause overly pessimistic timings.
If data memory speed is significant, then you have a *really* good
encryption algorithm. My DES code is about as tight as I can make it,
but the nearly proportional speedup in going from a clock-doubled to a
clock-tripled 486 chip shows that it's still limited by the internal
CPU speed and not by the relatively slow memory bus bandwidth. (Some
of the other CPU-intensive benchmarks I ran at the same time *are*
memory bus limited. E.g. my Viterbi decoder, which improved only 15%
because of its heavy memory write traffic occasioned by the 486's
> 2. Data dependencies. Some algorithms have different data
> usage patterns depending on the input. Encrypting a
> block of all 0's, for example, obscures this effect.
This is unlikely to be a problem with most DES implementations given
the scrambling effect of the 16 rounds. Even if you repeatedly encrypt
the same data, I suspect that the entire 2K SP table quickly lands in
the on-chip cache. Nevertheless, I run my tests in OFB mode just to be
sure. Easy enough to do.
> 3. Endianicity. For protocols, the time to rearrange the
> data to/from network byte order should be considered.
> This transformation is sometimes embedded into the algorithm
True, but this is usually taken into account when you specify the CPU
you're running on.
> 4. The compiler and switches. Try all the compilers that are
> available for the machine, and try all the optimization levels.
> Make sure the routine still gets correct results, choose the
> fastest result.
True -- for C code. My code is hand-optimized assembler, so I doubt the
compiler will make much of a difference...
- From: Hilarie Orman <firstname.lastname@example.org>