[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Timings

>I'd like to insert a few words of caution about timings of crypto routines.
>The speed of eating an apple, orange, or banana differs depending on
>whether or not you start with peeled fruit, partially digested, or juiced.
>Similarly, the following factors must be normalized when timing software:

Glad you brought them up.

>	1. Blocksize and number of blocks.  Working with the same
>	   short piece of data several hundred thousand times 
>	   can be misleading due to data cache effects.
>	   Blocks that are too large can cause swapping and TLB miss
>	   rates that might cause overly pessimistic timings.

If data memory speed is significant, then you have a *really* good
encryption algorithm. My DES code is about as tight as I can make it,
but the nearly proportional speedup in going from a clock-doubled to a
clock-tripled 486 chip shows that it's still limited by the internal
CPU speed and not by the relatively slow memory bus bandwidth.  (Some
of the other CPU-intensive benchmarks I ran at the same time *are*
memory bus limited. E.g. my Viterbi decoder, which improved only 15%
because of its heavy memory write traffic occasioned by the 486's
write-through cache).

>	2. Data dependencies.  Some algorithms have different data
>	   usage patterns depending on the input.  Encrypting a
>	   block of all 0's, for example, obscures this effect.

This is unlikely to be a problem with most DES implementations given
the scrambling effect of the 16 rounds. Even if you repeatedly encrypt
the same data, I suspect that the entire 2K SP table quickly lands in
the on-chip cache. Nevertheless, I run my tests in OFB mode just to be
sure.  Easy enough to do.

>	3. Endianicity.  For protocols, the time to rearrange the
>	   data to/from network byte order should be considered.
>	   This transformation is sometimes embedded into the algorithm
>	   details.

True, but this is usually taken into account when you specify the CPU
you're running on.

>	4. The compiler and switches.  Try all the compilers that are
>	   available for the machine, and try all the optimization levels.
>	   Make sure the routine still gets correct results, choose the
>	   fastest result.

True -- for C code. My code is hand-optimized assembler, so I doubt the
compiler will make much of a difference...