[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: MAC speeds
>
> >>>>> "Bart" == Bart Preneel <Bart.Preneel@esat.kuleuven.ac.be> writes:
>
> Bart> Yes it can be made faster. See
> Bart> http://www.esat.kuleuven.ac.be/~bosselae/fast.html
>
> Bart> --Bart
>
> Bart> On Thu, 9 Mar 2000, Bill Manning wrote:
>
> >> % % Tero Kivinen reports a time of 154000 cycles (308
> >> microseconds) for % HMAC-MD5 on 2048 bytes on a 500MHz Alpha. % %
> >> That's slower than today's fast ciphers, notably the AES
> >> candidates, % which are below 40 cycles/byte. Some people seem to
> >> be making decisions % on this basis. % % But there are faster
> >> implementations of MD5. Furthermore, there are % much faster MACs.
> >> ... % ---Dan
> >>
> >> I'm not sure that MD5 can be made too much faster. see:
> >>
> >> http://www.isi.edu/~touch/pubs/sigcomm95.html
>
> Antoon's numbers are quite impressive. On the other hand, the number
> quoted from Tero is very bad even for C code run through a poor
> compiler. I make it to be about 4800 cycles per hash block. I get
> about 1600 cycles per block on a MIPS processor, which is a single
> issue machine, for C code run through gcc. Neither has Rotate as a
> primitive instruction, so they are at a bit of a disadvantage compared
> to the x86, but it seems reasonable that you should be able to get
> down to about 1k cycles per block or so.
>
> paul
>
On the gcc code generated for the MD5 case, did any of the following
attached MIPS IV instructions get emitted to partially overcome the
lack of a generic rotate instruction on the MIPS processor?
I will compile your code if you send me a pointer to it using
SGI compilers on processors with support MIPS III and the MIPS IV
instruction sets. I didn't find a pointer to the code on Antoon's
web site containing the Pentium numbers.
Since the R10K/12K series are out-of-order execution processors
with two arithmetic units, which possibly might reduce the
1600 cycles per block count. You can get absolute clock counts
on the R10K/R12K series via appropriate reading of the right
hardware register, to avoid estimating the true amount of overlap.
From the MIPS IV Instruction Set Manual, Version 3.1, dated
January 1995, pages A-140 through A-143 the supported
instructions include:
1) SRA => Shift Word Right Arithmetic SRA rd,rt,sa
Arithmetic right shift a word by a fixed number of bits
On 32-Bit Processors:
The contents of the low-order 32-bit word of GPR 'rt' are
shifted right, duplicating the sign-bit(bit31) in the emptied
bits; The word results is placed in GPR 'rd'. The bit shift
count is specified by 'sa'. If 'rd' is a 64-bit register,
the result word is sign-extended.
On 64-Bit Processors:
On 64-bit processorsm if GPR 'rt' does not contain a
sign-extended 32-bit value (Bits 63..31) then the results
of the operation is undefined.
2) SRAV => Shift Word Right Arithmetic Variable SRA sd,rt,rs
Arithmetic right shift a word by a variable number of bits
Similar to SRA but where the right shift count is contained
in the low-order 5 bits of the GPR 'rs'
3) SRL => Shift Word Right Logical SRA rd,rt,sa
Logical right shift a word by a fixed number of bits.
On 32-Bit Processors:
The contents of the low-order 32-bit word of GPR 'rt' are
shifted right, inserting zero's into the empied bits;
The word results is placed in GPR 'rd'. The bit shift
count is specified by 'sa'. If 'rd' is a 64-bit register,
the result word is sign-extended.
On 64-Bit Processors:
On 64-bit processorsm if GPR 'rt' does not contain a
sign-extended 32-bit value (Bits 63..31 equal) then the results
of the operation is undefined.
4) SRLV => Shift Word Right Logical Variable SRA rd,rt,rs
Logical right shift a word by a variable number of bits.
Similar to SRL but where the right shift count is contained
in the low-order 5 bits of the GPR 'rs'
5) SWL => Store Work Left SWL rt,offset(base)
Store the most-significant part of the word to an unaligned
memory address.
The 16-bit signed 'offset' is added to the contents of
GPR 'base' to form an effective address (EffAddr) EffAddr
is the address of the most-significant of four consecutive
bytes forming a workd in memory (W) starting at an arbitrary
byte coundary. A part of W, the most significant one to
four bytes, is in the aligned word containing EffAddr. The same
number of the most sighnificant (left) bytes from the word
in GPR 'rt' are stored into these bytes of W.
If GPR 'rt'is a 64-bit register, the source word is the low
word of the register. The gory details in pictures can
be found in the reference manual.
6) SWR => Store Work Right
Store the least-significant part of the word to an unaligned
memory address.
Similar to SWL, except the least-significant(right) bytes
from the word in GPR 'rt' are stored into these bytes in W.
-- Bill
References: