[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MAC speeds

To: pkoning@xedia.com (Paul Koning)
Subject: Re: MAC speeds
From: fisher@hollywood.engr.sgi.com (William Fisher)
Date: Thu, 10 Mar 100 16:03:01 -0800 (PST)
Cc: Bart.Preneel@esat.kuleuven.ac.be, bmanning@ISI.EDU, djb@cr.yp.to, ipsec@lists.tislabs.com, fisher@hollywood.engr.sgi.com (William Fisher)
In-Reply-To: <200003091641.LAA20425@tonga.xedia.com> from "Paul Koning" at Mar 10, 0 11:41:03 am
Reply-To: fisher@sgi.com
Sender: owner-ipsec@lists.tislabs.com

> 
> >>>>> "Bart" == Bart Preneel <Bart.Preneel@esat.kuleuven.ac.be> writes:
> 
>  Bart> Yes it can be made faster.  See
>  Bart> http://www.esat.kuleuven.ac.be/~bosselae/fast.html
> 
>  Bart> --Bart
> 
>  Bart> On Thu, 9 Mar 2000, Bill Manning wrote:
> 
>  >> % % Tero Kivinen reports a time of 154000 cycles (308
>  >> microseconds) for % HMAC-MD5 on 2048 bytes on a 500MHz Alpha.  % %
>  >> That's slower than today's fast ciphers, notably the AES
>  >> candidates, % which are below 40 cycles/byte. Some people seem to
>  >> be making decisions % on this basis.  % % But there are faster
>  >> implementations of MD5. Furthermore, there are % much faster MACs.
>  >> ...  % ---Dan
>  >> 
>  >> I'm not sure that MD5 can be made too much faster.  see:
>  >> 
>  >> http://www.isi.edu/~touch/pubs/sigcomm95.html
> 
> Antoon's numbers are quite impressive.  On the other hand, the number
> quoted from Tero is very bad even for C code run through a poor
> compiler.  I make it to be about 4800 cycles per hash block.  I get
> about 1600 cycles per block on a MIPS processor, which is a single
> issue machine, for C code run through gcc.  Neither has Rotate as a
> primitive instruction, so they are at a bit of a disadvantage compared 
> to the x86, but it seems reasonable that you should be able to get
> down to about 1k cycles per block or so.
> 
> 	paul
> 
	On the gcc code generated for the MD5 case, did any of the following
	attached MIPS IV instructions get emitted to partially overcome the
	lack of a generic rotate instruction on the MIPS processor?

	I will compile your code if you send me a pointer to it using
	SGI compilers on processors with support MIPS III and the MIPS IV
	instruction sets. I didn't find a pointer to the code on Antoon's
	web site containing the Pentium numbers.

	Since the R10K/12K series are out-of-order execution processors
	with two arithmetic units, which possibly might reduce the
	1600 cycles per block count. You can get absolute clock counts
	on the R10K/R12K series via appropriate reading of the right
	hardware register, to avoid estimating the true amount of overlap.

	From the MIPS IV Instruction Set Manual, Version 3.1, dated
	January 1995, pages A-140 through A-143 the supported
	instructions include:

	1) SRA => Shift Word Right Arithmetic	SRA	rd,rt,sa

		Arithmetic right shift a word by a fixed number of bits

		On 32-Bit Processors:

		The contents of the low-order 32-bit word of GPR 'rt' are
		shifted right, duplicating the sign-bit(bit31) in the emptied
		bits; The word results is placed in GPR 'rd'. The bit shift
		count is specified by 'sa'. If 'rd' is a 64-bit register,
		the result word is sign-extended.

		On 64-Bit Processors:

		On 64-bit processorsm if GPR 'rt' does not contain a
		sign-extended 32-bit value (Bits 63..31) then the results
		of the operation is undefined.

	2) SRAV => Shift Word Right Arithmetic Variable	     SRA    sd,rt,rs

		Arithmetic right shift a word by a variable number of bits
		Similar to SRA but where the right shift count is contained
		in the low-order 5 bits of the GPR 'rs'

	3) SRL => Shift Word Right Logical	SRA	rd,rt,sa
		Logical right shift a word by a fixed number of bits.

		On 32-Bit Processors:

		The contents of the low-order 32-bit word of GPR 'rt' are
		shifted right, inserting zero's into the empied bits;
		The word results is placed in GPR 'rd'. The bit shift
		count is specified by 'sa'. If 'rd' is a 64-bit register,
		the result word is sign-extended.

		On 64-Bit Processors:

		On 64-bit processorsm if GPR 'rt' does not contain a
		sign-extended 32-bit value (Bits 63..31 equal) then the results
		of the operation is undefined.

	4) SRLV => Shift Word Right Logical Variable	SRA	rd,rt,rs

		Logical right shift a word by a variable number of bits.
		Similar to SRL but where the right shift count is contained
		in the low-order 5 bits of the GPR 'rs'

	5) SWL => Store Work Left	SWL	rt,offset(base)

		Store the most-significant part of the word to an unaligned
		memory address.

		The 16-bit signed 'offset' is added to the contents of
		GPR 'base' to form an effective address (EffAddr) EffAddr
		is the address of the most-significant of four consecutive
		bytes forming a workd in memory (W) starting at an arbitrary
		byte coundary. A part of W, the most significant one to
		four bytes, is in the aligned word containing EffAddr. The same
		number of the most sighnificant (left) bytes from the word
		in GPR 'rt' are stored into these bytes of W.

		If GPR 'rt'is a 64-bit register, the source word is the low
		word of the register. The gory details in pictures can
		be found in the reference manual.
	
	6) SWR => Store Work Right
	
		Store the least-significant part of the word to an unaligned
		memory address.

		Similar to SWL, except the least-significant(right) bytes
		from the word in GPR 'rt' are stored into these bytes in W.

-- Bill

References:

Re: MAC speeds

From: Paul Koning <pkoning@xedia.com>

Prev by Date: Re: IPSec vs SSL
Prev by thread: Re: MAC speeds
Next by thread: MAC speeds
Index(es):
- Main
- Thread