[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

MACs based on hash




Some comments on the proposals for MAC constructions:

> *  The "prepend key and length" scheme, PKL(msg) = Hash(key,length,msg)
> *  The "envelope" scheme, ENV(msg) = Hash(key,pad,msg,key)
> *  The HMAC scheme, HMAC(msg)  = Hash(key,pad1, Hash(key,pad2,msg))
> *  The NMAC scheme, NMAC(msg) =  Hash_key1 (Hash_key2(msg))

For the first two of these schemes, we know that there is a reduction
to the pseudo-randomness of the underlying compression function (PKL ENV);
For the NMAC:  "we know that if the hash function is collision-free 
(for a secret IV) and the compression function is a good MAC on 512-bit 
messages then NMAC is a good MAC".
For HMAC, we need an additional mixing property of the compression function.

You forgot to mention MDx-MAC (see the Crypto'95 paper of Paul van Oorschot
and myself), for which there exist a security proof based on the 
pseudo-randomness of the underlying compression function which is keyed
in both the IV and the message input. 


All these reductions are very nice from a theoretical standpoint 
(really a good piece of work). However, as a practical cryptanalyst, 
I have the following remarks:

1. very little cryptanalysis has been done on MD4 and MD5; nevertheless,
   some quite good results have been obtained already (from the 
   cryptanalyst's point of view).

2. no work at all has been done on properties of MD5 when part of the input 
   is a secret key.

3. it is even not obvious at all where to put this key: which part of 
   the input is the "natural" place to put it? 
   (in my humble opinion, it should be put in every part of the input,
    but this is based on my intuition). 

4. for all of these there exists a forgery attack which is unrealistic if 
   the underlying function (with the same parameters) is really strong; 
   for the first two schemes, this (unrealistic) attack allows for key 
   recovery rather than forgery.  However, no one has yet analyzed how much 
   this can be improved if MD5 is used as underlying function.

In view of all of this, for all of you who want to use any of these 
constructions based on MD5, I recommend to be very careful and take the 
most conservative one you can afford. 

I wonder how many of you would be prepared to use any of these schemes
if the US government had proposed them... (sorry, couldn't resist).

The good news is that you can make many mistakes, since there are 
only very few cryptanalysts, and their time is very scarce (there are 
many algorithms). That means it can take many years before they find the 
time to try to break any of these schemes. 
It's up to you to decide whether you can ignore their warnings.

Good luck.   

Bart Preneel


PS Below some more details (for those of you who want to know):

---------------------------------------------------------------------------

1.  Up till now very little cryptanalysis has been performed on MD4 and MD5 
(compared to for example the amount of work on DES). I believe the total 
effort is less than 1 personyear.
MD4 was badly broken (research effort of about 5 manmonths; collisions in 
1 second on a PC), and it can be anticipated based on this that collisions 
will be found for MD5 in the next 12 to 24 months.  The main issue is just
to find a few competent people who want to spend a sufficient amount of time.

2. Up till now NO work has been done on the evaluation of the compression 
function of MD5 as a pseudo-random function. 

Some people claim that the fact that the compression function of MD5 is 
a MAC or a pseudorandom function is a design criterion for MD5. 
In my opinion it is at best a by-product of the design, and a by-product 
which _never_ has been checked for.  The reason for this is that usually 
MD5 is not used with secret input, but only with known/chosen inputs.
The only property which has been evaluated and which approximates this
is preimage resistance: if you are given the complete output, find a preimage.

What is required for a MAC are properties of the compression function for 
which _part_ of the input is fixed and secret; one then is allowed to
look at many outputs for chosen values of the remaining inputs.
As far as I know, there is only one class of concrete primitives which
have been checked for such a property, namely  ...block ciphers.
(unfortunately they are slower and cannot be exported).
The properties required are totally different from collision resistance 
or one-wayness: for example, the existence of probabilistic linear 
relations between part of the input (where you have put the key) and 
the output could lead to linear cryptanalysis of MD5; such an attack would
not change anything to the security of the hash function as a collision
resistant or (second) preimage resistant function.

3. it is even not obvious at all where to put this key: which part of 
   the input is the "natural" place to put it? 

If one uses MD5 as a MAC or as a pseudorandom function, you have to insert
the key somewhere. In my humble opinion, there is no obvious place to do this;
if you look at MD5 as a block cipher, the message input would be the key, 
and the IV would be the plaintext. 
NMAC enters the key in the IV, and the [BCK] paper contains
the following claim: "As it turns out, the latter approach
has some significant analytical advantages".

Moreover, if you one would design a hash function according to the
Merkle-Damgaard principle, i.e., derive the collision resistance of the 
hash function from that of the compression function (such as Snefru of Merkle), 
there would be no distinction between the IV and the message input. So it is 
quite confusing to me that putting the key in the IV input can have 
analytical advantages.

The fact that there is no obvious place to put the key, is the reason why
in MDx-MAC the key is put both in the IV and in the message input.
The idea behind this is that since the compression function is not designed
to hide any of its inputs in the way which a pseudo-random function should,
the key should be involved in _every_ input.
I agree that this is not a security proof, but in my opinion it is
a sound design principle.

4. for all of these schemes there exists a forgery attack which is unrealistic if 
   the underlying function (with the same parameters) is really strong; 
   for the first two schemes, this (unrealistic) attack allows for key 
   recovery rather than forgery.  However, no one has yet analyzed how much 
   this can be improved if MD5 is used as underlying function.

Since the time of cryptanalysts is scarce, and our knowledge of 
cryptographic algorithms is limited, we often have to rely on 
certificational weaknesses. For example MD5 has as certificational 
weakness that its compression function is not collision resistant,
and the construction of RFC 1828 has as certificational weakness that 
an unrealistic forgery attack can be extended to an unrealistic key 
recovery attack.  For me as a cryptographer this would be sufficient 
to never use RFC1828 based on MD5.

And the fact that the attack for any compression function takes $2^{66}$ 
chosen texts does not mean that this could not be improved for MD5..


----------------------------------------------------------------------------