[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

keyed-MD5



I have sumbitted a new version
of my internet-draft on keyed MD5 (draft-krawczyk-keyed-md5-01.txt).

You may recall that the previous version of this draft (00)
presented the construction that was adopted into RFC-1828
(keyed MD5 for IP AH).
The reason for the new updated draft is that I am proposing a different
construction which keeps all the advantages of the previous one
(simplicity, efficiency, short key, no change to MD5 code, etc), but
in addition has a significantly better security analysis behind it.

To the best of my knowledge, it is the best mode of keyed-MD5 proposed
so far among those that enjoy the above properties and have a well understood
security analysis behind it.   (I personally believe it will make it until the
basic MD5 function will be dropped by cryptanalysis or performance,
since a practical breaking of this message authentication algorithm would
imply severe weaknesses of the underlying hash function.)

My previous draft was a compromise between improving on
the by-then WG official keyed-MD5 proposal and  what seemed to be
acceptable to the RFC editors. It was not, in particular,
the best solution available at the time.
In the meantime, we have even further improved in some aspects of this
construction. In particular, it enjoys now very clear analysis advantages
relative to other proposals (including my previous one).

I hope that you agree with me that it is not too late to move to
this new proposal. It involves no difficulty for existing implementations
to change the function, and it offers a significantly better understood
security. On the other hand, it is better to do it now before these
standards are  widely deployed in products.

As an "appendix" to this note I am attaching some more information about
this construction that does not appear in the draft. People interested to
better understand the issues involved here are encouraged to read it.

Hugo

*************************************************************************

The following is intended to complement the information provided in the
draft (draft-krawczyk-keyed-md5-01.txt) itself about the considerations
and rationale behind the proposed function. I recommend you first read
the draft.

Background references: the reference [BCK1] in the paper has not yet been
published, but I will make it available on a web site this week
(I will send a message about that).
That paper deals with the general theory of building pseudorandom  functions
(from which MACs are a particular case) out of iterated hash functions a la
MD5. It does not present the specific construction I described in
the draft; it develops the mathematical tools to analyze these constructions
and presents some (other) examples. It is not easy to read (unfortunately,
the analysis there is quite involved and directed more for the "math types").

The "nested" construction (as I call the function that I am proposing in the
new internet draft) is tailored in particular to message authentication,
and as such has a simpler and tighter security analysis. Its detailed
analysis will be presented in a paper [BCK2] that I plan to complete before
the RSA conference in January, where I will present it.

Here I present some more details than those appearing in the internet-draft
as for the rationale behind this construction.  Except for the actual
mathematical details and formalism of the analysis the information
given in the draft and here is quite complete.

In what follows you will need some "patience" before I get to the actual
construction as proposed in the draft.

The basis of the methodology introduced in [BCK1] is to consider hash
functions that are keyed through their "initial variable" (IV) (sometimes
called 'chaining variable'), that is, the usually fixed IV is replaced
by a value K which serves as the (secret) key to the function.

Let's denote by MD5_K(x) the result of applying MD5 to a string x, when the
IV of MD5 is initialized to K (here K is a random string of length 128).
The basic nested construction for keyed-MD5 would be

MAC(K1,K2,x) = MD5_K2(MD5_K1(x))

where K1 and K2 are two *different* keys each of length 128 bits,
and x is the information being authenticated.
(The secret keys K1 and K2 are shared by the communicating parties).

We can prove (using much simpler tools than [BCK1]) that
for this construction to be secure it is enough that

1) an adversary that *does not know* K1 cannot find x and x' such that
   MD5_K1(x)=MD5_K1(x'); and

2) an adversary that *does not know* K2 and sees the values of MD5_K2
   on a sequence of 512-bits  strings y1,y2,...  cannot compute by itself
   the full result of MD5_K2 on any other string y.

Both conditions can be further weakened but to present that we need more
definitions and notation. But even the conditions as stated above are already
weaker than other assumptions on MD5 used for this kind of analysis
(including those needed to analyze the construction in RFC1828).

The regular (and basic) assumption on MD5 is that collisions cannot be found
even when the IV is fixed and known. Here (condition (1)) we are requiring
a much weaker assumption, namely, collision resistance (of MD5_K1) when
the IV is random and *secret*.
Indeed, a far much harder task for the attacker.
(This assumption can be further weakened if you notice that the adversary
never gets the value of MD5_K1 on any string, but just the result
after the additional MD5_K2 is applied).

Another approach commonly used in the analysis of MD5-like functions is to
consider these functions (ot their compression function) as
"random functions".
In such cases, one is assuming that the result of this (compression)
function cannot be predicted even on a single-bit basis (i.e.,
if I ask you what will be the value of the 15th bit of the result of the
function on a given input, your ability to predict it would be no better than
1/2; if I ask you to predict a whole byte your expected success probability
is only 1/256, and so on).
Instead (condition (2)), we are only asking that predicting the *whole*
128 bit output (of MD5_K2) is hard, e.g. can succeed with probability
at most 2^{-64}. This is a very significant difference. For example,
while the discovery of any statistical bias in the output of MD5 could
rend the function useless (or very weak) for uses where full randomness
is required, it could still make a very good MAC under the above construction.

(Again, our actual requirement is weaker; just notice that although
the adversary sees the function MD5_K2 computed on some strings,
it never gets to entirely see these strings since they contain the unknown
substring MD5_K1(x)).

Now, back to the proposed

   MAC(K,x)= MD5(K,pad2,MD5(K,pad1,x)).

This may seem quite different than the function described above.
Especially  that it uses only one 128 bit key K.
However, if you know how MD5 works you will notice that this
is *exactly*:

   MD5_K2(MD5_K1(x))

where

K1=MD5'(K,pad1)
K2=MD5'(K,pad2)

(MD5' stands for a single iteration of MD5 applied to one 512 bit block
without length padding).

The idea is that K1 and K2 in the original construction are derived from one
128-bit key using the compression function of MD5 and with *different*
pads pad1 and pad2.

We do this to avoid defining a 256 bit key, and to avoid mandating the
keying of MD5 IVs. The latter would prevent the use of MD5 as a "pure"
black-box (althought the implementation of a keyed IV is quite trivial and
explained in the draft for the sake of increased efficiency).

The additional assumption we need for this derivation of K1 and K2 from a
single K is that these two keys are "computationally independent", i.e.
that one cannot infer on the value of K1 from knowing K2, and viceversa.
This requires a relatively weak pseudorandomness assumption on the
compression function of MD5, e.g., reasonable mixing properties
(which is related to the basic design of the compression function
of MD5 as a block cipher).

As for the choice of values for pad1 and pad2, I chose them arbitrarily to
have a very simple and "portable" description. The only "structure"
behind these particular values is that the two bytes x'36' and x'5C'
have half of the bits in common and half of them different making the Hamming
distance between the pads a big one; this is intended to exploit as much
as possible the  mixing properties of MD5.

As for the best attack known on this proposal it requires knowing the result
of keyed-MD5 on about 2^64 known (but of the same length) messages, and
all of which use the *same* key. As explained in the draft, this is totally
impractical.




Follow-Ups: