[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

*To*: IPSEC@ans.net*Subject*: Re: key-ed MD5 again*From*: Hilarie Orman <ho@cs.arizona.edu>*Date*: Wed, 29 Mar 1995 19:38:19 -0700

An additional note on this subject from Richard Schroeppel (rcs@cs.arizona.edu) follows. As it is a rather detailed missive, let me summarize the bottom line as a contribution to rough concensus in favor of MD5(key,data,key). Date: Tue, 28 Mar 95 14:26:29 GMT From: "William Allen Simpson" <Bill.Simpson@um.cc.umich.edu> To: IPSEC@ans.net Subject: Re: key-ed MD5 again Status: R > From: Masataka Ohta <mohta@necom830.cc.titech.ac.jp> > > Some significant analysts think that a single key at the beginning of > > MD5 does not provide enough key material when the text is long. The > > MD5(key,MD5(text)) was suggested to improve the effect of the key in the > > final hash. > > As MD5 is a chain of addition, a transitive 1-to-1 mapping, of > hashed values, it is unlikely that the initial scrambling effect > by the added hased key is weakened later. > An excellent question? The conclusion was passed to me word of mouth. But, looking at the algorithm, it seems to me that up to 4 bits of influence can be lost from the high end of the sum on each block from lost carry in the four registers. As the text grows longer, less of the initial key effect is seen. It is that addition was used instead of xor that has this result. Bill.Simpson@um.cc.umich.edu The final few lines of the reference inplementation of MD5 are buf[0] += a; buf[1] += b; buf[2] += c; buf[3] += d; Here buf[] is an array of 4 32bit words that represent the state of MD5 before eating the present block (64 bytes) of information; and a,b,c,d are 4 32bit words that are a very-complicated function of both buf[] and the 64byte data block. The fact that the combining operator used here is + rather than xor doesn't cause any information loss. The missing carries don't matter in the least: If we know the contents of buf[] after this code has been executed, and the values of a,b,c,d, then we can perfectly well determine the contents of buf[] prior to the code; in fact, we could do buf[0] -= a; buf[1] -= b; buf[2] -= c; buf[3] -=d; to recover the original four 32bit numbers in buf[]. Using an xor operator would be equivalent, from the perspective of (no) information loss in these 4 lines of code. The information loss in MD5(key, VeryLongText) is more subtle, and arguably inconsequential. Suppose we model the MD5 block-eating operation as a random map of 128bits -> 128bits, with the particular details of the map determined by the 64 byte data block that is consumed. The range of such a random map will typically include only 63% of the possible values; each range value has a 1/e chance of not appearing. About 37% of the range values have one inverse, 18% have two inverses, etc. Eating one data block has, in effect, erased .66 bits of information from the original 128bit state, since the range contains only 2^127.34 values. This is no big deal, but clearly will be a problem if multiple rounds of the MD5 operation continue to erase .66 bits on each round. But this doesn't happen. On the next round, since the first-round range has diminished by a factor of .63, the percentage of colliding values also drops: After two random maps, the expected range size is 47% of the initial range, which is larger than .63^2. The recursion for the expected range size after N applications of (independent) random maps is - F(N) F(0) = 1; F(N+1) = 1 - e The first few values after F(0) = 1 are F(1) = .632, F(2) = .469, F(3) = .374, F(4) = .312, F(5) = .268 ... It's easy to check that F(N) does approach 0 for large N, but extremely slowly. In fact, F(N) ~= 2 / (N + log terms). In the case at hand, suppose the VeryLongText above is 1 MB. This is 15,625 blocks of 64bytes; if we feed those to MD5 one block at a time, the expected size of the range is diminished by a factor of 15625/2, or about 8000. In other words, we've lost 13 bits of the original 128 bits in the initial state of MD5. If VeryLongText is 1 GB, the same calculation shows we'll lose 23 bits. Now suppose Shamir comes up with some discovery that, say, the high bit of each word of the MD5 state is lost during the mapping process, so that 16 initial MD5 states map to one range state. It almost doesn't matter! The recursion above becomes - 16 F(N) F(0) = 1; F(N+1) = 1 - e Although the information loss in the first round is dramatic, the asymptotic formula for this revised F is just F(N) ~= 2 / (16 N + log terms). So the amount of extra information loss from the initial state is 4 bits. The formula MD5(Key, VeryLongText) means that Key is a 128bit initial value put into the MD5 state. We can achieve almost the same effect by prefixing Key to VeryLongText and using regular MD5 with the standard initial state. If we also append Key, getting MD5( Key .conc. VeryLongText .conc. Key ) then we cancel out all of the intermediate information loss discussed above, and also protect against some appending attacks. Rich Schroeppel rcs@cs.arizona.edu

**Re: key-ed MD5 again***From*: Masataka Ohta <mohta@necom830.cc.titech.ac.jp>

- Prev by Date:
**Re: IPv6 Security Last Call** - Next by Date:
**IBM's Commercial Data Masking Facility** - Prev by thread:
**Re: key-ed MD5 again** - Next by thread:
**Re: key-ed MD5 again** - Index(es):