[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Rationale for the definitions of SKEYID (long msg)




Recent discussion in the list have made clear the need for some 
clarification of the rationale behind the definition of SKEYID for each of 
the authentication modes in IKE.  The following note is intended to provide 
an intuitive account of the basic cryptographic rationale behind 
these specifications without getting too technical.  As with everything 
regarding cryptographic design, intuition alone is dangerous.  So 
take this as a high-level (and simplistic) overview of the considerations.  
A full analysis of these modes would require far more elaboration.

Note that this is not a discussion on the merits and need for the different 
authentication modes but an account of how, after the different 
authentication modes were selected, we designed the SKEYID derivation.

(For reference I am appending at the end of this message the definitions
of SKEYID and related elements from the IKE rfc)

GUIDING PRINCIPLES

(1) The main principle guiding this design was, of course, to make 
each of the authentication modes secure.

(2) Second, to reduce the complexity of specification and implementation 
of the different modes by defining a mechanism for key derivation and key
usage which is common to all modes. 

NOTE: this is achieved to a maximal degre by IKE where only the definition 
of SKEYID differs between modes, these differences are essential 
given the essential differences in the type of authentication credentials 
that each mode assumes. Hopefully the presentation below will help 
clarifying these issues.

(3) Third, to use a single cryptographic function, "prf", for the 
derivation of all keys in the protocol. This simplifies the design, 
analysis, and implementation: if you choose a weak prf all fails, 
but if it is good then you are safe. This approach also makes sure 
that you are not using the same key to key two different algorithms 
("key separation" principle).

NOTE: PRFs, for pseudo-random functions, are powerful cryptographic tools: 
they provide for derivation of multiple keys that are indistinguishable 
from randomness and cryptographically independent from each other, 
and they provide mixing properties as we'll see later. They can be 
built out of any secure block cipher or keyed hash function, and 
do not require any idealized properties such as "random oracles".
CBC-MAC and HMAC are common realizations.

In addition to the above basic principles I have always adhered to the 
following rule:

(4) Avoid using the result of a DH exchange, denoted by g^xy in IKE, 
as a direct key to cryptographic algorithms. Whenever possible first hash 
this value.  Then use the hashed value as a key to a prf for deriving 
further keys.  This has the effect of not relying on the security 
of each independent bit in g^xy but rather on the overall "cryptographic 
entropy" present in g^xy. 

NOTE: I won't get here into a deep cryptographic discussion of things 
like the "decisional DH assumption", etc., but let me just say that 
assuming all bits in g^xy to be all equally and simultaneously pseudorandom 
is, in my view, a too optimistic and unnecessary assumption, given 
that a simple and performance-insignificant hashing of the key can 
achieve a much better mixing. IKE uses its prf also to achieve this 
mixing. Moreover, in some cases (pre-shared and PK encryption modes) 
this prf-based mixing makes the key material dependent on a secret 
quantity in addition to g^xy; this means that even if the specific 
DH group in use is broken 10 years from now, the attacker that recorded 
the communication will still be unable to read it since it does not 
have the secret prf key which is independent from g^xy! 
(In the case of PK encryption mode the attacker will need to be able 
to break the PK encryption in addition to the DH group.) 
I personally prefer these added lines-of-defense than going to 5000-bit 
DH keys as some people have been suggesting recently (and at what cost!)

NOTE: Those interested in reading more about these design principles and 
the compact multi-mode approach can take a look at my 1995 paper: 
``SKEME: A Versatile Secure Key Exchange Mechanism for Internet", 
Proceedings of the 1996 Internet Society Symposium on Network and 
Distributed System Security, Feb. 1996.
(Available from www.research.ibm.com/security/skeme.ps)
In particular, you can see there how simpler IKE could have been;
but, for good and bad,  IKE is the result of a multi-year huge-committee 
work and "rough consensus" and this shows in its final design.

SKEYID FUNCTIONALITY

All phase 1 modes in IKE establish a key via a Diffie-Hellman (DH) exchange.
The modes differ mainly in the way this exchange is authenticated.
The key SKEYID exists in all three modes and has as its most essential 
role to provide the DH authentication.
The definition of SKEYID is thus strongly tied to the type of keying 
material and cryptographic functions available in each of the modes.
However, once we define SKEYID, its use in the authentication process
is common to all modes and is defined via the expressions HASH_I and HASH_R.

In addition, SKEYID is used for deriving further keying material out of
the DH key g^xy. This keying material is used during phase 1 itself 
(mainly, for identity protection), and in phase 2 for phase2 authentication 
and for derivation of the key material needed by ipsec for securing 
IP data. These three derived keys are called SKEYID_e, SKEYID_a, 
and SKEYID_d, respectively, and their derivation is identical for all modes.

All these key derivations, and the definition of HASH_I/R, use a prf and 
SKEYID as the prf key. Moreover, SKEYID is never used (and should never be
used) as a key for anything else than for keying the prf.

NOTE: As I noted many times in the past, there is an unfortunate typo in the
definition of HASH_R in the IKE rfc (where SAi_b appears instead of SAr_b).

Next I explain the SKEYID derivation for each of the three authentication 
modes. Signature mode is presented last because it is the most complex to 
explain.

PRESHARED MODE

In this mode the two parties to the exchange are assumed to share a STRONG
secret key (the mode is not concerned with the way this sharing happened;
it assumes the sharing is secure: i.e., the key belongs and is known only 
to the exchange parties).

The underlying idea is simple: use a MAC function keyed with the 
preshared key in order to authenticate the DH exponents, g^x and g^y, 
as exchanged during the protocol.  Since a prf is, in particular, a 
secure MAC function then the HASH_I and HASH_R computations provide 
this functionality. There is no need to authenticate g^xy directly since 
this authentication is implied by the authentication of g^x and g^y 
by each of the parties.

In this case one could have simply definded SKEYID = pre-shared-key.
Instead we defined: SKEYID = prf(pre-shared-key, Ni | Nr)
in order to force SKEYID to be different and fresh with each exchange
(as the nonces Ni and Nb are); this is just a safer and more robust 
use of the long-lived pre-shared key.


PUBLIC KEY ENCRYPTION MODE

Here the authentication of the protocol happens by each party "proving"
possesion of its own private decryption key. More precisely, the 
initiator sends a nonce Ni encrypted under the responder's public 
key. Now the responder has to prove it was able to decrypt Ni. 
Same happens in the other direction.
How do they prove knowledge of these nonces? By their ability to compute
hash(Ni,Nb). This knowledge and ability is then bound to the authentication
process, namely, the parties do not reveal hash(Ni | Nb) but use 
this value to authenticate the exchanged information (including the 
DH exponents g^x and g^y). 

In this case one could have simply definded SKEYID = hash(Ni | Nb)

Instead we defined: SKEYID = prf(hash(Ni | Nb), CKY-I | CKY-R)

Why CKY-I | CKY-R? This is not strictly necessary since Ni and Nb 
are already fresh material. However, in the above way we get the
key material out of the prf (as in all other key derivations) and also 
we tie the key to a specific key-exchange session which in ISAKMP 
is identified by the pair of cookies.

NOTE: In a recent paper with Canetti (Eurocrypt'2001) we prove the 
security of this mechanism but for the case where authentication is 
performed uni-directionally by each party (using the other party's 
nonce as the key rather than hashing the nonces together). In this 
uni-directional case, tying the nonce to the session identifier is 
fundamental for security. Without it the protocol is open to key 
reuse/replay attacks.  The same principle is used above.

SIGNATURE MODE

Here things are not as intuitive as above, and much care is required.
Some background is needed and I'll try to sketch it here.

The signature mode of IKE was a development out of Photuris signature-based 
authentication. In turn, Photuris was based on the STS protocol by 
Diffie, van Oorschot, and Wiener (described in their excellent paper
``Authentication and Authenticated Key Exchanges'', Designs, Codes and 
Cryptography, V. 2, 1992, and whose reading I strongly recommend to anyone 
interested in the design of key exchange protocols -- btw, does anyone 
know of an electronic version in the web?)

One great contribution of the above paper is in pointing out to a 
subtle attack against the authenticity of key exchange protocols,
in particular those based on signature authentication (the attack 
is sometimes referred to as "unknown key-share attack"). 
In this attack a man-in-the-middle does not get to learn the exchanged 
key but succeeds in "confusing" the parties as for whom they are 
talking to (party A believes she exchanged the key with B, while 
B is convinced that the key was exchanged with Eve).

This problem is somewhat counter-intuitive and shows that plain signature 
authentication is not sufficient and careful BINDING of the exchanged 
key and the party's identity is required. To makes things more 
understandable, one can think that what is needed is a proof by the 
signer that it knows the key g^xy. The solution proposed by STS was 
to sign the exponents g^x and g^y and then encrypt the resultant 
signature with the exchanged key g^xy.  Photuris adopted this technique 
since the encryption was needed anyway to hide identities, a requirement 
set by the ipsec WG.

Unfortunately, the STS solution is not as strong as we would like.
The source of the problem is in using an encryption function for solving
an authentication problem (encryption is especially bad to bind things
together, just think of stream ciphers). One explicit weakness of the STS
protocol is that if a party is using a certificate for which the 
CA does not require a proof of possession then the "unknown 
key-share attack" is possible even after encrypting the signature.  
Other attacks and weaknesses may be possible. Another potential problem 
with the use of encryption as an authentication function is the 
possibility that implementations will use weaker encryption than 
authentication (either for reasons such as export regulations or 
just because the encryption may be regarded, as many did with Photuris, 
as necessary only for identity protection and then its strength may 
be set well below the "authenticity level")

The right and simple solution is not to use encryption but a MAC function
to bind together the signature, the party identities and the key g^xy.
(Encryption is still there but only for identity protection.)
There were two options: to apply a MAC keyed with g^xy on top of the
signature, or to first apply a MAC keyed with g^xy on the identity 
(of the sender) and then apply the signature on top of the MAC.
IKE implemented the later option (this was particularly convenient
for uniformity with other modes). That is, each party to the exchange,
let's call it A, produces a signature of the form 

SIGN_A(MAC(g^xy, id_A | g^x | g^y | cookies | SA))

(where g^xy acts as the key to the MAC).

In IKE the MAC is implemented via the prf and instead of using g^xy 
directly as a key we first hash it (see above "rule" for always 
hashing g^xy before using it as a key). This is how we obtain the usage
of HASH_I/R in signature mode (which is nothing but the above MAC()
expression). Now note that HASH_I/R are defined to use SKEYID as the key. 
This is why in signature mode we should have defined SKEYID=g^xy, 
or more precisely SKEYID=hash(g^xy).
And this is exactly what we did except that instead of applying hash(g^xy)
we defined: SKEYID = prf(Ni | Nr, g^xy)
This use of a prf, with a random but known key (this is what Ni|Nr is),
is equivalent to the use of a mixing hash function as needed to 
extract the "cryptographic entropy" from g^xy.
 
NOTE: one consequence of the above design is that the signature mode 
cannot be guaranteed, in general, to provide non-repudiation of the 
signed information since one is signing the result of a prf, and 
prf's are not required to be collision resistant.
This was a conscious decision: there is no requirement for IKE to provide
non-repudiation. If at all, such a property should be regarded as a
privacy drawback. (Still, if someone wants to provide this property it can 
use a prf which is assumed to be also collision resistant, e.g. HMAC)

NOTE: if identity protection would have not been a requirement a simpler
signature mode would have been possible (one that dispenses at all of 
encryption or a MAC in addition to the signature). ISO 9796 has such 
a simple protocol. However its implementation in IKE would have required 
giving up identity protection or adding more round trips to the protocol.

Hope this helps clarifying things.

Hugo

Appendix: SKEYID definitions from the RFC


   For signatures:            SKEYID = prf(Ni_b | Nr_b, g^xy)
   For public key encryption: SKEYID = prf(hash(Ni_b | Nr_b), CKY-I | CKY-R)
   For pre-shared keys:       SKEYID = prf(pre-shared-key, Ni_b | Nr_b)

   The result of either Main Mode or Aggressive Mode is three groups of
   authenticated keying material:

      SKEYID_d = prf(SKEYID, g^xy | CKY-I | CKY-R | 0)
      SKEYID_a = prf(SKEYID, SKEYID_d | g^xy | CKY-I | CKY-R | 1)
      SKEYID_e = prf(SKEYID, SKEYID_a | g^xy | CKY-I | CKY-R | 2)

   and agreed upon policy to protect further communications. The values
   of 0, 1, and 2 above are represented by a single octet. The key used
   for encryption is derived from SKEYID_e in an algorithm-specific
   manner (see appendix B).

   To authenticate either exchange the initiator of the protocol
   generates HASH_I and the responder generates HASH_R where:

    HASH_I = prf(SKEYID, g^xi | g^xr | CKY-I | CKY-R | SAi_b | IDii_b )
    HASH_R = prf(SKEYID, g^xr | g^xi | CKY-R | CKY-I | SAi_b | IDir_b )