[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Peer liveliness







I believe that the current IKEv2 spec addresses this issue in a way that
puts minimal requirements on implementations, guarantees interoperability
(though with less than ideal convergence time), and allows implementations
to do better.

But it's quite possible that I don't understand all of the things that
could go wrong, or have inadequately expressed what implementations MUST
do, or just plain screwed up.

The implementation requirements for robust interoperability are:

(1) An IKE SA and all of its associated child SAs fail together. You aren't
allowed a "partial crash" where some of the state is lost but some is kept.
This will fall out naturally in most implementations, but may require some
modular designs to have different modules poll one another for liveness.

(2) A node may not send on a set of SAs associated with a single IKE SA
indefinitely without hearing something back. If it hears nothing for long
enough, it should send an IKE message requiring a reply, and if no reply
comes it must declare all of the SAs dead.

(3) A node that has packets to send according to its SPD and no SA to send
them on must periodically attempt to open an SA for them.

I believe these three requirements along guarantee that the right thing
will happen eventually. But it doesn't prescribe what the timers should be.
So it's possible it will take unacceptably long for things to converge. (If
network delays are long enough and timeouts short enough, the system could
fail to work at all, but I believe that problem is unavoidable).

The problem with more sophisticated strategies is that they may be
exploitable for denial of service attacks. Anyone can forge an INVALID_SPI
notification message from an IP address of their choice (since such a
message is not cryptographically protected). If such a message were
sufficient to cause its recipient to shut down and restart the SA, it would
be a very effective attack. So the spec says that such a message may be
used only as a hint to a problem - for example to trigger a
cryptographically protected liveness test. This will cause the failure to
be detected more quickly, but will never cause one to be detected falsely.

Similarly, the INITIAL_CONTACT notification can be used when setting up an
SA to assure the other end that it should abandon any SAs it has open to
the same identity. This is useful in - for example - the firewall case
where an identity is tied to a single box and it would be an error for that
box to bring up two connections at once. It would not be useful in the case
of a user who is allowed to remotely log in from multiple workstations at
the same time. Again, this makes convergence happen faster while never
making the wrong thing happen.

Responding to the individual comments below...

Gregory Lebovitz <Gregory@netscreen.com> wrote on 04/29/2003:
>
>  [WE] won't achieve interoperability unless it's mandated that
> [IMPLEMENTORS] must
> > reply INVALID_SPI (in clear or initiate IKE back to the
> > sender) whenever it
> > receives bad spi packets.  Current IKEv2 draft doesn't
> > address this issue
> > (only states you MAY reply a clear notify message).
> >
> > IKEv1 vendors has implemented many ways to solve it which leave poor
> > interoperability.  We should just pick a method and clarify
> > it in IKEv2.
> > ===============
> > Michael Shieh
> >
I think we did, but if you don't think it works, explain why.

>
> We have been having quite a debate in the ICSA IPsec consortium mail list
> recently trying to figure out how to handle this in IKEv1 (YES, STILL!!!)
>
> Here is what we know for sure of this problem statement:
> (a) detecting liveness/deadness of peer is a good thing, but does not
solve
> all the failure cases in and of itself
Which ones does it not solve?

>  (b) the behavior of a recently rebooted device when it receives an
> encrypted packet for an SPI or IKE-SA not in its SADB MUST be mandated,
or
> else implementations will not interoperate (as is the case in IKEv1, 5
years
> later).
Can you give an example of how two implementations following IKEv2 could
fail to interoperate?

>  (c) the behavior of a peer that receives a new IKE from a peer that it
has
> an existing IKE-SA with (i.e. the rebooted peer that is trying to
initiate a
> new connection) MUST be mandated, or else implementations will not
> interoperate (as is the case in IKEv1, 5 years later).
I believe it is mandated that the new IKE-SA must be accepted, and the old
one either closed immediately or closed after a timeout, though perhaps
that's just what I was thinking and not what I wrote. Is there anything
specific you would recommend?

>
> Darren Dukes wrote:
> > I believe INVALID_SPI does what you are looking for.  If I receive an
> > INVALID_SPI notify via an IKE SA I know to delete the SA and
> > traffic will
> > bring up a new one.
>
> I don't believe this will work, since it assumes that an IKE SA is
> established. In the scenario, the IKE-SA would have been lost along with
the
> SPI of the CHILD-SA by the rebooted peer.
>
Until a new IKE-SA is established, any INVALID_SPI message would be
cryptographically unprotected and therefore not to be taken as other
than a hint. If a new IKE-SA is established, the INVALID_SPI could
be taken as trustworthy and used to abandon the old SA. Without the
INVALID_SPI message, abandonment would still happen but it would take
longer.

> Recommendations to solve the solution:
> - the empty notify as an aliveness check is a good idea. It accomplishes
> what the DPD draft did. Keep using this.
>
Generating them is not mandated, but the ability to respond to them is.

> - do what you can to use empty notify to detect dead peer ASAP. The
faster
> the persisting peer can delete the old SPI and IKE-SA, the better. The
best
> case is for Persisting Peer to detect death and initiate new IKE to
rebooted
> peer before rebooted peer gets packets with old SPI, IKE-SA.
>
If the rebooted peer knows that the SA is needed, it can do that. If it
sets them up based on traffic, it has to wait until a packet comes in from
one side or the other.

> - On the Rebooted peer side: If an implementation receives a protected
> packet from an unkown SPI,
>  - simply relying on sending back an unprotected INVALID_SPI is not a
good
> idea. It is too easy to DoS the persisting peer by simply spoofing the
> rebooted peer's address.
>  - initiate IKE to the persisting peer.
This is allowed, although sending what looks like protected messages from
randomly chosen IP addresses to cause the node to attempt lots of IKE
connections is also a plausible DOS attack. Sending the INVALID_SPI message
will tell the other end to probe this end for liveness and initiate its own
new IKE connection if that liveness test fails. That's the path guaranteed
to work. Others will speed things up if implementations choose to do them.

>
> - On the Persisting Peer:
>  - If you get a new IKE request from a peer already in your SADB, respond
> with the under-attack, 6 message method. This will mitigate the DoS
attack.
> If you get all the way through SA and TS negotiation successfully, you
are
> assured (unless I'm missing something) that this really is your peer, and
> that he re-initiated because he lost the original IKE-SA. Start using the
> new IKE-SA and the new CHILD-SA and delete the previous ones after some
wait
> period.
>
Only if there is an INITIAL_CONTACT notification message. Otherwise it's
possible that the peer is opening multiple IKE SAs, perhaps because he is
replicated. In some configurations this might be acceptable. In firewall to
firewall tunnels, it would not and an implementation might reasonably treat
any IKE-SA as an INITIAL_CONTACT.

> Would this proposal explicitly solve things?
>
> Gregory.

      --Charlie