[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Peer liveliness



Moving the discussion back to IKEv2, for the moment...

Several of us have spent a lot of time discussing this issue in the past few
weeks. A main problem we are trying to solve (though not the only one) is
rapid recovery from a rebooted peer.

If you look at the current DPD draft for IKEv1, it calls for sending
INITIAL-CONTACT whenever a peer thinks this is its first contact, i.e. has
no established SAs with the remote peer. This is done, even in the case
where DPD is running on both peers, to let the other peer -- the persisted
peer (as opposed to the rebooted peer) -- know to delete the old SAs asap.
Because the DPD timers might not catch it fast enough.

It is a very good idea to do this because sending the empty notify (DPD),
and the timer setting for how often, are totally optional. Therefore,
depending on settings, and *without* the INITIAL-CONTACT, it could be quite
some time before the persisted peer relinquishes its current SAs. 

Charlie, I can't remember, is the sending of INITIAL-CONTACT a MUST in the
latest IKEv2 draft? Would it be a good idea to make INITIAL-CONTACT
notification a MUST, if it is not already? Doing so would help shorten the
tunnel black hole in most cases, regardless of dpd settings.

The next question is: what is the best behavior for a (rebooted) peer who
receives an invalid SPI? Today the mandate is to drop silently. But, if two
rules are checked first, it can be fine (i think) to respond. Those rules
are:
  - do I have an active SA with the sender of the invalid SPI? If yes, drop
silently. If no, go to next rule check...
  - do I have the source IP of the sender in my SPD? i.e. is the sender a
valid peer? If no, drop silently. If yes...
  - initiate IKE per SPD definition.

If these two rules are followed, the only threat I see to responding with
IKE initiation is that an attacker who knew all of my valid peers' IPs
could, at the moment of recovery from reboot (or power up), cause me to
establish IKE with all my peers listed in SPD, even though I might not have
otherwise made those establishmetns. Attacker would do so by sending me
invalid SPIs spoofed with source of each of my peers. I guess I see this as
a pretty tough attack to pull-off in the real world (given spoof checking
used on most ISP routers these days), and the pay-off of the attack likely
doesn't merit the difficulty of execution. Does the value merrit the risk? 

Summary: IKEv2 aliveness checking doesn't ensure fast recovery. It provides
a mechanism that MAY be used for fast detection and recovery, but doesn't
guarantee it. However, combining the initiate-IKE response behavior +
INITIAL-CONTACT + liveness detection would ensure VERY fast
re-establishments for valid peers after one rebooted (and covers all other
cases too). If the liveness checking doesn't catch the failure fast enough,
the initiate IKE response w/ IC will. 

Thoughts?

Gregory.

> -----Original Message-----
> From: Ravi [mailto:ravivsn@roc.co.in]
> Sent: Wednesday, May 14, 2003 9:59 PM
> To: Charlie_Kaufman@notesdev.ibm.com
> Cc: Gregory Lebovitz; 'ddukes@cisco.com'; ipsec@lists.tislabs.com;
> Michael Choung Shieh; owner-ipsec@lists.tislabs.com
> Subject: Re: Peer liveliness
> 
> 
> Hi,
>   In IKEv2, the IKE SA are bound to the IPSEC SA and IPSEC SAs (Child
>   SAs) are deleted whenever IKE SA is dead. Due to this, I 
> don't see any
>   problem with the approach mentioned in IKEv2 specifications. But, in
>   IKEv1, this binding is not mandated and IPSEC SA can exist without
>   corresponding IKE SA. This is where I see problem and current DPD
>   specification does not seem to be considering this. I was proposing
>   before, the need for Dead Tunnel detection on the remote 
> SGs. I plan to
>   come out with draft in 1 to 2 weeks on this. It is only applicable
>   for IKEv1 implementations.
> 
> Regards
> Ravi
> 
> Charlie_Kaufman@notesdev.ibm.com wrote:
> > 
> > 
> > 
> > I believe that the current IKEv2 spec addresses this issue 
> in a way that
> > puts minimal requirements on implementations, guarantees 
> interoperability
> > (though with less than ideal convergence time), and allows 
> implementations
> > to do better.
> > 
> > But it's quite possible that I don't understand all of the 
> things that
> > could go wrong, or have inadequately expressed what 
> implementations MUST
> > do, or just plain screwed up.
> > 
> > The implementation requirements for robust interoperability are:
> > 
> > (1) An IKE SA and all of its associated child SAs fail 
> together. You aren't
> > allowed a "partial crash" where some of the state is lost 
> but some is kept.
> > This will fall out naturally in most implementations, but 
> may require some
> > modular designs to have different modules poll one another 
> for liveness.
> > 
> > (2) A node may not send on a set of SAs associated with a 
> single IKE SA
> > indefinitely without hearing something back. If it hears 
> nothing for long
> > enough, it should send an IKE message requiring a reply, 
> and if no reply
> > comes it must declare all of the SAs dead.
> > 
> > (3) A node that has packets to send according to its SPD 
> and no SA to send
> > them on must periodically attempt to open an SA for them.
> > 
> > I believe these three requirements along guarantee that the 
> right thing
> > will happen eventually. But it doesn't prescribe what the 
> timers should be.
> > So it's possible it will take unacceptably long for things 
> to converge. (If
> > network delays are long enough and timeouts short enough, 
> the system could
> > fail to work at all, but I believe that problem is unavoidable).
> > 
> > The problem with more sophisticated strategies is that they may be
> > exploitable for denial of service attacks. Anyone can forge 
> an INVALID_SPI
> > notification message from an IP address of their choice 
> (since such a
> > message is not cryptographically protected). If such a message were
> > sufficient to cause its recipient to shut down and restart 
> the SA, it would
> > be a very effective attack. So the spec says that such a 
> message may be
> > used only as a hint to a problem - for example to trigger a
> > cryptographically protected liveness test. This will cause 
> the failure to
> > be detected more quickly, but will never cause one to be 
> detected falsely.
> > 
> > Similarly, the INITIAL_CONTACT notification can be used 
> when setting up an
> > SA to assure the other end that it should abandon any SAs 
> it has open to
> > the same identity. This is useful in - for example - the 
> firewall case
> > where an identity is tied to a single box and it would be 
> an error for that
> > box to bring up two connections at once. It would not be 
> useful in the case
> > of a user who is allowed to remotely log in from multiple 
> workstations at
> > the same time. Again, this makes convergence happen faster 
> while never
> > making the wrong thing happen.
> > 
> > Responding to the individual comments below...
> > 
> > Gregory Lebovitz <Gregory@netscreen.com> wrote on 04/29/2003:
> > 
> >> [WE] won't achieve interoperability unless it's mandated that
> >>[IMPLEMENTORS] must
> >>
> >>>reply INVALID_SPI (in clear or initiate IKE back to the
> >>>sender) whenever it
> >>>receives bad spi packets.  Current IKEv2 draft doesn't
> >>>address this issue
> >>>(only states you MAY reply a clear notify message).
> >>>
> >>>IKEv1 vendors has implemented many ways to solve it which 
> leave poor
> >>>interoperability.  We should just pick a method and clarify
> >>>it in IKEv2.
> >>>===============
> >>>Michael Shieh
> >>>
> >>
> > I think we did, but if you don't think it works, explain why.
> > 
> > 
> >>We have been having quite a debate in the ICSA IPsec 
> consortium mail list
> >>recently trying to figure out how to handle this in IKEv1 
> (YES, STILL!!!)
> >>
> >>Here is what we know for sure of this problem statement:
> >>(a) detecting liveness/deadness of peer is a good thing, 
> but does not
> > 
> > solve
> > 
> >>all the failure cases in and of itself
> > 
> > Which ones does it not solve?
> > 
> > 
> >> (b) the behavior of a recently rebooted device when it receives an
> >>encrypted packet for an SPI or IKE-SA not in its SADB MUST 
> be mandated,
> > 
> > or
> > 
> >>else implementations will not interoperate (as is the case 
> in IKEv1, 5
> > 
> > years
> > 
> >>later).
> > 
> > Can you give an example of how two implementations 
> following IKEv2 could
> > fail to interoperate?
> > 
> > 
> >> (c) the behavior of a peer that receives a new IKE from a 
> peer that it
> > 
> > has
> > 
> >>an existing IKE-SA with (i.e. the rebooted peer that is trying to
> > 
> > initiate a
> > 
> >>new connection) MUST be mandated, or else implementations will not
> >>interoperate (as is the case in IKEv1, 5 years later).
> > 
> > I believe it is mandated that the new IKE-SA must be 
> accepted, and the old
> > one either closed immediately or closed after a timeout, 
> though perhaps
> > that's just what I was thinking and not what I wrote. Is 
> there anything
> > specific you would recommend?
> > 
> > 
> >>Darren Dukes wrote:
> >>
> >>>I believe INVALID_SPI does what you are looking for.  If I 
> receive an
> >>>INVALID_SPI notify via an IKE SA I know to delete the SA and
> >>>traffic will
> >>>bring up a new one.
> >>
> >>I don't believe this will work, since it assumes that an IKE SA is
> >>established. In the scenario, the IKE-SA would have been 
> lost along with
> > 
> > the
> > 
> >>SPI of the CHILD-SA by the rebooted peer.
> >>
> > 
> > Until a new IKE-SA is established, any INVALID_SPI message would be
> > cryptographically unprotected and therefore not to be taken as other
> > than a hint. If a new IKE-SA is established, the INVALID_SPI could
> > be taken as trustworthy and used to abandon the old SA. Without the
> > INVALID_SPI message, abandonment would still happen but it 
> would take
> > longer.
> > 
> > 
> >>Recommendations to solve the solution:
> >>- the empty notify as an aliveness check is a good idea. It 
> accomplishes
> >>what the DPD draft did. Keep using this.
> >>
> > 
> > Generating them is not mandated, but the ability to respond 
> to them is.
> > 
> > 
> >>- do what you can to use empty notify to detect dead peer ASAP. The
> > 
> > faster
> > 
> >>the persisting peer can delete the old SPI and IKE-SA, the 
> better. The
> > 
> > best
> > 
> >>case is for Persisting Peer to detect death and initiate new IKE to
> > 
> > rebooted
> > 
> >>peer before rebooted peer gets packets with old SPI, IKE-SA.
> >>
> > 
> > If the rebooted peer knows that the SA is needed, it can do 
> that. If it
> > sets them up based on traffic, it has to wait until a 
> packet comes in from
> > one side or the other.
> > 
> > 
> >>- On the Rebooted peer side: If an implementation receives 
> a protected
> >>packet from an unkown SPI,
> >> - simply relying on sending back an unprotected 
> INVALID_SPI is not a
> > 
> > good
> > 
> >>idea. It is too easy to DoS the persisting peer by simply 
> spoofing the
> >>rebooted peer's address.
> >> - initiate IKE to the persisting peer.
> > 
> > This is allowed, although sending what looks like protected 
> messages from
> > randomly chosen IP addresses to cause the node to attempt 
> lots of IKE
> > connections is also a plausible DOS attack. Sending the 
> INVALID_SPI message
> > will tell the other end to probe this end for liveness and 
> initiate its own
> > new IKE connection if that liveness test fails. That's the 
> path guaranteed
> > to work. Others will speed things up if implementations 
> choose to do them.
> > 
> > 
> >>- On the Persisting Peer:
> >> - If you get a new IKE request from a peer already in your 
> SADB, respond
> >>with the under-attack, 6 message method. This will mitigate the DoS
> > 
> > attack.
> > 
> >>If you get all the way through SA and TS negotiation 
> successfully, you
> > 
> > are
> > 
> >>assured (unless I'm missing something) that this really is 
> your peer, and
> >>that he re-initiated because he lost the original IKE-SA. 
> Start using the
> >>new IKE-SA and the new CHILD-SA and delete the previous 
> ones after some
> > 
> > wait
> > 
> >>period.
> >>
> > 
> > Only if there is an INITIAL_CONTACT notification message. 
> Otherwise it's
> > possible that the peer is opening multiple IKE SAs, perhaps 
> because he is
> > replicated. In some configurations this might be 
> acceptable. In firewall to
> > firewall tunnels, it would not and an implementation might 
> reasonably treat
> > any IKE-SA as an INITIAL_CONTACT.
> > 
> > 
> >>Would this proposal explicitly solve things?
> >>
> >>Gregory.
> > 
> > 
> >       --Charlie
> > 
> 
> 
> -- 
> 
> 
> The views presented in this mail are completely mine. The 
> company is not
> responsible for whatsoever.
> --------------------------------------------------------------
> ----------
> Ravi Kumar CH
> Rendezvous On Chip (i) Pvt Ltd
> Hyderabad, India
> Ph: +91-40-2335 1214 / 1175 / 1184
> 
> ROC home page <http://www.roc.co.in>
> 
> 
>