[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Heartbeats Straw Poll

Yes. All good points, Tero.

Beauty with out truth is insubstantial.
Truth without beauty is unbearable.

> -----Original Message-----
> From: owner-ipsec@lists.tislabs.com
> [mailto:owner-ipsec@lists.tislabs.com]On Behalf Of Tero Kivinen
> Sent: Wednesday, August 09, 2000 9:17 AM
> To: Theodore Ts'o
> Cc: ebooth@cisco.com; smb@research.att.com; sommerfeld@East.Sun.COM;
> sfanning@cisco.com; warlord@mit.edu; skelly@redcreek.com;
> paul.hoffman@vpnc.org; ipsec@lists.tislabs.com
> Subject: Re: Heartbeats Straw Poll
> Theodore Ts'o writes:
> > Neither of these (accounting and returning IP addresses to 
> a DHCP pool)
> > are IPSEC issues.  This is stuff you have to deal with even 
> if you're
> > not using IPSEC.  Hence, solving it with an IPSEC-specific solution
> > seems like we're barking up the wrong tree.
> Most of the NAT traversal proposal that encapsulate IPsec inside UDP
> packets needs some kind of keepalive protocol to keep the NAT from
> deleting the UDP "connection".
> In that cases it doesn't matter if it is phase 1 or phase 2 "ping".
> Also a comment for those who complain about keepalives being
> make-deads, that is NOT TRUE for the ipsec traffic. Your TCP/IP
> session is not dead even if the IPsec SA is removed. The SA will be
> recreated immediately when you send your next packet to that
> connection. On the other hand if you do not detect black holes then
> your TCP/IP session will be dead, because it will be sending packets
> to deleted SA and the other end will just ignore them. After a TCP/IP
> timeouts the connection is dropped. 
> I think the heart beats are needed to get information when to start
> renegotiating the SAs with the other end and when to use existing
> ones. And I would really like to get that information on as soon as
> possible.
> It does not mean that when the hearbeat stop coming in, I immedately
> have to delete all SAs with that host. It means that I will mark that
> SA as uncertain state, and if I run out of resources I will delete
> that SA.
> If there was only temporary network connectivity loss I will starts
> seeing the heart beats again later, so I can move the SA back to
> active SAs list.
> Also if there is traffic coming to the SA that is in uncertain state,
> I will first try to create new SA for that, just in case the other end
> was really rebooted. If the other end was rebooted, it will send me
> INITIAL-CONTACT notification so I know that I can remove the old SA.
> If it wasn't rebooted, but there was temporary network problem that
> happened to get fixed just in time to get the SA renegotiated, but
> before the other ends heart beats start coming in, then I now have two
> valid SAs with the other end and I can delete one of them to save
> resources, or I can just leave them there. 
> It is very annoying to know that remote gw was rebooted and your local
> gw still thinks that it has SAs with the remote gw and sends stuff to
> black hole. There are currently three easy ways to recover from the
> problem:
> 	1) Reboot the local gw (== remove all SAs) (I think this is
> 	   the option that will be selected by most of the people, it
> 	   is easy, fast and after that everything works fine for
> 	   them. Of course if you had multiple VPN connections the
> 	   rest of the connections are now in the same situation, the
> 	   remote gw was rebooted, and if you are not sending data to
> 	   them, they have to do same thing :-)
> 	2) Manually delete all SAs from the local gw to the remote gw
> 	3) Call somebody in the remote office and ask them to send
> 	   some packet to you so that when the remote gw starts
> 	   negotiation with our local gw, it will include
> 	   INITIAL-CONTACT notification, which will then clear out
> 	   SAs in your local gw.
> Most of the products have some kind of crash-recovery algorithm, that
> will do some heuristic stuff based on the unknown SPIs etc to get over
> this problem. That code is quite complicated and usually they don't
> work that well... 
> How often have you heard this in the IPsec interop meetings: "Lets
> clear all SAs and try again, I think there was some old SAs left from
> the previous test still laying around...".
> So, I think we do need some kind of black hole detection, and probably
> also need some kind of keep alive for the NAT traversal case.
> Heartbeat in phase 1 is fine for both cases. Pings in phase 2 are fine
> also, but not that simple, as people are trying to claim (also we
> don't want one ping per SA we need one ping per GW/HOST pair, there
> might be thousands of SAs between two machines). Birthday certificate
> with INVALID-SPI notification would be excellent for the black hole
> detection, but we still propably need something else for the NAT
> traversal case.
> -- 
> kivinen@ssh.fi                               Work : +358 303 9870
> SSH Communications Security                  http://www.ssh.fi/
> SSH IPSEC Toolkit                            http://www.ssh.fi/ipsec/