[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Heartbeats Straw Poll



Theodore Ts'o writes:
> Neither of these (accounting and returning IP addresses to a DHCP pool)
> are IPSEC issues.  This is stuff you have to deal with even if you're
> not using IPSEC.  Hence, solving it with an IPSEC-specific solution
> seems like we're barking up the wrong tree.

Most of the NAT traversal proposal that encapsulate IPsec inside UDP
packets needs some kind of keepalive protocol to keep the NAT from
deleting the UDP "connection".

In that cases it doesn't matter if it is phase 1 or phase 2 "ping".

Also a comment for those who complain about keepalives being
make-deads, that is NOT TRUE for the ipsec traffic. Your TCP/IP
session is not dead even if the IPsec SA is removed. The SA will be
recreated immediately when you send your next packet to that
connection. On the other hand if you do not detect black holes then
your TCP/IP session will be dead, because it will be sending packets
to deleted SA and the other end will just ignore them. After a TCP/IP
timeouts the connection is dropped. 

I think the heart beats are needed to get information when to start
renegotiating the SAs with the other end and when to use existing
ones. And I would really like to get that information on as soon as
possible.

It does not mean that when the hearbeat stop coming in, I immedately
have to delete all SAs with that host. It means that I will mark that
SA as uncertain state, and if I run out of resources I will delete
that SA.

If there was only temporary network connectivity loss I will starts
seeing the heart beats again later, so I can move the SA back to
active SAs list.

Also if there is traffic coming to the SA that is in uncertain state,
I will first try to create new SA for that, just in case the other end
was really rebooted. If the other end was rebooted, it will send me
INITIAL-CONTACT notification so I know that I can remove the old SA.
If it wasn't rebooted, but there was temporary network problem that
happened to get fixed just in time to get the SA renegotiated, but
before the other ends heart beats start coming in, then I now have two
valid SAs with the other end and I can delete one of them to save
resources, or I can just leave them there. 

It is very annoying to know that remote gw was rebooted and your local
gw still thinks that it has SAs with the remote gw and sends stuff to
black hole. There are currently three easy ways to recover from the
problem:

	1) Reboot the local gw (== remove all SAs) (I think this is
	   the option that will be selected by most of the people, it
	   is easy, fast and after that everything works fine for
	   them. Of course if you had multiple VPN connections the
	   rest of the connections are now in the same situation, the
	   remote gw was rebooted, and if you are not sending data to
	   them, they have to do same thing :-)
	2) Manually delete all SAs from the local gw to the remote gw
	3) Call somebody in the remote office and ask them to send
	   some packet to you so that when the remote gw starts
	   negotiation with our local gw, it will include
	   INITIAL-CONTACT notification, which will then clear out
	   SAs in your local gw.

Most of the products have some kind of crash-recovery algorithm, that
will do some heuristic stuff based on the unknown SPIs etc to get over
this problem. That code is quite complicated and usually they don't
work that well... 

How often have you heard this in the IPsec interop meetings: "Lets
clear all SAs and try again, I think there was some old SAs left from
the previous test still laying around...".

So, I think we do need some kind of black hole detection, and probably
also need some kind of keep alive for the NAT traversal case.

Heartbeat in phase 1 is fine for both cases. Pings in phase 2 are fine
also, but not that simple, as people are trying to claim (also we
don't want one ping per SA we need one ping per GW/HOST pair, there
might be thousands of SAs between two machines). Birthday certificate
with INVALID-SPI notification would be excellent for the black hole
detection, but we still propably need something else for the NAT
traversal case.
-- 
kivinen@ssh.fi                               Work : +358 303 9870
SSH Communications Security                  http://www.ssh.fi/
SSH IPSEC Toolkit                            http://www.ssh.fi/ipsec/


Follow-Ups: References: