[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: I-D ACTION:draft-ietf-ipsec-udp-encaps-00.txt



David_Mason@nai.com ("Mason, David") writes:
> > The 5 minute interval of keep-alives (keep-alives still sent
> > every 20 (or X) seconds) after all SAs are down is to allow
> > for the case for the peer external to the NAT to potentially
> > decide that it needs to rekey the MM, and still have the NAT
> > hole poked so that the IKE traffic could traverse the NAT and
> > reach the internal machine.  This is a tradeoff between
> > maintaining lots of state and sending keep-alives vs.
> > potentially breaking upper layer connections.
> The paragraph as worded implies that a single keepalive packet
> might be sent after an elapsed period of up to N (5) minutes.
> I would suggest moving the paragraph to after the paragraph
> below it and re-wording it to something like:
> 
> "A peer MAY continue to send NAT-keepalive packets every M seconds
> for up to N minutes after the last Phase I and Phase II SA between
> the peers has been deleted.  This will keep the NAT mappings alive
> and allow the peer external to the NAT to initiate a rekey if it
> decides it needs to.  N is a locally ..."
> 
> Also I would think that the 5 minute default is too long.  If VPN
> wasn't involved the "server" wouldn't normally be able to connect
> back to the "client" after 5 minutes so I'm not sure why this
> capability should be provided for VPN.  One or two keepalive
> packets after the last SA has been deleted should be more than
> sufficient as the default.  Is the long 5 minute default have
> something to do with the fact that the keepalive mechanism
> happens at the IP layer and for some implementations it might
> be problematic for it to know when the last Phase 1 SA has been
> deleted?  I guess this is a problem of dangling Phase 1 SAs :-)

I did not write (this) draft, but I assume it's mainly usability oriented
(=P1/P2 rekey delays).

_If_ IPsec SAs and P1 SAs are not tied together, there may be P1 SA down
yet still IPsec SAs going on.

Let's do hypothetical scenario..

 - IPsec SA expires, we're dumb and have only hard limit so the SA is really
 zapped _when_ we start rekey
 - P1 SA was down - oops.
 - negotiate P1 SA, do P2

Example:

 We are using lossy GPRS backbone (I have enjoyed one for remote access),
 we can except say, ~1k/s bandwidth, 1s roundtrip and ~30% UDP packet loss.

Assuming either 

 [a] jumbo-content (I've seen IKE PhaseI+II that took about hundred kb), or

 [b] bad luck with packet loss (with relaxed retry timers, and some form of
 reasonable 10-20s start for retry interval with exponential interval
 between retries, it takes only few resends to push resend to minute(s)
 range)

a) shows user/admin with attitude problem, but b) I think is fairly common
occurrence.

Admittedly, I think that for _most_ applications say, 10s would be enough
but I am not sure if that's good design criteria for general-purpose
protocol.. Too bad the IPsec isn't more connection-oriented so we could
just say "I'm done with you, sod off.".

I don't even go into using this to fight NASA's NAT for probe that's going
around Sun, because I assume it's bit out of context and delay's too small
;-)

Summary:
 _If_ you assume network isn't lossy, supporting 66s roundtrip is stupid,
 but packet losses with reasonable IKE retry behavior can make 300s quite
 short time.

> To really be reliable the IPsec system behind the NAT device will
> need to keep TCP connection state and generate keepalives while
> there are open TCP connections (no need to keep the IPsec/IKE SAs
> active though).  The NAT device use to keep the TCP state for
> them but since TCP is now encapsulated in ESP/UDP the NAT mappings
> no longer have state and just have the UDP timeout.  The IPsec
> system must now assume that burden if it wants to provide the
> same functionality that would exist without using VPN.  There is
> probably a better solution to this than sending a packet every
> 20 seconds for TCP connections that are idle for a very long time.
> But I don't think just doing it for five minutes is a solution either.

Doing keepalive _only_ when TCP sessions are open is bad idea; when the NAT
mapping dies along with the TCP sessions, the new sessions become
problematic;

 - send (some) messages along, with keepalives
 - no reply at all? (I guess you'd need DPD for this) => zap P1/P2s
 - new P1 SA
 - new P2 SA

Which would mean quite major delay as far as QoS is concerned.

Second choice, which is to simply make P2 SAs last as long as there is TCP
activity (=just do P1 with INITIAL CONTACT flag set once we desire to
re-start traffic), works somewhat better, although it breaks (at least
spirit of) RFCs _and_ causes major CPU hogging due to unnecessary P1/P2s.

> -dave

-Markus

-- 
Markus Stenberg <stenberg@ssh.com> of SSH Communications Security (www.ssh.com)