[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Path MTU Discovery



> Path-mtu discovery breaks in the presence of multiple IPsec
> encapsulation(*) (it might even break in the presence of ONE
> intermediate encapsulating entity). 

Are you sure it totally breaks?  It doesn't work as well, for sure, but I
don't see total breakage.

An encrypting router often has a "tunnel interface" that'll have properties
like:

	tun0:	10.69.0.0/16  --> 10.9.1.25

Interfaces have MTU associated with them.  A combination of PathMTU discovery
from the router to the tunnel endpoint, and knowledge of the algorithms used,
etc. can give this tunnel interface a reasonable MTU estimate.  Reasonable
enough that an MTU too large message for datagrams bound for 10.69.0.0/16 can
be sent.

BTW, by PathMTU discovery to the endpoint, I mean that the router (because it
is originating packets now, from its address to the tunnel endpoint) has a
cache-entry/host-route/whatever for the tunnel endpoint.  That entry can be a
repository for intermediate Path MTU information.

	Incoming IP data                           Outgoing forward result
	-- src 10.8.20.69  -->  ROUTER (ifaddr) -> src 10.10.20.20
	   dst 10.69.21.12    <-(10.8.20.2)        dst 10.9.1.25
	   proto=TCP            (10.10.20.20)->    proto=ESP (with IP inside)

		Figure 1:  Demonstration of "originating packets".

Now let's say there's ANOTHER layer of encryption between the router above
and its tunnel endpoint of 10.9.1.25.  THAT router may send an ICMP toobig to
OUR router (10.10.20.20), saying the path to 10.9.1.25 has a smaller MTU than
what I think.  Now, because of the multiple nestings of IP, I can't percolate
that ICMP toobig all the way back to the originator, but it will eventually
percolate back.

It will take N dropped messages for N layers of tunelling.  So if there's an
intermediate node's worth of encapsulation, the first message will be
dropped, then when the first subsequent message hits the cranked-down tunnel
endpoint, THEN a toobig can be sent back.

Using the above figure 1 as an example, if a router says the path MTU to
10.9.1.25 is less, then the router's tunnel interface will ratchet down its
mtu.  The original packet from the source host 10.8.20.69 will just be
dropped, because the router doesn't really want to go digging deep for the
originator.

The next IP datagram from 10.8.20.69 will generate an ICMP toobig from the
router, because it now has the ratcheted-down MTU on its tunneling interface,
and so with one dropped datagram, the node 10.8.20.69 knows the whole path
MTU.

If messages drop occasionally, that's fine.  This is IP.  Sure it's a
performance hit, but security and performance are sometimes (note my choice
of words) opposites in a tradeoff.

I don't think your solution about keeping SPIs helps a whole lot here.  It
seems to be unnecessary implementation cruft.  If there's any flaws in what
I've said, however, I'd certainly like to know.

> This still doesn't address the problem of the original TCP mtu (the
> mtu of the outgoing interface could be less than that reported on the
> kernel structure, depending on whether a packet will be IPsec'ed or
> not). But i doubt we can mandate a solution for that.

As for original TCP MSS, which needs to be set, IP must be able to send a
hint to the particular TCP session indicating that IP security will lower the
effective MSS for this TCP connection.  I say it must only alter a single TCP
session because IP security should use per-endpoint security properties where
possible.  See Bellovin's USENIX Security '96 conference paper for details on
why.  See draft-mcdonald-simple-ipsec-api-00.txt for how an application may
exploit this.

> Also, there's the case of whether we accept as valid ICMPs from anyone
> in between (which means anyone) or just two encapsulating entities
> (e.g. two tunneling firewalls). The network-correct approach is
> anyone; the security correct is next enc entity.

Good point, and it applies to ICMP messages of all shapes, sizes, and
flavors.  It's possible an intermediate router could send an ICMP with AH on
it, that way I have reasonable assurance it came from a router with that IP
address.

> (*) Steve Kent replied that it shouldn't break for an end host;

He is right.

> however, the 4.4BSD TCP code checks the outgoing interface MTU
> directly to determine the size of the packets, if the route entry does not
> have an mtu (check tcp_input.c, tcp_mss()). This means that either TCP
> is patched, or fragmentation will happen.

Stock 4.4 is broken w.r.t. trying to perform Path MTU discovery.  FreeBSD has
one solution for doing this with IPv4.  The NRL IPv6 code has another
solution that it implements on the IPv6 side of things.

Dan


References: