[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

linux-ipsec: cornered: MTU and fragmentation bugs

[ NOTICE!  This list will be hosted at lists.tislabs.com as of March 26.
There is no need to resubscribe, if you are on the list, you will remain
on it.  Just begin sending posts, and any administrative requests to
lists.tislabs.com as of now.  List mail to tis.com will cease to be
delivered as of March 26, 1999.  ]

Our initial operational experience with IPSEC is that inserting it
into transmission paths provokes operational Path MTU Discovery
problems that were not previously apparent.  I'm forwarding a pretty
succinct note from John Denker of AT&T that describes the problems.

Besides circumventing this problem in Linux IPSEC, it should be
brought up in a few more general fora.  I'm amazed that the original
Path MTU Discovery RFC (1191) never considered the failure mode that
happens if ICMP messages don't get back to the sender.  (The fix, to
terminate MTU discovery after a few unsuccessful retransmissions,
would have been simple had it been thought of.)

I'm surprised that the Linux kernel (2.0) is not sending ICMP
"fragmentation required and DF set" responses.  I hope this is fixed
in 2.1; RFC 1191 requires it.  I've cc'd Alan Cox (Linux networking
maintainer) and Keith Owens (who has posted several clear notes to
linux-kernel about some aspects of the issue).

I would have been shocked, shocked! had there not been an RFC or
Internet-Draft about Path MTU Discovery failures.  But indeed,
the IETF "TCP Implementation" working group is working on it:


I've cc'd the author of this draft, Kevin Lahey, on this message.  The
draft should definitely add mentions of the interaction of Path MTU
problems, IP tunnelling, and IPSEC, including problems getting ICMP
messages out of tunnels, and MTU's that are reduced by the size of
tunnel and IPSEC headers.  It should also cross-reference the Path MTU
and Tunnel MTU discussions in RFC 2003 (IP-in-IP).  It sounds like we
need some cross-fertilization between the TCPIMPL and IPSEC working
groups, and with other groups using IP-in-IP encapsulation (RFC's 2003
and 1853) such as MOBILEIP.


Date: Wed, 24 Mar 1999 12:10:19 -0500
To: linux-ipsec@clinet.fi
From: John Denker <jsd@research.att.com>
Subject: linux-ipsec: cornered: MTU and fragmentation bugs

Hi --

At the risk of being forever banished from the hacker community, and having
my wizardly pointy hat confiscated, let me say this:  The MTU/fragmentation
bug is *not* microsoft's fault!  Eeeck!  

Here's the deal:

0) Path-MTU discovery is a good thing.  Typically this is done by initially
sending large packets with the DF bit set, and seeing if they get through. 

1) The microsoft TCP clients negotiate for a large initial MSS.  This is
perfectly legal, and should result in efficiency if other players do their
part.  This is necessarily done with no knowledge of the actual path-MTU.

2) This makes it likely that packets will be sent that exceed the MTU of
some router along the path -- especially when there is encapsulation going
on at some point, such as the ipsec tunnel.

3) The RFCs say that when an oversized packet (with the DF bit set)
arrives, a router MAY return an ICMP message of type host-unreachable
explaining that fragmentation is needed and suggesting a new packet size.
In practice, path-MTU discovery without these frag-needed messages is
somewhat inefficient.

4) Heretofore linux has not generated these frag-needed messages.  I
consider this a weakness in linux.  I have a patch for this, as mentioned
in previous notes.

5) What's worse, there are some firewalls (the Firewall-One brand in
particular, and quite likely others) that in their usual configuration do
not pass these ICMP frag-needed datagrams.  I consider this a weakness in
the firewalls.  This is a pain in the neck to fix.

6) What's *much* worse is that practically all the web servers in the world
improperly assume that the routers MUST return a frag-needed message.  As
much as you might enjoy bashing microsoft, their web site is the only one
I've been able to discover that is both efficient and robust... efficient
in that it starts out by sending large packets, and robust that it will
(even in the absence of frag-needed messages) back off if they don't get

Here is a partial list of servers I've checked:
www.ibm.com			inefficient: always requests a small MSS
www.snap.com			inefficient: always requests a small MSS
www.toad.com			inefficient: never sets DF, always sends small packets
www.sandelman.ottawa.on.ca	inefficient: never sets DF, always sends small
www.hotbot.com		chokes
www.aol.com			chokes
www.netscape.com		chokes
www.altavista.com		chokes
www.yahoo.com			chokes
www.clinet.fi			chokes
www.sgi.com			chokes
www.intel.com			chokes
www.compaq.com		chokes
www.psi.net			chokes
www.cygnus.com		chokes
www.quintillion.com		chokes
www.research.att.com		chokes

Except for the first four, these servers are grossly noncompliant with the


Solution #1 (ideal):  Fix linux and fix the firewalls so that ICMP
frag-needed messages are returned to servers who depend on them.  This
results in maximum efficiency.

Solution #2 (for users who can't easily fix their firewalls): ipsec must
(at least optionally) support a "virtually-enormous tunnel" mode.  In that
mode, as I have previously discussed, when a packet arrives that is too big
to be transported in a single envelope, it should be fragmented
(*regardless* of whether the DF bit was set) and transported in multiple
envelopes.  If the DF bit was set on the raw packet (and perhaps not
otherwise) the packet should be reassembled by the other security gateway
before being sent on its way.  This behavior, while perhaps very slightly
inefficient, is much more robust in the face of all those real-world
ill-behaved web servers.

A tunnel with virtually-infinite MTU doesn't offend me in the least.  I
consider it consistent with the fact that the tunnel shows up as virtually
a single hop, no matter how many real-ethernet hops are used to transport
the envelopes.

IMHO this solution #2 is a required feature, necessary for version 1.00.
It should be at least a compile-time option.  Making it a run-time option
would be even nicer, with (I would think) hardly any extra work.

Cheers --- jsd

[John Gilmore here again:]

John Denker, when you say a Web server "chokes", you appear to mean that:

	*  It sends big packets with "DF".
	*  It doesn't recover if it never sees ICMP frag-neededs.

This is, in fact, compatible with the current RFC's.  The problem is
a protocol design bug, not an implementation bug.  See 
draft-ietf-tcpimpl-pmtud-00.txt for more details.

RFC 1191 does require routers to return ICMP messages when they can't
fragment a datagram, though it doesn't use those all-important capital
letters in exactly the right place (it says "is required to" rather
than "MUST" in section 4), and I haven't found an RFC that
specifically says MUST about this.  The RFC 1191 requirement was added
shortly after the "Gateway Requirements" RFC 1009 collected all the
little requirements into one place.  I don't think a newer collection
of router requirements has ever been issued, so it's easy for router
mfrs to miss this one.

Here are a few explanations of the general Path MTU problem.  I've
cc'd these folks too, so they can add this info to their explanations.