<!doctype linuxdoc system>

<article>

<title>Path MTU discovery in the presence of security gateways
<author>Michael Richardson <tt/mcr@sandelman.ottawa.on.ca/
<date>v1.1, 30 June 1997
<abstract>                   
This document describes the problem of getting accurate Path MTU
information in the presence of untrusted routers. Typical Path MTU
discovery is done by sending packets with the don't fragment bit set,
and listening for ICMP messages from routers that want to fragment the
packets. Unfortunately, these messages could be forged, and IPsec
based security system(s) can not pass make direct use of these
messages. An alternate, backwards compatible algorithm is
suggested. 
</abstract>

<toc>

<sect>Introduction
<p>
<sect1>Definition of terminology

<p>
  Here is a network of two security gateways, a client node and a
server node.

<tscreen><verb>
	C---{G1}--{R1}--{R2}...{R3}--{R4}...{Rn}--{G2}---S

  C is the TCP initiator.
  G1/G1 are security gateways.
  Rx are routers.
  .. is a link with a restricted MTU.
  S is the TCP listener.
</verb></tscreen>

<p>
There are both TCP endpoints and security association end points, they
will be distinguished with the following terms:

<descrip>
<tag/C/ is the transport layer originator. <tt/TLO/
<tag/S/ is the transport layer target.     <tt/TLT/
<tag>C/G1</tag>  is a network layer originator/target
pair. <tt>NLO/NLT/</tt>
<tag>G1/G2</tag> is a network layer originator/target pair.
<tag>G2/S</tag>  is a network layer originator/target
pair.
</descrip>

<sect>Introduction to the problem

<p>
[RFC1191] describes a mechanism for finding the maximum transmission
unit of an arbitrary internet path. It says:

<quote>
The basic idea is that a source host initially assumes that the PMTU
of a path is the (known) MTU of its first hop, and sends all
datagrams on that path with the DF bit set.  If any of the datagrams
are too large to be forwarded without fragmentation by some router
along the path, that router will discard them and return ICMP
Destination Unreachable messages with a code meaning "fragmentation
needed and DF set" [7].  Upon receipt of such a message (henceforth
called a "Datagram Too Big" message), the source host reduces its
assumed PMTU for the path.
</quote>

<p>
The are several problems:
<enum>
<item> the ICMP "Datagram Too Big" messages are sent from a intermiate
router (Rx in the diagram) to the gateway machine. They are not
authenticated in anyway, nor does it appear that there is any
reasonable way for the routers to prove they are legitimate
members of the routing path.

<p>
An attacker could influence the MTU used, possibly reducing the MTU of
the route to an unacceptably low value. This may consistute
unacceptably bad service. This is an issue to the Internet Metrics WG.

<p>
A too high an MTU would result in excessive fragmentation, which on a
loosy link, may result in very high retransmission rates. IPsec
tunnels do not retransmit encrypted packets, rather they depend on the
TLO node to do a retransmit, so retransmitted packets result in higher
encryption loads as well. A gateway with limited CPU may start
discarding more datagram fragments as it spends more time encrypting. 

<item> the PMTU information in the ICMP messages is difficult to relay
back to the TCP/UDP (or other) stacks of the sending node. So, nodes
<tt>C</tt> and <tt>S</tt> continue to send using whatever MTU they
started with. This defeats the point of doing PMTU in conjunction with
IPsec.

<item> it would be preferable to IPsec gateways for TLO nodes to have
PMTU available. This allows the IPsec gateway to ask the TLO node to
reduce its PMTU by the amount of overhead the ESP takes. Otherwise,
the resulting ESP datagram has to be fragmented.
</enum>

<p>
There are two path MTUs:
<enum>
<item> the TLO/TLT PMTU
<item> the NLO/NLT PMTU
</enum>

<p>
The ideal transport layer PMTU is the NLx PMTU minus the overhead
of the ESP header and transform. For rfc1829 ESP this number is 36
bytes, for the KSM draft ESP [rfcXXXX] this is 52 bytes (for DES,
DES/HMAC-MD5-96). 

<sect1>Requirement for PMTU information

<p>
The information must be authenticated. This implies that none of the
routers <tt>Rx</tt> may provide this information. It must come from
either nodes/routers on the trusted side, or from the gateways
themselves. 

<p>
Only the two gateway nodes know the effective number of bytes of
overhead.

<p>
Only the decrypting node can observe the fragmentation resulting from
the sequence of routers, R1..Rn. 

<p>
IPv6 does not allow for intermediate routers to fragment packets. Only
the originating node may do so. Intermediate routers MUST send ICMP
Datagram Too Big messages, and drop the packet. It should be noted,
again, that there are two originators: <tt>C</tt> and <tt>G1</tt>.

<sect>Authenticated PMTU information

<p>
Both proposal one and two must be adapted slightly for IPv6. This is
discussed later.

<sect1>Proposal one

<p>
Gateway <tt>G1</tt> MUST drop all non-local ICMP Host Unreachable
datagrams (including "Datagram too bid") which arrive on its
unprotected interface. The gateway MAY accept ICMP packets that are
addressed to itself.

<p>
ICMP datagrams arriving via an authenticated (whether encrypted or
not, depending only policy) at <tt>G1</tt> SHOULD be passed to their
destination node as normal.

<p>
Gateway <tt>G2</tt> upon receiving an ESP or AH packet that needs to
be reassembled, MUST take note of the size largest fragment
received. This value is compared to the previous largest fragment
size. If this size has changed by more than 10%, or more than 2*MSL
time (i.e. 2 minutes) has passed since the previous ICMP message, then
an ICMP Datagram Too Big message is generated. The largest
fragment size is initialized to 576 bytes.

<p>
The ICMP datagram is addressed from gateway <tt>G2</tt> to the
originating node <tt>C</tt>, and gives a size that is based on the
maximum fragment size (above), minus the IPsec overhead. The ICMP
datagram is sent via the tunnel on which the IPsec packet was a
member. I.e. the ICMP is encapsulated. 

<p>
A packet arriving at <tt>G1</tt> with the DF bit set, does not
cause the DF bit to be set on the encapsulating datagram. 

<sect1>Proposal two

<p>
Gateway <tt>G1</tt> MUST drop all non-local ICMP Host Unreachable
datagrams (including "Datagram too bid") which arrive on its
unprotected interface. The gateway MAY accept ICMP packets that are
addressed to itself.

<p>
ICMP datagrams arriving via an authenticated (whether encrypted or
not, depending only policy) at <tt>G1</tt> SHOULD be passed to their
destination node as normal.

<p>
Gateway <tt>G1</tt> MUST maintain a PMTU value with its SPI/Security
Association state. Packets arriving from node <tt>C</tt> with the DF
bit set, and that are bigger than the PMTU value, MUST be discarded,
and an ICMP Datagram Too Big message sent. In other words, the
security gateway acts as a router would if the IPsec tunnel were
in fact a physical interface. The PMTU value is initialized to either 
to the MTU of the interface on which outgoing ESP packets would
travel, minus the ESP overhead. 

<p>
Gateway <tt>G2</tt> upon receiving an ESP or AH packet that needs to
be reassembled, MUST take note of the size largest fragment
received. This value is compared to the previous largest fragment
size. If this size has changed by more than 10%, or more than 2*MSL
time (i.e. 2 minutes) has passed since the previous ICMP message, then
an ICMP Datagram Too Big message is generated. The largest
fragment size is initialize to 576.

<p>
The ICMP datagram is addressed from gateway <tt>G2</tt> to gateway
<tt>G1</tt>, and gives a size that is based on the maximum fragment
size (above), minus the IPsec overhead. The ICMP datagram is sent via
the tunnel on which the IPsec packet was a member. I.e. the ICMP is
encapsulated and encrypted.

<p>
A packet arriving at <tt>G1</tt> with the DF bit set (but fitting in
the MTU of the SA), does not cause the DF bit to be set on the
encapsulating datagram. If the DF bit was copied, and a routing change
reduced the PTMU, the datagram to be dropped, and never reach
<tt>G2</tt>, so news of the PMTU change would not be relayed.

<sect1> Differences

<p>
This section is still under construction. Input is requested:

<enum>
<item> the ICMP is generated by the near router in proposal two.
<item> the ICMP in the tunnel potentially carries addresses which
would not satisfy filtering rules.
</enum>

<sect>Limits to this solution: IPv6

<p>
The major problem in the IPv6 case is that the far end gateway
<tt>G2</tt> will not see no packets if the PMTU estimate is too
big. An ICMP will only be received by <tt>G1</tt> if the PMTU estimate
is small enough to transit all routers. 

<p>
In order to grow the PMTU, either initially, or to take advantage of a
routing change, the gateway <tt>G1</tt> must therefore send probe
packets of a larger size, knowing that the packet will be lost if the
probe is too big. There are other reasons why the packet, or the
response may be lost, so the probe must be done again anyway.

<p>
Further, the path may suddendly experience a drop in PMTU due to a
routing change. In that case, no packets will be received at
<tt>G2</tt>, so <tt>G1</tt> must also occasionally send probes of a
smaller size if it hasn't received an ICMP message in 2*MSL
time. (note, this number is probably too big)

<p>
Making smaller packets is easy: the gateway can use the fragmentation
facilities of IPv6 to split up an encrypted packet. A larger packet
can be produced by adding more padding before encryption.

<sect>Security Considerations:

<p>
	This entire document discusses a security protocol.

<sect>References:

<p>
<descrip>
<tag/[RFC-1825]/
R. Atkinson, "Security Architecture for the Internet Protocol",
RFC-1825, August 1995.
<tag/[RFC-1191]/
J. Mogul, S. Deering, "Path MTU Discovery", RFC-1191, November 1990.
<tag/[KSM-AH]/   
New AH draft.
<tag/[metrics]/
I. M. ISP, "How fast can it go?", draft-ietf-metrics-00.txt, work in
progress: Jan. 20, 1997
<tag/[Gupta97-1]/
V. Gupta, S. Glass, "Firewall Traversal for Mobile IP:
Goals and Requirements", draft-ietf-mobileip-ft-req-00.txt, work in
progress: Jan. 20, 1997
<tag/[Gupta97-2]/
V. Gupta, S. Glass, "Firewall Traversal for Mobile
IP: Guidelines for Firewalls and Mobile IP entities",
draft-ietf-mobileip-firewall-trav-00.txt, work in progress: March 17, 1997
</descrip>

<sect1> Author's Address

<p>
<tscreen><verb>
   Michael C. Richardson
   Sandelman Software Works Corp.
   152 Rochester Street
   Ottawa, ON K1R 7M4
   Canada

   Telephone:   +1 613 233-6809
   EMail:       mcr@sandelman.ottawa.on.ca
</verb></tscreen>

<sect1> Expiration and File Name

<p>
   This draft expires January 9, 1997

<p>
   Its file name is draft-richardson-ipsec-pmtu-disocovery-00.txt

</article>

