[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
PMTU/DF issues
Folks,
The text below discusses some IPSEC/PMTU/DF issues and the corresponding
proposed changes to the IPSEC Architecture document. The initial
section is the text we propose to include in the IPSEC Architecture
document. That is followed by more detailed discussion/analysis which
will be in an Appendix. The term "communication association" refers to
the "connection" defined by source and destination addresses, transport
protocol, source and destination ports, and user id. ICMP PMTU is used
to refer to an ICMP message for:
IPv4:
- Type = 3 (Destination Unreachable)
- Code = 4 (Fragmentation needed and DF set)
- Next-Hop MTU in the low-order 16 bits of the second
word of the ICMP header (labelled unused in RFC 792),
with high-order 16 bits set to zero
IPv6 (RFC 1885):
- Type = 2 (Packet Too Big)
- Code = 0 (Fragmentation needed and DF set)
- Next-Hop MTU in the 32 bit MTU field of the ICMP6
Thank you,
IPSEC Document Editing Team
=========================================================================
We propopse to add the following text to the IPSEC Architecture document
re: Fragmentation and PMTU -- The analysis/discussion of these topics
will be included in Appendix X to the Architecture document.
1. DF bit:
In cases where a system (host or gateway) adds an encapsulating
header (ESP or AH tunnel), it MUST support the option of copying the
DF bit from the original packet to the encapsulating header (and
processing ICMP PMTU messages). This means that it MUST be possible
to configure the system's treatment of the DF bit (set, clear, copy
from encapsulated header) for each interface. (See Appendix X for
rationale.)
2. Fragmentation:
Fragmentation MUST be done after outbound IPSEC processing and
reassembly MUST be done before inbound IPSEC processing. (See
Appendix X for analysis of how this is impacted by the location
(where in the stack) of the IPSEC implementation.)
(Footnote) Any IPSEC implementation that is not integrated into an IP
implementation MUST support constructing any encapsulating IP headers
and doing any necessary fragmentation and re-assembly. (See Appendix
X for further discussion.)
3. Path MTU Discovery:
The amount of information returned with the ICMP PMTU message (IPv4
or IPv6) is limited and this affects what selectors are available for
use in further propagating the PMTU information. (See Appendix X
for more detailed discussion of this topic.)
A. If the ICMP PMTU message contains only 64 bits of the IPSEC
header (minimum for IPv4), then a security gateway MUST
support the following options on a per SPI/SA basis:
a. if the originating host(s) can be determined, send
the PMTU information to all the possible originating
hosts.
b. if the originating host(s) cannot be determined,
store the PMTU with the SPI/etc and wait until the
next packet(s) arrive from the originating host(s)
for the relevant security association. If it/they
are bigger than the PMTU, drop the packet(s), and
compose ICMP PMTU message(s) with the new packet(s)
and the updated PMTU, and send the ICMP message(s)
about the problem to the originating host(s) .
B. If the ICMP message contains more information from the
original packet, e.g., the 576 byte minimum for IPv6, then
there MAY be enough information to immediately determine to
which host to propagate the ICMP/PMTU message and to provide
that system with a 5-selector pointer for storing/updating
the PMTU. Under such circumstances, a security gateway MUST
generate an ICMP PMTU message immediately upon receipt of an
ICMP PMTU from further down the path.
The calculation of PMTU from an ICMP PMTU MUST take into account the
addition of any IPSEC header -- ESP or AH transport, or ESP or AH
tunnel. (See Appendix X for discussion of implementation issues.)
In hosts, the granularity with which PMTU ICMP processing can be done
differs depending on the implementation situation. Looking at a
host, there are 3 situations that are of interest with respect to
PMTU issues (See Appendix X for detailed discussion of this issue):
a. Integration of IPSEC into the native IP implementation
b. Bump-in-the-stack implementations, where IPSEC is implemented
"underneath" an existing implementation of a TCP/IP protocol
stack, between the native IP and the local network drivers
c. No IPSEC implementation -- This case is included because it
is relevant in cases where a security gateway is sending PMTU
information back to a host.
Only in case (a) can the PMTU data be maintained at the same
granularity as communication associations. In (b) and (c), the IP
layer will only be able to maintain PMTU data at the granularity of
source and destination IP addresses (and optionally ToS), as
described in RFC 1191. This is an important difference, because more
than one communication association may map to the same source and
destination IP addresses, and each communication association may have
a different amount of IPSEC header overhead (e.g., due to use of
different transforms or different algorithms).
Implementation of the calculation of PMTU and support for PTMUs at
the granularity of individual communication associations is a local
matter. However, a socket-based implementation of IPSEC in a host
SHOULD maintain the information on a per socket basis. Bump in the
stack systems MUST pass an ICMP PMTU to the host IP implementation,
after adjusting it for any IPSEC header overhead added by these
systems. The calculation of the overhead SHOULD be determined by
analysis of the SPI and any other selector information present in a
returned ICMP PMTU message.
The host mechanism for getting the updated PMTU to the transport
layer is unchanged, as specified in RFC 1191 (Path MTU Discovery).
In all systems (host or gateway) implementing IPSEC and maintaining
PMTU information, the PMTU associated with a security association
(transport or tunnel) MUST be "aged" and some mechanism put in place
for updating the PMTU in a timely manner, especially for discovering
if the PMTU is smaller than it needs to be. A given PMTU has to
remain in place long enough for a packet to get from the source end
of the security association to the system at the other end of the
security association and propagate back an ICMP error message if the
current PMTU is too big. Systems SHOULD use the approach described
in the Path MTU Discovery document (RFC 1191, Section 6.3), which
suggests periodically resetting the PMTU to the first-hop data-link
MTU and then letting the normal PMTU Discovery processes update the
PMTU as necessary. The period SHOULD be configurable.
=========================================================================
Appendix X -- Analysis/Discussion of PMTU/DF/Fragmentation Issues
The legend for the diagrams is:
==== = security association (AH or ESP, transport or tunnel)
---- = connectivity (or if so labelled, administrative boundary)
.... = ICMP message (hereafter referred to as ICMP PMTU) for
IPv4:
- Type = 3 (Destination Unreachable)
- Code = 4 (Fragmentation needed and DF set)
- Next-Hop MTU in the low-order 16 bits of the second
word of the ICMP header (labelled unused in RFC 792),
with high-order 16 bits set to zero
IPv6 (RFC 1885):
- Type = 2 (Packet Too Big)
- Code = 0 (Fragmentation needed and DF set)
- Next-Hop MTU in the 32 bit MTU field of the ICMP6
Hx = host x
Sx = socket x
SGx = security gateway x
X* = X supports IPSEC
1. DF bit -- In cases where a system (host or gateway) adds an
encapsulating header (e.g., ESP tunnel), should/must the DF bit in
the original packet be copied to the encapsulating header?
Fragmenting seems correct for some situations, e.g., it might be
appropriate to fragment packets over a network with a very small MTU,
e.g., a packet radio network, or a cellular phone hop to mobile node,
rather than propagate back a very small PMTU for use over the rest of
the path. In other situations, it might be appropriate to set the DF
bit in order to get feedback from later routers about PMTU
constraints which require fragmentation. The existence of both of
these situations argues for enabling a system to decide whether or
not to fragment over a particular network "link", i.e., for requiring
an implementation to be able to copy the DF bit (and to process ICMP
PMTU messages), but making it an option to be selected on a per
interface basis. In other words, an administrator should be able to
configure the router's treatment of the DF bit (set, clear, copy from
encapsulated header) for each interface.
2. Fragmentation -- Fragmentation MUST be done after outbound IPSEC
processing. Reassembly MUST be done before inbound IPSEC processing.
The general reasoning is shown below (delimited by the *******'s).
NOTE: IPSEC always has to figure out what the encapsulating IP header
fields are. This is independent of where you insert IPSEC and is
intrinsic to the definition of IPSEC. Therefore any IPSEC
implementation that is not integrated into an IP implementation must
include code to construct the necessary IP headers (IP2):
o AH-tunnel --> IP2-AH-IP1-Transport-Data
o ESP-tunnel --> IP2-ESP_hdr-IP1-Transport-Data-ESP_trailer
****************************************************************************
Overall, the fragmentation/reassembly approach described above works
for all cases examined.
AH Xport AH Tunnel ESP Xport ESP Tunnel
Implementation approach IPv4 IPv6 IPv4 IPv6 IPv4 IPv6 IPv4 IPv6
----------------------- ---- ---- ---- ---- ---- ---- ---- ----
Hosts (integr w/ IP stack) Y Y Y Y Y Y Y Y
Hosts (betw/ IP and drivers) Y Y Y Y Y Y Y Y
S. Gwy (integr w/ IP stack) Y Y Y Y
Outboard crypto processor *
* If the crypto processor system has its own IP address, then it
is covered by the security gateway case. This box receives
the packet from the host and performs IPSEC processing. It
has to be able to handle the same AH, ESP, and related
IPv4/IPv6 tunnel processing that a security gateway would have
to handle. If it doesn't have it's own address, then it is
similar to the bump-in-the stack implementation between IP and
the network drivers.
The following analysis assumes that:
1. There is only one IPSEC module in a given system's stack.
There isn't an IPSEC module A (adding ESP/encryption and
thus) hiding the transport protocol, SRC port, and DEST port
from IPSEC module B.
2. There are several places where IPSEC could be implemented
(as shown in the table above).
a. Hosts with integration of IPSEC into the native IP
implementation. Implementer has access to the source
for the stack.
b. Hosts with bump-in-the-stack implementations, where
IPSEC is implemented between IP and the local network
drivers. Source access for stack is not available;
but there are well-defined interfaces that allows the
IPSEC code to be incorporated into the system.
c. Security gateways and outboard crypto processors with
integration of IPSEC into the stack.
3. Not all of the above approaches are feasible in all hosts.
But it was assumed that for each approach, there are some
hosts for whom the approach is feasible.
For each of the above 3 categories, there are IPv4 and IPv6, AH
transport and tunnel modes, and ESP transport and tunnel modes -- for
a total of 24 cases (3 x 2 x 4).
Some header fields and interface fields are listed here for ease of
reference -- they're not in the header order, but instead listed to
allow comparison between the columns. (* = not covered by AH
authentication. ESP authentication doesn't cover any headers that
precede it.)
IP/Transport Interface
IPv4 IPv6 (RFC 1122 -- Sec 3.4)
---- ---- ----------------------
Version = 4 Version = 6
Header Len
*TOS Prty,Flow Lbl TOS
Packet Len Payload Len Len
ID ID (optional)
*Flags DF
*Offset
*TTL *Hop Limit TTL
Protocol Next Header
*Checksum
Src Address Src Address Src Address
Dst Address Dst Address Dst Address
Opt?ions Opt?ions Opt
? = AH covers Option-Type and Option-Length, but
not Option-Data.
The results for each of the 24 cases is shown below ("works" = will
work if system fragments after outbound IPSEC processing, reassembles
before inbound IPSEC processing). Notes indicate implementation
issues.
a. Hosts (integrated into IP stack)
o AH-transport --> (IP1-AH-Transport-Data)
- IPv4 -- works
- IPv6 -- works
o AH-tunnel --> (IP2-AH-IP1-Transport-Data)
- IPv4 -- works
- IPv6 -- works
o ESP-transport --> (IP1-ESP_hdr-Transport-Data-ESP_trailer)
- IPv4 -- works
- IPv6 -- works
o ESP-tunnel --> (IP2-ESP_hdr-IP1-Transport-Data-ESP_trailer)
- IPv4 -- works
- IPv6 -- works
b. Hosts (Bump-in-the-stack) -- put IPSEC between IP layer and
network drivers. In this case, the IPSEC module would have to
do something like one of the following for fragmentation and
reassembly.
- do the fragmentation/reassembly work itself and
send/receive the packet directly to/from the network
layer. In AH or ESP transport mode, this is fine. In
AH or ESP tunnel mode where the tunnel is to the
ultimate destination, this is fine. But in AH or ESP
tunnel modes where the tunnel end is different from
the ultimate destination and where the source host is
multi-homed, this approach could result in sub-optimal
routing because the IPSEC module may be unable to
obtain the information needed (LAN interface and
next-hop gateway) to direct the packet to the
appropriate network interface. This is not a problem
if the interface and next-hop gateway are the same for
the ultimate destination and for the tunnel end. But
if they are different, then IPSEC would need to know
the LAN interface and the next-hop gateway for the
tunnel end.
OR
- pass the IPSEC'd packet back to the IP layer where an
extra IP header would end up being pre-pended and the
IPSEC module would have to check and let IPSEC'd
fragments go by.
OR
- pass the packet contents to the IP layer in a form
such that the IP layer recreates an appropriate IP
header
At the network layer, the IPSEC module will have access to
the following selectors from the packet -- SRC address, DST
address, TOS, Next Protocol, and if there's a transport layer
header --> SRC port and DST port. One cannot assume IPSEC
has access to the User ID. It is assumed that the available
selector information is sufficient to figure out the relevant
Security Association(s).
o AH-transport --> (IP1-AH-Transport-Data)
- IPv4 -- works
- IPv6 -- works
o AH-tunnel --> (IP2-AH-IP1-Transport-Data)
- IPv4 -- works
- IPv6 -- works
o ESP-transport --> (IP1-ESP_hdr-Transport-Data-ESP_trailer)
- IPv4 -- works
- IPv6 -- works
o ESP-tunnel --> (IP2-ESP_hdr-IP1-Transport-Data-ESP_trailer)
- IPv4 -- works
- IPv6 -- works
c. Security gateways -- integrate IPSEC into the IP stack
NOTE: The IPSEC module will have access to the following
selectors from the packet -- SRC address, DST address, TOS,
Next Protocol, and if there's a transport layer header -->
SRC port and DST port. It won't have access to the User ID
(only Hosts have access to User ID information.) It also
won't have access to the transport layer information if there
is an ESP header, or if it's not the first fragment of a
fragmented message. It is assumed that the available
selector information is sufficient to figure out the relevant
Security Association(s).
o AH-tunnel --> (IP2-AH-IP1-Transport-Data)
- IPv4 -- works
- IPv6 -- works
o ESP-tunnel --> (IP2-ESP_hdr-IP1-Transport-Data-ESP_trailer)
- IPv4 -- works
- IPv6 -- works
****************************************************************************
3. Path MTU Discovery -- As mentioned earlier, "ICMP PMTU" refers to an
ICMP message used for Path MTU Discovery.
A. The amount of information returned with the ICMP message is limited
and this affects what selectors are available to identify security
associations, originating hosts, etc. for use in further propagating
the PMTU information.
In brief... An ICMP message must contain from the "offending"
packet:
- IPv4 (RFC 792) -- IP header plus a minimum of 64 bits
- IPv6 (RFC 1885) -- IP header plus a minimum of 576 bytes
Accordingly, in the IPv4 context, an ICMP PMTU may identify only the
first (outermost) security association. This is because the ICMP
PMTU may contain only 64 bits of the "offending" packet beyond the IP
header, which would capture only the first SPI from AH or ESP. In
the IPv6 context, an ICMP PMTU will probably provide all the SPIs and
the selectors in the IP header, but maybe not the SRC/DST ports (in
the transport header) or the encapsulated (TCP, UDP, etc.) protocol.
Moreover, if ESP is used, the transport ports and protocol selectors
may be encrypted.
Looking at the diagram below of a security gateway tunnel (as
mentioned elsewhere, security gateways do not use transport mode)...
H1 =================== H3
\ | | /
H0 -- SG1* ---- R1 ---- SG2* ---- R2 -- H5
/ ^ | \
H2 |........| H4
Suppose H0 sends a data packet to H5 via SG1 and SG2 and there is a
security association between SG1 and SG2. SG1 maps the IPSEC
selectors (source address, destination address, next protocol, source
port, destination port) into a security association that defines the
SPI, source and destination gateways, and how to process the packet.
It determines that this packet should go through the IPSEC ESP (or
AH) tunnel between SG1 and SG2. It then does the IPSEC processing
needed for the SG1/SG2 hop and sends the packet -- adds encapsulating
IP header with SRC = SG1, DEST = SG2 and adds ESP header before data
packet and ESP trailer after. Now suppose R1 sends an ICMP PMTU to
SG1. How does SG1 determine the PMTU selectors to use to return the
ICMP message to the originating host (H1)?
original after IPSEC ICMP
packet processing packet
-------- ----------- ------
IP-3 header (S = R1, D = SG1)
ICMP header (includes PMTU)
IP-2 header IP-2 header (S = SG1, D = SG2)
ESP header minimum of 64 bits of ESP hdr (*)
IP-1 header IP-1 header
TCP header TCP header
TCP data TCP data
ESP trailer
(*) The 64 bits will include enough of the ESP (or AH) header to
include the SPI.
- ESP -- SPI (32 bits), unknown (32 bits) -- could be
the optional Replay counter but one can't be sure.
- AH -- Next header (8 bits), Payload Len (8 bits),
Reserved (16 bits), SPI (32 bits)
This limitation on the amount of information returned with an ICMP
message creates a problem in identifying the originating hosts for
the packet (so as to know where to further propagate the ICMP PMTU
information). If the ICMP message contains only 64 bits of the IPSEC
header (minimum for IPv4), then the 5 original IPSEC selectors will
have been lost -- Source and Destination addresses, Next Protocol,
Source and Destination ports. But the ICMP error message will still
provide SG1 with the SPI, the PMTU information and the source and
destination gateways for the relevant security association.
The destination security gateway and SPI uniquely define a security
association which in turn defines a set of possible originating
hosts. At this point, SG1 could:
a. send the PMTU information to all the possible originating hosts.
This would not work well if the host list is a wild card or if
many/most of the hosts weren't sending to SG1; but it might work
if the SPI/destination/etc mapped to just one host.
b. store the PMTU with the SPI/etc and wait until the next packet(s)
arrive from the originating host(s) for the relevant security
association. If it/they are bigger than the PMTU, drop the
packet(s), and compose ICMP PMTU message(s) with the new
packet(s) and the updated PMTU, and send the originating host(s)
the ICMP message(s) about the problem. This involves a delay in
notifying the originating host(s), but avoids the problems of (a).
Since only the latter approach is feasible in all instances, a
security gateway MUST provide such support, as an option. However,
if the ICMP message contains more information from the original
packet, e.g., the 576 byte minimum for IPv6, then there MAY be enough
information to immediately determine to which host to propagate the
ICMP/PMTU message and to provide that system with a 5-selector
pointer for storing/updating the PMTU. Under such circumstances, a
security gateway MUST generate an ICMP PMTU message immediately upon
receipt of an ICMP PMTU from further down the path. NOTE: The Next
Protocol field MAY not be contained in the 576 bytes and the use of
ESP encryption MAY hide the selector fields that have been encrypted.
B. The calculation of PMTU from an ICMP PMTU has to take into account
the addition of any IPSEC header by H1 -- ESP or AH transport, or ESP
or AH tunnel. Within a single host, multiple applications may share
an SPI and nesting of security associations may occur. The diagram
below illustrates several possible combinations of security
associations between a pair of hosts (as viewed from the perspective
of one of the hosts.) (ESPt or AHt = tunnel mode; ESPx or AHx =
transport mode)
Socket 1 ----------------------------------------------- I
| n
Socket 2 (ESPt/SPI-A) ------------------------------- | t
\ | e
Socket 3 (AHx/SPI-B, ESPt/SPI-C) --- AHx (SPI-D) --- ESPt (SPI-E)--r
/ n
Socket 4 (ESPx/SPI-F, ESPt/SPI-G) -- ESPx (SPI-H) --- e
t
In order to figure out the PMTU for each socket that maps to SPI-E,
it will be necessary to have backpointers from SPI-E to each of the 4
paths that lead to it -- Socket 1, SPI-A, SPI-D, and SPI-H.
C. In hosts, the granularity with which PMTU ICMP processing can be done
differs depending on the implementation situation. Looking at a
host, there are 3 situations that are of interest with respect to
PMTU issues:
a. Integration of IPSEC into the native IP implementation
b. Bump-in-the-stack implementations, where IPSEC is implemented
"underneath" an existing implementation of a TCP/IP protocol
stack, between the native IP and the local network drivers
c. No IPSEC implementation -- This case is included because it is
relevant in cases where a security gateway is sending PMTU
information back to a host.
Only in case (a) can the PMTU data be maintained at the same
granularity as communication associations. In the other cases, the
IP layer will maintain PMTU data at the granularity of Source and
Destination IP addresses (and optionally ToS), as described in RFC
1191. This is an important difference, because more than one
communication association may map to the same source and destination
IP addresses, and each communication association may have a different
amount of IPSEC header overhead (e.g., due to use of different
transforms or different algorithms). The examples below illustrate
this.
In cases (a) and (b)... Suppose you have the following situation.
H1 is sending to H2 and the packet to be sent from R1 to R2 exceeds
the PMTU of the network hop between them.
==================================
| |
H1* --- R1 ----- R2 ---- R3 ---- H2*
^ |
|.......|
If R1 is configured to not fragment subscriber traffic, then R1 sends
an ICMP PMTU message with the appropriate PMTU to H1. H1's
processing would vary with the nature of the implementation. In case
(a) (native IP), the security services are bound to sockets or the
equivalent. Here the IP/IPSEC implementation in H1 can store/update
the PMTU for the associated socket. In cases (b), the IP layer in H1
can store/update the PMTU but only at the granularity of Source and
Destination addresses and possibly ToS, as noted above. So the
result may be sub-optimal, since the PMTU for a given SRC/DST/ToS
will be the subtraction of the largest amount of IPSEC header used
for any communication association between a given source and
destination.
In case (c), there has to be a security gateway to have any IPSEC
processing. So suppose you have the following situation. H1 is
sending to H2 and the packet to be sent from SG1 to R exceeds the
PMTU of the network hop between them.
================
| |
H1 ---- SG1* --- R --- SG2* ---- H2
^ |
|.......|
As described above for case (b), the IP layer in H1 can store/update
the PMTU but only at the granularity of Source and Destination
addresses, and possibly ToS. So the result may be sub-optimal, since
the PMTU for a given SRC/DST/ToS will be the subtraction of the
largest amount of IPSEC header used for any communication association
between a given source and destination.
D. Implementation of the calculation of PMTU (B) and support for PTMUs
at the granularity of individual "communication association s" (C) is
a local matter. However, a socket-based implementation of IPSEC in a
host SHOULD maintain the information on a per socket basis. Bump in
the stack systems MUST pass an ICMP PMTU to the host IP
implementation, after adjusting it for any IPSEC header overhead
added by these systems. The determination of the overhead SHOULD be
determined by analysis of the SPI and any other selector information
present in a returned ICMP PMTU message.
E. The host mechanism for getting the updated PMTU to the transport
layer is unchanged, as specified in RFC 1191 (Path MTU Discovery).
F. In all systems (host or gateway) implementing IPSEC and maintaining
PMTU information, the PMTU associated with a security association
(transport or tunnel) has to be "aged" and some mechanism put in
place for updating the PMTU in a timely manner, especially for
discovering if the PMTU is smaller than it needs to be. A given PMTU
has to remain in place long enough for a packet to get from the
source end of the security association to the system at the other end
of the security association and propagate back an ICMP error message
if the current PMTU is too big.
Systems SHOULD use the approach described in the Path MTU Discovery
document (RFC 1191, Section 6.3), which suggests periodically
resetting the PMTU to the first-hop data-link MTU and then letting
the normal PMTU Discovery processes update the PMTU as necessary.
The period SHOULD be configurable.
Follow-Ups: