-- BEGIN included message
- To: ipsec@ns.ncsa.com
- Subject: IPSec error monitoring
- From: Jesse Walker <jwalker@shiva.com>
- Date: Fri, 21 Aug 1998 15:09:14 -0400
- Organization: Shiva Corporation
The attached note discusses a topic that seems relevant to IPSecond. I'd like to know whether people thinks this is worth persuing further. Jesse Walker Shiva Corporation 28 Crosby Drive Bedford, MA 01730-1437 voice: 781-687-1719 fax: 781-687-1828 internet: jwalker@shiva.com1. Motivation IPSec provides no mechanism to expeditiously release resources when the media carrying its security associations fails, or when one end of a set of security association abruptly drops out of the dialog without notifying the peer. This problem is particularly common when one end of a security association is a remote access client, under the control of an unsophisticated user, who will as often as not switch off a laptop computer rather than follow "proper procedure" to diconnect. The discussion from a few months ago regarding what to do when one endpoint of a set of security associations crashes and then reboots provides another illustration of problems encountered due to this lack. Another basic problem is that, to date, IPSec implementations have provided rather weak diagnostics, and the security features of the protocol suite magnify the difficulty in applying traditional network trouble-shooting mechanisms. When IPSec security associations experience problems, it is often very difficult to isolate and correct the root cause. IPSec diagnostics is still something of a black art. To address problems like these, it might be useful to try to measure the quality of a path between two security association endpoints. One way we might implement these measurements is to mimic aspects of the PPP LQM protocol (RFC 1989). "LQM" packets could be sent over the "link" between the SA endpoints to establish the loss characteristics of the path between them. The reported statistics (or their lack) could then be used for a number of purposes, such as providing a basis for some classes of quality of service decisions (when to shift some traffic from one tunnel to another), diagnostics (traffic works in only one direction), security alerts (the number of errors on one SA suddenly skyrockets), or keepalives (we failed to receive too many consecutive messages). For want of better terminology, we will call the set of security associations between two endpoints a "path". Given this terminology, we will call the protocol the Path Quality Monitoring protocol, or PQM. The point of using "path" rather than the ancestoral term "link" is to emphasize that we have to address the differences from the PPP the environment. In the IPSec environment generally packets can be reordered, and the bandwidth available can be much more dynamic, as routers inside the cloud between the path endpoints shift load and topology. However, like LQM, to effect flexible future policies, the PQM protocol should measure data loss in units of packets and octets. It should measure each security association separately, and communicate all measurements to both IKE peers, so that each end can implement its own policies governing inadequate quality. Each PQM implementation should maintain a set of counters of packets and kilobytes transmitted and successfully received, and convey this information to its peer in a PQM message at regular intervals. By comparing the values reported in successive PQM messages, a receiver should be able to form a fairly accurate picture of a path's quality. The intent of the counters is to provide an indication of the dynamics of the information passing over the path between the PQM peers, not to measure the total bandwidth used. One possible PQM design would encapsulate PQM messages directly under IPSec security associations. Indeed, it seems almost self evident that monitoring packets should be exposed to exactly the same environment as the "real" packets. This seems to imply that monitoring packets should be sent via the same security association as the "real" ones. At least three apparent problems exist with this approach, however. First, the messages would not in general comply with the SPD for the security associations, because they must circulate between the security association end-points. When a security gateway is involved, datagrams addressed to or from the IP address of one of the path endpoints (the security gateway) usually will not be authorized to traverse any of the security associations used to transport "real" traffic. This implies that PQM may often require its own security association. We could construct a new pair of IPSec security associations just to accomodate this function. However, it is worth noting that an IKE security association typically already exists between the path endpoints. It might be better to exploit this one rather than negotiating yet one more. Second, providing the "same" environment for PQM as for the operational packets is more problematic in the general IPSec environment, because of the lack of a physical link. It is in principle impossible to guarantee the same environment for even two successive packets sent via the same security association, as they may be routed differently and hence ultimately subjected to different filtering or fragmentation. This suggests that the inherited wisdom of using the "link" itself as the transport vehicle for a quality monitoring protocol is of less importance here than in the PPP case. A final problem is that multiple pairs of security associations can exist between two IKE peers instead of a single half-duplex pair. A PQM protocol based directly on the IPSec security associations would entail either running PQM messages through each of the pairs, thereby wasting bandwidth, or relying on the selection of an arbitrary security association to convey the information, or depending upon configuration, etc. Having to make any choice at all again suggests it is not the right design choice. If you buy this reasoning, something like PQM seems like a good addition to IPSec, and an IKE extension leaps out as a reasonable mechanism for supporting PQM. The remainder of this note is a speculative first attempt to define such an extension. It is not intended as the last word on the subject and without a doubt can be improved significantly by others. Its single greatest merit is concreteness, affording us an opportunity to pick over its bones until an acceptable protocol emerges from it. 2. Protocol data units The PQM protocol will use IKE as its transport vehicle. PQM messages will constitute a new Informational Exchange type, which means they are half-duplex. To this end, we introduce a new IKE attribute, used to negotiate PQM as an IKE SA characteristic, and three new payloads, to convey the necessary statistics. This section describes these new facilities. 2.1. PQM attribute This is a new attribute of an IKE SA, so can be proposed during negotiations for the IKE SA. It offers a maximum PQM reporting period in seconds. When accepted, this is the base reporting period for PQM over the path. When not accepted or not present, PQM is not used. The PQM attribute is identified by IKE attribute id TBD. Its value is a nonnegative number 16-bit number, with a default value of TBD seconds. The PQM atrribute value MUST be configurable. The value 0 is not meaningful in the protocol. 2.2. PQM Payload The PQM Payload is used to convey PQM statistics to the IKE security association peer. Figure 1 shows the format of the PQM Payload. 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Payload | RESERVED | Payload Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Number SPIs | RESERVED | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Sequence Number | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Reporting Period | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PQMs Sent | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | PQMs Received | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Peer PQMs Sent | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Peer PQMs Received | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 1. PQM Payload The PQM Payload fields are defined as follows: o Next Payload (1 octet) - Identifier for the payload type of the next payload in the message. If the current payload is the last in the message, then this field will be 0. This field MUST NOT contain the values for the Send Statistics or Receive Statistics Payloads, as they are considered part of the PQM information reported by the payload. o RESERVED (1 octet) - Unused, set to 0 o Payload Length (2 octets) - The length of the entire PQM payload, including Send Statistics and Receive Statistics payloads, in octets. o Number SPIs (1 octet) - The combined number of Send Statistics and Receive Statistics payloads conveyed by this PQM payload. This MUST be an even number, as a PQM payload MUST always include reports for each of the paired security associations negotiated by a single Quick Mode proposal. o RESERVED (3 octets) - Unused, set to 0 o Sequence Number (4 octets) - the number of PQM messages sent thus far. This is a monotonically increasing number counting from zero. This allows the receiver to detect out-of-order PQM messages. o Reporting Period (4 octets) - the maximum number of seconds before the local system expects to generate another PQM message. The local system MAY generate another PQM message sooner than this, but it MUST NOT generate the next PQM message later than what the Reporting Period field advertises. o PQMs Sent (4 octets) - The number of PQM payloads the local system has sent so far over this path. Since the path consists of the current IKE SA as well as all the IPSec SAs between the path endpoints, this definition is general enough to accomodate roll- over, when one SA replaces another. o PQMs Received (4 octets) - The number of PQM payloads the local system has received over this path. o Peer PQMs Sent (4 octets) - The number of PQM payloads the path peer has reported to have sent thus far. This information allows the local implementation to monitor the efficacy of its own PQMs. o Peer PQMs Received (4 octets) - The number of PQM payloads the path peer has reported to have received thus far thus far. This information also allows the local implementation to determine the efficacy of its own PQMs. The payload type for the PQM payload is TBD. 2.3. Send Statistics Payload The Send Statistics Payload is used to convey PQM statistics to the path peer. These statistics indicate the level of traffic generated by the local system to the peer over a particular security association, as well as pertinent performance information the peer has reported for that security association. This information allows the peer to gauge the loss characteristics of the security association. Figure 2 shows the format of the Send Statistics payload: 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Payload | RESERVED | Payload Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SPI | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Packets Sent | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Kilobytes Sent | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Last Peer Received Packets | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Last Peer Received Kilobytes | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Last Peer Packet Errors | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Last Peer Packet Drops | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 2. Send Statistics Payload The Send Statistics Payload fields are defined as follows: o Next Payload (1 octet) - Identifier for the payload type of the next payload in the message. This MUST only be 0, indicating the end of the PQM payload, TBD, indicating the next Send Statistics Payload, or TBD, indicating the next Receive Statistics payload. o RESERVED (1 octet) - Unused, set to 0 o Payload Length (2 octets) - The length of the Send Statistics payload in octets. This always has the value 32. o SPI (4 octets) - The IPSec SPI of the security association used to transmit packets to the peer. The IP address of the PQM receiver and the SPI uniquely identify the security association. All the counters in the payload refer to this security association. Remark: Obviously the local implementation knows the IP address of the peer, since it negotiated a security association with it. Therefore we don't include it in the Send Statistics. This usage means that several paths can exist between multi-homed hosts, and that we will monitor each separately. This is an intended consequence of the usage. o Packets Sent (4 octets) - The number of packets the local system has transmitted over the indicated security association. o Kilobytes Sent (4 octets) - The number of kilobytes of data the local system has transmitted over the the indicated security association. This value is computed prior to encapsulation. Issue: we are trying to avoid 64 bit counters. Clearly we have not accomplished this, because most media deployed today use packets of at least 1.5 K bytes and the counters allow up to 2^32 packets. Does it really matter if the counters wrap and so give only relative measurements? o Last Peer Received Packets (4 octets) - The last value the peer has reported for the number of received packets for the indiciated security association. o Last Peer Received Kilobytes (4 octets) - The last value the peer has reported for the number of kilobytes successfully decapsulated for the indicated security association. o Last Peer Packet Errors (4 octets) - The last value the peer has reported for the number of packets for the indicated security association discarded because of errors. Errors include such things as replay, out-of-window detection, digest error, decryption error, padding error, and the like. o Last Peer Packet Drops (4 octets) - The last value the peer reported for the number of packets for the indicated security association dropped for reasons other than errors (e.g., lack of resources). The payload type for the Send Statistics payload is TBD. 2.4. Receive Statistics Payload The Receive Statistics Payload is used to convey PQM statistics to the IKE security association peer. These statistics indicate the level of traffic received by the local system from the peer over a particular security association, as well as error statistics pertinent to the operation of that security association. This information allows the peer to gauge the loss characteristics of the security association. Figure 3 shows the format of a Receive Statistics Payload: 1 2 3 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Next Payload | RESERVED | Payload Length | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | SPI | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Packets Received | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Kilobytes Received | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Packet Errors | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ | Packet Drops | +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+ Figure 3. Receive Statistics Payload The Receive Statistics Payload fields are defined as follows: o Next Payload (1 octet) - Identifier for the payload type of the next payload in the message. This MUST only be 0, indicating the end of the PQM payload, TBD, indicating the next Send Statistics Payload, or TBD, indicating the next Receive Statistics payload. o RESERVED (1 octet) - Unused, set to 0 o Payload Length (2 octets) - The length of the Receive Statistics payload in octets. This always has the value 24. o SPI (4 octets) - The IPSec SPI of the security association used to demultiplex packets received from the peer. The IP address of the sender and this SPI uniquely identify the security association. All the counters in the payload refer to this security association. Remark: See the corresponding item in section 2.3. o Received Packets (4 octets) - The number of packets successfully decapsulated for the indiciated security association. o Received Kilobytes (4 octets) - The number of kilobytes successfully decapsulated for the indicated security association. Issue: see the corresponding item in section 2.3. o Packet Errors (4 octets) - The number of packets for the indicated security association discarded because of errors. Errors include such things as replay, out-of-window detection, digest error, decryption error, padding error, and the like. o Packet Drops (4 octets) - The number of packets for the indicated security association dropped for reasons other than errors (e.g., lack of resources). The payload type for the Receive Statistics payload is TBD. 3. Protocol PQM has two components. First it must be negotiated. Then the protocol must be implemented. 3.1. PQM negotiations If an implementation desires to use PQM, it MUST negotiate this during the negotiation of the IKE SA. No support for PQM exists for manually configured IPSec security associations. Use of PQM MUST be optional, as its operation may be counter productive in many environments, e.g., dial-on-demand. The IKE accomplishes the negotation by including the PQM attribute with a value in its IKE SA proposal. The peer IKE implementation MAY accept such a proposal in the usual way. A PQM attribute value exceeding the life of the IKE SA may be acceptable, as the value only gives the maximum delay between subsequent reports. In the case where a proposal conveying the PQM attribute is accepted, both peer IKE implementations agree to a. keep an IKE security association established as long as any IPSec security associations negotiated by the IKE security association remain. This means they will negotiate to extend the IKE session before expiry as long an IPSec session negotiated by the IKE session exists. b. run the PQM protocol with the peer over IKE, with subsequent PQM packets being sent at least as often as the negotated reporting period. If an implementation wishes to stop using PQM, it can renegotiate the IKE session and omit PQM from its proposal. 3.2. Implementation Each IPSec security association will be associated with the IKE session that negotiated it, and an IKE session with a path. To perform PQM, an implementation MUST remember these associations. A PQM message is a new species of IKE Informational exchange. It consists of a. An IKE header; b. a Hash Payload, to guarantee liveness and to provide data integrity of the message. The Hash payload must immediately follow the IKE header; c. a PQM Payload (PQM), which includes d. one Send and Receive Statistics payload representing each pair of IPSec security associations associated with the IKE session. Each of thses pairs are considered part of the PQM payload. Note that if the implementation has not negotiated any IPSec security associations, the PQM Payload will include no Send and Receive Payloads. Being an Informational Exchange, the Hash Payload is the output of the IKE SA pseudo-random function, using the IKE SA SKEYID_a as the key, and a unique message id (M-ID) concatenated to the entire PQM payload: HASH = prf(SKEYID_a, M-ID | PQM) and the Hash, PQM Payload, and Send and Receive Statistics Payloads are CBC encrypted under SKEYID_e for the IKE session. Since PQM is an Informational Exchange, each such message is considered as a new exchange rather than a continuation of an old, so the CBC encryption mode initialization vector is computed from scratch each time by using the negotiated pseudorandom function, the hash of the last phase 1 CBC output block, and te randomly selected message id M-ID which is unique to this message. Thus, in the notation from the IKE specification, the PQM message may be denoted by something like Hdr* HASH PQM The PQM, Send, and Receive Statistics payloads MUST NOT be sent during IKE phase 1; they may only be sent during IKE phase 2, after the IKE SA has been established. If PQM has been negotiated, an IKE implementation SHOULD begin sending these immediately after entering phase 2; it MAY defer the first message until after completing quick mode negotiations. An IKE implementation transmits at least one PQM message during each Reporting Period; it MAY transmit a PQM message before the reporting period lapses, however. The PQM peers transmit the PQM messages independently. However, like its PPP LQM counterpart, an IKE implementation also transmits a PQM message if it receives two consecutive PQM messages from its peer with the same Peer Receive parameter values. This can indicate that one of its own prior PQM message failed to be delivered. This document does not specify the uses to which the PQM data may be applied, as this is an implementation matter.
-- END included message