[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Fwd: IPSec error monitoring]




-- BEGIN included message

The attached note discusses a topic that seems relevant to IPSecond. I'd
like to know whether people thinks this is worth persuing further.

Jesse Walker
Shiva Corporation
28 Crosby Drive
Bedford, MA  01730-1437
voice: 781-687-1719
fax: 781-687-1828
internet: jwalker@shiva.com

1. Motivation

IPSec provides no mechanism to expeditiously release resources when
the media carrying its security associations fails, or when one end of
a set of security association abruptly drops out of the dialog without
notifying the peer. This problem is particularly common when one end
of a security association is a remote access client, under the control
of an unsophisticated user, who will as often as not switch off a
laptop computer rather than follow "proper procedure" to diconnect.
The discussion from a few months ago regarding what to do when one
endpoint of a set of security associations crashes and then reboots
provides another illustration of problems encountered due to this
lack.

Another basic problem is that, to date, IPSec implementations have
provided rather weak diagnostics, and the security features of the
protocol suite magnify the difficulty in applying traditional network
trouble-shooting mechanisms. When IPSec security associations
experience problems, it is often very difficult to isolate and correct
the root cause. IPSec diagnostics is still something of a black art.

To address problems like these, it might be useful to try to measure
the quality of a path between two security association endpoints. One
way we might implement these measurements is to mimic aspects of the
PPP LQM protocol (RFC 1989). "LQM" packets could be sent over the
"link" between the SA endpoints to establish the loss characteristics
of the path between them. The reported statistics (or their lack)
could then be used for a number of purposes, such as providing a basis
for some classes of quality of service decisions (when to shift some
traffic from one tunnel to another), diagnostics (traffic works in
only one direction), security alerts (the number of errors on one SA
suddenly skyrockets), or keepalives (we failed to receive too many
consecutive messages).

For want of better terminology, we will call the set of security
associations between two endpoints a "path". Given this terminology,
we will call the protocol the Path Quality Monitoring protocol, or
PQM. The point of using "path" rather than the ancestoral term "link"
is to emphasize that we have to address the differences from the PPP
the environment. In the IPSec environment generally packets can be
reordered, and the bandwidth available can be much more dynamic, as
routers inside the cloud between the path endpoints shift load and
topology. However, like LQM, to effect flexible future policies, the
PQM protocol should measure data loss in units of packets and
octets. It should measure each security association separately, and
communicate all measurements to both IKE peers, so that each end can
implement its own policies governing inadequate quality.

Each PQM implementation should maintain a set of counters of packets
and kilobytes transmitted and successfully received, and convey this
information to its peer in a PQM message at regular intervals. By
comparing the values reported in successive PQM messages, a receiver
should be able to form a fairly accurate picture of a path's
quality. The intent of the counters is to provide an indication of the
dynamics of the information passing over the path between the PQM
peers, not to measure the total bandwidth used.

One possible PQM design would encapsulate PQM messages directly under
IPSec security associations. Indeed, it seems almost self evident that
monitoring packets should be exposed to exactly the same environment
as the "real" packets. This seems to imply that monitoring packets
should be sent via the same security association as the "real" ones.

At least three apparent problems exist with this approach, however.

First, the messages would not in general comply with the SPD for the
security associations, because they must circulate between the
security association end-points. When a security gateway is involved,
datagrams addressed to or from the IP address of one of the path
endpoints (the security gateway) usually will not be authorized to
traverse any of the security associations used to transport "real"
traffic. This implies that PQM may often require its own security
association. We could construct a new pair of IPSec security
associations just to accomodate this function. However, it is worth
noting that an IKE security association typically already exists
between the path endpoints. It might be better to exploit this one
rather than negotiating yet one more.

Second, providing the "same" environment for PQM as for the
operational packets is more problematic in the general IPSec
environment, because of the lack of a physical link. It is in
principle impossible to guarantee the same environment for even two
successive packets sent via the same security association, as they may
be routed differently and hence ultimately subjected to different
filtering or fragmentation. This suggests that the inherited wisdom of
using the "link" itself as the transport vehicle for a quality
monitoring protocol is of less importance here than in the PPP case.

A final problem is that multiple pairs of security associations can
exist between two IKE peers instead of a single half-duplex pair. A
PQM protocol based directly on the IPSec security associations would
entail either running PQM messages through each of the pairs, thereby
wasting bandwidth, or relying on the selection of an arbitrary
security association to convey the information, or depending upon
configuration, etc. Having to make any choice at all again suggests it
is not the right design choice.

If you buy this reasoning, something like PQM seems like a good
addition to IPSec, and an IKE extension leaps out as a reasonable
mechanism for supporting PQM. The remainder of this note is a
speculative first attempt to define such an extension. It is not
intended as the last word on the subject and without a doubt can be
improved significantly by others. Its single greatest merit is
concreteness, affording us an opportunity to pick over its bones until
an acceptable protocol emerges from it.

2. Protocol data units

The PQM protocol will use IKE as its transport vehicle. PQM messages
will constitute a new Informational Exchange type, which means they
are half-duplex. To this end, we introduce a new IKE attribute, used
to negotiate PQM as an IKE SA characteristic, and three new payloads,
to convey the necessary statistics. This section describes these new
facilities.

2.1. PQM attribute

This is a new attribute of an IKE SA, so can be proposed during
negotiations for the IKE SA. It offers a maximum PQM reporting period
in seconds. When accepted, this is the base reporting period for PQM
over the path. When not accepted or not present, PQM is not used.

The PQM attribute is identified by IKE attribute id TBD. Its value is
a nonnegative number 16-bit number, with a default value of TBD
seconds. The PQM atrribute value MUST be configurable. The value 0 is
not meaningful in the protocol.

2.2. PQM Payload

The PQM Payload is used to convey PQM statistics to the IKE security
association peer. Figure 1 shows the format of the PQM Payload.

                           1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Next Payload  |   RESERVED    |         Payload Length        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |  Number SPIs  |                    RESERVED                   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                       Sequence Number                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                       Reporting Period                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          PQMs Sent                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        PQMs Received                          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                       Peer PQMs Sent                          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                     Peer PQMs Received                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                        Figure 1.  PQM Payload


The PQM Payload fields are defined as follows:

 o  Next Payload (1 octet) - Identifier for the payload type of the
    next payload in the message. If the current payload is the last in
    the message, then this field will be 0. This field MUST NOT contain
    the values for the Send Statistics or Receive Statistics Payloads,
    as they are considered part of the PQM information reported by
    the payload.

 o  RESERVED (1 octet) - Unused, set to 0

 o  Payload Length (2 octets) - The length of the entire PQM payload,
    including Send Statistics and Receive Statistics payloads, in octets.

 o  Number SPIs (1 octet) - The combined number of Send Statistics and
    Receive Statistics payloads conveyed by this PQM payload. This MUST
    be an even number, as a PQM payload MUST always include reports for
    each of the paired security associations negotiated by a single
    Quick Mode proposal.

 o  RESERVED (3 octets) - Unused, set to 0

 o  Sequence Number (4 octets) - the number of PQM messages sent thus
    far. This is a monotonically increasing number counting from zero.
    This allows the receiver to detect out-of-order PQM messages.

 o  Reporting Period (4 octets) - the maximum number of seconds before
    the local system expects to generate another PQM message. The local
    system MAY generate another PQM message sooner than this, but it
    MUST NOT generate the next PQM message later than what the Reporting
    Period field advertises.

 o  PQMs Sent (4 octets) - The number of PQM payloads the local system
    has sent so far over this path. Since the path consists of the
    current IKE SA as well as all the IPSec SAs between the path
    endpoints, this definition is general enough to accomodate roll-
    over, when one SA replaces another.

 o  PQMs Received (4 octets) - The number of PQM payloads the local
    system has received over this path.

 o  Peer PQMs Sent (4 octets) - The number of PQM payloads the path
    peer has reported to have sent thus far. This information allows
    the local implementation to monitor the efficacy of its own PQMs.

 o  Peer PQMs Received (4 octets) - The number of PQM payloads the
    path peer has reported to have received thus far thus far. This
    information also allows the local implementation to determine the
    efficacy of its own PQMs.

The payload type for the PQM payload is TBD.

2.3. Send Statistics Payload

The Send Statistics Payload is used to convey PQM statistics to the
path peer. These statistics indicate the level of traffic generated by
the local system to the peer over a particular security association,
as well as pertinent performance information the peer has reported for
that security association. This information allows the peer to gauge
the loss characteristics of the security association. Figure 2 shows
the format of the Send Statistics payload:

                           1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Next Payload  |   RESERVED    |         Payload Length        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                              SPI                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          Packets Sent                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         Kilobytes Sent                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                   Last Peer Received Packets                  |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                  Last Peer Received Kilobytes                 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                     Last Peer Packet Errors                   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      Last Peer Packet Drops                   |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                   Figure 2.  Send Statistics Payload


The Send Statistics Payload fields are defined as follows:

 o  Next Payload (1 octet) - Identifier for the payload type of the
    next payload in the message. This MUST only be 0, indicating the end
    of the PQM payload, TBD, indicating the next Send Statistics
    Payload, or TBD, indicating the next Receive Statistics payload.

 o  RESERVED (1 octet) - Unused, set to 0

 o  Payload Length (2 octets) - The length of the Send Statistics payload
    in octets. This always has the value 32.

 o  SPI (4 octets) - The IPSec SPI of the security association used to
    transmit packets to the peer. The IP address of the PQM receiver
    and the SPI uniquely identify the security association. All the
    counters in the payload refer to this security association.

      Remark: Obviously the local implementation knows the IP address
      of the peer, since it negotiated a security association with it.
      Therefore we don't include it in the Send Statistics. This usage
      means that several paths can exist between multi-homed hosts,
      and that we will monitor each separately. This is an intended
      consequence of the usage.

 o  Packets Sent (4 octets) - The number of packets the local system has
    transmitted over the indicated security association.

 o  Kilobytes Sent (4 octets) - The number of kilobytes of data the local
    system has transmitted over the the indicated security association.
    This value is computed prior to encapsulation.

      Issue: we are trying to avoid 64 bit counters. Clearly we have not
      accomplished this, because most media deployed today use packets
      of at least 1.5 K bytes and the counters allow up to 2^32
      packets. Does it really matter if the counters wrap and so give
      only relative measurements?

 o  Last Peer Received Packets (4 octets) - The last value the peer has
    reported for the number of received packets for the indiciated
    security association.

 o  Last Peer Received Kilobytes (4 octets) - The last value the peer has
    reported for the number of kilobytes successfully decapsulated for
    the indicated security association.

 o  Last Peer Packet Errors (4 octets) - The last value the peer has
    reported for the number of packets for the indicated security
    association discarded because of errors. Errors include such
    things as replay, out-of-window detection, digest error,
    decryption error, padding error, and the like.

 o  Last Peer Packet Drops (4 octets) - The last value the peer reported
    for the number of packets for the indicated security association
    dropped for reasons other than errors (e.g., lack of resources).

The payload type for the Send Statistics payload is TBD.


2.4. Receive Statistics Payload

The Receive Statistics Payload is used to convey PQM statistics to
the IKE security association peer. These statistics indicate the level of
traffic received by the local system from the peer over a particular
security association, as well as error statistics pertinent to the
operation of that security association. This information allows
the peer to gauge the loss characteristics of the security association.
Figure 3 shows the format of a Receive Statistics Payload:

                           1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    | Next Payload  |   RESERVED    |         Payload Length        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                              SPI                              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        Packets Received                       |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                       Kilobytes Received                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         Packet Errors                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          Packet Drops                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

                  Figure 3.  Receive Statistics Payload


The Receive Statistics Payload fields are defined as follows:

 o  Next Payload (1 octet) - Identifier for the payload type of the
    next payload in the message. This MUST only be 0, indicating the end
    of the PQM payload, TBD, indicating the next Send Statistics
    Payload, or TBD, indicating the next Receive Statistics payload.

 o  RESERVED (1 octet) - Unused, set to 0

 o  Payload Length (2 octets) - The length of the Receive Statistics
    payload in octets. This always has the value 24.

 o  SPI (4 octets) - The IPSec SPI of the security association used to
    demultiplex packets received from the peer. The IP address of the
    sender and this SPI uniquely identify the security association.
    All the counters in the payload refer to this security association.

      Remark: See the corresponding item in section 2.3.

 o  Received Packets (4 octets) - The number of packets successfully
    decapsulated for the indiciated security association.

 o  Received Kilobytes (4 octets) - The number of kilobytes successfully
    decapsulated for the indicated security association.

      Issue: see the corresponding item in section 2.3.

 o  Packet Errors (4 octets) - The number of packets for the indicated
    security association discarded because of errors. Errors include such
    things as replay, out-of-window detection, digest error,
    decryption error, padding error, and the like.

 o  Packet Drops (4 octets) - The number of packets for the indicated
    security association dropped for reasons other than errors (e.g.,
    lack of resources).

The payload type for the Receive Statistics payload is TBD.

3. Protocol

PQM has two components. First it must be negotiated. Then the protocol
must be implemented.

3.1. PQM negotiations

If an implementation desires to use PQM, it MUST negotiate this during
the negotiation of the IKE SA. No support for PQM exists for
manually configured IPSec security associations. Use of PQM MUST be
optional, as its operation may be counter productive in many
environments, e.g., dial-on-demand.

The IKE accomplishes the negotation by including the PQM attribute
with a value in its IKE SA proposal. The peer IKE implementation MAY
accept such a proposal in the usual way. A PQM attribute value
exceeding the life of the IKE SA may be acceptable, as the value only
gives the maximum delay between subsequent reports.

In the case where a proposal conveying the PQM attribute is accepted,
both peer IKE implementations agree to

a. keep an IKE security association established as long as any IPSec
security associations negotiated by the IKE security association
remain. This means they will negotiate to extend the IKE session
before expiry as long an IPSec session negotiated by the IKE session
exists.

b. run the PQM protocol with the peer over IKE, with subsequent PQM
packets being sent at least as often as the negotated reporting period.

If an implementation wishes to stop using PQM, it can renegotiate the
IKE session and omit PQM from its proposal.

3.2. Implementation

Each IPSec security association will be associated with the IKE
session that negotiated it, and an IKE session with a path. To perform
PQM, an implementation MUST remember these associations.

A PQM message is a new species of IKE Informational exchange. It
consists of

a. An IKE header;

b. a Hash Payload, to guarantee liveness and to provide data integrity
of the message. The Hash payload must immediately follow the IKE
header;

c. a PQM Payload (PQM), which includes

d. one Send and Receive Statistics payload representing each pair of
IPSec security associations associated with the IKE session. Each of
thses pairs are considered part of the PQM payload. Note that if the
implementation has not negotiated any IPSec security associations, the
PQM Payload will include no Send and Receive Payloads.

Being an Informational Exchange, the Hash Payload is the output of the
IKE SA pseudo-random function, using the IKE SA SKEYID_a as the key,
and a unique message id (M-ID) concatenated to the entire PQM payload:

		HASH = prf(SKEYID_a, M-ID | PQM)

and the Hash, PQM Payload, and Send and Receive Statistics Payloads
are CBC encrypted under SKEYID_e for the IKE session.

Since PQM is an Informational Exchange, each such message is
considered as a new exchange rather than a continuation of an old, so
the CBC encryption mode initialization vector is computed from scratch
each time by using the negotiated pseudorandom function, the hash of
the last phase 1 CBC output block, and te randomly selected message id
M-ID which is unique to this message.

Thus, in the notation from the IKE specification, the PQM message may
be denoted by something like

		Hdr* HASH PQM

The PQM, Send, and Receive Statistics payloads MUST NOT be sent during
IKE phase 1; they may only be sent during IKE phase 2, after the IKE
SA has been established. If PQM has been negotiated, an IKE
implementation SHOULD begin sending these immediately after entering
phase 2; it MAY defer the first message until after completing quick
mode negotiations.

An IKE implementation transmits at least one PQM message during each
Reporting Period; it MAY transmit a PQM message before the reporting
period lapses, however.

The PQM peers transmit the PQM messages independently. However, like
its PPP LQM counterpart, an IKE implementation also transmits a PQM
message if it receives two consecutive PQM messages from its peer with
the same Peer Receive parameter values. This can indicate that one of
its own prior PQM message failed to be delivered.

This document does not specify the uses to which the PQM data may be
applied, as this is an implementation matter.

-- END included message