[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

monitoring anti-replay detection in AH and ESP



I'm working with Tim Jekins on the IPSec MIB, and the issue of what to
do about monitoring the anti-replay detection system on receive in AH
and ESP has come up.

The reason I'm interested in properly instrumenting it is that
"problems" with IPSec over the public internet will be necessarily
hard to diagnose.  For instance, if "performance stinks", why?

Well, the anti-replay mechanism can detect a lot of things that the
underlying network can do to screw up performance.  You can get pretty
effective measurements of both packet loss and packet reordering.

For instance, if you recieve a packet that isn't the "next" one you
expect to receive, then you know that the network may be reordering
packets.  If you're shifting lots of zeros out of the anti-replay
bitmask, then there's probably high packet loss.

So, there are a number of different counters we could propose.  Some
are easier to count than others.  Others provide more information.
Let me propose a few:

1. Packets received with sequence number > highest received + 1.

This can indicate either a dropped packet, or reordering.  This counts
the number of times that the bit map was rotated more than one bit
when a packet was received.

2. Packets received with sequence number < highest received, but in
window.

This can only indicate reordering.

3. Unused sequence numbers removed from window.

This is essentially a count of the number of 0 (not seen) bits that
you shift out of the bitmask when event 1 (above) happens.  (Well, it
also would increment if you shifted out a 0 on a normal "next
expected" receive.)  These are most likely lost packets, although
there is a possibility that it is caused by large-scale reordering.

This is the hardest one to implement.  But it really is the best count
of lost packets.  Event 1 is quite shared between lost and reordering.

The value of this counter goes up the longer the receive window for
anti-replay is.  (We'll also put the receive window in the MIB, to
allow properly scaled interpretation of these.)

This counter also helps you know if a high count of replay errors
really represents an attack, or just high reordering that needs a
larger window.


Obviously, there will also be a count of true replay errors, which I
presume is completely acceptable to all.

However, even this could be made more specific.  We could split it
into:

4. Replay in window.

This is either a real replay attack, or packet duplication by the network.

5. Replay out of window.

This could be either a replay attack, or massive packet reordering.

The code logic has to detect cases 4 and 5 seperately, so it's not a
great burden to count both ways.

I suppose that the exception would be hardware that might not report
the difference.  Anybody done that, or doing that?  If so, these two
(4 & 5) could be optional counters, with a mandatory "replay error"
counter.


Follow-Ups: