[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: problems with draft-jenkins-ipsec-rekeying-06.txt



> | Replacing unbounded state creation with unbounded rekeying
> frequency is no
> | solution. Fine, so this problem is kind of academic, but
> you are endagering
> | the ability of a host to be fully self-stablizing (and that
> is of academic
> | interest to me).
>
> Can you expand on this?

Sure, I guess (although I suspect many people are getting bored of this
topic). As I said, this is of mainly academic interest to me. I believe that
a networking protocol should allow compliant implementations to be robust.

If you want to be fully self-stabilizing then you need to be able to predict
and control your behaviour, even while under heavy load or attack. Relying
on dynamic memory allocation under low memory conditions can make a system
very unpredictable.

The proof of self-stabilization is akin to mathematical induction. Instead
of merely stating that abnormal situation S_t shouldn't happen, you assume
S_(t-1) is stable and show that S_t = S(t, S_(t-1)) is also stable. [Note
than I am distinguishing between "abnormal" and "unstable" states here. (An
abnormal situation should not cause unstable behaviour in a robust
implementation.)]

One important key to ensuring the predictability of your behaviour is that
YOU (rather than the peer) get to decide how to bound your CPU consumption
and memory usage. That's why it's best to avoid statements like "the host
MUST store each X" or "the host MUST reply to (unsolicited) messages of type
X".

As I'm sure you're aware, the danger of an unstable system under heavy load
is that after it corrects its behaviour (usually by crashing and rebooting),
the situation which created the heavy load still exists (and is often
worsened by the increased demand which is generated by the temporary
outage). This may cause the instability to re-manifest itself. Witness the
famous AT&T crash of 1990 or the recent DDoS attacks on websites or the Bill
Clinton chatroom bug.

Or consider the well-known logging bug where an application has a log
message for "log is full" but they forget to reserve space in the log for
this abnormal condition. A similar bug often manifiests itself on Windows
systems when a careless implementer stores the "Out of Memory" message in
the stringtable resource.

It's easy for a system to be stable if it has very limited functionality.
The danger here is that most of us want to make the most of the resources we
have.

Right now, we can't determine ahead of time exactly how much memory will be
required to negotiate and store an SA of either type (although the SA
footprints have much less size variance than the dynamic memory required
during negotiation). However, once an SA has been created, it's size never
needs to change.

That's the beauty of bounded connection state. Even if you run out of
dynamic memory, your existing SAs are not affected.

You are also able to ignore unsolicted messages from the peer. If the peer
sends a QM and you don't have the memory or the CPU power to handle the
message, you can simply ignore it with no bad consequences.

Now let's consider what happens if you need to process a QM under heavy load
conditions:

1) A QM comes in and you don't have the CPU resources to parse it. You drop
the message but the peer assumes you have received it, so the message id
lists for the Isakmp SA get out of synch.

or

2) A QM comes in and you don't have enough memory to store the message id.
Therefore, you delete the Isakmp SA and renegotiate. However, the
negotiation fails because you don't have enough memory to complete it. Now
you have lost an SA that used to work + the peer will probably continue to
retry the negotiation and it will continue to fail, thus wasting additional
CPU resources.

or

3) As above, except that the renegotiation of the Isakmp SA succeeds. Then
the peer resends the QM and you don't have enough memory to store the
message id, so you delete the Isakmp SA, attempt to renegotiate, and end up
thrashing endlessly (this is particularly bad if it happens to multiple SAs
simultaneously).

And, of course, if you are susceptible to this problem under heavy load,
then this behaviour can also be exploited by a malicious intermediate router
(particularly if you store the message ids of info modes in addition to
QMs).

On the other hand, a counter (which was also proposed as a solution to the
replay attack) does not require any additional memory allocation, nor does
it require you to process every packet from the peer. Therefore, there is no
potential for the counter to create unstable behaviour.

Andrew
--------------------------------------
Beauty with out truth is insubstantial.
Truth without beauty is unbearable.



References: