[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: Heartbeats (was RE: keepalives)



How about, to get this discussion going, I suggest a format and you (the
list) tell me if it seems appropriate. I can put this in a draft if there is
interest in standardization.

I think that different types of heartbeats (phase 1/phase 2,
SA-referenced/host-referenced) provide different services, and we need to
support all kinds.

I use the term 'heartbeat' throughout. If you prefer keep-alive, etc. then
search & replace with your favorite term. When I do refer to keep-alives at
the end of the document, I use Tim's definition (a mechanism for disabling
the peer's inactivity timeout).

----------------------------------------------------------------------------

As I see it, there are two types of heartbeats: phase 1 heartbeats and phase
2 heartbeats.


PHASE 1 HEARTBEATS:

Phase 1 heartbeats tell you if the phase 1 SA is still up. Therefore, they
also tell you that the peer is still there. However, this is a sufficient
but not necessary condition. The peer may be a dangling implementation, in
which case they may not send phase 1 heartbeats even though they are still
running.

However, phase 1 heartbeats still have a use because they ensure that the
peers will always agree on whether a phase 1 exists. It avoids the
clumsiness of one peer trying to send a message on the phase 1, receiving
NOTIFY_INVALID_COOKIES and then timing out, before realizing that the phase
1 is down and needs to be rekeyed.

Also, as was discussed in an earlier thread, using NOTIFY_INVALID_COOKIES as
a means of determining when an SA has gone down is vulnerable to DoS
attacks, whereas heartbeats are not.

I'm not going to discuss a format for phase 1 heartbeats in this post
because IMHO it's not a technically difficult issue. Any one of a million
packet formats would suffice (info mode, config mode, acknowledged info
mode, some new exchange) and the only real issue is getting a group of
people to settle on any particular one.


PHASE 2 HEARTBEATS:

There are two types of phase 2 heartbeats: host-referenced and
SA-referenced. A host-referenced heartbeat is a protocol that runs across a
dedicated phase 2 SA between the two peers. An SA-referenced heartbeat is a
protocol that runs across an existing (user) SA.

Host-referenced heartbeats can only be used to detect if the peer is still
up and running. Therefore, they are of limited use. (However, the fact that
they don't carry any sensitive information means that they they would never
need to be deleted before their natural lifetime. Therefore, they would be
the most reliable means of detecting if the peer is still alive since there
is no possibility of a phase 2 delete being lost.)

SA-referenced heartbeats detect if a specific phase 2 SA is still working.
They also probably tell you when the peer is not there, since you wouldn't
expect a phase 2 SA to disappear without receiving a delete (although I've
been hearing some discussion of this recently on the list). However, they
are probably the most useful type of heartbeat, which is why I am going to
discuss them here.

SA-referenced Phase 2 heartbeats are more technically complicated than other
heartbeats because:

1) They must not interfere with the peer's inactivity timeouts.
2) They must not disturb any accounting services that may be running.
3) They must not result in any packets ending up on the peer's red network.
4) They must not assume that a phase 1 SA exists between the two peers.

It is not, in general, possible to satisfy all of these constraints without
some degree of cooperation. Therefore, both peers must be aware of the
heartbeat scheme that is being used (i.e. it must be negotiated).

In light of these constraints, I propose the following format:

Every X seconds, peer 1 (the initiator) sends an encrypted ping to peer 2
and peer 2 replies. In order to distinguish these pings from user traffic,
the source and destinations addresses are set to the hosts' black IPs. If
either side fails to receive a heartbeat within N*X seconds then they can
assume that the SA has gone down (and they should send a delete for it). (If
they fail to receive a ping but they receive other traffic on the SA then
something has gone wrong and they should log the event). Replay protection
is not required, as IPSec automatically provides it.

It is not necessary for peer 2 to ever initiate the pings. However, to
increase reliability, if peer 2 does not receive a ping during the normal
window [X, X*3/2], he may force the issue by initiating a ping in the
opposite direction.

This technique has the following advantages:

1) It satisfies all of the above constraints.
2) It does not require the host to have any knowledge of the peer's red IP
or red subnet.
3) Ping has universal brand-name recognition as a heartbeat protocol.
Therefore, no special payload format is required.

and the following disadvantages:

1) The SPD must make a specific exception for ping packets between the black
IPs.
2) The accounting service should know not to bill the user for this traffic.

However, I believe these disadvantages will be inherent in any SA-referenced
heartbeat scheme.

Note that a Host-referenced heartbeat scheme could be constructed in the
same way as an SA-referenced scheme, simply by negotiating a dedicated SA
using the black IPs as the endpoints. This could be done in tunnel mode
(presumably using the same policy exception that is used for SA-referenced
heartbeats) or it could simply be done in transport mode.


FUTURE CONSIDERATIONS:

One potential limitation of this scheme is that it does not generalize well
to keep-alives. The use of ping as a packet format is simple, but it doesn't
allow us to specify any additional information (all it says is
STILL_CONNECTED). It may be desirable to send extra information in the
packet. For example, a simple keep-alive (in the literal sense) scheme would
be to take the heartbeat scheme and add one extra bit of information (E.g.
STILL_CONNECTED, IDLE_TIMEOUT=disabled). On the other hand, I would prefer
that idle timeouts be disabled via. a negotiated attribute of the SA (if
feature negotiation ever gets standardized).

There is no particular reason to use ping as transport, except for the fact
that it is already a universally accepted packet format and requires no
approval from IANA.

Andrew
_______________________________________________
 Beauty without truth is insubstantial.
 Truth without beauty is unbearable.



Follow-Ups: