[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

heartbeats (summary of responses)



Hi all. It's been a few days since my original post on this subject and
there have been a fair number of replies. Thanks to everyone who offered
comments. This message is a summary of all replies plus some comments from
me. 

I'm attaching the original message for context:  <<RE: Heartbeats (was RE:
keepalives)>> 

Most people expressed an interest in host-referenced heartbeats. However,
they disagreed on what mechanism should be used to transport the heartbeats.
Some people preferred a phase 1 solution; others preferred a phase 2
solution; a couple of people preferred clear pings. Only a couple of people
(Tero) agreed that it is desirable to support more than one kind of
heartbeat protocol.


Clear Pings:

I don't believe that clear pings are an acceptable solution. We can't
possibly prevent every kind of DoS attack, but we have an obligation to not
make them *easy*.


Phase 1 Sa-Referenced Heartbeats:

In the absence of phase 1 host-referenced heartbeats, these basically tell
you if the SA is still up. Since you have a reliable indication of when the
SA is up, an attacker can't induce you to renegotiate (a DoS threat) by
sending unauthenticated Notify Invalid Cookies (or spoofing a message with
invalid cookies). Also, it speeds up negotiation of phase 2s, since you know
ahead of time whether you need to negotiate a phase 1 first. No one showed
much interest in this type of heartbeat.


Phase 1 Host-Referenced Heartbeats:

The biggest problem with this idea is that phase 1 heartbeats seem to be
incompatible with dangling SA awareness. We have already established that
many vendors are not willing to use the continuous channel model, therefore
any phase 1 heartbeat scheme must be dangling SA aware.

I know Jan has disputed this, but I think that most of us will want to send
heartbeat packets at regular intervals. If you set your heartbeat rate to
once every 30 seconds, how many people would want to renegotiate their phase
1 at 30 second intervals (under low memory conditions)?

On the other hand, I believe that we could get around this limitation by not
permitting dangling SAs, but allowing 'pseudo-dangling' SAs instead. In this
scenerio, implementations would not be permitted to delete their phase 1s at
will. However, they would be allowed to convert them into a skeletal form
(pseudo-phase 1), which contains just enough information to receive
heartbeats (and probably, by extension, info modes), but not enough to
negotiate QMs or use advanced features.

This would kill two birds with one stone. It would allow implementations to
save memory by discarding unused phase 1 info, but it would still allow them
to send phase 2 deletes without renegotiating the phase 1. It would also
allow applications to send host-referenced heartbeat packets in phase 1,
which IMHO is the right place to send management-type packets.

I know Dan said he would look into using using "inline Isakmp" messages to
accomplish this, which is a form of pseudo-phase 1 SA as well; in order to
send the message, you still have to keep a state. Instead, why not have the
pseudo-phase 1 just store the encryption and authentication objects, plus
the iv for info modes... I guess you'd need to track the phase 1 lifetime
params as well. Plus the heartbeat info, of course. Anything else? Wouldn't
this be easier (and less computationally intensive) than
generating/verifying an RSA signature on every heartbeat? (and it wouldn't
expose known-plaintext RSA sig pairs to a passive attacker) (P.S. I have to
give Slava credit for this idea since I just noticed he proposed it already)

The only other concern that I have heard about phase 1 heartbeats is that
some people are worried that the IKE daemon may crash independently of the
IPSEC task and they don't want a failure in IKE to cause the phase 2 SAs to
go away. This may be an issue in load-sharing situations where the two
processes may be running on different boxes.

Jan suggested that the phase 1 heartbeat inform the peer which phase 2s are
still up. I don't think this would be much of an issue if we had phase 1
heartbeats AND acknowledged info modes. (This also assumes that the IKE
daemon 'knows' when a phase 2 goes down unpredictably.)

I was thinking that probably we would want to use a periodic heartbeat
message with a replay counter (i.e. a synchronous uni-directional
heartbeat). Derrell commented that some vendors have already implemented a
query-type heartbeat protocol. This seems less useful to me than a periodic
uni-directional protocol. (How do you know when to query? If you wait until
you need to send a packet then you have high latency. If you query on a
regular basis then you get the same results as the synchronous protocol, but
use twice the bandwidth.)


Phase 2 Host-Referenced Heartbeats:

This is an alternative to host-referenced heartbeats of the phase 1 variety.
The advantage is that they do not rely on the existence of the phase 1 SA,
so they are automatically compatible with dangling implementations.

Still, they don't solve every possible scenerio. In the same way that a
phase 1 heartbeat only tells you that the Isakmp daemon is running and not
necessarily that the Ipsec layer is working, there is the possibility that
one Ipsec SA may crash but another may continue working. In particular, I'm
thinking of load-sharing scenarios where the tunnel endpoint is the same but
the SAs are running on different host processors.

I'm assuming that we would send them across a black-to-black tunnel. This
would normally be used for management traffic, but Jan tells me that l2tp
uses it as well. However, it would be easy to tell the l2tp traffic from the
heartbeat traffic since the next protocol in the header would be l2tp, not
icmp.

Someone thought we should define a completely new protocol for ipsec
heartbeats. This doesn't seem realistic to me. I suppose if ping was
unacceptable we could define a new Isakmp message type and allow it to be
interpreted as a heartbeat if it is received on an ipsec tunnel. But asking
IANA for a new protocol number just for the heartbeats seems presumptuous.

Jan also suggested the possibility of a new doi for heartbeat messages. That
wouldn't be such a bad idea for host-referenced heartbeats. After all, they
don't need to 'hijack' existing SAs, so there is no reason why they couldn't
be negotiated within a completely separate domain. It might make this into a
slightly bigger project than I intended, though. This also provides a
potential fix for the load-sharing scenario. Since there is no requirement
to 

There is also some potential for legacy support. If an existing host allows
negotiation of black-to-black SAs and the SPD allows pings then the legacy
host can be monitored for liveliness without any negotiation. I do think we
need to support negotiation in the long run, though.

A couple of people suggested sending a ping to the red (internal IP). I
don't agree with this, since it would then require the peer to have
knowledge of the internal IP, which is unnecessary (although I suppose the
parameters for this could be negotiated).


Phase 2 Sa-Referenced Heartbeats:

A couple of people expressed interest in these, mostly as a
resource-friendly alternative to host-referenced heartbeats. In remote
access scenarios, typically there is only 1 (or only a few) phase 2 SA
between the peers; adding a host-referenced heartbeat SA would double the
resource requirements (assuming you are dropping the phase 1s). By hijacking
an existing phase 2 you can save memory and negotiation time.

I also foresaw two other uses: One was a simple solution to the load sharing
scenario (sending the heartbeat on a hijacked SA guarantees that the same
physical host that is  sending the ping is the same one that is transmitting
on the SA). The other possibility is that some phase 2 SAs (probably in the
VPN backbone) will be deemed 'critical' (in that they guarantee less than
0.01% downtime or such). In order to ensure that these SAs are not only up,
but also WORKING (i.e. the state on the peers is synchronized), it might be
desirable to send a regular heartbeat message.

Jan expressed concern that this would slow down Ipsec processing too much.
That depends. How many people verify the source/dest addresses for the
tunneled packet against the ones in the SPD? If you are already doing this
for every packet then you lose no speed; if you just check the spi currently
then adding the check for the IPs would slow you down.

But I was thinking, maybe this is another case where using a separate doi
would be useful. I wouldn't want to negotiate a separate SAs for the new doi
(that would mostly defeat the point), but maybe a doi of IPSEC_MANAGEMENT
could be used to identify management traffic that is flowing on an existing
Ipsec SA.

Jan expressed concern about not interfering with customer billing
information, however all the proposed solutions have already taken this into
account. Ditto with not allowing heartbeats to interfere with inactivity
timers.

Tero commented that phase 2 host-referenced heartbeats could be implemented
as a special case of phase 2 sa-referenced heartbeats, which is what I
originally intended.


Heartbeat Negotiation Protocols:

I didn't really bring this up before. I assume that we will need to have
some kind of general negotiation framework in place (a subject that was
discussed briefly and then dropped). In the meantime, I will probably label
this an 'experimental future' and use a vendor id; however, since there will
undoubtedly be parameters (e.g. heartbeat type, heartbeat frequency,
recovery action, IP address to ping, etc.) regardless of which heartbeat
format we decide on, maybe we will need a config exchange as well.


Andrew
_______________________________________________
 Beauty without truth is insubstantial.
 Truth without beauty is unbearable.

-- BEGIN included message

How about, to get this discussion going, I suggest a format and you (the
list) tell me if it seems appropriate. I can put this in a draft if there is
interest in standardization.

I think that different types of heartbeats (phase 1/phase 2,
SA-referenced/host-referenced) provide different services, and we need to
support all kinds.

I use the term 'heartbeat' throughout. If you prefer keep-alive, etc. then
search & replace with your favorite term. When I do refer to keep-alives at
the end of the document, I use Tim's definition (a mechanism for disabling
the peer's inactivity timeout).

----------------------------------------------------------------------------

As I see it, there are two types of heartbeats: phase 1 heartbeats and phase
2 heartbeats.


PHASE 1 HEARTBEATS:

Phase 1 heartbeats tell you if the phase 1 SA is still up. Therefore, they
also tell you that the peer is still there. However, this is a sufficient
but not necessary condition. The peer may be a dangling implementation, in
which case they may not send phase 1 heartbeats even though they are still
running.

However, phase 1 heartbeats still have a use because they ensure that the
peers will always agree on whether a phase 1 exists. It avoids the
clumsiness of one peer trying to send a message on the phase 1, receiving
NOTIFY_INVALID_COOKIES and then timing out, before realizing that the phase
1 is down and needs to be rekeyed.

Also, as was discussed in an earlier thread, using NOTIFY_INVALID_COOKIES as
a means of determining when an SA has gone down is vulnerable to DoS
attacks, whereas heartbeats are not.

I'm not going to discuss a format for phase 1 heartbeats in this post
because IMHO it's not a technically difficult issue. Any one of a million
packet formats would suffice (info mode, config mode, acknowledged info
mode, some new exchange) and the only real issue is getting a group of
people to settle on any particular one.


PHASE 2 HEARTBEATS:

There are two types of phase 2 heartbeats: host-referenced and
SA-referenced. A host-referenced heartbeat is a protocol that runs across a
dedicated phase 2 SA between the two peers. An SA-referenced heartbeat is a
protocol that runs across an existing (user) SA.

Host-referenced heartbeats can only be used to detect if the peer is still
up and running. Therefore, they are of limited use. (However, the fact that
they don't carry any sensitive information means that they they would never
need to be deleted before their natural lifetime. Therefore, they would be
the most reliable means of detecting if the peer is still alive since there
is no possibility of a phase 2 delete being lost.)

SA-referenced heartbeats detect if a specific phase 2 SA is still working.
They also probably tell you when the peer is not there, since you wouldn't
expect a phase 2 SA to disappear without receiving a delete (although I've
been hearing some discussion of this recently on the list). However, they
are probably the most useful type of heartbeat, which is why I am going to
discuss them here.

SA-referenced Phase 2 heartbeats are more technically complicated than other
heartbeats because:

1) They must not interfere with the peer's inactivity timeouts.
2) They must not disturb any accounting services that may be running.
3) They must not result in any packets ending up on the peer's red network.
4) They must not assume that a phase 1 SA exists between the two peers.

It is not, in general, possible to satisfy all of these constraints without
some degree of cooperation. Therefore, both peers must be aware of the
heartbeat scheme that is being used (i.e. it must be negotiated).

In light of these constraints, I propose the following format:

Every X seconds, peer 1 (the initiator) sends an encrypted ping to peer 2
and peer 2 replies. In order to distinguish these pings from user traffic,
the source and destinations addresses are set to the hosts' black IPs. If
either side fails to receive a heartbeat within N*X seconds then they can
assume that the SA has gone down (and they should send a delete for it). (If
they fail to receive a ping but they receive other traffic on the SA then
something has gone wrong and they should log the event). Replay protection
is not required, as IPSec automatically provides it.

It is not necessary for peer 2 to ever initiate the pings. However, to
increase reliability, if peer 2 does not receive a ping during the normal
window [X, X*3/2], he may force the issue by initiating a ping in the
opposite direction.

This technique has the following advantages:

1) It satisfies all of the above constraints.
2) It does not require the host to have any knowledge of the peer's red IP
or red subnet.
3) Ping has universal brand-name recognition as a heartbeat protocol.
Therefore, no special payload format is required.

and the following disadvantages:

1) The SPD must make a specific exception for ping packets between the black
IPs.
2) The accounting service should know not to bill the user for this traffic.

However, I believe these disadvantages will be inherent in any SA-referenced
heartbeat scheme.

Note that a Host-referenced heartbeat scheme could be constructed in the
same way as an SA-referenced scheme, simply by negotiating a dedicated SA
using the black IPs as the endpoints. This could be done in tunnel mode
(presumably using the same policy exception that is used for SA-referenced
heartbeats) or it could simply be done in transport mode.


FUTURE CONSIDERATIONS:

One potential limitation of this scheme is that it does not generalize well
to keep-alives. The use of ping as a packet format is simple, but it doesn't
allow us to specify any additional information (all it says is
STILL_CONNECTED). It may be desirable to send extra information in the
packet. For example, a simple keep-alive (in the literal sense) scheme would
be to take the heartbeat scheme and add one extra bit of information (E.g.
STILL_CONNECTED, IDLE_TIMEOUT=disabled). On the other hand, I would prefer
that idle timeouts be disabled via. a negotiated attribute of the SA (if
feature negotiation ever gets standardized).

There is no particular reason to use ping as transport, except for the fact
that it is already a universally accepted packet format and requires no
approval from IANA.

Andrew
_______________________________________________
 Beauty without truth is insubstantial.
 Truth without beauty is unbearable.

-- END included message


Follow-Ups: