[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Heartbeats Straw Poll



On Mon, 7 Aug 2000, Dan Harkins wrote:

>   The only implementation of keepalives that I'm aware of is Cisco's
> and it doesn't scale. This was discussed in Adelaide by Chinna 
> Narasimha Reddy Pellacuru. That implementation may have problems-- I
> wrote it after all-- but a solution to its problems has not been
> discussed and both drafts which address keepalives/heartbeats are
> essentially what I implemented in IOS: a simple request/response,
> "are you there?"/"yes, I'm here". Has Cisco's version been fixed to 
> scale?

Not to my knowledge. I've rewritten it quite extensively in an attempt to
make it 'better', but fundametally it's still what you wrote, and it still
doesn't scale all that well.

> If so, can you explain how so that that information can be 
> incorporated into the drafts? If not, then perhaps we should take a 
> step back and look at what it is we're doing since we have evidence 
> that what is being proposed doesn't work right.
> 
I guess if you have a good fast hardware adapater, and more importantly a
fast PATH to the HW and back, then keepalives could work better. Our
implementation in some extreme cases bogs down trying to catch up with the
thousands of keepalives that are being sent to it. Nevermind that the
defaults (10 2) you put in place don't help ;) If a peer didn't respond in 10
seconds (possibly because he's swamped with other keepalives), sending them
at a 2 second interval doesn't improve that situation. Changing this to
exponential backoff may help, and treating heartbeats as a 'prioritized'
packet MAY help (although I can come up with some interesting race conditions
that way), but ultimately you have LOTS of tiny encrypted packets that you
need to decrypt and encrypt (on reply), which bogs down the machine.

My 2c..
jan


>   It's an interesting question about addressing keepalives in the
> "IKE rework". But doesn't this change IKE? Come to think of it, aren't
> keepalives officially prohibited until the "IKE rework" is finished?
> 
>   Dan.
> 
> On Mon, 07 Aug 2000 15:55:21 PDT you wrote
> > Derek Atkins wrote:
> > > 
> > > The problem with dead-peer detection is that you have no way to know
> > > how or why a peer lost contact.  You don't know if they got rebooted,
> > > lost power, lost connectivity, or some other reason.  Perhaps your
> > > peer's upstream provider fell off the net?  Or maybe it's a laptop and
> > > the laptop got unplugged and moved.  The problem is you can't tell.
> > > Worse, in the case of an intermittent lossage, you really don't want
> > > to reap the SAs, because the peer might come back.
> > 
> > I do not really care WHY it is dead. The end result is some clean up of
> > the SADB and aMaybe we need a "death certificate" CA :-) Intermittent
> > lossage can be solved by having a window of missed "heartbeats". Idle
> > timers could also help, but you would need to know the traffic type to
> > configure those correctly (kind of a layer violation).
> > 
> > > 
> > > If you're really that worried about the resources for "dead" SAs,
> > > negotiate relatively short rekey times, at which point if your peer
> > > disappears, the SA will timeout in a relatively short time.
> > > 
> > > As has been mentioned, 'keep-alives' really equate to 'make-dead'.  If
> > > you want to see if something is still there, why not just keep track
> > > of the last time a packet has been received on a particular SA and
> > > just send an ICMP ping_request?  The ping_response will tell you your
> > > peer is alive, and you can update your SA time-of-last-packet to the
> > > current time.  But what's the point of that?  Short key lifetimes
> > > solves this problem.
> > 
> > Short Key lifetimes do solve some of the problem, but can be expensive.
> > 
> > During the IKE rework, maybe some of these issues can be addressed. This
> > is quite an interesting problem to solve.
> > 
> > > 
> > > -derek
> > > 
> > > Scott Fanning <sfanning@cisco.com> writes:
> > > 
> > > > Bill
> > > >
> > > > Thats a good idea, but from an implementation point of view, I am not
> > > > sure if I like the idea of maintaining a timestamp for every packet (SA
> > > > Used) through a tunnel.
> > > >
> > > > I guess the problem I want to address with the heartbeats is dead-peer
> > > > detection, and as a result do action foo. INITIAL-CONTACT does help in
> > > > SADB sync'ing but is not authenticated and there is no assured delivery.
> > > > I think that Scotts point of auditing is a good side-effect of dead-peer
> > > > detection, and could also be tied to accounting, but I agree this is
> > > > outside the scope of the problem.
> > > >
> > > > Scott
> > > >
> > > >
> > > > Bill Sommerfeld wrote:
> > > > >
> > > > > > Yes, it was pointed out at the ipsra meeting that accounting is not a
> > > > > > requirement. However, what about auditing? For purposes of security
> > > > > > auditing, it is necessary to know when a remote access client
> > > > > > disconnects. Is this a valid requirement?
> > > > >
> > > > > Wouldn't keeping track of the last time an SA was used, and logging it
> > > > > into your audit trail when the SA expires or is deleted, be sufficient
> > > > > for auditing purposes?
> > > > >
> > > > >                                         - Bill
> > > 
> > > --
> > >        Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
> > >        Member, MIT Student Information Processing Board  (SIPB)
> > >        URL: http://web.mit.edu/warlord/      PP-ASEL      N1NWH
> > >        warlord@MIT.EDU                        PGP key available
> 

 --
Jan Vilhuber                                            vilhuber@cisco.com
Cisco Systems, San Jose                                     (408) 527-0847



References: