[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: my presentation on heartbeats



Howdy,
	So I just realized that my 'slides' will be terribly vague to people
who did not hear me speak at the WG meeting. So I will intersperse
'speakers notes' into the presentation.



Ricky Charlet wrote:
> 
> Howdy,
>     Below is the text of the heartbeat presentation I made at the ipsec
> WG meeting. Is this the real problem? If so, is this the right way to
> rank cantidate solutions.
> 
> --
> Ricky Charlet        rcharlet@redcreek.com     usa 510-795-6903
> 
> ===========================================
> slide 1
>                Ricky Charlet
> 
>                Redcreek Communications
>                rcharlet@redcreek.com
> 
> ============================
> slide 2.          the problem
> 
>  black hole detection
>    for redundancy/error messaging
>    for resource recovery
>    for time based accounting

	Here I claim there is only one problem that we are trying to solve with
heartbeats, 'black hole detection'. But we all have differing
motivations for wanting black-hole-detection. These motivations are at
least the three bullets above. If any one has more reasons for wanting
to know about a black hole, then please provide feedback. If anyone
thinks that our problem space is bigger than just black hole detection,
then please provide feedback.
	Note that if your only motivation for black hole detection is resource
recovery, that is, you don't care about time accounting or taking  quick
recovery actions, then your time granularity to dectection can be in
minutes. Otherwise your time granularity to detection needs to be in
seconds. For example, a gatway terminating many thousands of clients,
may not care about contacting secondary clients if the primary is down.
So a simple inactivity timeout at about 10~15 minutes without active
heartbeats would suffice there.



> 
> ==============================
> slide 3.      problem reduction
> 
> If you trust your own list of SPIs,
>   then you only need to know about peer reachablility.
> 
>  o current authenticated conversation on any  phase 1 or 2 SA  proves
> peer is still there.
> 
>  o on a silent but good connection an authenticated  hello exchange over
> any single  phase 1 or 2 SA  proves the peer is still there.


	Here I claim that we do not need black-hole-detection per SA, only per
peer. Current traffic on any SA (P1 or P2) from a peer proves that peer
is still reachable. And a hello exchange on any P1 or P2 SA proves the
peer is still reachable. 
	Although a large motivator for these heatbeat threads was to help solve
SAD desychronozation, I claim that is not really in our problem space.
IKE can be, and is being amended to improve the integrity of SAD
sychronozation.

> 
> ===============================
> slide 4.    criteria
> 
>  o variable granularity to detect within seconds, or detect within
> minutes
> 
>  o scales to thousands of connections
>    ie. does not take a lot of work
> 
>  o low cost to implement (simple)


	Here, I hope to propose criteria to judge which solution to pick. If
you caught the last bit of the last slide, you noticed there there are
at least two ways to prove a peer is still reachable, through phase 1
and through phase 2. At least 4 propsals I have noticed on the list
offer cantidate mechanisms to do heartbeats. On the next slide I
scoreboard the proposals. Notice that my idea of using a ping inside of
'hijacked' P2 SAs, scores very low.  bummer :-(


> 
> ===============================
> slide 5.   score board
> 
>  o P2 conditional pings inband:
>      - moderate scaling, high cost of implementation
>  o P1 tell your peer to send hellos and keep sliding windows:
>     -  poor scaling, high cost of implementation
>        (perhaps scaling properties are fixable)
>  o P1 conditional  send hellos
>     - good scaling, low cost of implementation
>       (new 'hello' notify packet, hello process)
>  o P2 new transport SA to carry pings
>    - poor  scaling, low cost of implementation
>       (ping process extra cost of config work)
> 
> ===================================
> slide 6 Darts?
> 
>  Any challenges to my claims?
> 
> ==================================


References: