LB_FAILED event

Question

Please can someone tell me how long the LTM will wait, given that the selected server hasn't responded, before triggering the LB_FAILED event.
I have noticed that when people are using LB_FAILED for passive monitoring they also put in an LB:detach and LB:down. I don't understand why you use the LB:detach - is it just good housekeeping on the LTM? When we used the LB:down on our system there were pool members getting removed due to slow response even though they weren't really down/offline.
Oh, and sorry, but another thing. I'm using universal persistence as below:
when HTTP_REQUEST { 
set uri [HTTP::uri] 
set jsess [findstr $uri "jsessionid" 11 "?"] 
log local0. "Entering REQUEST, jsess is: $jsess" 
if { $jsess != "" } { 
persist uie $jsess 
} 
} 
when HTTP_RESPONSE { 
if { [findstr $uri "jsessionid"] &gt; 0}{ 
set jsess1 [findstr $uri "jsessionid" 11 "?"] 
log local0. "jsessionid found, jsess is: $jsess1" 
persist add uie $jsess1 
} 
}
The virtual server uses a second iRule for LB_FAILED events. I have seen other examples in the forum delete the persistence using a 'persist delete' statement - should I use this? I assumed that if a connection to a member fails with event LB_FAILED that that connection will be reselected (as below) and a new persistence record created?
when LB_FAILED {
set selected_server [LB::server addr]
if {$selected_server == ""} {
log local0. "No mdex node available"
}
else {
log local0. "Node: ${selected_server} not responding."
 Select another node
LB::reselect
}
}
Thanks, Keith

deb_allen_18 · Answer

Not sure what the timeout is before LB_FAILED is triggered, anybody else know?&nbsp;
&nbsp;The connection and persistence table relationships are not cleared if a selected node that looks UP fails to respond (thus triggering LB_FAILED).  Only a monitor marking the node DOWN clears the related server-side table entries.&nbsp;
&nbsp;So with OneConnect, LB::detach is necessary before a re-select to clear the connection table relationship of the client to the backend server, otherwise the same node will be re-selected. (might not be specific to OneConnect, but I think I'm remembering that correctly, somebody will straighten me out I'm sure)&nbsp;
&nbsp;Same idea re:  removing the persistence record in LB_FAILED when re-selecting.  Not sure if it's as necessary, but it would ensure the old persistence table relationship is cleared, then a new one would be created when the connection is re load balanced.  &nbsp;
&nbsp;(LB::down isn't well integrated with monitors yet, so I would recommend against using it in this situation, preferring instead good monitoring with appropriate intervals configured.)&nbsp;
&nbsp;HTH
&nbsp;/deb

jrahm · Answer

An oldie but goodie from Dr. Teeth:&nbsp;&nbsp;
&nbsp;LB_FAILED is triggered in a variety of circumstances. I'll list a few off the top of my head:
&nbsp;a) no pool selected
&nbsp;b) no available pool members in selected pool
&nbsp;c) no route to pool member
&nbsp;d) failed to connect to pool member&nbsp;
&nbsp;With regard to d). The "max retrans syn" option in the TCP profile will affect the timeout.&nbsp;

deb_allen_18 · Answer

Ah, nice work!  Didn't think of that, &amp; didn't realize that was in the profile.
So first retransmission if no response is typically 3 seconds, and typcial backoff timer algorithm is to double wait time after each failed attempt. I verified for LTM @ the command line, and it looks like max syn retrans is set to 5, producing the following progression and timing out after 93 seconds:Trying 172.24.2.200...
11:26:24.455153 172.24.2.41.55507 &gt; 172.24.2.200.http: S ...
11:26:27.452465 172.24.2.41.55507 &gt; 172.24.2.200.http: S ...
11:26:33.452465 172.24.2.41.55507 &gt; 172.24.2.200.http: S ...
11:26:45.452465 172.24.2.41.55507 &gt; 172.24.2.200.http: S ...
11:27:09.452481 172.24.2.41.55507 &gt; 172.24.2.200.http: S ...
11:27:57.452466 172.24.2.41.55507 &gt; 172.24.2.200.http: S ...
telnet: connect to address 172.24.2.200: Connection timed out
Looks like "Maximum Syn Retransmissions" is set to 4 in LTM's default tcp profile though, so LB_FAILED would be triggered if server didn't respond in 45 seconds:  1st SYN:  0
  2nd SYN: +3 seconds
  3rd SYN: +6 seconds
  4th SYN: +12 seconds
  5th SYN: +24 seconds
 ======================
 LB_FAILED: 45 seconds
I will update the LB_FAILED wiki page with details to that effect.
/deb

andrew_husking · Answer

Hey Deb, 
&nbsp;  
&nbsp; how did you do the test mentioned in your last post? 
&nbsp;  
&nbsp; We are having issues with a black box solution we are load balancing with our F5's, and we have an iRule in place with LB failed, and it is being hit a lot more than i would expect. now we don't see the members of the pool drop, just the log entry saying "LB failed" (which we told the iRule to log so we could see what the users saw). 
&nbsp;  
&nbsp; Any thoughts?

hooleylist · Answer

Hi Andrew, 
&nbsp;  
&nbsp; You can check the LB_FAILED wiki page for details: 
&nbsp;  
&nbsp; http://devcentral.f5.com/wiki/default.aspx/iRules/lb_failed 
&nbsp; LB_FAILED is triggered when LTM is ready to send the request to a pool member and one hasn’t been chosen (the system failed to select a pool or a pool member), is unreachable (when no route to the target exists), or is non-responsive (fails to respond to a connection request). 
&nbsp;  
&nbsp; How often do you see LB_FAILED triggered?  If it's fairly consistent you could try running a tcpdump to capture the communication.  I'd try capturing the client and serverside connection info in the trace so you can see exactly what's happening. 
&nbsp;  
&nbsp; Aaron

Forum Discussion

LB_FAILED event

8 Replies

Recent Discussions

import live updates from version x to version y

Tenant image upgrade

iRule editor partition button does not work

F5Access | MacOS Sonoma

Overwriting or adding LTM SSL Traffic cert and key using iControlREST

Related Content

iRule Event Order Flowchart

Security Automation with F5 BIG-IP and Event Driven Ansible

Leveraging Ansible Rulebooks for Streamlined Event Triggers: Advantages and Benefits

Log separation by event

Insights of F5 Distributed Cloud WAAP Events Export, New Operators and Trends features