Forum Discussion

JCMATTOS_41723's avatar
JCMATTOS_41723
Icon for Nimbostratus rankNimbostratus
Nov 15, 2007

Health Monitor Retries?

We are planning on using some of the monitors on our LTM 8400 9.4x. I noticed that you are able to change only two settings the intveral and timeout. I have two basic questions, my first question is what is the default retry threshold upon a failure? And secondly, is there a way to change the retry threshold, so that it fails a certain amount of times before taking a device out of service? Please help thanks!

5 Replies

  • The interval and timeout for a monitor are loosely defined as:

     

     

    interval: how often in seconds to send a request

     

     

    timeout: how long to wait for a successful response before marking the member down

     

     

    By default, these values are set to interval and timeout of 5 sec and 16 seconds (timeout = 3 x interval + 1). So the monitoring daemon starts a timer equal to the value of the timeout. It will send a request every five seconds--the length of the interval. When it receives a successful response, the countdown is reset to the timeout value. In this scenario, the node basically has three chances to respond before being marked down. Requests are still sent every interval even when the node is marked down. This allows for automatic resumption of use of the node when it responds correctly to a request. If you want to keep the node marked down even after it responds again, you can enable 'manual resume'.

     

     

    If you want to give the node more chances to respond before marking it down, you could extend the timeout length. Setting the interval and timeout to 5 and 31 would mean the node would get sent six requests before being marked down.

     

     

    Does this make sense?

     

     

    Aaron
  • However it doesn't appeaer that the timeout determines the number of polls. Or perhaps there is a set minimum of 3? If I set the interval to 5 and the timeout to 11 it still is 16 seconds before health will change.

     

  • Neither should be the case. If you enable debug logging, you should see bigd mark the node down after the timeout expires if the pool member hasn't responded to any of the requests.

     

     

    You can enable bigd debug by running 'b db bigd.debug enable' from the command line. The output is written to /var/log/bigdlog.

     

     

    Aaron
  • Setting an interval/timeout more frequent than the 5/16 defaults will many times result in a node flapping up and down, I've found. It depends on the app but in general I don't recommend lowering these.

     

     

    Denny