Forum Discussion

Yehuda_Pinchas's avatar
Yehuda_Pinchas
Icon for Nimbostratus rankNimbostratus
Sep 30, 2015

The pool members up/down for no reason

Dear F5 guys,

 

We have a serious problem, where pools are fluctuating UP/Down. When all pool members are down, the pool will be down and also VS will be down. Users are complaining as they can't access systems. This happens randomly at different times and not related to working hours at all. The performance of the servers is not stifled.

 

thank you yehuda.

 

8 Replies

  • Either you have a network issue which is causing your monitors to fail, OR your servers are having problems you aren't aware of OR the monitors themselves need to be tuned to be more appropriate to your specific servers.

    A monitor will fail if;

    • the server does not respond
    • the server does not respond in time
    • the server does not respond with the expected response

    You need to perform a tcpdump to ascertain which one (or more) of these is causing your issue. The following will display hex/ascii of healthcheck traffic on the screen;-

    tcpdump -i0.0:nnn -s0 -XXX -vv host x.x.x.x and port yy
    

    where x.x.x.x is the non-floating self-ip of your LTM, and yy is the port used by your pool.

    Or if that causes too much traffic you can output to file and view in Wireshark;-

    tcpdump -i0.0:nnn -s0 -w /var/tmp/health.cap host x.x.x.x and port yy
    

    What kind of monitors are you using?

  • Hi, This is known issue of servers(Windows/Unix, etc). I guess server agent(Services) are failing. Please restart server one by one. Problem will resolved.

     

    Regards, Samir

     

  • We've also seen this happen with garbage collection issues on the servers, although there has been impact on server performance at these times.

     

  • its happened due to memory buffer overflow in server. You just reboot server on role on basis(only option).

     

    • Yehuda_Pinchas's avatar
      Yehuda_Pinchas
      Icon for Nimbostratus rankNimbostratus
      I tested it, it's not a problem of performance. Monitor falls because he loses Ping. The problem persists. other applications work properly.
  • To start with, you can try to change the monitor from L7(assuming you are on HTTP/HTTPS) to L3 (ICMP), so that you can be sure about whether the problem lies in network connectivity OR the service itself. You can use ICMP and HTTP both monitors at the same time. Just select "at least one" option for getting logical OR between the two monitors.

     

  • Hey,

     

    This problem rises if you are using iCMP as helth monitor. Sometimes echo reply-request RT cause this problem. You disable the node level health check and enable custom health monitor (say http/https/tcp/udp) in the pool level. It will work.

     

    Let me know if above solution work for you.

     

    -Jinshu

     

  • Just ran into something similar during LTM deployment. One of self IPs had a duplicate on the network.