Forum Discussion

Brad_10289's avatar
Brad_10289
Icon for Nimbostratus rankNimbostratus
Apr 05, 2012

PROBLEM: Pool Member Won't Work Through Big IP LTM

Hi all, I was wondering if anyone might have some insight into a strange issue I'm seeing in our environment that I have had zero success in finding any solution or related posted issue about.

 

 

PROBLEM: Client can seemingly connect to Pool Member on HTTP (Port 80) via Virtual Server, but Pool Member will not honor GET request. Other Pool Member behaves perfectly as do all other Pool Members in other Pools/Virtual Servers. However clients are successfully able to directly connect to actual IP of offending Pool Member over HTTP.

 

 

BACKGROUND: Offending Pool Member is running on Windows 2003 server as a guest OS under VMWare ESX. HTTP server software/version is unknown by me at this point. Big IP version is 10.x.x.

 

 

TROUBLESHOOTING: Created new Virtual Server with both Pool Members for isolated testing as not to affect production environment. Confirmed using WGET under Windows Client that unable to connect to offending Member (eventually produces Read Error, Server Reset Connection error). Confirmed WGET to other Member works flawlessly. Confirmed using direct IP of offending Member works. All tests 100% consistent in result.

 

 

Logged into console via SSH and confirmed able to ping real IP of offending member. Likewise, had network administrator ping F5 and traceroute to F5 from offending member.

 

 

Confirmed able to TELNET to port 80 to real IP and working Member and successfully able to simulate GET request and receive HTML in response. When attempting to connect to offending member via TELNET through Virtual Server, connection is made but there is no response to GET request and connection eventually closes on its own.

 

 

From F5 shell via SSH, was able to successfully make GET requests via TELNET on port 80 of offending member via Virtual Server IP (as well as other servers/IPs).

 

 

This to me is suggesting there must be something going on between F5 and the client for this one specific pool member, since F5 can seemingly connect to offending member via virtual IP. NAT/SNAT are enabled.

 

 

Also tried to delete and re-add pool member to no avail.

 

 

Any input/advice greatly appreciated. Thank you.

 

 

ETA: Although this should go without mentioning given the information in bold above, I neglected to mention that the built-in default HTTP monitor does recognize the offending member is up and active. I may go ahead and try a modified monitor that looks for a specific response, but as mentioned, I can communicate with the server via TELNET on an SSH connection, so I don't believe there's an issue between F5 and the member.

 

13 Replies

  • I don't even want to begin to explain what we went through trying to isolate this ridiculous issue (went as far as to rebuild the standby box) and it turned out a simple server restart resolved it.
    • Hi Brad, thanks a lot for the detailled analysis and the "solution". One question please: Which server was restarted? Just the Windows server or the ESX as well? Thanks in advance, Stephan
  • We are far from 2012 but I just fall in this thread and after some analysis, I decided to answer with the problems identified in this scenario:

    1.  SYN packet from F5 self-ip, server-side, not answered:
    

    09:59:05.152362 arp who-has XXX.XXX.XXX.153 tell XXX.XXX.XXX.142 out slot1/tmm0 lis=

    09:59:05.152940 arp reply XXX.XXX.XXX.153 is-at 00:50:56:84:27:44 in slot1/tmm1 lis=

    09:59:08.151357 IP XXX.XXX.XXX.144.51298 > XXX.XXX.XXX.153.http: S 2929947028:2929947028(0) win 4380 out slot1/tmm0 lis=NOS_Test

    09:59:11.351549 IP XXX.XXX.XXX.144.51298 > XXX.XXX.XXX.153.http: S 2929947028:2929947028(0) win 4380 out slot1/tmm0 lis=NOS_Test

    09:59:14.551543 IP XXX.XXX.XXX.144.51298 > XXX.XXX.XXX.153.http: S 2929947028:2929947028(0) win 4380 out slot1/tmm0 lis=NOS_Test

    09:59:17.751434 IP XXX.XXX.XXX.144.28570 > XXX.XXX.XXX.153.http: S 1018696290:1018696290(0) win 4380 out slot1/tmm0 lis=NOS_Test

    09:59:20.751355 IP XXX.XXX.XXX.144.28570 > XXX.XXX.XXX.153.http: S 1018696290:1018696290(0) win 4380 out slot1/tmm0 lis=NOS_Test

    We can see the “BAD/offending” server is not responding to SYN packets from F5 with source address “XXX.XXX.XXX.144”. Reasons:

    •   Assuming there is not Firewall ACL at the “BAD/offending server” (I would check this to be sure)
    •   There is a routing problem in the “BAD/offending” server:
        o   Either is responding to the SYN packet thru another interface/GW, which definitely not the F5
        o   It lacks a default route and, in case the F5 source address inside the SYN packet is in another subnet, then the “BAD/offending” in unable to respond to the SYN packet
    

    So I would probably review the routing configuration on the offending server in order to solve this issue.

    KR, Francisco