Forum Discussion

Mat_126116's avatar
Mat_126116
Icon for Nimbostratus rankNimbostratus
Nov 07, 2014

F5 is not flushing expired TCP connections

Hi All,

 

Last days I noticed that due to unknown factor BigIP F5 load balancer is not flushing expired TCP connections on one VIP. Connections stuck in CLOSE-WAIT/LAST-ACK and FIN-WAIT/CLOSING state. As a result connection table grows to the point where there are no free ports available to handle legitimate traffic for new client’s connections. Customer experienced issue when number of connection which stuck in CLOSE-WAIT/LAST-ACK and FIN-WAIT/CLOSING is around 90k. What is strange in my opinion becasue we have two pool members and also use snat pool on this VIP. Snat pool consist of two ip addresses. In my opinion customer should be affected due to port exhoustion when the number of stuck connections will be around 258000: here is my calculation: TCP protocol allows to use 2^16-1=65535 ports minus reserved ports which gives 64511 available port numbers. We use two ip addresses in snat pool. One ip address is used towards two pool members theoretical number of free ports from one ip address should be 2x64511= 129022. Beacuse we have two addresses on snat it should be 2x129022 = 258044. Therefore I have two major questions to resolve. 1) Why connections stuck in CLOSE-WAIT/LAST-ACK and FIN-WAIT/CLOSING state for a long time and are not cleaned 2) Why customer experienced issue when value is similar to 90k.

 

Moreover due to unknown factor mentioned issue are not only related to one customer but also have affection on other customers (different context, same loadbalancer). On the affected VIP we use standard tcp as Protocol Profile (Client) and no Protocol Profile (Server).

 

some current statistic from affected vip:

 

tmsh show ltm virtual ... details

 

Ltm::TCP Profile: tcp

 

Connections Open 194 Current in CLOSE-WAIT/LAST-ACK 37.6KCurrent in FIN-WAIT/CLOSING 37.6K Current in TIME-WAIT 2 Accepted 8.4M Not Accepted 0 Established 7.7M Failed 377 Expired 458.2K Abandoned 132

 

If someone had met with something similar and will be able to help I will be very grateful!

 

6 Replies

  • Hi Mat,

     

    Do you have any custom tcp or udp profile on this VS? What have you set the connection time out on those profiles. If you have not set any custom profile then I believe you should get it checked via F5 Support and share your findings...

     

    Regards,

     

  • Hi,

     

    We are using default tcp profile (no custom profiles). We've just created case to F5 support, waiting for their findings.

     

    regards, Mat

     

  • Yes, we finally found the root cause toghether with F5 support but it was not entirely related to F5 configuration.

     

    We took packet captures and found that the pool member is sending fin/ack again and again for every 120 secs on same TCP connection stream.

     

    It seems F5 was in FIN/WAIT-2. FIN/WAIT-2 state are handled by the Idle Timeout setting (300 secs). F5 should have sent fin/ack to the client and go the fin/wait2 state.

     

    The fin/ack from the pool member, reset the counter at F5 TCP idle-time out and the connection is never removed from F5 connection table.

     

    There is no need for the pool member to retransmit the fin/ack every 120 secs, since pool member already received the ack from F5 for the fin/ack.

     

    Resolution: We create a new tcp profile with idle timeout of 110 secs. It seems that it fixed the connection table growth.

     

    • melcaniac's avatar
      melcaniac
      Icon for Cirrus rankCirrus
      We had a similar problem, where the application was not properly closing TCP connections due to memory issues. In response to the FIN packets sent by the F5, the application server sent back TCP Window Full responses (the window size was already down to 34) and was holding the connection open with ZeroWindowProbes every 60 seconds which would the ACKed by the F5. Eventually this filled up the available ephemeral ports on the server and the F5 was blamed. Even though we proved this was a memory issue with the application, we were able to remediate the issue by using a TCP profile with an Idle Timeout of 55 seconds.