Forum Discussion

Gustavo_Randich's avatar
Gustavo_Randich
Icon for Nimbostratus rankNimbostratus
Feb 04, 2013

long-lived TCP connections being RST

 

Hello,

 

 

We can't make AMQP permanent long-lived TCP connections to a RabbitMQ pool work smootly with LTM. The LTM keeps dropping connections with a TCP RST payload of "TCP 3WHS rejected", with the application left in an unrecoverable socket state needing a restart.

 

 

We've read thoroughly this: http://support.f5.com/kb/en-us/solutions/public/9000/800/sol9812.html, and played with different values for TCP Profiles, including Idle Timeout, Reset on Timeout, Keep Alive Interval, Max Syn/Segment Retransmission, etc. without success. We cannot change LTM 'Global Settings', but we discarded as possible causes virtual server connection limits, available pool members or limits, or iRules or SNAT problems.

 

 

We also can't make SSH connections stay open for more than 1 hour, regardless of user activity, but in this case we've not observed the TCP RST payload.

 

 

 

We are looking for a recipe (TCP Profile?) to make a virtual server "intouchable", making it's connections undroppable, or droppable, say, only once a day. Has someone ever used LTM as a SSH / RabbitMQ balancer with a, e.g., 6-hour inactivity tolerance?

 

 

Thanks in advance,

 

 

 

 

3 Replies

  • OK, so...

     

    1) The RST is sent from the F5 to the real server, not to the client right?

     

    2)Is SNAT involved in any way?

     

    3) What's the highest Idle Time you've tried?

     

    4) How long will a connection stay up for?

     

    5) Are monitors failing at all?

     

    6) Doesn't the client use a keepalive?

     

    7) Have you used the same profile settings client and server-side?

     

    8) Are there any firewalls in path?

     

     

    That article doesn't actually list all the possible reasons for a RST, there are others (such as unmirrored connections during a failover) but I'm still working on a complete list.

     

     

    I've had applications using long lived connections without any issues but I'm pretty sure most of them had at least an hourly keepalive of one kind or another.
  • >> 1) The RST is sent from the F5 to the real server, not to the client right?

     

     

    The RST is sent from the F5 to the client; the real server never receives an RST from F5. (If we bypass the F5, the client never receives an RST from the real server)

     

     

    >> 2)Is SNAT involved in any way?

     

     

    We are using a dedicated SNAT Pool for this virtual server.

     

     

    >> 3) What's the highest Idle Time you've tried?

     

     

    We've tried 86400 seconds, client and server profile.

     

     

    >> 4) How long will a connection stay up for?

     

     

    In the range from 20 to 120 minutes every one of the 16 clients we have receive an RST (TCP 3WHS rejected)

     

     

    >> 5) Are monitors failing at all?

     

     

    Default tcp monitors are all OK

     

     

    >> 6) Doesn't the client use a keepalive?

     

     

    The client communicates every 5 minutes using AMQP

     

     

    >> 7) Have you used the same profile settings client and server-side?

     

     

    We've tried same profile settings on both sides, and at times using slightly different keepalive intervals for server/client, without success

     

     

    >> 8) Are there any firewalls in path?

     

     

    There are no firewalls

     

     

     

    The AMQP client loses its link with the queue server and logs RPC read timeouts, failing to consume messages, regardless of the socket connection being still open and the heartbeat still working.

     

     

    In the case of SSH, the connection simply gets dropped.

     

     

    Thanks you again

     

  • Hmmm. All looks good and the profile settings are fine and should be honoured. I wonder, did you also configure the SNAT Idle Timeout to match the TCP profile Idle Timeout?

     

     

    Have you tried turning on logging (obviously use with caution), details here: http://support.f5.com/kb/en-us/solutions/public/13000/200/sol13223.html

     

     

    Also, can you do a tcpdump client-side and see if the client does anything unusual prior to RST being sent.