Forum Discussion

LeeG_118759's avatar
LeeG_118759
Icon for Nimbostratus rankNimbostratus
May 21, 2013

Take Pool Member offline for service upgrade

Disclaimer: I may not be using correct terminology. (Pool member environment: Windows Server 2008 R2)

 

We have a pool named LF_Pool, with two members: 10.16.20.83:9080 and 10.16.20.84:9080

 

I want to upgrade my service running on port 9080 for each pool member regularly. I don't want to just hap-hazardly just stop the service, perform the upgrade and restart the service... I want to try and do it elegantly without causing any down-time or service interruption.

 

So we found the iControl Powershell cmdlets and we spent the entire day testing them out to see if we can achieve our goal. But we can't figure out how to get this working correctly. Sure, we were easily successful in running the Powershell cmdlets (albeit the iControlSnapInReadme.txt file is dissappointingly incorrect in some areas.)

 

We successfully ran "Set-F5.LTMPoolMemberState -Pool LF_Pool -Member 10.16.20.83:9080 -State Offline"... and it appears to be offline... but I am guessing my understanding of Offline is not correct. When I attempt to call the service on that Pool member, it readily and happily continues to work... but I don't want any new requests to be allowed into that pool member. (My service is a very simple HTTP REST service; my client application is just a web browser (Chrome) that displays the JSON result).

 

How can we get the pool member to be "offline"... meaning, it can still return the results of an in-progress transaction... but absolutely no new incoming connections period until I permit it. What's the secret?

 

P.S. We already tried the PsServerControl script to see if that would work... but it still doesn't do what I want. At least we are pretty certain it doesn't work the way we want it to.

 

Thanks, Lee

 

6 Replies

  • Are you using any persistence? When you test a service call are you sure a new connection is established?

     

     

    If the Pool is configured with the default Action on Service Down setting of None then established connections remain (as long as data is sent and received) and possibly new ones can be establish because of persistence.

     

     

    If you don't use persistence then changing the setting to Reject is your best bet as TCP will recover quickly and connections to the offline server are removed quickly.

     

     

    If you are using persistence: a) do you really need to b) using OneConnect helps connections move more quickly and c) Reject is still an option as TCP recovery mechanisms may still mean 0 data loss and seamless recovery.
  • Thanks Steve, I think those two ideas are tremendously helpful. I'm not explicitly using persistence in my real-application, but when testing from a web browser, the browser itself is... so if I instead test from Fiddler, the request will be more pure and not include the "KeepAlive" header. So that is what I'll try next. Plus, your suggestion of changing the "Service Down" setting to 'Reject' is also a very good idea. I am not the admin of the F5 balancer... so I have to schedule some time with the Admin to try these ideas. I'll try and let you know how things work out.

     

    --

     

    Lee
  • Well, we had mostly success, but with a two points of wonder. First, I tested my web service using Fiddler (a simple HTTP GET query) without altering the "Action On Service Down" setting (e.g. left it as "None"). When we set one of the nodes in the pool Offline, Fiddler still continued to communicate with that offline node. I don't believe Fiddler is sending any "Keep-Alive" header or any other header to keep persistence... so I'm ignorant as to what is keeping the connection alive.

     

     

    Then... when we changed the "Action On Service Down" in the BIG-IP settings window to "Reject"... my tests started working as expected. Steve, do you feel this is the recommended way to achieve taking a pool member down for maintenance? Do you recommend an alternative way? I'm not the Admin of the BIG-IP... just a developer, so I need a little hand-holding. Thanks.

     

    -Lee
  • OK, we're still missing a complete picture here. Is persistence configured on the F5 itself? If so that may explain the continued use of the offline node.

     

    Is Fiddler establishing a new TCP connection when you test after marking the node offline? It's not a valid test if it's using an existing connection. This only applies if there is no persistence in play on the F5.

     

     

    I can't really determine the best approach without understanding the above two points and knowing a bit more about the application itself. The reject setting is certainly valid if the application recovers/deals with this well. Happy to help.
  • Brent_West_7733's avatar
    Brent_West_7733
    Historic F5 Account
    What protocol is the Server using? Keep-Alives are default in HTTP/1.1. If you wish to close a connection after each response, you can change this behavior on the server or have the BigIP include a "Connection: close" header via an iRule.

     

     

    The "Action on Service Down" will continue to select a down member if the TCP connections on both sides of the proxy are still valid, i.e. a reset hasn't been issued by the server or client.

     

    Setting the pool action to "Reject" will cause a reset packet to be sent to the client, forcing a new TCP connection to be established, and a new load-balancing decision to be made (unless there is a persistence record.)

     

     

    Additionally, there is a difference between "Disabled" and "Offline"

     

    Offline still selects an active connection, but will not honor a persistence record

     

    Disabled will honor a persistence record as well as active TCP connections
  • Thanks Steve and Brent. Sorry for the late reply... I'm not the admin on the BigIP, so I have to coordinate the timing to investigate.

     

     

    Steve, I am told we do not have a persistence profile in place. But when I get a chance again to coordinate with my Ops team, I will be sure to take a close and thorough look. I am not certain how Fiddler is working under the hood, but my knee jerk thought would be to presume it doesn't hold an open connection... but again I'm not sure. Ultimately, my "real" client will be a web page making the call via javascript, so that client should have fine-grained control (again another guess). When we changed the "Action on Service Down" setting to "Reject"... the entire system behavior did indeed work the way we expected. But because I have so many gaps in my knowledge of all the parts of the system (the HTTP server, the BigIP balancer, the client code, HTTP and TCP protocols in general) I am ignorant of a great many things. I most graciously appreciate your help.

     

     

    Brent, Yes it's HTTP/1.1 (ASP.NET Web API self-hosting service .Net 4.0/4.5... it's not hosted in IIS). I don't currently know how or even /if/ I can modify the self-hosting service properties. This is my first foray in many of these technologies, so my knowledge is very limited. I read about the iRule features, although I have no first-hand experience. I will make a note of that possibility and discuss it with my Operations guy (he's overworked and under-staffed at-the-moment). Also, thank you for the extra explanation of the differences between "Disabled" and "Offline". It seems that perhaps we might indeed have a "Persistence Profile" and just don't know it. I will re-inquire about that and look more thoroughly.

     

     

    Thank you to both of you, and I will report back my findings as they come in.

     

    Lee

     

     

    The server protocol is indeed HTTP/1.1, but I'm not sure I fully understand how Keep-Alives work. I though they were initiated solely by the client... but perhaps I'm completely wrong. As for the "Action on Service Down" setting