F5 Friday: Gracefully Scaling Down

What goes up, must come down. The question is how much it hurts (the user).

An oft ignored side of elasticity is scaling down. Everyone associates scaling out/up with elasticity of cloud computing but the other side of the coin is just as important, maybe more so. After all, what goes up must come down. The trick is to scale down gracefully, i.e. to do it in such a way as to prevent the disruption of service to existing users while simultaneously trying to scale back down after a spike in demand.

The ramifications of not scaling down are real in terms of utilization and therefore cost. Scaling up with the means to scale back down means higher costs, and simply shutting down an instance that is currently in use can result in angry users as service is disrupted. What’s necessary is to be able to gracefully scale down; to indicate somehow to the load balancing solution that a particular instance is no longer necessary and begin preparation for eventually shutting it down. Doing so gracefully requires that you are somehow able to quiesce or bleed off the connections. You want to continue to service those users who are currently connected to the instance while not accepting any new connections.

This is one of the benefits of leveraging an application-aware application delivery controller versus a simple Load balancer: the ability to receive instruction in-process to begin preparation for shut down without interrupting existing connections.


BIG-IP users have always had the ability to specify whether disabling a particular “node” or “member” results in the rejection of all connections (including existing ones) or if it results in refusing new connections while allowing old ones to continue to completion. The latter technique is often used in preparation for maintenance on a particular server for applications (and businesses) that are sensitive to downtime. This method maintains availability while accommodating necessary maintenance.

In version 10.2 of the core BIG-IP platform a new option was introduced that more easily enables the process of draining a server/application’s connections in preparation for being taken offline. Whether the purpose is maintenance or simply the scaling down side of elastic scalability is really irrelevant; the process is much the same.

Being able to direct a load balancing service in the way in which connections are handled from the application is an increasingly important capability, especially in a public cloud computing environment because you are unlikely to have the direct access to the load balancing system necessary to manually engage this process. By providing the means by which an application can not only report but direct the load balancing service, some measure of customer control over the deployment environment is re-established without introducing the complexity of requiring the provider to manage the thousands (or more) credentials that would otherwise be required to allow this level of control over the load balancer’s behavior.


For specific types of monitors in LTM (Local Traffic Manager) – HTTP, HTTPS, TCP, and UDP – there is a new option called “Receive Disable String.” This “string” is just that, a string that is found within the content returned from the application as a result of the health check.

In phase one we have three instances of an application (physical or virtual, doesn’t matter) that are all active. They all have active connections and are all receiving new connections. In phase two a health check on one server returns a response that includes the string “DISABLE ME.” BIG-IP sees this and, because of its configuration, knows that this means the instance of the application needs to gracefully go offline. LTM therefore continues to direct existing connections (sessions) with that instance to the right application (phase 3), but subsequently directs all new connection requests to the other instances in the pool (farm, cluster).

When there are no more existing connections the instance can be taken offline or shut down with zero impact to users.

The combination of “receive string” and “receive disable string” impacts the way in which BIG-IP interprets the instruction. A “receive string” typically describes the content received that indicates an available and properly executing application. This can be as simple as “HTTP 200 OK” or as complex as looking for a specific string in the response. Similarly the “receive disable” string indicates a particular string of text that indicates a desire to disable the node and begin the process of bleeding off connections. This could be as simple as “DISABLE” as indicated in the above diagram or it could just as easily be based solely on HTTP status codes. If an application instance starts returning 50x errors because it’s at capacity, the load balancing policy might include a live disable of the instance to allow it time to cool down – maintaining existing connections while not allowing new ones. Because action is based on matching a specific string, the possibilities are pretty much wide open. The following table describes the possible interactions between the two receive string types:



One of the ways in which a provider could leverage this functionality to provide differentiated value-added cloud services (as Randy Bias calls them) would be to define an application health monitoring API of sorts that allows customers to add to their application a specific set of URIs that are used solely for monitoring and can thus control the behavior of the load balancer without requiring per-customer access to the infrastructure itself. That’s a win-win, by the way. The customer gets control but so does the provider. 

Consider an health monitoring API that is a single URI: http://$APPLICATION_INSTANCE_HOSTNAME/health/check.

Now provide a set of three options for customers to return (these are likely oversimplified for illustration purposes, but not by much):


For all application instances the BIG-IP will automatically use an HTTP-derived monitor that calls $APP_INSTANCE/health/check and examines the result. The monitor would use “ENABLE” as the “receive string” and “QUIESCE” as the “receive disable” string. Based on the string returned by the application, the BIG-IP takes the appropriate action (as defined by the table above). Of course this can also easily be accomplished by providing a button on the cloud management interface to do the same via iControl, but this option is more able to be programmatically defined by customers and thus is more dynamic and allows for automation. And of course such an implementation isn’t relegated only to service providers; IT organizations in any environment can take advantage of such an implementation, especially if they’re working toward an automated data center and/or self-service provisioning/management of IT services.

That is infrastructure as a service.

Yes, this means modification to the application being deployed. No, I don’t think that’s a problem – cloud and Infrastructure as a Service (IaaS), at least real IaaS is going to necessarily require modifications to existing applications and new applications will need to include this type of integration in the future if we are to take advantage of the benefits afforded by a more application aware infrastructure and, conversely, a more infrastructure-aware application architecture.

Related Posts

Published Aug 06, 2010
Version 1.0

Was this article helpful?

1 Comment

  • FYI. I wrote a small tcp daemon, that will be used as additional health check. The daemon provides nice server side setup for system admins to announce state of the service manually, or by using other software.






    The code is as production ready as any open source, i.e., if it does not work I can try to fix the issue(s), and will definitely welcome feedback and patches.