HTTP Pipelining: A security risk without real performance benefits
Everyone wants web sites and applications to load faster, and there’s no shortage of folks out there looking for ways to do just that. But all that glitters is not gold, and not all acceleration techniques actually do all that much to accelerate the delivery of web sites and applications. Worse, some actual incur risk in the form of leaving servers open to exploitation. A BRIEF HISTORY Back in the day when HTTP was still evolving, someone came up with the concept of persistent connections. See, in ancient times – when administrators still wore togas in the data center – HTTP 1.0 required one TCP connection for every object on a page. That was okay, until pages started comprising ten, twenty, and more objects. So someone added an HTTP header, Keep-Alive, which basically told the server not to close the TCP connection until (a) the browser told it to or (b) it didn’t hear from the browser for X number of seconds (a time out). This eventually became the default behavior when HTTP 1.1 was written and became a standard. I told you it was a brief history. This capability is known as a persistent connection, because the connection persists across multiple requests. This is not the same as pipelining, though the two are closely related. Pipelining takes the concept of persistent connections and then ignores the traditional request – reply relationship inherent in HTTP and throws it out the window. The general line of thought goes like this: “Whoa. What if we just shoved all the requests from a page at the server and then waited for them all to come back rather than doing it one at a time? We could make things even faster!” Tada! HTTP pipelining. In technical terms, HTTP pipelining is initiated by the browser by opening a connection to the server and then sending multiple requests to the server without waiting for a response. Once the requests are all sent then the browser starts listening for responses. The reason this is considered an acceleration technique is that by shoving all the requests at the server at once you essentially save the RTT (Round Trip Time) on the connection waiting for a response after each request is sent. WHY IT JUST DOESN’T MATTER ANYMORE (AND MAYBE NEVER DID) Unfortunately, pipelining was conceived of and implemented before broadband connections were widely utilized as a method of accessing the Internet. Back then, the RTT was significant enough to have a negative impact on application and web site performance and the overall user-experience was improved by the use of pipelining. Today, however, most folks have a comfortable speed at which they access the Internet and the RTT impact on most web application’s performance, despite the increasing number of objects per page, is relatively low. There is no arguing, however, that some reduction in time to load is better than none. Too, anyone who’s had to access the Internet via high latency links can tell you anything that makes that experience faster has got to be a Good Thing. So what’s the problem? The problem is that pipelining isn’t actually treated any differently on the server than regular old persistent connections. In fact, the HTTP 1.1 specification requires that a “server MUST send its responses to those requests in the same order that the requests were received.” In other words, the requests are return in serial, despite the fact that some web servers may actually process those requests in parallel. Because the server MUST return responses to requests in order that the server has to do some extra processing to ensure compliance with this part of the HTTP 1.1 specification. It has to queue up the responses and make certain responses are returned properly, which essentially negates the performance gained by reducing the number of round trips using pipelining. Depending on the order in which requests are sent, if a request requiring particularly lengthy processing – say a database query – were sent relatively early in the pipeline, this could actually cause a degradation in performance because all the other responses have to wait for the lengthy one to finish before the others can be sent back. Application intermediaries such as proxies, application delivery controllers, and general load-balancers can and do support pipelining, but they, too, will adhere to the protocol specification and return responses in the proper order according to how the requests were received. This limitation on the server side actually inhibits a potentially significant boost in performance because we know that processing dynamic requests takes longer than processing a request for static content. If this limitation were removed it is possible that the server would become more efficient and the user would experience non-trivial improvements in performance. Or, if intermediaries were smart enough to rearrange requests such that they their execution were optimized (I seem to recall I was required to design and implement a solution to a similar example in graduate school) then we’d maintain the performance benefits gained by pipelining. But that would require an understanding of the application that goes far beyond what even today’s most intelligent application delivery controllers are capable of providing. THE SILVER LINING At this point it may be fairly disappointing to learn that HTTP pipelining today does not result in as significant a performance gain as it might at first seem to offer (except over high latency links like satellite or dial-up, which are rapidly dwindling in usage). But that may very well be a good thing. As miscreants have become smarter and more intelligent about exploiting protocols and not just application code, they’ve learned to take advantage of the protocol to “trick” servers into believing their requests are legitimate, even though the desired result is usually malicious. In the case of pipelining, it would be a simple thing to exploit the capability to enact a layer 7 DoS attack on the server in question. Because pipelining assumes that requests will be sent one after the other and that the client is not waiting for the response until the end, it would have a difficult time distinguishing between someone attempting to consume resources and a legitimate request. Consider that the server has no understanding of a “page”. It understands individual requests. It has no way of knowing that a “page” consists of only 50 objects, and therefore a client pipelining requests for the maximum allowed – by default 100 for Apache – may not be seen as out of the ordinary. Several clients opening connections and pipelining hundreds or thousands of requests every second without caring if they receive any of the responses could quickly consume the server’s resources or available bandwidth and result in a denial of service to legitimate users. So perhaps the fact that pipelining is not really all that useful to most folks is a good thing, as server administrators can disable the feature without too much concern and thereby mitigate the risk of the feature being leveraged as an attack method against them. Pipelining as it is specified and implemented today is more of a security risk than it is a performance enhancement. There are, however, tweaks to the specification that could be made in the future that might make it more useful. Those tweaks do not address the potential security risk, however, so perhaps given that there are so many other optimizations and acceleration techniques that can be used to improve performance that incur no measurable security risk that we simply let sleeping dogs lie. IMAGES COURTESTY WIKIPEDIA COMMONS4.5KViews0likes5CommentsLoad Balancing on the Inside
Business critical internal processing systems often require high-availability and fault tolerance, too. Load balancing and application delivery is almost always associated with scaling out interactive, web-based applications. Rarely does anyone think about load balancing and application delivery in batch processing systems even when those systems might be critical to the business they are supporting. But scaling out non-interactive processing systems and providing high-availability to such critical systems is just as easily accomplished for an application delivery controller (ADC) as it is to scale out an interactive web-based application. Maybe easier. When that system also requires a bit more intelligence than just simple load balancing, it makes a lot of sense to look closer at a context-aware system that can support all the requirements in a single solution. THE SCENARIO A batch document processing system uses a document ID to match all related documents to the same “case.” The first time a document ID is encountered, it creates a new “case” and subsequent documents bearing that ID are attached to the original case. To ensure processing around the clock, a redundant set of application servers is configured to process the documents, and the vendor’s application server clustering solution is used to load balance documents (in simple round-robin fashion) across the two instances. A load test is conducted, ramping up to 2500 documents per hour (41 per minute, fewer than 1 per second). During the test it is discovered that in some situations two documents with the same ID will arrive at the clustering solution in order. They will each be load balanced to separate instances. There is no existing “case” for this document id. Because of processing times and load on the servers, both documents result in the creation of separate “cases.” The test is considered a failure. Because the system, while managing the load fine from a network perspective, executed incorrectly under load from a process perspective. The solution? Reconfigure the clustering solution to an active-standby configuration, thus introducing the process latency needed to ensure that the scenario does not occur. Retest. Success. The result? The investment in the second instance of the application server – hardware, software licenses, management, maintenance – is wasted. It is a “failover” node only and reduces the overall capacity – and ultimately performance at higher load levels – of the system. WHEN CONTEXT MATTERS This scenario is real; it was described to me by a program manager at a Fortune 500 with a great deal of frustration as it seemed, to her anyway, that the architects could not come up with a working solution other than wasting a perfectly good set of resources. Instinctively she described a solution that leveraged persistence to force all documents with the same ID to the same server as it had been proven repeatedly that if all documents with the same ID were processed by the same application server that the system processed them correctly and associated them with the right “case” in all situations. But the application server clustering solution, which can provide server affinity (persistence) based on a few variables, was for some reason not able to support affinity (persistence) based on the document ID. After a few questions regarding the overall system and processing times it became clear that a context-aware application delivery controller could indeed solve this problem. The solution is fairly simple, actually, and based on existing persistence-based load balancing solutions. It is a given that documents with the same ID are batch processed within minutes of each other. Thus, a persistence table with a life of an hour or even thirty-minutes would provide the proper context in which documents could be processed and directed to the “right” web application server. This requires context; it requires that the load balancing solution, the application delivery controller, be aware of not only what it is processing but what it has processed already, and where it’s been sent. Document ID Based Persistence Logic Extract the document ID from the document Check the persistence table for the document ID If the document ID already exists, route the document to the same server as the previous document(s) with that ID If the document ID does not exist, decide which server the document will be sent to for processing and create an entry in the persistence table Wash. Rinse. Repeat. This problem is really about process level execution; about enforcing a business requirement on the technological implementation. In order to achieve compliance with the business process expectations it is necessary to be able to view each request in the context of that process rather than as an individual request that needs to be executed. Thus each touch point in the architecture that needs to manipulate, transform, or perform some task with or on or to the request needs to be able to take into consideration the process; it needs to be context-aware so that its decisions are made within the context of the entire process and not just the individual request. Layer 7 switching, application load balancing, application delivery. Whatever you want to call it, it is the way in which load balancing becomes context-aware and becomes collaborative. It enables the business requirements to be not only taken into consideration but enforced while ensuring that CapEx and OpEx investments in additional systems are not left to sit idle; wasted. It improves capacity essentially by introducing process latency into the equation. By forcing the process to follow a particular path the application delivery controller assists in the technological implementation meeting the goals of the business. In order words, it aligns IT with the business. Sometimes the marketing fluff is more solid than it appears. To Boldly Go Where No Production Application Has Gone Before WILS: Network Load Balancing versus Application Load Balancing Sessions and Cookies and Persistence, oh my! Persistent and Persistence, What's the Difference? If Load Balancers Are Dead Why Do We Keep Talking About Them? A new era in application delivery Infrastructure 2.0: The Diseconomy of Scale Virus The Politics of Load Balancing Business-Layer Load Balancing Not all application requests are created equal244Views0likes1CommentLineRate scripting and reducing large application latency
Can LineRate’s Node.js Scripting help reduce response time latency for large applications? Recently, a talk at Velocity 2014 by Jeff Dean gave me some ideas for a quick script to do just that. The idea is simple; any time a response is not received quickly, just replicate the request to another server. I put together a very simple and admittedly naïve scriptand config snippet to do just that.A more advanced script providing more of Jeff Dean’s ideas is certainly possible. Optimizations such as aborting duplicate requests and server coordination to cancel batched jobs are possible. Check out the summary and video of Jeff Dean’s talk for many more details behind Google’s implementation and other ideas for enhancing this script. Keep your eye on this script on my github for future updates:https://github.com/ldm5180/lr-lowlatency.211Views0likes0CommentsCloud: Impact of DNS on Performance
#webperf #devops Developers and operations must work together to mitigate the impact of hybrid architectures on application performance One of the ramifications of relying on off-premise cloud infrastructure is that you're necessarily stuck with some of the idiosyncrasies that come with it. For example, it's not your network, and thus topologically-related identifiers such as host names and IP address are not within your purview. But you certainly aren't going to ask your customers to visit "host111-east-virginia-zone3-subnet5.cloudproivder.com". At least not if you want them visit, you won't. Luckily, you control your own DNS destiny, so you'll just CNAME that crazy long host name provided by the provider to be something more catchy and inline with your branding, say, "coolappz.com". While certainly more appealing to everyone (easy to remember, fits better on a bumper sticker and on branded swag) it does have a downside: double the latency. You see, CNAME lookups require two distinct DNS queries to resolve - the first retrieves the ultra-ugly-long host name, the second resolves the ultra-ugly-long host name into an IP address that can actually be used by the browser to connect. So that's double the lookup, double the roundtrips, double the latency. Of course, no web page comprises just one host. That would be so 90s and this, this is the 21st century! This is Web 2.0, the age of integration and interconnection and inter-everything. And if the services upon which you rely to build that web app are using CNAMEs, too, well... I hope you like math cause you're going to be added up some roundtrips and latency for a while. The point here is not to scare you off of hybrid architectures due to the potential impact on performance, but rather to remind you to keep the impact in the fore. It is important to remember the impact of topology, proximity, and the technology in general on the overall performance of your web applications. A Google Developers article nails down where DNS latency comes from quite well: There are two components to DNS latency: - Latency between the client (user) and DNS resolving server. In most cases this is largely due to the usual round-trip time (RTT) constraints in networked systems: geographical distance between client and server machines; network congestion; packet loss and long retransmit delays (one second on average); overloaded servers, denial-of-service attacks and so on. - Latency between resolving servers and other nameservers. This source of latency is caused primarily by the following factors: - Cache misses. If a response cannot be served from a resolver's cache, but requires recursively querying other nameservers, the added network latency is considerable, especially if the authoritative servers are geographically remote. - Underprovisioning. If DNS resolvers are overloaded, they must queue DNS resolution requests and responses, and may begin dropping and retransmitting packets. - Malicious traffic. Even if a DNS service is overprovisioned, DoS traffic can place undue load on the servers. Similarly, Kaminsky-style attacks can involve flooding resolvers with queries that are guaranteed to bypass the cache and require outgoing requests for resolution. -- Introduction: causes and mitigations of DNS latency Interestingly, Google is arguing for public DNS services, even though this may in fact contribute to location-induced DNS latency, particularly for custom domains for which the authoritative zone is served by relatively few number of DNS servers, most of which are geographically located far from the majority of users. Intercontinental latency is still very much problematic. Catchpoint, a web performance monitoring service, mentions this in its exhaustive list of the ways in which DNS impacts performance: Exotic Domains: be careful with the exotic domain names, .ly, .tv… these domains have authoritative servers that are often far away from you end user ISPs. The records will have almost always 2 day TTL, however you never know when someone will be impacted because the query has to go to the authoritative servers and they fail. Example “.ly”, 2 authoritative servers are in Libya, 2 in the US, and 1 in the Netherlands. So when we go connecting clouds and data centers, we need to be concerned with where and how domains are being disseminated, sharded, and resolved. We need to more carefully consider how we are referencing content and whether or not the performance boosts we get from some techniques (such as domain sharding) are being offset by the impact of double the latency from the need to resolve those extra hosts. We need to examine that in the context of other contributing factors, such as TTL (time to live). If the time to live is long enough, then perhaps the initial hit from the extra lookup required to resolve a CNAME isn't going to matter over the life of the session. If we're looking at supporting a stateless API in which sessions don't really exist, then the second lookup may indeed be problematic, but only if the calls are generally spread out over a time interval that is greater than the TTL. It's a balancing act, where understanding how application network services contribute to the performance of applications is critical to pushing the right buttons and twisting the right knobs will alleviate performance issues that can damage adoption and growth of the web applications that are key to business. You're Not Off The Hook, Developers So often it's the case that applications are written with a specific behavior in mind and it is left to devops to figure out how to mitigate these kinds of potential performance issues. But it is just as important for developers to understand how the application network services contribute to performance because sometimes, all it takes is for the application to be "tweaked' with respect to an update interval or use of a different host name to generate a significant improvement in performance. It is increasingly difficult for - and sometimes even impossible - for operations to make adjustments in the infrastructure, particularly in hybrid environments where infrastructure services are black-box and off-limits. Thus, it is of growing importance that developers and operations work together to map the interaction of applications with application network services such that each group can make appropriate modifications and configuration changes that serve to improve the overall performance of the application, no matter where it might be deployed. As more and more organizations adopt hybrid, distributed applications that span geographies in addition to environments, this level of cooperation and collaboration will be key to managing web application performance issues.393Views0likes0CommentsBare Metal Blog: Throughput Sometimes Has Meaning
#BareMetalBlog Knowing what to test is half the battle. Knowing how it was tested the other. Knowing what that means is the third. That’s some testing, real clear numbers. In most countries, top speed is no longer the thing that auto manufacturers want to talk about. Top speed is great if you need it, but for the vast bulk of us, we’ll never need it. Since the flow of traffic dictates that too much speed is hazardous on the vast bulk of roads, automobile manufacturers have correctly moved the conversation to other things – cup holders (did you know there is a magic number of them for female purchasers? Did you know people actually debate not the existence of such a number, but what it is?), USB/bluetooth connectivity, backup cameras, etc. Safety and convenience features have supplanted top speed as the things to discuss. The same is true of networking gear. While I was at Network Computing, focus was shifting from “speeds and feeds” as the industry called it, to overall performance in a real enterprise environment. Not only was it getting increasingly difficult and expensive to push ever-larger switches until they could no longer handle the throughput, enterprise IT staff was more interested in what the capabilities of the box were than how fast it could go. Capabilities is a vague term that I used on purpose. The definition is a moving target across both time and market, with a far different set of criteria for, say, an ADC versus a WAP. There are times, however, where you really do want to know about the straight-up throughput, even if you know it is the equivalent of a professional driver on a closed course, and your network will never see the level of performance that is claimed for the device. There are actually several cases where you will want to know about the maximum performance of an ADC, using the tools I pay the most attention to at the moment as an example. WAN optimization is a good one. In WANOpt, the goal is to shrink the amount of data being transferred between two dedicated points to try and maximize the amount of throughput. When “maximize the amount of throughput” is in the description, speeds and feeds matter. WANOpt is a pretty interesting example too, because there’s more than just “how much data did I send over the wire in that fifteen minute window”. It’s more complex than that (isn’t it always?). The best testing I’ve seen for WANOpt starts with “how many bytes were sent by the originating machine”, then measures that the same number of bytes were received by the WANOpt device, then measures how much is going out the Internet port of the WANOpt device – to measure compression levels and bandwidth usage – then measures the number of bytes the receiving machine at the remote location receives to make sure it matches the originating machine. So even though I say “speeds and feeds matter”, there is a caveat. You want to measure latency introduced with compression and dedupe, and possibly with encryption since WANOpt is almost always over the public Internet these days, throughput, and bandwidth usage. All technically “speeds and feeds” numbers, but taken together giving you an overall picture of what good the WANOpt device is doing. There are scenarios where the “good” is astounding. I’ve seen the numbers that range as high as 95x the performance. If you’re sending a ton of data over WANOpt connections, even 4x or 5x is a huge savings in connection upgrades, anything higher than that is astounding. This is an (older) diagram of WAN Optimization I’ve marked up to show where the testing took place, because sometimes a picture is indeed worth a thousand words. And yeah, I used F5 gear for the example image… That really should not surprise you . So basically, you count the bytes the server sends, the bytes the WANOpt device sends (which will be less for 99.99% of loads if compression and de-dupe are used), and the total number of bytes received by the target server. Then you know what percentage improvement you got out of the WANOpt device (by comparing server out bytes to WANOpt out bytes), that the WANOpt devices functioned as expected (server received bytes == server sent bytes), and what the overall throughput improvement was (server received bytes/time to transfer). There are other scenarios where simple speeds and feeds matter, but less of them than their used to be, and the trend is continuing. When a device designed to improve application traffic is introduced, there are certainly few. The ability to handle a gazillion connections per second I’ve mentioned before is a good guardian against DDoS attacks, but what those connections can do is a different question. Lots of devices in many networking market spaces show little or even no latency introduction on their glossy sales hand-outs, but make those devices do the job they’re purchased for and see what the latency numbers look like. It can be ugly, or you could be pleasantly surprised, but you need to know. Because you’re not going to use it in a pristine lab with perfect conditions, you’re going to slap it into a network where all sorts of things are happening and it is expected to carry its load. So again, I’ll wrap with acknowledgement that you all are smart puppies and know where speeds and feeds matter, make sure you have realistic performance numbers for those cases too. Technorati Tags: Testing,Application Delivery Controller,WAN Optimization,throughput,latency,compression,deduplication,Bare Metal Blog,F5 Networks,Don MacVittie The Whole Bare Metal Blog series: Bare Metal Blog: Introduction to FPGAs | F5 DevCentral Bare Metal Blog: Testing for Numbers or Performance? | F5 ... Bare Metal Blog: Test for reality. | F5 DevCentral Bare Metal Blog: FPGAs The Benefits and Risks | F5 DevCentral Bare Metal Blog: FPGAs: Reaping the Benefits | F5 DevCentral Bare Metal Blog: Introduction | F5 DevCentral204Views0likes0CommentsMainframes are dead, Right?
Funny thing about the hype cycle in high tech, things rarely turn out the way cheerleaders proclaim it will. Mainframes did not magically disappear in any of the waves that predicted their demise. The reason is simple – there is a lot of code running on mainframes that works, and has worked well for a long time, rewriting all of that code would be a monumental undertaking that, even today, twenty years after the first predictions of its demise, many organizations – particularly in financials - are not undertaking. Don’t get me wrong: There are a variety of reasons why mainframes in their current incarnation are doomed to a small vertical market at best in the very very long run, but the cost of recreating systems just to remove mainframes is going to continue to hold them in a lot of datacenters in the near future. But they do need to be able to communicate with newer systems if they’re going to hang around, and the last five years or so have seen a whole lot of projects to make them play more friendly with the distributed datacenter. While many of these projects have come off without a hitch, I saw an interesting case in financial services the other day involving a mainframe and a slow communications channel. Thought it was a slick solution, and thought I’d write it up in case any of you all run into similar problems. This company had, over time, become very distributed geographically, but still had some systems running on their mainframe back in corporate HQ. The systems needed to communicate back to the mainframe, but some of them were on horrible networks that would sometimes suffer latency and line quality issues, yet the requirement to run apps on the mainframe persisted. The mainframe has limited I/O capabilities without expensive upgrades that the organization would like to avoid, so they are considering alternate solutions to resolve the problem of one branch tying up resources with retransmits and long latency lags while the others back-filled the queue. It’s more complex than that, but I’m keeping it simple for this blog, in hopes that if it applies to you directly, you’ll be better able to adapt the scenario to your situation. No highly distributed architecture with mainframe interconnects is simple, and they’re rarely exactly the same as another installation that fits the same description, but this will (hopefully) give you ideas. This is the source problem – when site 1 (or site N) had connection problems, it locks up some of the mainframe’s I/O resources, slowing everything down. If multiple sites have problems with communications links, it could slow the entire “network” of sites communicating with the mainframe as things backed up and make even good connections come out slow. In the following slide, of course I’m using F5 as an example – mainly because it is the solution I know to be tested. If you use a different ADC vendor than F5, call your sales or support reps and ask them if you could do this with their product. Of course you won’t get all the excellent other features of the market-leading ADC, but you’ll solve this problem, which is the point of this blog. The financial services organization in question could simply place BIG-IP devices at the connecting points where the systems entered the Internet. This gives the ability to configure the F5 devices to terminate the connections, and buffer responses. The result is that the mainframe only holds open a connection long enough for it to transfer over the LAN, and eliminate the latency and retransmit problems posed by the poor incoming connections. Again, it is never that simple, these are highly complex systems, but it should give you some ideas if you run into similar issues. Here is what the above diagram would look like with the solution pieces in place. Note that the F5 boxes in the branches would not be strictly necessary within the confines of the problem statement – you only care about alleviating problems on the mainframe side – but offer the ability to do some bi-directional optimizations that can improve communications between the sites. It also opens the possibility of an encrypted tunnel in the future if needed, which in the case of financial services, is highly attractive. I thought it was cool that someone was thinking about it as a network problem with mainframe symptoms rather than the other way around, and it is a relatively simple fix to implement. Mainframes aren’t going away, but we’ll see more of this as they’re pushed harder and harder and put behind more and more applications. Inventive solutions to this type of problem will become more and more common. Which is pretty cool. Related Articles and Blogs: FNZ Teams with F5 to Deliver the Financial Services Industry's Most ... FishNet Security Ensures Application Performance and Availability ... Cloud Changes Cost of Attacks Network Optimization Won't Fix Application Performance in the Cloud Forget Performance IN the Cloud, What About Performance TO the ... Data Center Feng Shui: Architecting for Predictable Performance F5 Networks Expands Application Ready Network™ to Include ... F5 Networks Announces Application Ready Network™ for Oracle New F5 Application Ready Network (ARN) for Oracle Siebel258Views0likes0CommentsPerformance in the Cloud: Business Jitter is Bad
#fasterapp #ccevent While web applications aren’t sensitive to jitter, business processes are. One of the benefits of web applications is that they are generally transported via TCP, which is a connection-oriented protocol designed to assure delivery. TCP has a variety of native mechanisms through which delivery issues can be addressed – from window sizes to selective acks to idle time specification to ramp up parameters. All these technical knobs and buttons serve as a way for operators and administrators to tweak the protocol, often at run time, to ensure the exchange of requests and responses upon which web applications rely. This is unlike UDP, which is more of a “fire and forget” protocol in which the server doesn’t really care if you receive the data or not. Now, voice and streaming video and audio over the web has always leveraged UDP and thus it has always been highly sensitive to jitter. Jitter is, without getting into layer one (physical) jargon, an undesirable delay in the otherwise consistent delivery of packets. It causes the delay of and sometimes outright loss of packets that are experienced by users as pauses, skips, or jumps in multi-media content. While the same root causes of delay – network congestion, routing changes, time out intervals – have an impact on TCP, it generally only delays the communication and other than an uncomfortable wait for the user, does not negatively impact the content itself. The content is eventually delivered because TCP guarantees that, UDP does not. However, this does not mean that there are no negative impacts (other than trying the patience of users) from the performance issues that may plague web applications and particularly those that are more and more often out there, in the nebulous “cloud”. Delays are effectively business jitter and have a real impact on the ability of the business to perform its critical functions – and that includes generating revenue. BUSINESS JITTER and the CLOUD David Linthicum summed up the issue with performance of cloud-based applications well and actually used the terminology “jitter” to describe the unpredictable pattern of delay: Are cloud services slow? Or fast? Both, it turns out -- and that reality could cause unexpected problems if you rely on public clouds for part of your IT services and infrastructure. When I log performance on cloud-based processes -- some that are I/O intensive, some that are not -- I get results that vary randomly throughout the day. In fact, they appear to have the pattern of a very jittery process. Clearly, the program or system is struggling to obtain virtual resources that, in turn, struggle to obtain physical resources. Also, I suspect this "jitter" is not at all random, but based on the number of other processes or users sharing the same resources at that time. -- David Linthicum, “Face the facts: Cloud performance isn't always stable” But what the multitude of articles coming out over the past year or so with respect to performance of cloud services has largely ignored is the very real and often measurable impact on business processes. That jitter that occurs at the protocol and application layers trickles up to become jitter in the business process; a process that may be critical to servicing customers (and thus impacts satisfaction and brand) as well as on the bottom line. Unhappy customers forced to wait for “slow computers”, as it is so often called by the technically less adept customer service representatives employed by many organizations, may take to the social media airwaves to express displeasure, or cancel an order, or simply refuse to do business in the future with the organization based on delays experienced because of unpredictable cloud performance. Business jitter can also manifest as decreased business productivity measures, which it turns out can be measured mathematically if you put your mind to it. Understanding the variability of cloud performance is important for two reasons: You need to understand the impact on the business and quantify it before embarking on any cloud initiative so it can be factored in to the overall cost-benefit analysis. It may be that the cost savings from public cloud are much greater than the potential loss of revenue and/or productivity, and thus the benefits of a cloud-based solution outweigh the risks. Understanding the variability and from where it comes will have an impact and help guide you to choosing not only the right provider, but the right solutions that may be able to normalize or mitigate the variability. If the primary source of business jitter is your WAN, for example, then it may be that choosing a provider that supports your ability to deploy WAN optimization solutions would be an appropriate strategy. Similarly , if the variability in performance stems from capacity issues, then choosing a provider that allows greater latitude in load balancing algorithms or the deployment of a virtual (soft) ADC would likely be the best strategy. It seems clear from testing and empirical (as well as anecdotal) evidence that cloud performance is highly variable and, as David puts it, unstable. This should not necessarily be seen as a deterrent to adopting cloud services – unless your business is so highly sensitive to latency that even milliseconds can be financially damaging – but rather it should be a reality that factors into your decision making process with respect to your choice of provider and the architecture of the solution you’ll be deploying (or subscribing to, in the case of SaaS) in the cloud. Knowing is half the battle to leveraging cloud successfully. The other half is strategy and architecture. I’ll be at CloudConnect 2012 and we’ll discuss the subject of cloud and performance a whole lot more at the show! Sessions Is Features vs. Performance the New Cloud Battle Line? On the performance of clouds Face the facts: Cloud performance isn't always stable Data Center Feng Shui: Architecting for Predictable Performance A Formula for Quantifying Productivity of Web Applications Enterprise Apps are Not Written for Speed The Three Axioms of Application Delivery Virtualization and Cloud Computing: A Technological El Niño227Views0likes0CommentsRTT (Round Trip Time): Aka – Why bandwidth doesn’t matter
A great post over on ajaxian got me to thinking today. Why is it whenever you hear people talking about speed on the internet, they use a single metric? Whether they’re discussing the connection in the datacenter, their residential DSL, or the wireless connection via their mobile device, everyone references the bandwidth of their connection when talking about speed. “Oh I just got a 20Mb/s connection at home, it’s blazing fast!". That’s all well and good, and 20 Mb/s is indeed a lot of throughput for a residential connection. Unfortunately for Joe Average, about 98% of the population wouldn’t know what the heck to DO with 20 Mb/s of download speed, and even worse than that…they would likely see absolutely zero increase in performance while doing the one thing most people use their connection for the most, browsing the web. No one seems to ever bother mentioning the true culprit for slow (or fast) web browsing performance: latency, measured in RTT (Round Trip Time). This is the measure of time it takes for your system to make a request to the server and receive a response back. I.E. one complete request loop. This is where the battle for speed when browsing the web is won and lost. A round trip is measured in milliseconds (ms). This represents how much time it will take regardless of file size (this is important later in the discussion) to make the trip from you to the server and back. This means each connection you have to open with the server for an additional request must take at least this long. You add in the time it takes to download each file after accounting for RTT. “Impossible!” you say, “Clearly going from a 10Mb/s connection to the new, fast, fancy (expensive?) 20 Mb/s connection my provider is proud to now be offering will double my speed on the web! 20 is twice as much as 10 you dullard!” you assert? Oh how wrong you are, dear uneducated internet user. Allow me to illuminate the situation via a brief discussion of what actually occurs when you are browsing the web. We’ll skip some of the fine grained details and all the DNS bits, but here’s the general idea: Whenever you make a request for a web page on the net, you send out a request to a server. That server, assuming you’re an allowed user, then sends a response. Assuming you don’t get redirected and are actually served a page, the server will send you a generally simple HTML page. This is a single, small file that contains the HTML code that tells your browser how to render the site. Your computer receives the file, and your browser goes to work doing exactly that, rendering the HTML. Up to this point people tend to understand the process, at least in broad strokes. What happens next is what catches people I think. Now that your browser is rendering the HTML, it is not done loading the page or making requests to the server, not by a long shot. You still haven’t downloaded any of the images or scripts. The references to all of that are contained in the HTML. So as your browser renders the HTLM for the given site, it will begin sending requests out to the server asking for those bits of content. It makes a new request for each and every image on the page, as well as any other file it needs (script files, CSS files, included HTML files, etc.) Here are the two main points that need to be understood when discussing Bandwidth vs. RTT in regards to page load times: 1.) The average web page has over 50 objects that will need to be downloaded (reference: http://www.websiteoptimization.com/speed/tweak/average-web-page/) to complete page rendering of a single page. 2.) Browsers cannot (generally speaking) request all 50 objects at once. They will request between 2-6 (again, generally speaking) objects at a time, depending on browser configuration. This means that to receive the objects necessary for an average web page you will have to wait for around 25 Round Trips to occur, maybe even more. Assuming a reasonably low 150ms average RTT, that’s a full 3.75 seconds of page loading time not counting the time to download a single file. That’s just the time it takes for the network communication to happen to and from the server. Here’s where the bandwidth vs. RTT discussion takes a turn decidedly in the favor of RTT. See, the file size of most of the files necessary when browsing the web is so minute that bandwidth really isn’t an issue. You’re talking about downloading 30-60 tiny files (60k ish on average). Even on a 2Mb/s connection which would be considered extremely slow by today’s standards these files would each be downloaded in a tiny fraction of a second each. Since you can’t download more than a few at a time, you couldn’t even make use of a full 2 Mb/s connection, in most situations. So how do you expect going from 10Mb/s to 20Mb/s to actually increase the speed of browsing the web when you couldn’t even make use of a 2Mb/s connection? The answer is: You shouldn’t. Sure, if they were downloading huge files then bandwidth would be king, but for many small files in series, it does almost nothing. You still have to open 50 new connection, each of which has a built in 150ms of latency that can’t be avoided before even beginning to download the file. However, if you could lower your latency, the RTT from you to the server, from 150ms down to 50ms, suddenly you’re shaving a full 2.5 seconds off of the inherent delay you’re dealing with for each page load. Talk about snappier page loads…that’s a huge improvement. Now of course I realize that there are lots of things in place to make latency and RTT less of an issue. Advanced caching, pre-rendering of content where applicable so browsers don’t have to wait for ALL the content to finish downloading before the page starts rendering, etc. Those are all great and they help alleviate the pain of higher latency connections, but the reality is that in today’s internet using world bandwidth is very rarely a concern when simply browsing the web. Adding more bandwidth will not, in almost all cases, increase the speed with which you can load websites. Bandwidth is king of course, for multi-tasking on the web. If you’re the type to stream a video while downloading audio while uploading pictures while browsing the web while playing internet based games while running a fully functioning (and legal) torrent server out of your house…well then…you might want to stock up on bandwidth. But don’t let yourself be fooled into thinking that paying for more bandwidth in and of itself will speed up internet browsing in general when only performing that one task. #Colin865Views0likes1CommentWould you risk $31,000 for milliseconds of application response time?
Keep in mind that the time it takes a human being to blink is an average of 300 – 400 milliseconds. I just got back from Houston where I helped present on F5’s integration with web application security vendor White Hat, a.k.a. virtual patching. As almost always happens whenever anyone mentions the term web application firewall the question of performance degradation was raised. To be precise: How much will a web application firewall degrade performance? Not will it, but how much will it, degrade performance. My question back to those of you with the same question is, “How much are you willing to accept to mitigate the risk?” Or perhaps more precisely, how much are your users and customers – and therefore your business - willing to accept to mitigate the risk, because in most cases today that’s really who is the target and thus bearing the risk of today’s web application attacks. As Jeremiah Grossman often points out, mass SQL injection and XSS attacks are not designed to expose your data, they’re designed today to exploit your customers and users, by infecting them with malware designed to steal their personal data. So the people who are really bearing the burden of risk when browsing your site are your customers and users. It’s their risk we’re playing with more than our own. So they question has to be asked with them in mind: how much latency are your users and customers willing to accept in order to mitigate the risk of being infected and the potential for becoming the next statistic in one of the many fraud-oriented organizations tracking identity theft? SIX OF ONE, HALF-DOZEN OF THE OTHER No matter where you implement a security strategy that involves the deep inspection of application data you are going to incur latency. If you implement it in code, you’re increasing the amount of time it takes to execute on the server – which increases response time. If you implement it in a web application firewall, you’re increasing the amount of time it takes to get to the server- which will undoubtedly increase response time. The interesting thing is that time is generally measured in milliseconds, and is barely noticeable to the user. It literally happens in the blink of an eye and is only obvious to someone tasked with reporting on application performance, who is used to dealing with network response times that are almost always sub-second. The difference between 2ms and 5ms is not noticeable to the human brain. The impact of this level of latency is almost unnoticeable to the end-user and does not radically affect his or her experience one way or another. Even 10ms – or 100 ms - is still sub-second latency and is not noticeable unless it appears on a detailed application performance report. But let’s say that a web application firewall did increase latency to a noticeable degree. Let’s say it added 2 seconds to the overall response time. Would the user notice? Perhaps. The question then becomes, are they willing to accept that in exchange for better protection against malicious code? Are they willing to accept that in exchange for not becoming the next victim of identity theft due to malicious code that was inserted into your database via an SQL injection attack and delivered to them the next time they visited your site? Are they willing to accept 5 seconds? 10 seconds? Probably not (I wouldn’t either), but what did they say? If you can’t answer it’s probably because you haven’t asked. That’s okay, because no one has to my knowledge. It’s not a subject we freely discuss with customers because we assume they are, for the most part, ignorant of the very risks associated with just visiting our sites. THE MYTH OF SUB-SECOND LATENCY Too, we often forget that sub-second latency does not really matter in a world served up by the Internet. We’re hard-wired to the application via the LAN and expect it to instantly appear on our screens the moment we try to access it or hit “submit”. We forget that in the variable, crazy world of the Internet the user is often subjected to a myriad events of which they are blissfully unaware that affect the performance of our sites and applications. They do not expect sub-second response times because experience tells them it’s going to vary from day to day and hour to hour and much of the reasons for it are out of their – and our – control. Do we want the absolute best performance for our customers and users? Yes. But not necessarily at the risk of leaving them – and our data – exposed. If we were really worried about performance we’d get rid of all the firewalls and content scanners and A/V gateways and IPS and IDS and just deliver applications raw across the Internet, the way nature intended them to be delivered: naked and bereft of all protection. But they’d be damned fast, wouldn’t they? We don’t do that, of course, because we aren’t fruitcakes. We’ve weighed the benefits of the protection afforded by such systems against the inherent latency incurred by the solutions and decided that the benefits outweighed the risk. We need to do the same thing with web application firewalls and really any security solution that needs to sit between the application and the user; we need to weigh the risk against the benefit. We first need to really understand what the risks are for us and our customers, and then make a decision how to address that risk – either by ignoring it or mitigating it. But we need to stop fooling ourselves into discarding possible solutions for what is almost always a non-issue. We may think our customers or users will raise hell if the response time of their favorite site or application increases by 5 or 50 or even 500 ms, but will they really? Will they really even notice it? And if we asked them, would they accept it in exchange for better protection against identity theft? Against viruses and worms? Against key-loggers and the cost of a trip to the hospital when their mother has a heart-attack because she happened to look over as three hundred pop-ups filled with porn images filled their screen because they were infected with malware by your site? We need to start considering not only the risk to our own organizations and the customer data we must protect, but to our customers’ and users’ environments, and then evaluate solutions that are going to effectively address that risk in a way that satisfies everyone. To do that, we need to involve the customer and the business more in that decision making process and stop focusing only on the technical aspects of how much latency might be involved or whether we like the technology or not. Go ahead. Ask your customers and users if they’re willing to risk $31,000 – the estimated cost of identity theft today to an individual – to save 500 milliseconds of response time. And when they ask how long that is, tell them the truth, “about the time it takes to blink an eye”. As potentially one of your customers or visitors, I’ll start out your data set by saying, “No. No, I’m not.”511Views0likes1CommentDear Slashdot: You get what you pay for
Open Source SSL Accelerator solution not as cost effective or well-performing as you think o3 Magazine has a write up on building an SSL accelerator out of Open Source components. It’s a compelling piece, to be sure, that was picked up by Slashdot and discussed extensively. If o3 had stuck to its original goal – building an SSL accelerator on the cheap – it might have had better luck making its arguments. But it wanted to compare an Open Source solution to a commercial solution. That makes sense, the author was trying to show value in Open Source and that you don’t need to shell out big bucks to achieve similar functionality. The problem is that there are very few – if any – commercial SSL accelerators on the market today. SSL acceleration has long been subsumed by load balancers/application delivery controllers and therefore a direct comparison between o3’s Open Source solution and any commercially available solution would have been irrelevant; comparing apples to chicken is a pretty useless thing to do. To the author’s credit, he recognized this and therefore offered a complete Open Source solution that would more fairly be compared to existing commercial load balancers/application delivery controllers, specifically he chose BIG-IP 6900. The hardware platform was chosen, I assume, based on the SSL TPS rates to ensure a more fair comparison. Here’s the author’s description of the “full” Open Source solution: The Open Source SSL Accelerator requires a dedicated server running Linux. Which Linux distribution does not matter, Ubuntu Server works just as well as CentOS or Fedora Core. A multi-core or multi-processor system is highly recommended, with an emphasis on processing power and to a lesser degree RAM. This would be a good opportunity to leverage new hardware options such as Solid State Drives for added performance. The only software requirement is Nginx (Engine-X) which is an Open Source web server project. Nginx is designed to handle a large number of transactions per second, and has very well designed I/O subsystem code, which is what gives it a serious advantage over other options such as Lighttpd and Apache. The solution can be extended by combining a balancer such as HAproxy and a cache solution such as Varnish. These could be placed on the Accelerator in the path between the Nginx and the back-end web servers. o3 specs out this solution as running around $5000, which is less than 10% of the listed cost of a BIG-IP 6900. On the surface, this seems to be quite the deal. Why would you ever purchase a BIG-IP, or any other commercial load balancer/application delivery controller based on the features/price comparison offered? Turns out there are quite a few reasons; reasons that were completely ignored by the author. CHAINING PROXIES vs INTEGRATED SOLUTIONS While all of the moving parts cited by the author (Nginx, Apache, HAproxy, Varnish) are all individually fine solutions, he suggests combining them to assemble a more complete application delivery solution that provides caching, Layer 7 inspection and transformation, and other advanced functionality. Indeed, combining these solutions does provide a deployment that is closer to the features offered by a commercial application delivery controller such as BIG-IP. Unfortunately, none of these Open Source components are integrated. This necessitates an architecture based on chaining of proxies, regardless of their deployment on the same hardware (as suggested by the author) or on separate devices; in path, of course, but physically separated. Chaining proxies incurs latency at every point in the process. If you chain proxies, you are going to incur latency in the form of: TCP connection setup and teardown processing Inspection of application data (layer 7 inspection is rarely computationally inexpensive) Execution of functionality (caching, security, acceleration, etc…) Transfer of data between proxies (when deployed on the same device this is minimized) Multiple log files This network sprawl degrades response time by adding latency at every hop and actually defeats the purposes for which they were deployed. The gains in performance achieved by offloading SSL to Nginx is almost immediately lost when multiple proxies are chained in order to provide the functionality required to match a commercial application delivery controller. A chained proxy solution adds complexity, obscures visibility (impacts ability to troubleshoot) and makes audit paths more difficult to follow. Aggregated logging never mentioned, but this is a serious consideration, especially where regulatory compliance enters the picture. The issue of multiple log files is one that has long plagued IT departments everywhere, as they often require manual aggregation and correlation – which incurs time and costs. A third party solution is often required to support troubleshooting and transactional monitoring, which incurs additional costs in the form of acquisition, maintenance, and management not considered by the author. Soft costs, too, are ignored by the author. The configuration of the multiple Open Source intermediaries required to match a commercial solution often require manual editing of configuration files; and must be configured individually. Commercial solutions – and specifically BIG-IP – reduce the time and effort required to configure such solutions by offering myriad options for management – standards-based API, scripting, command line, GUI, application templates and wizards, central management system, and integration as part of other standard data center management systems. COMPRESSION SHOULD NEVER BE A BINARY CONFIGURATION The author correctly identifies that offloading compression duties from back-end servers to an intermediary can result in improved performance of the application and greater efficiencies of the servers. NGinx supports industry-standard gzip compression. The problem with this – and there is a problem – is that it is not always beneficial to apply compression. Years of extensive experience and testing prove that the use of compression can actually degrade performance. Factors such as size of application payload, type of content, and the speed of the network on which the application data will be transferred should all be considered when making the decision to compress or not compress. This intelligence, this context-awareness, is not offered by this Open Source solution. o3’s solution is on or off, with nothing in between. In situations where images are being delivered over a LAN, for example, this will not provide any significant performance benefit and in fact will likely degrade performance. Certainly NGinx could be configured to ignore images, but this does not solve the problem of the inherent uselessness of trying to compress content traversing a LAN and/or under a specific length. SECURITY Another overlooked item is security. Not just application security, but full TCP/IP stack security. The Open Source solution could easily add mod_security to the list to achieve parity with the application security features available in commercial solutions. That does not address the underlying stack security. The author suggests running on any standard Linux platform. To be sure, anyone building such a solution for deployment in a production environment will harden the base OS; potentially using SELinux to further lock down the system. No need to argue about this; it’s assumed good administrators will harden such solutions. But what will not be done – and can’t be done – is securing the system against network and application attacks. Simple DoS, ARP poisoning, SYN floods, cookie tampering. The potential attacks against a system designed to sit in front of web and application servers are far more lengthy than this, but even these commonly seen attacks will not be addressed by o3’s Open Source solution. By comparison, these types of attacks are part and parcel of BIG-IP; no additional modules or functionality necessary. Furthermore, the performance numbers provided by o3 for their solution seem to indicate that testing was accomplished using 512-bit key certificates. A single Opteron core can only process around 1500 1024-bit RSA operations per second. This means an 8-core CPU could only perform approximately 12,000 1024-bit RSA ops per second – assuming that’s all they were doing. 512-bit keys run around five times faster than 1024-bit. The author states: “The system had no problems handling over 26,590 TPS” which seems to indicate it was not using the industry standard 1024-bit key based on the core capabilities of the processors to process RSA operations. In fact, 512-bit key certificates are no longer supported by most CAs due to their weak key strength. Needless to say, if the testing used to determine the SSL TPS for BIG-IP were to use 512-bit keys, you’d see a marked increase in the number of SSL TPS in the data sheet. YOU GET WHAT YOU PAY FOR Look, o3 has a put together a fairly cool and cheap solution that accomplishes many of the same tasks as a commercial application delivery controller. That’s not the point. The point is trying to compare a robust, integrated application delivery solution with a cobbled together set of components designed to mimic similar functionality is silly. Not only that, the logic that claims it is more cost efficient is flawed. Is the o3 solution cheaper? Sure- as long as we look only at acquisition. If we look at cost to application performance, to maintain the solution, to troubleshoot, and to manage it then no, no it isn’t. You’re trading in immediate CAPEX cost savings for long-term OPEX cost outlays. And as is always the case, in every market, you get what you pay for. A $5000 car isn’t going to last as long or perform as well as the $50,000 car, and it isn’t going to come with warranties and support, either. It will do what you want, at least for a while, but you’re on your own when you take the cheap route. That said, you are welcome to do so. It is your data center, after all. Just be aware of what you’re sacrificing and the potential issues with choosing the road less expensive. Application Acceleration: To compress or not compress Open Source SSL Accelerator IT @ AnandTech: Intel Woodcrest, AMD’s Operatorn and Sun’s UltraSparc T1: Server CPU Shoot-out275Views0likes5Comments