The Open Performance Testing Initiative

Performance. Everybody wants to know how things perform, whether it be cars, laptops, XML gateways, or application delivery controllers. It's one of the few quantifiable metrics used to compare products when IT goes shopping, especially in the world of networking.

Back at Network Computing I did a lot of testing, and a large portion of that testing revolved around performance. How fast, how much data, how many transactions, how many users. If it was possible, I tested the performance of any product that came through our lab in Green Bay. You've not had fun until you've actually melted an SSL accelerator, nor been in true pain attempting to load test Enterprise Service Buses (ESBs). In the six years I spent at Network Computing I actually ran three separate Spirent Communications load testing products into the ground. That's how hard we tested, and how seriously we took this task.

One of our core beliefs was that every product should be compared equally using the same methodology, in the same environment, under the same conditions and using the same definitions. If an industry standard performance test existed, we tried to use it. You've probably heard of SPEC, which provides industry standard benchmarks for a number of products such as mail servers, CPUs, web servers, and even SIP, and perhaps if you're in the software world you know about the TPC-C benchmarks, which measures the performance of databases.

But have you heard of the industry standard performance test for application delivery controllers? No?

That's because there isn't one.

Knowing how products perform is important, especially for those IT folks tasked with maintaining the environments in which products are deployed. Every organization has specific performance requirements that range from end-user response times to total capacity to latency introduced by specific feature sets.

Yet every vendor and organization defines performance metrics and test methodology just a bit differently, making cross-vendor and organizational comparisons nearly impossible without designing and running tests to determine what the truth might be. Even though two vendors say "Transactions Per Second" they don't necessarily mean the same thing, and it's likely that the underlying test configuration and data used to determine that metric were completely different, making a comparison of the values meaningless.

F5 wants to change that. Whether you're interested in uncovering the subtle differences in terminology that can result in skewing of performance results, developing your own killer performance methodologies, understanding how our latest performance test results were obtained, or downloading all the configuration files from the test then the new performance testing section of DevCentral is the place for you.

Got questions? Suggestions? Comments? We've also set up a forum where you can discuss the results or the wider topic of performance testing with our very own performance guru Mike Lowell. And of course I'll be out there too, so don't be shy.

This is more than just publishing the results of our latest performance tests; it's about being open and transparent in our testing so you know exactly what we tested, how, and what the results really mean. It's about giving you tools to help you build your own tests, or recreate ours. It's about leveling the playing field and being honest about how we measure the performance of BIG-IP.

Imbibing: Coffee

Technorati tags: F5, BIG-IP, performance, metrics, testing, MacVittie

Published Sep 10, 2007

Version 1.0

1 Comment

Lori_MacVittie
Employee
Oct 02, 2007
Hi Alan,

Thanks for the feedback and yes, there are always room for improvements. The first step is to put something out that can be used as a foundation and to encourage the idea that there should be a formalized, industry standard benchmark for the industry.

Independent testing is, of course, the goal. Testing that is open, fair, and that provides relevant results based on common criteria and terminology such that customers can make an apples to apples comparison.

Mike Lowell has addressed the price-per-transaction data in the forums (http://devcentral.f5.com/Default.aspx?tabid=53&forumid=40&tpage=1&view=topic&postid=1678617088) . Essentially this data was outside the scope of this report.

As is often the case with applications, testing with a full suite would introduce a number of additional variables that are difficult to address. We do a lot of internal testing with products like SAP, Oracle, and Microsoft for just the reasons you state, but in a pure performance test of BIG-IP we were looking to establish the high-end of its processing capabilities.

While we were of course pleased with the results of our own testing, part of the goal is to give customers and prospective customers the ability to definitively nail down specific terminology and test parameters such that they could more accurately define their own testing as well as be better informed so that they can interpret the results of all performance results they are handed. Even if we did extensive testing using applications and full configurations it would likely not be applicable for a given customer.

Each environment is unique in terms of network capabilities, hardware platforms upon which applications are deployed, and the specific configuration and needs for which they are evaluating an application delivery controller. We're hoping to enable those customers to better evaluate products in their own environment while offering up the "best case" performance of our products discovered through our own testing.

Unfortunately it's up to the industry to generate a vendor-neutral testing group to pick up and run with such tests. Obviously if we, or any other vendor, sponsored or assisted it would immediately be seen as biasing such an organization - whether the claim of bias was warranted or not.

Regards,

Lori