Forum Discussion

Logan_Ramirez_5's avatar
Logan_Ramirez_5
Icon for Nimbostratus rankNimbostratus
Apr 01, 2010

iControl web app to generate data for Cacti graphing

Ok, so I found this link the other day talking about how to use 'wget' instead of SNMP for Cacti graphing.

 

 

http://penguinman-techtalk.blogspot.com/2009/03/cacti-graphing-remote-service-without.html

 

 

and I just happened to be working on a couple of other apps with iControl...anyway, I ended up writing an app that takes in variables through Request.QueryString in the URL and it posts the meaningful Cacti data - in this case, 'current connections.'

 

 

In other words, i send in this:

 

http://server/myIControlApp.aspx?virtualdata_sa1=virtual_server_name

 

 

and I post back:

 

virtual_server_address:current connections

 

 

or, as an example:

 

web_https:44

 

 

which is the current conns on that virtual server.

 

 

OR, the specific reason I wrote it, I pass in a poolname and get the active connections on each member:

 

 

http://certificate/ltmquery/default.aspx?pooldata_sa1=pool_name

 

 

and get back stuff like

 

member1:12 member2:25 member3:12

 

 

then, anyone who understands Cacti a little and reads the article above can graph it.

 

 

it's written in VB but I imagine some folks out there could convert it to other languages...

 

 

so if you're interested, I suppose let me know.

 

or, moderators, maybe you can guide me to what I should do next - code share I suppose!

 

 

anyway, it's great, though, because now I can graph detail that I couldn't get with SNMP and cross-reference graphing points (such as the active connections for all pool members in virtual servers in both of our data centers - a more 'complete' picture of our enterprise web use).

10 Replies

  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus
    Hi.

     

     

    I don't understand why you say you can now graph data you couldn't get via SNMP... All that data is available via SNMP, and should be much lighter to gather... (Although it's a matter of tase whether iControl or SNMP is 'easier' to use... I like both, and they both have advantages & disadvantages).

     

     

    Current connections per VS is in the cacti template that Aaron started too... What we're missing really is poolmember stats like you've found, but that's mainly (At the moment) because cacti isn't really designed for complex/arbitrary structures.

     

     

    H
  • My SNMP point wasn't about these specific examples (like virtual servers and pools) but about 'the almost endless possibilities of using iControl and this output method.' We use several of the SNMP templates in existence for virtual servers, total connections, etc, as well, but this method can output anything obtainable from iControl.

     

     

    For clarity, are you saying that 'SNMP can get just as much data as iControl can?' If so, I was unaware of that and my statement is clearly, wrong (ignorant, doh!) and I would re-word it as 'this method gives you access to start graphing statistics, such as the connections to each pool member, in Cacti - since Cacti isn't designed for complex/arbitrary structures.' And thank you for the clarification.

     

     

    But is that true - that SNMP can get as much data as iControl?

     

     

    The other benefit I like, of this method (that I probably just do not know how to accomplish with SNMP!) is one data input method to gather various pieces of data from multiple sites. For example, all pool stats from X virtual servers in data center A, all pool stats from virtual server Y in data center B and all virtual stats from data center C - which is meaningful for us since those particular applications might all be related, giving us one snapshot of all the data.

     

     

    Thoughts?

     

  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus
    Hi.

     

     

    SNMP used to get some data that iControl couldn't get. Some of the multi-cpu stuff for example you certainly couldn't in the past, that's changed but you may need the v10 iControl docs not the v9 ones).

     

     

    (I'm talking about statistics here BTW and , not necessarily things like iRules although I've never tried to get iRules via SNMP a quick look at the MIB and there's nothing in the LOCAL mib for them - Sorry, thought I'd better qualify that now).

     

     

    I'm not sure that I've explained myself very well about the complex data structures in Cacti. It will get data structures no problem (In SNMP they're really just tables). It's just not automatic. But then neither is iControl. The difference is that you're writing the software for iControl so you have complete control, whereas cacti is a framework that will pull stats for you. Of course you could write an app to pull the stats via SNMP instead of iControl too (And in theory it SHOULD be a lot more resource friendly. iControl isn't exactly light weight).

     

     

    If you want to get poolmembership, etc... That is in SNMP too... But it's found on a table basis, rather than query and get a response. For example, to get poolmembers for a pool you simply query a table... The OID is built from poolName, addrType, poolMemberAddr and poolMemberPort (It's using the 4 entries as an index in the table).

     

     

    That's the price of using SNMP of course... The S stands for Simple... And it is, but looks complex because you have to change the way you think to use it and understand it.

     

     

    The data avilable for poolmembers for example is

     

     

        
        LtmPoolMemberEntry ::=     
        SEQUENCE {    
                ltmPoolMemberPoolName                                  LongDisplayString,    
                ltmPoolMemberAddrType                                  InetAddressType,    
                ltmPoolMemberAddr                                      InetAddress,    
                ltmPoolMemberPort                                      INTEGER,    
                ltmPoolMemberConnLimit                                 Integer32,    
                ltmPoolMemberRatio                                     Integer32,    
                ltmPoolMemberWeight                                    INTEGER,    
                ltmPoolMemberPriority                                  INTEGER,    
                ltmPoolMemberDynamicRatio                              INTEGER,    
                ltmPoolMemberMonitorState                              INTEGER,    
                ltmPoolMemberMonitorStatus                             INTEGER,    
                ltmPoolMemberNewSessionEnable                          INTEGER,    
                ltmPoolMemberSessionStatus                             INTEGER,    
                ltmPoolMemberMonitorRule                               LongDisplayString,    
                ltmPoolMemberAvailabilityState                         INTEGER,    
                ltmPoolMemberEnabledState                              INTEGER,    
                ltmPoolMemberDisabledParentType                        Integer32,    
                ltmPoolMemberStatusReason                              LongDisplayString    
        }    
        

     

     

    Plus the stats in a different table...

     

     

    Having said all that... when I built my stats system I used mainly iControl. Because it 'feels' better for getting the configs. However some stats (e.g. multi-cpu) weren't available in iControl. So they were gathered via SNMP... Maybe I should brush some of that code off and publish it... Now I have vLTM that might be a bit easier

     

     

    It should also be possible to write an app that pulls information from an F5 and populates the cacti DB for gathering stats automatically too... It's something I've thought about in the past but never had the time

     

     

    Also, if we're considering scalability, SNMP would be the best method of gathering data in Cacti... Being simple, it's also lightweight. Which is an important consideration when you're thinking about thousands of statistics. Not so much when yo've only got 1 F5 and 2 VS's... But IMO important nevertheless. When things grow it can mean the difference between working or not, especially since cacti degrades rather badly when the load gets too high.

     

     

    H

     

     

     

  • This is a good discussion. Very generally speaking, iControl isn't an ideal tool for heavy monitoring tasks and SNMP should win the day. However, there are always exceptions and each environment is different so your approach may be just the ticket and may not introduce any problems at all. Also, I've seen poorly configured SNMP setups kill systems as well, so care must be taken no matter what.

     

     

    Fwiw, here are a couple of pros/cons that stand out for each. I'm sure there's more...

     

     

    SNMP:

     

    -- Pro: Optimized monitoring subsystem. Be sure and enumerate OIDs in order if possible, as we've optimized the SNMP system for this type of access.

     

    -- Pro: Good for lots of data, 'pre canned'. Just point cacti or whatever at it and off you go.

     

    -- Con: can be a bear for complex tasks or ad-hoc reports.

     

    -- Con: it can be expensive (like, very very expensive) if you do your SNMP polling incorrectly or too aggressively.

     

     

    iControl:

     

    -- Pro: great for custom reporting/monitoring that may go beyond what's possible with iControl.

     

    -- Pro: it can be 'light enough' for some monitoring tasks.

     

    -- Con: it can be expensive for certain monitoring tasks.

     

     

    So for me, the mantra is: design your specific solution to your specific needs. If your needs change, you may need to re-evaluate your solution.

     

     

    -Matt
  • Interesting discussion. So Matt, just how much "heavier" is iControl, and in which examples? If you use get_all methods instead of trolling through one by one, it seems pretty lightweight to me. I've been collecting about 10,000 data points every 5 minutes using iControl and I haven't run into anything that suggests either the LTM cpus are being adversely impacted or that it's adversely impacting the collection system resources. In fact, given this massive amount of data I'm collecting, the collection run typically completes in less than 3 seconds, which actually is pretty amazing.

     

     

    I'm very curious in what cases iControl is not to be used for monitoring, so that I am aware of such limitations and can implement other methods (SNMP) where appropriate.

     

     

    Thanks,

     

    -Derek
  • indeed! has someone 'measured' this? that is, in fact, my feeling on this, generally speaking - that SNMP is 'the most light weight' solution available, but iControl (if written well) can still be light weight.

     

     

    quick example of poor iControl design - in my first app I was polling for all get_all_statistics() and then running through the data to find the 2 or 3 I was after, but then re-factored running 2 or 3 separate queries using get_statistics(virtual_name) method and that was faster (less overhead).

     

     

    regardless, I had the same observation here: even running get_all_stats (which was poor design for my app) didn't cause a change in the CPU (which we graph with SNMP).

     

     

    i would love to hear other examples and yours, with 10,000 data points, is very encouraging!

     

  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus
    Hmm... I don't have numbers, but lets compare the two protocols... On the one hand we have SNMP... The query is a simple OID 'string' of numbers... e.g. .1.3.6.... And is a few Bytes long... The query will be carried in a single UDP packet... The response is going to be in 1 or more UDP packets 'streamed' from agent (F5) to manager... And is of the form OID & data... There's no wasted data or packets. At the agent it's should be as simple as grabbing data from a table, and sending the OID and the data... Quick & easy. You can implement this in a few kB or less (Yes you can make it more complex, but you don't need to). Tables (In v2+) are pulled with one query (v1 requires query/resp/query/resp).

     

     

    But with iControl... It's a whole different story... I'm not sure who came up with the original idea for XML, but they should be... reeducated shall we say... Lightweight is isn't... Readable, well that's a matter of opinion... I'm sure there's some people that enjoy reading XML, but I'm not one of them. Lets see... First you open a TCP connection... Just the handshake will consume as much bandwidth and processing as a typical snmp_get/response. Then you have the SSL negotiation... Lots more processing... Then an HTTP GET. And a whole lot of XML encoded data... That you just spent precious cycles encoding... Then the server has to pass that to some process to decode the XML (Ohh boy... Lots more cycles), THEN and only then can the agent grab the data you're looking for. But it's not finished yet... A few million more cycles to encode that data into XML (And why doesn't XML have a 64-bit ordinal type anyway?) and stream all that data back to the manager...

     

     

    Hmm... It would be nice to stick a profiler on the code and see what sort of difference there is between the same queries for stats in iControl and SNMP...

     

     

    The saving grace for iControl, is that in older versions it looked like the SNMP agnet was having to talk to something else to pull the data. And that looked pretty heavy... But I know for sure that trolling the config every 5 minutes and pulling stats on a few dozen VS's, Pools and a few hundred poolmembers and nodes (Once a minute) used to consume up to 25-30% of a 6400 when doing it via iControl... (Oh, and blow out the host memory too. It's got better over the last few years, but like all Java programs, it's not exactly light weight...

     

     

    H
  • yea, this is good, enjoyable, low-level stuff.

     

     

    the reason I started this conversation was because that 'use wget to populate cacti' article hit right at home with how I (anyone!) could use iControl to generate meaningful outputs that could be graphed in Cacti - but I could not get with SNMP (either because I didn't know how or because of Cacti limitations).

     

     

    That is, using iControl in this manner (I think) is unique and maybe even 'the only way available' right now to generate this kind of data for graphing 'what is going on' inside your network. In other words, while SNMP may be leaner, it's impossible to use (at least with Cacti?) to generate the types of graphs I am generating now.

     

     

    There were two graphs I was specifically interested in:

     

    1. graph usage of pool members in a virtual server.

     

    2. graph total connections to our website vs. total database connections from the web server to handle those website connections.

     

     

    Regarding 1, the business question for us is how often is our primary web server up? How many connections does it handle? When are we failing over to the backup server? How many times does that happen? Why does this happen? Who takes it down? etc, etc...we all know how the questions cascade once they start going...

     

     

    Well, we could answer that lots of ways, but nothing beats a historical graph showing the data, showing the peaks and valleys of the pool members' usage. I couldn't find how to graph all of the pool member usage via SNMP or (as Hamish pointed out) overcome the Cacti limitations to create that graph.

     

     

    2, though, is the real gem. Like others, we're focused on application tuning, lean development and we recognize that we have a lot of 'clean up' to do in our code throughout the company (along with lots of legacy code to update!) and we ran into an issue several months ago where our web apps were not closing data base connections when they were done (sadly, we discovered this the hard way).

     

     

    Well, since I know what virtual servers each connection comes into (1, the external conns coming to our websites and 2, the internal conns from the servers to our database), i built a graph that area/stacks all of the web connections (representing that total number of simultaneous external connections) and is overlaid with a line of the database connections (representing the total number of database connections).

     

     

    So in one snap shot we can see how many X database connections are being used to handle Y external web connections and, therefore, we can see the trending and, in fact, this morning I noticed at one point the total number of database connections (X) was HIGHER than the total number of Y external web connections. This was not the norm, so I zoomed in an sent the screen shot to the web developer showcasing the exact time this happened (on the graph) so now they can go back and pull the web logs for that time period and see what apps were being accessed and determine if that was a result of poor code or normal behavior. Boom, right to the heart of a problem. "Sr Developer, yesterday morning, between 11:15 and 11:30 am, we had 110 server database connections open while servicing 96 external web connections. This behavior is abnormal to our normal operations, can you see why?" is way more effective than "Sr Developer, we think yesterday we had more connections to the backend database than the front end server sometime yesterday morning, can you see why? or worse, no email at all because we have no idea!!!

     

     

    While there are probably several tools you could 'run' to assess and evaluate this these are already in our toolbag (and free) and very effective.

     

     

    Again, maybe there is some way to make this second graph work with SNMP, but consider the next iteration I'll be working on:

     

    3. Adding another database connection line from our second data center from an entirely different LTM cluster.

     

    3. Adding another stack for external web connections from our second data center from an entirely different LTM cluster.

     

     

    That is, my current graph shows the total external web requests and database connections in one data center, but we have another data center that serves the same web apps, with data base connections coming back to DC 1 on a different virtual. So 'the REAL' company picture of how many Y external web apps are running with X data base connections is the sum of data running in two data centers (therefore on two different LTMs)...but with this app, it doesn't matter. I can get all of the vritual server and pool members active connections from any active LTM anywhere in 1 query to my iControl app which can then be graphed in 1 data input method.

     

     

    And I can repeat this over and over using the name of the virtual or the pool (as opposed to researching the MIBs for each particular virtual and or pool member and opposed to having to create multiple data input methods to go to multiple devices, then merge the graphs somehow).

     

     

    One query, one data input method, one graph - almost unlimited combination of virtual and pool members from any LTM cluster.

     

     

    When I 'got that' I just had to share it, but maybe I'm alone on the excitement of it! haha!!
  • Ok, this is very NON scientific, so chalk it up in the category of passing interest to this discussion.

     

     

    Run from localhost, pulling SSL client SSL native stat information:

     

    [root@bigip1:Active:avc(0)] tmp time snmpget -Ont localhost -c public .1.3.6.1.4.1.3375.2.1.1.2.9.6.0

     

    .1.3.6.1.4.1.3375.2.1.1.2.9.6.0 = Counter64: 0

     

     

    real 0m0.167s

     

    user 0m0.049s

     

    sys 0m0.109s

     

     

    From my remote system (windows XP running pycontrol)

     

    In [22]: timeit s.get_client_ssl_statistics()

     

    1 loops, best of 3: 197 ms per loop

     

     

    The 197ms includes serialization of the XML request, tcp-ip stand up, SSL negotiation, the SOAP request/response, and de-serialization of the XML into a python object. 167 vs. 197 isn't too shabby. Also the iControl call pulls much, much more data.

     

     

    System wise, if I do something like this:

     

    while True:

     

    hammer snmp or iControl

     

    sleep(1)

     

     

    I don't see a major difference between the two. I'm not claiming this is definitive, but it's certainly of interest...

     

     

    -Matt
  • Hamish's avatar
    Hamish
    Icon for Cirrocumulus rankCirrocumulus
    I'd like to see Agent/Server/BigIP side resource usage for equivalent tasks... I'm not certain it would be very easy though...

     

     

    And Derek, yes cycles DO matter. Just ask your data centre manager whether he cares if you can run your monitoring app on a 16-way i7 with 64GB of memory or a 2 way 1U box with 4GB... if he's anything like ANY of the DC managers I've ever worked with, he'll care a LOT. Because it costs him an absolute truckload of money to provision rackspace and power to these puppies. Now you may have a DC next to your own private nuclear power station, but when when they run out of power in the DC and you have to re-negotiate with the owners for more current, you'll find he's pretty receptive to ANYTHING that will lighten the load... (Been there, done that).

     

     

    Cycles on both the F5 and the manager matter a lot... Just in different ways... if you consume 20% of your F5 just to monitor it, then that's 20% of the cash you spent to buy it that isn't being used to service customers.

     

     

    Oh. Just in case you think I'm anti iControl and pro SNMP... I'm not... There's a place for both... Management of the F5... That's iControl. Polling for stats, that's SNMP (I've used iControl for this, it is a lot heavier but I'd still like to see a proper set of testing done. I may have to think for a bit how we could profile both the iControl server and the SNMP daemon on an F5 to discover how hard each has to work to gather say full stats on 100 VS's, 200 pools and associated profiles. H