Forum Discussion

Ray_330743's avatar
Ray_330743
Icon for Altostratus rankAltostratus
Dec 04, 2017
Solved

parsing health monitor events

I've set up a system for shipping member up/down events over to Logstash which then injects into InfluxDB, giving us visualization and alerting (specifically for excessive flapping in pools). We do periodic application resets and expect each member of a given pool to "flap" (become unavailable and then return to available) only once. Sometimes members will flap more than once and we'd like to know why.

 

I figure I can check the log lines reporting the member going down to see details on what the health monitor was noticing during its check or last three checks maybe. Here's an example line:

 

/Common/pool_foo: down; last error: /Common/pool_foo: Tcp read: Connection refused; Unable to connect;  Response Code: 200 (OK);  Response Code: 404 (Not Found); No successful responses received before deadline.;  Response Code: 503 (Service Unavailable);  Response Code: 302 (Found);  Response Code: 403 (Forbidden) @2017/11/15 02:23:40.

This is a log line for a single down event. I know you're not going to get a 404, a 302, and a 503 all together in a single HTTP request, so I'm thinking the monitor is sharing the last n checks. But to make sense of what's happening and at which point, I need to be able to pull this line apart and understand which messages associate with which health check attempt and in what order.

 

Anyone know?

 

  • The answer is you don't.

     

    You're not supposed to get multiple results like that.

     

    F5 support let me know that you're only supposed to get the last check's result and that the presence of other check results is due to a bug. See https://support.f5.com/kb/en-us/products/big-ip_ltm/releasenotes/related/relnote-supplement-hotfix-bigip-12-1-2.html: "636149-3 : Multiple monitor response codes to single monitor probe failure"

     

    This bug goes away in 12.1.2HF2.

     

    The details in the release notes drop some hints that might help you parse them if you're really interested and haven't yet upgraded.

     

    The support technician let me know that I could enable "Monitor Logging" for particular members, and that might be a good way to get more detail on what the monitor sees. (Thanks, Chad!)

     

3 Replies

  • If anyone was interested in this topic, I've filed a support case. I'll keep you updated.

     

  • The answer is you don't.

     

    You're not supposed to get multiple results like that.

     

    F5 support let me know that you're only supposed to get the last check's result and that the presence of other check results is due to a bug. See https://support.f5.com/kb/en-us/products/big-ip_ltm/releasenotes/related/relnote-supplement-hotfix-bigip-12-1-2.html: "636149-3 : Multiple monitor response codes to single monitor probe failure"

     

    This bug goes away in 12.1.2HF2.

     

    The details in the release notes drop some hints that might help you parse them if you're really interested and haven't yet upgraded.

     

    The support technician let me know that I could enable "Monitor Logging" for particular members, and that might be a good way to get more detail on what the monitor sees. (Thanks, Chad!)