Decrypting the /var/log/ltm health monitor

Question

When a site goes down the log will show the pool member that went down with error code 01070638. I need to know what each delimited part means because I get a string of text and I don't know what it's telling me.&nbsp;
This is a http monitor doing a HEAD and to the root / and expecting a 200 OK response.
This is how I'm reading it- no successful responses received before deadline is self explanatory.&nbsp;
&nbsp;response code 200 OK- i assume that's the expected response
&nbsp;Response Code: 500, 
&nbsp;then 302, 
&nbsp;then a 404
&nbsp;then a 503...
&nbsp;which is it? I'm confused. There are many of these in which I will see 200 then a 500 then a 503. So it isn't always a 302 and 404 in between like the example below. Is there documentation that explains what each section of the below log is telling us?&nbsp;&nbsp;&nbsp;
notice mcpd[7586]: 01070638:5: Pool /Common/ member /Common/ monitor status down. [ /Common/: down; last error: /Common/: Tcp read: No route to host; Unable to connect; No successful responses received before deadline.;  Response Code: 200 (OK);  Response Code: 500 (Internal Server Error);  Response Code: 302 (Found);  Response Code: 404 (Not Found);  Response Code: 503 (Service Unavailable) @2016/12/01 04:03:47.  ]  [ was up for 5hrs:49mins:58sec ]&nbsp;

psilva · Answer

Hi Darren~
I'm not sure about the 200 to 500 to 300 issue but this article seems to, at least, explain the error code you are seeing:&nbsp;
SOL10524: Error Message: Per-invocation log rate exceeded - https://support.f5.com/kb/en-us/solutions/public/10000/500/sol10524.html?sr=59308171&nbsp;
and specifically, mcpd[2239]: 01070638:6: Resuming log processing at this invocation; held 22 messages.&nbsp;
Hope that helps?&nbsp;
ps&nbsp;

jrahm · Answer

the error codes themselves are not published, and I'm not sure they'd be terribly helpful to you anyway. Better would be to enable monitor debugging:
tmsh modify sys db bigd.debug value enable
This will give you detailed logs in /var/log/bigdlog
[0][18226] 2016-12-29 19:42:08.333578: ID 12    :(_do_ping): time to ping, now=[1482980316.191702][2016-12-28 20:58:36], status=DOWN [ addr=::ffff:192.168.103.101:80 mon=/Common/http fd=-1 pend=0 conn=0 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1482980316.171756][2016-12-28 20:58:36] last_ping=[1482980311.184895][2016-12-28 20:58:31] deadline=[1482980317.397351][2016-12-28 20:58:37] on_service_list=True snd_cnt=23728 rcv_cnt=0 ]
[0][18226] 2016-12-29 19:42:08.333614: ID 12    :(_send_active_service_ping): pinging [ addr=::ffff:192.168.103.101:80 srcaddr=none ]
[0][18226] 2016-12-29 19:42:08.333622: ID 12    :(_connect_to_service): creating new socket (rd0) [ addr=::ffff:192.168.103.101:80 ]
[0][18226] 2016-12-29 19:42:08.333725: ID 12    :(_connect_to_service): connect: Operation now in progress [ addr=::ffff:192.168.103.101:80 srcaddr=::ffff:192.168.103.5%0:58627 ]
[0][18226] 2016-12-29 19:42:08.333755: ID 12    :(_do_ping): post ping, status=DOWN [ addr=::ffff:192.168.103.101:80 mon=/Common/http fd=10 pend=1 conn=1 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1482980321.171756][2016-12-28 20:58:41] last_ping=[1482980316.191702][2016-12-28 20:58:36] deadline=[1482980317.397351][2016-12-28 20:58:37] on_service_list=True snd_cnt=23729 rcv_cnt=0 ]
[0][18226] 2016-12-29 19:42:09.845594: ID 12    :(adjust_deadline): from [1482980317.397351][2016-12-28 20:58:37] to [1482980322.703494][2016-12-28 20:58:42] [ addr=::ffff:192.168.103.101:80 mon=/Common/http fd=10 pend=1 conn=1 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1482980321.171756][2016-12-28 20:58:41] last_ping=[1482980316.191702][2016-12-28 20:58:36] deadline=[1482980322.703494][2016-12-28 20:58:42] on_service_list=True snd_cnt=23729 rcv_cnt=0 ]
[0][18226] 2016-12-29 19:42:09.845602: ID 12    :(_analyze_pings): visit DOWN, now=[1482980317.703494][2016-12-28 20:58:37] [ addr=::ffff:192.168.103.101:80 mon=/Common/http fd=10 pend=1 conn=1 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1482980321.171756][2016-12-28 20:58:41] last_ping=[1482980316.191702][2016-12-28 20:58:36] deadline=[1482980322.703494][2016-12-28 20:58:42] on_service_list=True snd_cnt=23729 rcv_cnt=0 ]

Just make sure you disable the debug when you are done:
tmsh modify sys db bigd.debug value disable

fishballball · Answer

I also have this problem. Please tell me what's the meaning for 'Response Code: 404 (Not Found);  Response Code: 500 (Internal Server Error);  Response Code: 200 (OK);' in the below alert. Is it received three response at the same monitoring interval? Or it was the record for last three monitoring intervals before DOWN? Thanks.&nbsp;
Apr 10 03:13:24 LB1 notice mcpd[4800]: 01070638:5: Pool /Common/pool_34541 member /Common/172.19.155.11:34541 monitor status down. [ /Common/mon_http1: down, /Common/mon_http2: up; last error: /Common/mon_http1: Tcp read: Connection refused; Unable to connect;  Response Code: 404 (Not Found);  Response Code: 500 (Internal Server Error);  Response Code: 200 (OK); No successful responses received before deadline. @2018/04/10 03:13:24.  ]  [ was up for 78hrs:30mins:32sec ]&nbsp;

fishballball_30 · Answer

I also have this problem. Please tell me what's the meaning for 'Response Code: 404 (Not Found);  Response Code: 500 (Internal Server Error);  Response Code: 200 (OK);' in the below alert. Is it received three response at the same monitoring interval? Or it was the record for last three monitoring intervals before DOWN? Thanks.&nbsp;
Apr 10 03:13:24 LB1 notice mcpd[4800]: 01070638:5: Pool /Common/pool_34541 member /Common/172.19.155.11:34541 monitor status down. [ /Common/mon_http1: down, /Common/mon_http2: up; last error: /Common/mon_http1: Tcp read: Connection refused; Unable to connect;  Response Code: 404 (Not Found);  Response Code: 500 (Internal Server Error);  Response Code: 200 (OK); No successful responses received before deadline. @2018/04/10 03:13:24.  ]  [ was up for 78hrs:30mins:32sec ]&nbsp;

boneyard · Answer

as suggested by Jason Rahm above you, enable the debugging and see what turns up. also please post your tmos version.&nbsp;

Forum Discussion

Decrypting the /var/log/ltm health monitor

8 Replies

Recent Discussions

Unwinding the Benefits of Investing in White Label Crypto Wallet

SMTP profile not visible & showing

About AWAF "Redirect URLs" in "Responses and Blocks"

F5 iRule to replace host to path

Chrome V 124+ on MacOS - Virtual Server Access Issue

Related Content

Decrypt, Encrypt, Decrypt, Encrypt, Encrypt....

How to Enalbe Checkcert utility push log to var/log/ltm

Decrypting BIG-IP Packet Captures without iRules

Decrypting BIG-IP Packet Captures Automatically

F5 Distributed Cloud - Regional Decryption with Virtual Sites