Forum Discussion

Darren_Walker_2's avatar
Dec 01, 2016

Decrypting the /var/log/ltm health monitor

When a site goes down the log will show the pool member that went down with error code 01070638. I need to know what each delimited part means because I get a string of text and I don't know what it's telling me.

 

This is a http monitor doing a HEAD and to the root / and expecting a 200 OK response. This is how I'm reading it- no successful responses received before deadline is self explanatory.

 

 

response code 200 OK- i assume that's the expected response

 

Response Code: 500,

 

then 302,

 

then a 404

 

then a 503...

 

which is it? I'm confused. There are many of these in which I will see 200 then a 500 then a 503. So it isn't always a 302 and 404 in between like the example below. Is there documentation that explains what each section of the below log is telling us?

 

 

 

notice mcpd[7586]: 01070638:5: Pool /Common/ member /Common/ monitor status down. [ /Common/: down; last error: /Common/: Tcp read: No route to host; Unable to connect; No successful responses received before deadline.; Response Code: 200 (OK); Response Code: 500 (Internal Server Error); Response Code: 302 (Found); Response Code: 404 (Not Found); Response Code: 503 (Service Unavailable) @2016/12/01 04:03:47. ] [ was up for 5hrs:49mins:58sec ]

 

8 Replies

  • the error codes themselves are not published, and I'm not sure they'd be terribly helpful to you anyway. Better would be to enable monitor debugging:

    tmsh modify sys db bigd.debug value enable

    This will give you detailed logs in /var/log/bigdlog

    [0][18226] 2016-12-29 19:42:08.333578: ID 12    :(_do_ping): time to ping, now=[1482980316.191702][2016-12-28 20:58:36], status=DOWN [ addr=::ffff:192.168.103.101:80 mon=/Common/http fd=-1 pend=0 conn=0 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1482980316.171756][2016-12-28 20:58:36] last_ping=[1482980311.184895][2016-12-28 20:58:31] deadline=[1482980317.397351][2016-12-28 20:58:37] on_service_list=True snd_cnt=23728 rcv_cnt=0 ]
    [0][18226] 2016-12-29 19:42:08.333614: ID 12    :(_send_active_service_ping): pinging [ addr=::ffff:192.168.103.101:80 srcaddr=none ]
    [0][18226] 2016-12-29 19:42:08.333622: ID 12    :(_connect_to_service): creating new socket (rd0) [ addr=::ffff:192.168.103.101:80 ]
    [0][18226] 2016-12-29 19:42:08.333725: ID 12    :(_connect_to_service): connect: Operation now in progress [ addr=::ffff:192.168.103.101:80 srcaddr=::ffff:192.168.103.5%0:58627 ]
    [0][18226] 2016-12-29 19:42:08.333755: ID 12    :(_do_ping): post ping, status=DOWN [ addr=::ffff:192.168.103.101:80 mon=/Common/http fd=10 pend=1 conn=1 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1482980321.171756][2016-12-28 20:58:41] last_ping=[1482980316.191702][2016-12-28 20:58:36] deadline=[1482980317.397351][2016-12-28 20:58:37] on_service_list=True snd_cnt=23729 rcv_cnt=0 ]
    [0][18226] 2016-12-29 19:42:09.845594: ID 12    :(adjust_deadline): from [1482980317.397351][2016-12-28 20:58:37] to [1482980322.703494][2016-12-28 20:58:42] [ addr=::ffff:192.168.103.101:80 mon=/Common/http fd=10 pend=1 conn=1 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1482980321.171756][2016-12-28 20:58:41] last_ping=[1482980316.191702][2016-12-28 20:58:36] deadline=[1482980322.703494][2016-12-28 20:58:42] on_service_list=True snd_cnt=23729 rcv_cnt=0 ]
    [0][18226] 2016-12-29 19:42:09.845602: ID 12    :(_analyze_pings): visit DOWN, now=[1482980317.703494][2016-12-28 20:58:37] [ addr=::ffff:192.168.103.101:80 mon=/Common/http fd=10 pend=1 conn=1 up_intvl=5 dn_intvl=5 timeout=16 time_until_up=0 immed=0 next_ping=[1482980321.171756][2016-12-28 20:58:41] last_ping=[1482980316.191702][2016-12-28 20:58:36] deadline=[1482980322.703494][2016-12-28 20:58:42] on_service_list=True snd_cnt=23729 rcv_cnt=0 ]
    

    Just make sure you disable the debug when you are done:

    tmsh modify sys db bigd.debug value disable

  • I also have this problem. Please tell me what's the meaning for 'Response Code: 404 (Not Found); Response Code: 500 (Internal Server Error); Response Code: 200 (OK);' in the below alert. Is it received three response at the same monitoring interval? Or it was the record for last three monitoring intervals before DOWN? Thanks.

     

    Apr 10 03:13:24 LB1 notice mcpd[4800]: 01070638:5: Pool /Common/pool_34541 member /Common/172.19.155.11:34541 monitor status down. [ /Common/mon_http1: down, /Common/mon_http2: up; last error: /Common/mon_http1: Tcp read: Connection refused; Unable to connect; Response Code: 404 (Not Found); Response Code: 500 (Internal Server Error); Response Code: 200 (OK); No successful responses received before deadline. @2018/04/10 03:13:24. ] [ was up for 78hrs:30mins:32sec ]

     

    • boneyard's avatar
      boneyard
      Icon for MVP rankMVP

      as suggested by Jason Rahm above you, enable the debugging and see what turns up. also please post your tmos version.

       

  • I also have this problem. Please tell me what's the meaning for 'Response Code: 404 (Not Found); Response Code: 500 (Internal Server Error); Response Code: 200 (OK);' in the below alert. Is it received three response at the same monitoring interval? Or it was the record for last three monitoring intervals before DOWN? Thanks.

     

    Apr 10 03:13:24 LB1 notice mcpd[4800]: 01070638:5: Pool /Common/pool_34541 member /Common/172.19.155.11:34541 monitor status down. [ /Common/mon_http1: down, /Common/mon_http2: up; last error: /Common/mon_http1: Tcp read: Connection refused; Unable to connect; Response Code: 404 (Not Found); Response Code: 500 (Internal Server Error); Response Code: 200 (OK); No successful responses received before deadline. @2018/04/10 03:13:24. ] [ was up for 78hrs:30mins:32sec ]

     

    • boneyard's avatar
      boneyard
      Icon for MVP rankMVP

      as suggested by Jason Rahm above you, enable the debugging and see what turns up. also please post your tmos version.

       

  • Hi, Is there already some more info about this? We have the same issue for the moment. We were not able yet to take tcpdump and debug logs due to change windows.

     

    Oct 18 14:22:14 slot1/F5 notice mcpd[6750]: 01070638:5: Pool /Common/pool member /Common/pool-member:80 monitor status down. [ /Common/monitor_http: down; last error: /Common/monitor_http: Response Code: 200 (OK); Unable to connect; No successful responses received before deadline.; Response Code: 503 (Service Unavailable); Response Code: 500 (Internal Server Error) @2018/10/18 14:22:14. ] [ was up for 17hrs:53mins:4sec ]

     

    Br, Ben

     

    • boneyard's avatar
      boneyard
      Icon for MVP rankMVP

      what info do you expect? in general this just points to a health monitor returning a not expected HTTP response code. the fact one person might have some weird bug going on isn't something to focus on.