Forum Discussion

Stan_Ward's avatar
Stan_Ward
Icon for Altocumulus rankAltocumulus
Jun 20, 2018

RESOLV::lookup failure handling

We have an iRule that dynamically selects a node via DNS lookup. The host name and DNS server are hard coded within the iRule event. But we occasionally see errors like this in the LTM log:

 

Jun 20 04:16:38 bigip1a err tmm1[20105]: 01220001:3: TCL error: /Common/iRule_Logon_Page  - bad IP address format (line 1)TCL error (line 1) (line 1)     invoked from within "node [RESOLV::lookup @$DNS "server1.company.com"]`

$DNS is statically set at the beginning of the HTTP_REQUEST event, and the host lookup is based solely on whether the request involves our production, QA or dev logon server. The purpose of the RESOLV was purely to avoid hard coding IPs in case the server ever moves. Since both the DNS server IP and the host names are static and known to exist with a single IP each, clearly it's not a normal DNS lookup failure. It happens perhaps 1-5 times a day, out of thousands of daily requests.

 

Could this be due to a timeout waiting for a DNS response during times of heavy load on the network, DNS server, etc.? If so, how can this be detected? There is no viable alternative to a failed lookup, and would represent a major infrastructure outage (it's our employee logon server). So, my guess is that every once in a while, an employee gets some type of connection error while attempting to logon, and probably just retries and gets in.

 

Reading the RESOLV description in the iRules Wiki, it's not clear what a failure should look like. If there is no response, is there a return error code to catch, or just a zero-length list the same as a host not found?