Forum Discussion

jburns's avatar
jburns
Icon for Nimbostratus rankNimbostratus
Nov 30, 2021

What is the 'correct' way to retry a request to each pool member in result of 404?

I am attempting to modify this f5 example so that requests to a virtual server that result in a 404 response are retried on every member node.

My setup:

Virtual server with supplied iRule applied to it. Pool associated to the virtual server has 2 member nodes and a load balance method of 'Round Robin'. One member node is 'good' (will respond with a 200 on test request) and the other member node is 'bad' (will respond with 404).

Here are the relevant logs from the iRule when I issued a single request from my computer to the virtual server.

696168:Nov  4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <HTTP_REQUEST>: Saving HTTP request headers: GET /images/cool-image.svg HTTP/1.1  Host: client_ip  User-Agent: curl/7.47.0  Accept: */*  X-Forwarded-For: 10.34.160.46
696169:Nov  4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <LB_SELECTED>: server choosen ip_bad_server on retry 0
696170:Nov  4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <HTTP_RESPONSE>: server ip_bad_server returned 404. Retry 0 out of 2
696171:Nov  4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <LB_SELECTED>: server choosen ip_good_server on retry 1
696172:Nov  4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <LB_SELECTED>: server choosen  after reselect on retry 1
696173:Nov  4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <LB_SELECTED>: server choosen ip_bad_server on retry 1
696174:Nov  4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <LB_SELECTED>: server choosen  after reselect on retry 1
696176:Nov  4 12:15:25 f5_address info tmm2[13000]: Rule /Common/retry_next_on_404 <HTTP_RESPONSE>: server  returned 404. Retry 1 out of 2

Here is the iRule that generated these logs


# Retry requests to the virtual server's default pool if the server responds with a 404

when CLIENT_ACCEPTED {
  # On each new TCP connection track that we have not retried a request yet
  set retries 0
  # Save the name of the virtual server default pool
  set default_pool [LB::server pool]
}

when HTTP_REQUEST {
  # We only want to retry GET requests to avoid having to collect POST payloads
  # Only save the request headers if this is not a retried request
  if { [HTTP::method] eq "GET" && $retries == 0 }{
     set request_headers [HTTP::request]
     log local0. "Saving HTTP request headers: $request_headers"
  }
}

when LB_SELECTED {
  # Select a new pool member from the VS default pool if we are retrying this request
  log local0. "server chosen [LB::server addr] on retry $retries"
  if { $retries > 0 } {
     LB::reselect pool $default_pool
     log local0. "server chosen [LB::server addr] after reselect on retry $retries"
  }
}

when HTTP_RESPONSE {
  # Check for server errors
  log local0. "server [LB::server addr] returned [HTTP::status]. Retry $retries out of [active_members $default_pool]"
  if { [HTTP::status] == "404" } {

     # Server error, retry the request if we have not already retried more times than there are pool members
     incr retries

     if { $retries < [active_members $default_pool] } {
        # Retry this request
        HTTP::retry $request_headers
        # Exit this event from this iRule so we do not reset retries to 0
        return
     }
  }
  # If we are still in the rule we are not retrying this request
  set retries 0
}

So the order of events that appears to be happening.

1) We get a request and save the headers

2) A server is chosen based on LB algorithm (in this case the bad server is chosen)

3) We detect a 404 in HTTP_RESPONSE event and invoke HTTP::retry

4) We enter LB_SELECTED again due to HTTP::retry triggering HTTP_REQUEST and all subsequent events. And you can see we have selected the good server

5) Because the variable retires is greater than 1 we invoke LB::reselect. Notice after LB::reselect we do no log the selected servers address ([LB::server addr]), I am guessing this is due to LB::reselect not happening instantly and likely clearing properties in the LB object.

6) LB_SELECTED is entered again and this time logging shows we now have the bad server again.

7) the retries variable is still greater than 0 so we at least go into the conditional block that would invoke LB::selected. We loose logging suggesting that command was ran.

😎 We got another 404, increment the retries variable and then no more retries (does not pass conditional).

So in my test situation we end up with a case where all requests return a 404 because we keep ending up on the bad server. This seems due to both HTTP::retry causing a new node to be reselected based on LB algorithm and then explicitly telling f5 to reselect again based on the LB algorithm. If that is the case and with Round Robin being the default LB algorithm the example seems to mislead, as I expected this iRule to "try all members in a pool". In testing I can achieve this by just removing the LB::reselect command. But then I am confused as to why that is even in the irule for HTTP::retry. Also relying on the LB algorithm leads to an edge case where an unrelated request would 'increment' the LB algorithm and then I could 'miss' the 'good' server by just blindly retrying.

Also strange is the loss of the server addr. And that LB::reselect triggers the event LB_SELECTED that seemingly triggers LB::reselect but this does not result in an infinite loop.

I am looking for some validation on my assumptions and understanding of the order of events. And suggestions on the correct way to guarantee all member nodes are retried. Right now the best solution I have is blindly issue retries and set the number of retries I will have high enough that I will likely find the right server.

No RepliesBe the first to reply