Forum Discussion

kridsana's avatar
kridsana
Icon for Cirrocumulus rankCirrocumulus
Jul 19, 2018

F5 Gratuitous-ARP issue when failover

Hi

 

Last night we upgrade F5 v. 11.5.4 to v.12.1.3

 

when we failover from old unit v.11.5.4 to newly unit v.12.1.3, We experience some IP has more request timeout than the rest (we ping ip of each vs (~20 ip) when failover)

 

From my understanding, F5 will send G-ARP to neighbour unit when it's active.

 

Is it possible that G-ARP that sent is drop so those IP experience longer downtime due to still using old ARP?

 

or Is it because neighbour unit not use some G-ARP from F5?

 

or Is there any possibilities that make neighbour unit not learn new ARP as expect?

 

Thank you

 

5 Replies

  • Surgeon's avatar
    Surgeon
    Ret. Employee

    I am not sure what do you mean by neighbour units, but take into account next When the big-ip become active in send 5 garps per IP. Your router or switch may block garps thinking that it might be broadcast storm.

     

    K91071055: The BIG-IP system now sends gratuitous ARPs as quickly as possible

     

    But you may change the behaviour based on your business needs

     

    K11985: Overview of the arp.gratuitousrate and arp.gratuitousburst database variables

     

    Traffic flow also depend on switch mac table, not only arp table. If switch's mac table is not updated then you would expect delays.

     

    If any of your VIPs IP does not belong to any slef-ip's subnet, big-ip will not sent garp for these IPs since there is no outgoing interface.

     

    K15858: BIG-IP systems do not send gratuitous ARP requests under certain conditions

     

    If you are using dynamic routes then it may take some time for route table update before the service is restored.

     

  • IMHO F5 will surely sent G-ARP normally because it's just some IP that affect ping request timeout when failover.

     

    So possibilities that G-ARP is drop (maybe due to congestion or neighbour unit itself) is most likely. (BTW it's just my blindly guess)

     

    Is there any more possibilities that make neighbour unit not learn new ARP as expect?

     

  • Hi, there is a known issue: "597978-2 : GARPs may be transmitted by active going offline" Please check the current release notes. We experienced the issue as well in our environment when applying a "forced offline" to the active unit. Cheers, Stephan

     

  • For troubleshooting I would recommend to run a simultaneous tcpdump on interface "

    0.0
    " with a filter on "
    arp
    " on both machines in your device service cluster and to trigger a failover. (Use the "
    -s 0
    " option for full packet size and "
    -w /var/tmp/.pcap
    " for dump to file, please)

    I.e.
    tcpdump -vi 0.0 -c 10000 -s 0 arp -w /var/tmp/trace.$HOSTNAME.pcap

    In wireshark you can apply a filter of

    arp.isgratuitous
    to display G-ARPs only. As written by surgeon, it might happen that due to the pure amount of G-ARPs some might get lost.

    Virtual IPs configured in non-locally-attached networks wont send G-ARPs. Instead you can expect seeing G-ARPs for floating self IPs which are used as next hops.

    Don´t be irritated to see G-ARPs for local-only Self IPs (especially those to be used for device service cluster configuration heartbeat/config-sync/mirror) as well.

    Make sure, not to set the "
    link down on failover
    " (Device Management >> Device Groups >> : Failover; just leave it in the default setting of "
    0.0
    ") as it might trigger a spanning tree reconvergence after a failover resulting in lost G-ARPs.

    Cheers, Stephan
  • would anyone be able to help me find documentation on the exact behavior of the standby unit when a failover occurs? More specifically, if you force to standby the active unit, which now will become the new standby, what should it be doing in regards to sending garp requests during the failover event? Should it not send any garps at all or is it expected that it will send out garps while the new active unit is also sending out garp's. This is for virtual servers and snats. Is there any documentation specifically on this? And if the standby unit were to send garps but shouldn't, then why is it doing so? Thanks