Forum Discussion

Ron_Peters_2122's avatar
Ron_Peters_2122
Icon for Altostratus rankAltostratus
Sep 15, 2016

ARP/MAC Tables Not Updating on Core Switches After F5 LTM Failover (GARP Issue?)

We have two F5 LTM 5250v appliances configured with 2 vCMP instances each in an HA pair (Active/Standby). Each F5 5250v has a 10G uplink to two core switches (Cisco Nexus 7010) configured as an LACP port-channel on the F5 side and a Port-Channel/vPC on the Nexus side.

 

Port-Channel127/vPC127 = F5ADC01 Port-Channel128/vPC128 = F5ADC01

 

When I look at the MAC address tables on both 7K1 and 7K2, I can see all the individual F5 MACs for each VLAN we have configured on the F5 vCMP instances.

 

We are having an issue during automatic or manual failover where the MAC addresses for the virtual-servers are not being updated. If F5ADC01 is Active and we force it Standby, it immediately changes to Standby and F5ADC02 immediately takes over the Active role. However, the ARP tables on the Nexus 7K Core switches do not get updated so all the virtual-servers continue to have the MAC address associated with F5ADC01.

 

We have multiple partitions on each vCMP instance with several VLANs associated with each partition. Each partition only has a single route-domain the VLANs are allocated to. For traffic to virtual-servers, we are using Auto-MAP to SNAT to the floating Self-IP and using Auto-Last Hop so return traffic passes through the correct source VLAN. We are not using MAC masquerading.

 

The ARP time out on the Nexus 7Ks is 1500 seconds (default) so it takes 25min after a failover for a full network recovery. Eventually the ARP entries age out for all virtual servers and get refreshed with the correct MAC address. Obviously this is not acceptable.

 

I found an SOL article that talks about when GARPs can be missed after failover: SOL7332: Gratuitous ARPs may be lost after a BIG-IP failover event. We have confirmed the upstream core switches are not dropping any GARPs. As a test I went in and manually disabled all virtual-servers and then enabled them and all MACs updated immediately.

 

I have opened a support case with F5 and we have yet to determine where the issue lies. Does anybody have any ideas what the issue might be? If I need to provide more information about our configuration, let me know.

 

We are pretty new to the F5 platform. We recently migrated from the Cisco ACE30 platform. Failover on the ACE platform worked perfectly. Similar cabling setup (two port-channels to two separate Catalyst 6509 switches with an ACE30 module in each switch). After ACE failover, the MAC tables/ARP caches immediately updated.

 

Thank You!

 

22 Replies

  • We are running Version 11.5.3 HF2.

     

    We have not heard anything definitive back from support yet. They e-mailed me and linked me the following SOL article: SOL11880: BIG-IP objects may not send gratuitous ARP requests during failover https://support.f5.com/kb/en-us/solutions/public/11000/800/sol11880.html

     

    However, we do not feel this applied. We have multiple partitions on each vCMP instance. Each partition has only one default route-domain. Each partition has multiple VLANs allocated to it. Every VLAN has 2 Self-IPs and 1 Floating IP address. All virtual-servers share the same subnet as their designated VLAN/floating-IP. We are utilizing Auto-Map for all virtual-servers instead of using SNAT pools. We are also utilizing Auto-Last Hop so return traffic passes through the original source VLAN instead of using the single default route we have tied to the single route-domain.

     

    Note, the F5s are not utilized as the default gateway by the nodes. They only send return traffic through the F5s for traffic entering through the virtual-server. Each VLAN has an SVI on both upstream Nexus switches and we are utilizing HSRP with a virtual-address. The HSRP virtual-address is used as the default gateway by the nodes.

     

    We have another maintenance window scheduled for this Wednesday evening to perform another manual failover on our DEVQA vCMP instance where I will be setting up a tcpdump on the unit that will become Active to capture all ARP traffic. This is to verify whether the unit is sending GARPs after failover or not. We were also linked the following SOL article from support but we have monitored the switch and checked the logs/statistics and have confirmed the switches are not dropping any ARP traffic: SOL7332: Gratuitous ARPs may be lost after a BIG-IP failover event https://support.f5.com/kb/en-us/solutions/public/7000/300/sol7332.html

     

    We have verified each of these points and have confirmed that the upstream 7Ks are not dropping any ARP traffic. The ARP timeout is set to the default aging period, which on this platform is 25min.

     

    I will respond again after we conclude the failover test tomorrow night.

     

    Thank you for the responses thus far.

     

  • What version are you running?

     

    We've seen hints of this in some of our failovers after upgrading to 12.1.0/12.1.1.

     

    Also have you heard anything back from the case? I'm curious to see how things go for you guys.

     

    Cheers,

     

  • Unfortunately I'm not able to do this at this time as these boxes are up and running in production. We performed a failover test last night during a scheduled maintenance window in our DEVQA environment. We are however SPANing all network traffice (including ARP) on our network and I'm looking into the capture file to see what was sent during that duration.

     

  • Out of curiousity, can you force a failover whilse having the following tcpdump running on the box that will become active?

     

    tcpdump -ni 0.0:nnn -s0 -vvv arp

     

    Or optionally to capture to a pcap...

     

    tcpdump -ni 0.0:nnn -s0 -vvv arp -w /shared/tmp/garp.pcap

     

  • Thanks for the reply.

     

    Yes, for all virtual-servers, the Traffic-Group is set to floating and ARP is enabled. Below is a sample virtual-server config. All of them are set the same:

     

     

  • Navigate to Local Traffic->Virtual Servers->Virtual Address List and check the following:

     

    Is Traffic Group set to (floating)? Is the ARP enabled checkbox checked?