Forum Discussion

jcabezas_280360's avatar
jcabezas_280360
Icon for Nimbostratus rankNimbostratus
Dec 30, 2016

Why so high Ping Latencies in VE LTM ?

Hello,

I'm evaluating a VE LTM Trial, 25 Mbps, BIG-IP 12.1.1 Build 2.0.204 Hotfix HF2  
It's running on Hyper-V on Windows Server 2012R2.

When I run ping from the Hyper-V console window of the LTM VM I can measure the following times:

ping -I 172.27.50.1 172.27.50.151  =  **7 ms .. 30 ms** 
(pinging from the LTM internal static self-IP to another VM attached to the same Virtual Switch)

ping -I 172.27.50.1 172.27.50.161  =  **7 ms .. 30 ms**  
(pinging from the LTM internal static self-IP to another VM reached through the external network, through a physical switch)

ping -I 172.27.50.1 172.27.51.1    <  1 ms
(pinging from the LTM internal static self-IP to the LTM external static self-IP)

ping -I 172.27.50.1 172.27.52.1    <  1 ms
(pinging from the LTM internal static self-IP to the LTM management address)

ping -I 172.27.50.1 172.27.51.51   = **2 ms .. 4 ms**
(pinging from the LTM internal static self-IP to any of the configured LTM Virtual Servers)

pings between the two devices over the HA VLAN are even higher: tens of ms !

I reserved what I judge to be the recommended amounts of vCPU and memory to the LTM VE.
I have also disable Virtual Machine Queues in the PhyNICs and in the LTM VNICs.

Has someone suggestions of configurations to check/change, or troubleshooting procedures to reveal the cause of the high latencies above ?

Many thanks!

5 Replies

  • I'm testing a mixed cluster of one physical (LTM 1600) server and one VE running on Hyper-V.

     

    Both are running 12.1.2 and I got them clustered okay and I can do a failover to the VE node, but I ran across a problem with that. Specifically, if I do a continuous ping to the self-IP of the VE node when it's passive, I get consistent and low response times whether it's WAN or LAN.

     

    The moment I make VE the active node for any traffic group, the ping times start to vary all over the place. Even on the LAN, it goes from < 1ms to anywhere from 1 - 500 ms and might even timeout on a few of them. Same with WAN where I was getting a consistent 65ms response when standby, but all over the place when active.

     

    Any thoughts? The host machine for VE has dual 10-core Xeons and I've setup the guest machine with 8GB of RAM and 8 cores. The other guests on it aren't using much CPU at all when I've done my testing.

     

    When the VE goes active, I don't see CPU activity doing much... it gets a little higher and I see the connection count rise, but nothing out of the ordinary.

     

    I have VMQ's disabled everywhere (on the physical NICs as well as the virtual NIC setting) so that wouldn't be the problem. Plus, it's just strange that it only happens when it goes active.

     

    I should try pinging the floating IP as well but it's probably the same. I first noticed it when pinging a virtual IP of one of my vservers and saw the varied latency, so the floating IP is probably doing the same thing.

     

    FYI, I was using just plain GARP at first but switched to MAC Masquerading to see if that helped, but no difference. Besides, that wouldn't/shouldn't affect ping times to the self-IP.

     

    It really just seems like something in the network stack on the underlying Linux is having a hard time... it'd be great to find out if this is common, or just a few people?

     

  • I don't have much experience with Hyper-V but I wanted to ask some questions that might help others help you. And I'm just curious by nature, so here we go:

     

    • Is there any way to lock a physical interface and to the VE?
    • Is there any way to lock resources to the VE?
    • When setting up a VE in VMware there are some settings that you can make concerning LRO and SR-IOV. Perhaps that could be something?
    • Which modules are you running?
    • You mention CPU activity on the VM, but I don't see any references to memory?

    Links with information that might be useful:

     

    Considering the high differences in terms of latency and how the system is having low response times when passive I don't think any of the links above would help.

     

    Have you opened up a case with F5? Please update if you do?

     

    /Patrik

     

  • Some additional info. The server is a Proliant DL380 Gen9 running Windows Server 2012R2 for the Hyper-V. Two of the physical interfaces are in a network team just for guest machines, setup using the "Hyper-V Port" load balancing mode (the other two are in a different team for host network traffic like VM replication, etc).

     

    Those 2 NICs go into separate Cisco switches in a stack, using LACP for the team. Works great for all other guest VMs (and as mentioned, VMQ is disabled... I've seen the problems that caused in the past and learned my lessons :)

     

    The Big-IP VE is using a trial license. In a nutshell, one of our pair of LTM 1600's failed, and it's in a remote datacenter so I have to either get remote hands or plan a site visit to get it swapped out under the support contract. This same failed 1600 had died once before and replaced with a refurb unit, and now this refurb is dead. I think it's memory... on bootup I don't even see it POST when it powers on... the AOM goes through its thing and then just shows it waiting to post. Same thing that happened when the previous unit died and I think it was memory then as well.

     

    So, I thought I'd see if a physical/virtual cluster combo would work in the short term until I can get the unit RMA'd, just as a precaution in case the other one dies in the meantime.

     

    I did only have a single traffic group, but for testing I created a second traffic group once I got the new VE joined, so I could fail over a non-critical set of vips, and that's where I am now.

     

    Hyper-V doesn't have the same sorts of options as you indicate VMWare has. I did notice an option to enable MAC spoofing, so I went ahead and ticked that box since I was using MAC masquerading. Otherwise I think Hyper-V blocks net traffic from the guest that has a MAC other than what it's supposed to be.

     

    I tried switching the virtual net adapters to legacy NICs but when I rebooted the virtual F5, it didn't recognize any NICs so I switched back. I thought maybe it would use different drivers for those that might work better, but maybe I'd have to redo some configuration or start from scratch, so I stopped that research for now.

     

    The only modules enabled are LTM (set to "nominal") and everything else is "none", just like the physical box. Management is set to "small", but really, since I gave it 8GB of RAM I could probably bump that to medium or large. I don't have a lot of objects though. 58 vservers, 41 pools, 8 nodes... "small" has worked fine so far.

     

    Basically, this secondary traffic group has a set of vservers that aren't actively used except for the F5 monitors themselves (those particular vservers are for a redundant set of websites in this particular facility). So when I say the CPU is low, I mean it's barely above zero when the VE is hosting that group. Memory usage is also super low, with several GB (6+ ) of free space for TMM.

     

    All the "real" traffic on this cluster is on the other traffic group still pointed to the physical unit... leaving that alone until I get to the bottom of this virtual edition problem.

     

    I haven't opened a support case... this is a trial VE after all, and hopefully this was only going to be a temporary "just in case" for a month or so. :)

     

  • Ugh, I think I just realized what the problem is in my situation. I'm using the trial VE which, I just saw in the fine print, has a max throughput of 1 Mbps (not even bytes... 1 megabit per second). No wonder it always started to tank when I'd hit it with any kind of traffic.

     

    Well, now I know that it's possible in theory at least. Maybe I'll work with our F5 rep to get some kind of lab license at least so I can test this out further.

     

    Ideally it'd be great to know if this works long term because dealing with proprietary hardware like this when it dies and it's somewhat inaccessible to me is kind of a pain, so maybe we can convert our old licenses to VE boxes and not have to worry about one more piece of equipment that can fail (and our old LTM 1600's aren't supported for v13 anyway, meaning this location is now behind our other datacenter with newer F5's).

     

    Something to think about, and I hope this helps anyone else who runs into this and foolishly ignored the little blurb about what the limits are on a VE trial license... :)

     

  • Haha, that was a bit funny, thanks for sharing! :)

     

    I've tried combining physical with VE in prod before. Worked fine, although I would not be comfortable with doing it for a longer period of time. Better to have the same performance on both cluster members, or you might be in for a surprise.

     

    I think the F5 reps are able to generate fully fledged licenses valid for 45 (or was it 90?) days.

     

    Good luck!

     

    /Patrik