LTM VE performance while loadbalancing DNS (DNS profile) and Max connections
Hi I'll be raising a case via my SE as well but wanted to post on here in case others had some ideas. Probably going to be a bit of a long post - sorry.
We experienced a major service outage yesterday as our LTM VE seemed to crumble which was load balancing our DNS traffic. We haven't yet determined the initial cause but believe we have a good idea of the sequence of events but have some queries regarding F5 VE and ESXi interaction with regards to UDP traffic.
In normal working conditions we have average of 10K cps doing only DNS traffic - with LTM DNS profile applied this usually equates to about ~500 open active connections. I have always see it stay around this figure unless it stops seeing the DNS responses come back at which point the open connections obviously starts to rocket.
We've now seen in both load testing and also now in live is that about 40-50K cps LTM_ESXi5 combination seems starts dropping packets. however LTM does not show any dropped packets on the interfaces.
regarding the outage - we basically experienced a cascaded failure, it seems cps shot upto ~45k (haven't yet been able to determine if this was the cause or effect) but the consequence is it starts loosing packets, most importantly the probes to the DNS servers (both ICMP and DNS monitor), hence the LTM took all our DNS servers offline. Hence making the situation worse by swamping us with a storm of DNS requests from client retries, so the probes were continually flapping and the LTM could never recover. We recovered the situation by rate limiting the virtual server and removing the probes from the pool so DNS was just forwarded irrespective. My experience in load testing seems that LTM copes well while the max open connections is kept low but as soon as we seem to hit a threshold the max open connections then rockets because we begin to drop traffic.
So does anyone know what is the expected limit for a VE cps while using ESXi hypervisor - ours is literally only doing DNS loadbalancing. v11.4, default VE deployment from template. we only seemed to hit 50% cpu across 2 vCPU.
I've read a bit about UDP buffer tuning on linux host while using ESXi but is it possible to tune the udp buffers on F5 ltm using the same e.g something like sysctl -w net.core.rmem_max=26214400
Does anyone know if there is F5 ESXi best practises?
sorry for long post - hopefully someone will read it.