Syslog-V_277002
Oct 19, 2017Nimbostratus
Load balancing a Redis master/slave setup
I have the F5 in front of a couple of Redis servers, configured ad Master/Slave.
I have the READ pool, with round robin on both servers.
I have the WRITE pool with one server (master) with priority 100 and the slave with lower priority. When the master is down the slave kicks in (it is configured to be writable). I do not expect to have the keys that go into the slave replicated to the master when the master comes back. This is intended. I have a health check based on the application PING on Redis servers. It works as expected in identifying dead hosts.
The problem is that my app server do not behave as expected. When everything is up the app writes to the WRITE pool, and this results in keys to go to the master. Keys are replicated to the slave and so GET commands when sent to the READ pool are split between the master and the slave.
When master goes down F5 detects it, and if I open a redis-cli on the balanced address I connect to the writable slave and it all works well. The application instead goes in "retry" mode while issuing commands to the F5. and no keys are written on the slave host. Recycling the application registers on the slave host, but again commands are not forwarded. Even if I bring back the master no SET commands are sent to it. They simply disappear and the application gets "retry". Sniffing the servers the DO NOT receive any command.
If I bring down the slave it all works well. Then I bring down the master also and the app fails (expected). When I bring up again the master the app restarts sending commands to the master. To sum it up, during normal work: F5 detects the master going down, but the connections are NOT automatically redirected to the slave. New connections from the redis-cli are, but the app does not work even after recycling. After the recycle, if the master comes back up, connections are not forwarded to it. Closing and reopening the redis-cli works as expected going to the master.
It seems connections are stuck to the dead server and do not switch.
It is a FastL4 balancing.
Any suggestion on the way to cleanly switch between the backend servers?