Forum Discussion

vince_231497's avatar
vince_231497
Icon for Nimbostratus rankNimbostratus
Feb 10, 2016

Configsync problem

Hi guys,

 

I am testing two trial BigIP VE 11.3. I am currently having problem with configsync. It seems like my HA is working since one they are on active/standby state and it correctly goes to the appropriate state whenever I forced the other machine to go offline.

 

Here's what I currently have:

 

I am getting the following warning on both machine when trying to sync "One or more devices are unreachable. Resolve any communication problems before attempting to sync"

 

I am getting the following error on local traffic log on both machine:

 

(machine 1)

 

Fri Feb 8 16:23:55 AST 2002 notice big2 mcpd[4064] 01071431 Attempting to connect to CMI peer 192.168.30.11 port 6699

 

Fri Feb 8 16:23:55 AST 2002 notice big2 mcpd[4064] 01071432 CMI peer connection established to 192.168.30.11 port 6699

 

Fri Feb 8 16:23:55 AST 2002notice big2 mcpd[4020] 0107143c Connection to CMI peer 192.168.30.11 has been removed

 

(machine 2)

 

Fri Feb 8 16:23:55 AST 2002 notice big2 mcpd[4064] 01071431 Attempting to connect to CMI peer 192.168.30.10 port 6699

 

Fri Feb 8 16:23:55 AST 2002 notice big2 mcpd[4064] 01071432 CMI peer connection established to 192.168.30.10 port 6699

 

Fri Feb 8 16:23:55 AST 2002notice big2 mcpd[4020] 0107143c Connection to CMI peer 192.168.30.10 has been removed

 

Here's what I have currently tried:

 

  • Verified that the license is good (note: I am using the 90 day trial)

     

  • recreated device trust

     

  • clock is sync

     

  • change port lockdown to all

     

  • create another vlan just for HA

     

  • rebuild mcpd

     

  • I verified that both device can ping each other ip

     

  • tried telnet: I cannot get through on port 6699

     

I think it has something to do with that port flapping but I have no idea how to resolve it.

 

I am completely new to f5 and my only resources is this site. I have also read somewhere that the trial version does not support configsync but that post was almost a year ago. I just downloaded the latest trial just this January. I've been trying to tinker this issue for almost a week now with no luck. I hope someone can point me out to the right direction.

 

1 Reply

  • 11.3.0 is well out of support by now, but having said that, I don't think anyone has updated the 90-day trial to a more recent version. Speak with sales if you would like to trial a more recent version.

    The CMI channel is the connection used to perform configsync, which is separate from the HA failover capability.

    The things you've tried are all good ideas, although you should be attempting to connect to tcp/4353, not 6699 (it arrives on 4353, and based on the SNI value in the certificate, it is translated internally to 6699, and visa versa)

    One of the more common causes for CMI dying in those older versions of software was running out of RAM (which caused inactive TCP connections, including the CMI channel, to be killed off) due to leaks or overcommittment. Does the problem go away if you reboot both LTMs ? Do you see many occurrances of the message "sweeper_update: aggressive mode activated" in the ltm log ?

    I've also seen expired device certificates cause this sort of problem - did the LTMs have NTP time sync with an external clock when the device certificates were first generated ? (check the expiry under system / device certificates / device certificate)

    If that doesn't help, try turning on debugging and see if that sheds any light on the matter:

    tmsh modify sys db log.ssl.level value Debug
    tmsh modify sys db log.tmm.level value Debug
    

    Set them both back to value 'notice' after collecting the data, and check both /var/log/ltm and /var/log/tmm for messages.

    You could also run "tcpdump -i0.0:nnn -s0 tcp and port 4353" to see what the tcp reset cause is when the connection is being closed.