High Availablity Troubleshooting
This document addressed the troubleshooting steps about high availability HA issues. We will detail the common symptoms, overview the system, and describe the commands that have been used for diagnostics.
Troubleshooting steps
Access the CE Terminal
Log in to the CE (Customer Edge) terminal and gain superuser privileges:
sudo su -
Examine Keepalived Configuration
uci show keepalived
Example Response
- Master node:-
root@Demo_CE:~# uci show keepalived
keepalived.globals=globals
keepalived.globals.router_id='Master_node'
keepalived.@ipaddress[0]=ipaddress
keepalived.@ipaddress[0].name='eth0'
keepalived.@ipaddress[0].address='172.20.10.7'
keepalived.@ipaddress[0].device='eth0'
keepalived.@ipaddress[0].label_suffix='ha'
keepalived.@ipaddress[0].scope='link'
keepalived.@ipaddress[1]=ipaddress
keepalived.@ipaddress[1].name='eth3'
keepalived.@ipaddress[1].address='172.30.1.254'
keepalived.@ipaddress[1].device='eth3'
keepalived.@ipaddress[1].label_suffix='ha'
keepalived.@ipaddress[1].scope='link'
keepalived.@track_interface[0]=track_interface
keepalived.@track_interface[0].name='eth0_ha'
keepalived.@track_interface[0].value='eth0'
keepalived.@track_interface[0].weight='100'
keepalived.@track_interface[1]=track_interface
keepalived.@track_interface[1].name='eth3_ha'
keepalived.@track_interface[1].value='eth3'
keepalived.@track_interface[1].weight='100'
keepalived.@peer[0]=peer
keepalived.@peer[0].name='Backup_node'
keepalived.@peer[0].address='100.100.100.2'
keepalived.@vrrp_instance[0]=vrrp_instance
keepalived.@vrrp_instance[0].name='Master'
keepalived.@vrrp_instance[0].state='MASTER'
keepalived.@vrrp_instance[0].interface='eth1'
keepalived.@vrrp_instance[0].virtual_router_id='100'
keepalived.@vrrp_instance[0].priority='100'
keepalived.@vrrp_instance[0].advert_int='1'
keepalived.@vrrp_instance[0].nopreempt='0'
keepalived.@vrrp_instance[0].virtual_ipaddress='eth0' 'eth3'
keepalived.@vrrp_instance[0].unicast_src_ip='100.100.100.1'
keepalived.@vrrp_instance[0].unicast_peer='Backup_node'
keepalived.@vrrp_instance[0].auth_type='PASS'
keepalived.@vrrp_instance[0].auth_pass='admin'
keepalived.@vrrp_instance[0].track_interface='eth0_ha' 'eth3_ha'
keepalived.@vrrp_instance[0].garp_master_delay='1'
keepalived.@vrrp_instance[0].garp_master_repeat='1'
keepalived.@vrrp_instance[0].garp_master_refresh='1'
keepalived.@vrrp_instance[0].garp_master_refresh_repeat='1'
- Backup Node:-
root@Backup_node:~# uci show keepalived
keepalived.globals=globals
keepalived.globals.router_id='Backup_node'
keepalived.@ipaddress[0]=ipaddress
keepalived.@ipaddress[0].name='eth0'
keepalived.@ipaddress[0].address='172.20.10.7'
keepalived.@ipaddress[0].device='eth0'
keepalived.@ipaddress[0].label_suffix='ha'
keepalived.@ipaddress[0].scope='link'
keepalived.@ipaddress[1]=ipaddress
keepalived.@ipaddress[1].name='eth3'
keepalived.@ipaddress[1].address='172.30.1.254'
keepalived.@ipaddress[1].device='eth3'
keepalived.@ipaddress[1].label_suffix='ha'
keepalived.@ipaddress[1].scope='link'
keepalived.@track_interface[0]=track_interface
keepalived.@track_interface[0].name='eth0_ha'
keepalived.@track_interface[0].value='eth0'
keepalived.@track_interface[0].weight='100'
keepalived.@track_interface[1]=track_interface
keepalived.@track_interface[1].name='eth3_ha'
keepalived.@track_interface[1].value='eth3'
keepalived.@track_interface[1].weight='100'
keepalived.@peer[0]=peer
keepalived.@peer[0].name='Master_node'
keepalived.@peer[0].address='100.100.100.1'
keepalived.@vrrp_instance[0]=vrrp_instance
keepalived.@vrrp_instance[0].interface='eth0'
keepalived.@vrrp_instance[0].advert_int='1'
keepalived.@vrrp_instance[0].nopreempt='0'
keepalived.@vrrp_instance[0].virtual_ipaddress='eth0' 'eth3'
keepalived.@vrrp_instance[0].auth_type='PASS'
keepalived.@vrrp_instance[0].garp_master_delay='1'
keepalived.@vrrp_instance[0].garp_master_repeat='1'
keepalived.@vrrp_instance[0].garp_master_refresh='1'
keepalived.@vrrp_instance[0].garp_master_refresh_repeat='1'
keepalived.@vrrp_instance[0].unicast_src_ip='100.100.100.2'
keepalived.@vrrp_instance[0].unicast_peer='Master_node'
keepalived.@vrrp_instance[0].auth_pass='admin'
keepalived.@vrrp_instance[0].track_interface='eth0_ha' 'eth3_ha'
keepalived.@vrrp_instance[0].virtual_router_id='100'
keepalived.@vrrp_instance[0].priority='50'
keepalived.@vrrp_instance[0].name='Backup'
keepalived.@vrrp_instance[0].state='BACKUP'
This command is used to display the active status of Keepalived. Examine this output carefully for virtual IP addresses and interfaces as well as other configurations. Be careful to take note of the priority for each node as they determine the order in which nodes will fail.
Check Keepalived Status
/etc/init.d/keepalive status
Example Response
- both node:
root@Backup_node:~# /etc/init.d/keepalive status
running
This command will check the status of the Keepalived service. It will indicate if Keepalived is running and, if it is, what node is master at present. Look for error messages in the output.
Review Network Configuration
cat /etc/config/network
Example Response
- Master Node:-
root@Demo_CE:~# cat /etc/config/network
config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals
option packet_steering '1'
config interface 'eth0'
option device 'eth0'
option default_wan '1'
option disabled '0'
option proto 'static'
option ipaddr '172.20.10.6'
option netmask '255.255.255.0'
list dns '172.20.10.1'
config interface 'eth3'
option device 'eth3'
option proto 'static'
option netmask '255.255.255.0'
option disabled '0'
option ipaddr '172.30.1.2'
config rule
option priority '901'
option lookup 'main'
config interface 'wlm0'
option disabled '1'
option proto '3g'
option pppname 'wlm0'
option device 'ttyUSB0'
option apn 'comgt'
option ipv6 '0'
option delegate '0'
option metric '2'
option ip4table '2'
config route 'f85c71f21c3040bdb4abcd168fa8e900'
option target '172.30.2.0'
option netmask '255.255.255.0'
option gateway '172.31.0.2'
option table 'main'
option proto 'static'
option metric '1'
option interface 'br25'
config route '1777530465de4eafada07376f1239abf'
option target '172.30.1.0'
option netmask '255.255.255.0'
option gateway '172.31.0.1'
option table 'main'
option proto 'static'
option metric '1'
config interface 'eth1'
option proto 'static'
option device 'eth1'
option ipaddr '100.100.100.1'
option netmask '255.255.255.0'
list dns '172.20.10.1'
- Backup Node:-
root@Backup_node:~# cat /etc/config/network
config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals
option packet_steering '1'
config interface 'eth0'
option device 'eth0'
option default_wan '1'
option disabled '0'
option proto 'static'
option ipaddr '172.20.10.8'
option netmask '255.255.255.0'
list dns '172.20.10.1'
config interface 'eth3'
option device 'eth3'
option proto 'static'
option netmask '255.255.255.0'
option disabled '0'
option ipaddr '172.30.1.1'
config rule
option priority '901'
option lookup 'main'
config interface 'wlm0'
option disabled '1'
option proto '3g'
option pppname 'wlm0'
option device 'ttyUSB0'
option apn 'comgt'
option ipv6 '0'
option delegate '0'
option metric '2'
option ip4table '2'
config route 'f85c71f21c3040bdb4abcd168fa8e900'
option target '172.30.2.0'
option netmask '255.255.255.0'
option gateway '172.31.0.2'
option table 'main'
option proto 'static'
option metric '1'
option interface 'br25'
config route '1777530465de4eafada07376f1239abf'
option target '172.30.1.0'
option netmask '255.255.255.0'
option gateway '172.31.0.1'
option table 'main'
option proto 'static'
option metric '1'
config interface 'eth1'
option proto 'static'
option device 'eth1'
option ipaddr '100.100.100.2'
option netmask '255.255.255.0'
list dns '172.20.10.1'
This command will show the network configuration. Check that the interface, IP address, and other network settings are properly configured on both nodes. Also, check that the virtual IP address is on the same subnet as the real IP address of the nodes.
Check Firewall Rules
uci show firewall
Example Response
root@Backup_node:/etc/config# uci show firewall
firewall.@defaults[0]=defaults
firewall.@defaults[0].syn_flood='1'
firewall.@defaults[0].input='ACCEPT'
firewall.@defaults[0].output='ACCEPT'
firewall.@defaults[0].forward='REJECT'
firewall.@zone[0]=zone
firewall.@zone[0].name='lan'
firewall.@zone[0].network='eth3'
firewall.@zone[0].input='ACCEPT'
firewall.@zone[0].output='ACCEPT'
firewall.@zone[0].forward='ACCEPT'
firewall.@zone[1]=zone
firewall.@zone[1].name='wan'
firewall.@zone[1].network='eth0' 'wlm0'
firewall.@zone[1].input='REJECT'
firewall.@zone[1].output='ACCEPT'
firewall.@zone[1].forward='REJECT'
firewall.@zone[1].masq='1'
firewall.@zone[1].mtu_fix='1'
firewall.@forwarding[0]=forwarding
firewall.@forwarding[0].src='lan'
firewall.@forwarding[0].dest='wan'
firewall.@rule[0]=rule
firewall.@rule[0].name='Allow-DHCP-Renew'
firewall.@rule[0].src='wan'
firewall.@rule[0].proto='udp'
firewall.@rule[0].dest_port='68'
firewall.@rule[0].target='ACCEPT'
firewall.@rule[0].family='ipv4'
firewall.@rule[1]=rule
firewall.@rule[1].name='Allow-Ping'
firewall.@rule[1].src='wan'
firewall.@rule[1].proto='icmp'
firewall.@rule[1].icmp_type='echo-request'
firewall.@rule[1].family='ipv4'
firewall.@rule[1].target='ACCEPT'
firewall.@rule[2]=rule
firewall.@rule[2].name='Allow-IPSec-ESP'
firewall.@rule[2].src='wan'
firewall.@rule[2].proto='esp'
firewall.@rule[2].target='ACCEPT'
firewall.@rule[3]=rule
firewall.@rule[3].name='Allow-ISAKMP'
firewall.@rule[3].src='wan'
firewall.@rule[3].dest_port='500'
firewall.@rule[3].proto='udp'
firewall.@rule[3].target='ACCEPT'
firewall.@include[0]=include
firewall.@include[0].path='/etc/firewall.user'
firewall.libreswan=include
firewall.libreswan.path='/etc/libreswan_firewall.sh'
firewall.libreswan.reload='1'
firewall.@rule[4]=rule
firewall.@rule[4].name='Allow-SSH'
firewall.@rule[4].src='*'
firewall.@rule[4].proto='tcp'
firewall.@rule[4].dest_port='25321'
firewall.@rule[4].target='ACCEPT'
firewall.@rule[5]=rule
firewall.@rule[5].name='Allow-HTTPS'
firewall.@rule[5].src='*'
firewall.@rule[5].proto='tcp'
firewall.@rule[5].dest_port='443'
firewall.@rule[5].target='ACCEPT'
firewall.@rule[6]=rule
firewall.@rule[6].name='Allow-BGP'
firewall.@rule[6].src='*'
firewall.@rule[6].proto='tcp'
firewall.@rule[6].dest_port='179'
firewall.@rule[6].target='ACCEPT'
firewall.@rule[7]=rule
firewall.@rule[7].name='Allow-IPSEC-NAT'
firewall.@rule[7].src='*'
firewall.@rule[7].proto='udp'
firewall.@rule[7].dest_port='4500'
firewall.@rule[7].target='ACCEPT'
firewall.@rule[8]=rule
firewall.@rule[8].name='Allow-VXLAN'
firewall.@rule[8].src='*'
firewall.@rule[8].proto='udp'
firewall.@rule[8].dest_port='4789'
firewall.@rule[8].target='ACCEPT'
firewall.TO_CN=ipset
firewall.TO_CN.name='TO_CN'
firewall.TO_CN.match='dst_net'
firewall.TO_CN.storage='hash'
firewall.TO_CN.enabled='1'
firewall.TO_CN.loadfile='/usr/local/share/ipsets/CN.txt'
firewall.FROM_CN=ipset
firewall.FROM_CN.name='FROM_CN'
firewall.FROM_CN.match='src_net'
firewall.FROM_CN.storage='hash'
firewall.FROM_CN.enabled='1'
firewall.FROM_CN.loadfile='/usr/local/share/ipsets/CN.txt'
firewall.TO_IN=ipset
firewall.TO_IN.name='TO_IN'
firewall.TO_IN.match='dst_net'
firewall.TO_IN.storage='hash'
firewall.TO_IN.enabled='1'
firewall.TO_IN.loadfile='/usr/local/share/ipsets/IN.txt'
firewall.FROM_IN=ipset
firewall.FROM_IN.name='FROM_IN'
firewall.FROM_IN.match='src_net'
firewall.FROM_IN.storage='hash'
firewall.FROM_IN.enabled='1'
firewall.FROM_IN.loadfile='/usr/local/share/ipsets/IN.txt'
firewall.TO_RFC1918=ipset
firewall.TO_RFC1918.name='TO_RFC1918'
firewall.TO_RFC1918.match='dst_net'
firewall.TO_RFC1918.storage='hash'
firewall.TO_RFC1918.enabled='1'
firewall.TO_RFC1918.loadfile='/usr/local/share/ipsets/RFC1918.txt'
firewall.FROM_RFC1918=ipset
firewall.FROM_RFC1918.name='FROM_RFC1918'
firewall.FROM_RFC1918.match='src_net'
firewall.FROM_RFC1918.storage='hash'
firewall.FROM_RFC1918.enabled='1'
firewall.FROM_RFC1918.loadfile='/usr/local/share/ipsets/RFC1918.txt'
firewall.CGW_ALLOWED_IPADDRESS=ipset
firewall.CGW_ALLOWED_IPADDRESS.name='CGW_ALLOWED_IPADDRESS'
firewall.CGW_ALLOWED_IPADDRESS.match='dst_net'
firewall.CGW_ALLOWED_IPADDRESS.storage='hash'
firewall.CGW_ALLOWED_IPADDRESS.enabled='1'
firewall.CGW_BLOCKED_IPADDRESS=ipset
firewall.CGW_BLOCKED_IPADDRESS.name='CGW_BLOCKED_IPADDRESS'
firewall.CGW_BLOCKED_IPADDRESS.match='dst_net'
firewall.CGW_BLOCKED_IPADDRESS.storage='hash'
firewall.CGW_BLOCKED_IPADDRESS.enabled='1'
firewall.CGW_ALLOWED_IPSUBNETS=ipset
firewall.CGW_ALLOWED_IPSUBNETS.name='CGW_ALLOWED_IPSUBNETS'
firewall.CGW_ALLOWED_IPSUBNETS.match='dst_net'
firewall.CGW_ALLOWED_IPSUBNETS.storage='hash'
firewall.CGW_ALLOWED_IPSUBNETS.enabled='1'
firewall.CGW_BLOCKED_IPSUBNETS=ipset
firewall.CGW_BLOCKED_IPSUBNETS.name='CGW_BLOCKED_IPSUBNETS'
firewall.CGW_BLOCKED_IPSUBNETS.match='dst_net'
firewall.CGW_BLOCKED_IPSUBNETS.storage='hash'
firewall.CGW_BLOCKED_IPSUBNETS.enabled='1'
firewall.CGW_ALLOWED_DOMAIN=ipset
firewall.CGW_ALLOWED_DOMAIN.name='CGW_ALLOWED_DOMAIN'
firewall.CGW_ALLOWED_DOMAIN.match='dst_net'
firewall.CGW_ALLOWED_DOMAIN.storage='hash'
firewall.CGW_ALLOWED_DOMAIN.enabled='1'
firewall.CGW_BLOCKED_DOMAIN=ipset
firewall.CGW_BLOCKED_DOMAIN.name='CGW_BLOCKED_DOMAIN'
firewall.CGW_BLOCKED_DOMAIN.match='dst_net'
firewall.CGW_BLOCKED_DOMAIN.storage='hash'
firewall.CGW_BLOCKED_DOMAIN.enabled='1'
firewall.CGW_ALLOWED=ipset
firewall.CGW_ALLOWED.name='CGW_ALLOWED'
firewall.CGW_ALLOWED.match='dst_set'
firewall.CGW_ALLOWED.storage='list'
firewall.CGW_ALLOWED.enabled='1'
firewall.CGW_ALLOWED.entry='CGW_ALLOWED_IPADDRESS' 'CGW_ALLOWED_IPSUBNETS' 'CGW_ALLOWED_DOMAIN'
firewall.CGW_BLOCKED=ipset
firewall.CGW_BLOCKED.name='CGW_BLOCKED'
firewall.CGW_BLOCKED.match='dst_set'
firewall.CGW_BLOCKED.storage='list'
firewall.CGW_BLOCKED.enabled='1'
firewall.CGW_BLOCKED.entry='CGW_BLOCKED_IPADDRESS' 'CGW_BLOCKED_IPSUBNETS' 'CGW_BLOCKED_DOMAIN'
firewall.snat_cgw_range=include
firewall.snat_cgw_range.path='/etc/firewall.snat_range_cgw-iptables'
firewall.snat_cgw_range.reload='1'
firewall.ss_rules=include
firewall.ss_rules.path='/etc/firewall.ss-rules-iptables'
firewall.ss_rules.reload='1'
firewall.Allow_DNS_From=ipset
firewall.Allow_DNS_From.name='Allow_DNS_From'
firewall.Allow_DNS_From.match='src_net'
firewall.Allow_DNS_From.storage='hash'
firewall.Allow_DNS_From.entry='0.0.0.0/0'
firewall.Allow_DNS=rule
firewall.Allow_DNS.name='Allow_DNS'
firewall.Allow_DNS.src='*'
firewall.Allow_DNS.proto='udp'
firewall.Allow_DNS.dest_port='53'
firewall.Allow_DNS.target='ACCEPT'
This command shows the firewall rules. Make sure that the firewall is not blocking out necessary traffic, such as VRRP traffic used by Keepalived or traffic to the virtual IP address.
Examine System Logs
logread
This command shows the system logs. Scrutinize the logs to see if there are any error messages or warnings related to Keepalived, network interface, or other pertinent services. Log files are typically the most informative about the cause of the problem. Search around the time when the problem happened.