Failover Troubleshooting
This document outlines troubleshooting steps specific to Multi-WAN Failover mode, where one WAN interface should take over only when the primary connection fails. In a Failover, traffic flows primarily through a preferred WAN link and only switches to a backup link when the primary fails. This guide helps ensure your failover settings are correctly configured and functioning.
Summary Table for Common Issues and Fixes
| Issue | Possible Cause | Recommended Action |
|---|---|---|
| All traffic on one WAN | Incorrect weight/metric or disabled interface | Check and correct weights and enable all intended interfaces |
| Failover not happening | mwan3 service stopped or misconfigured | Start/restart mwan3 service; verify config files |
| Ping fails on specific WAN | WAN connection problem or tracking IP unreachable | Check physical connection and ISP status; verify track IPs |
| Traceroute shows single path | Routing or policy misconfiguration | Check mwan3 policies and rules; ensure failover is enabled |
| Logs show errors related to mwan3 | Config errors or interface issues | Review and fix errors from logs |
Troubleshooting Steps
- Cloud
- UCI
- Run-Time
- Testing
- Log
Cloud Configuration Verification
Verify Current Load Balancing Configuration:
Log into the CE device and gain root access:
sudo su -
Run the following command to display the last applied multi-WAN configuration, ensuring that the mode is set to Failover.
cat /tmp/last_config_response.json | jq .multiWanV2
The given one is just an example output; when this command is run, it will show something like this.
Example Response
{
"enable": true,
"mode": "FAIL_OVER",
"notificationEmails": [
"apex_connect.ltd1@gmail.com"
],
"wanInterfaces": null,
"wanInterfacesConfig": {
"pppoe0": {
"interfaceName": "pppoe0",
"targetIps": [
"8.8.8.8",
"4.2.2.2"
],
"failureInterval": 5,
"recoveryInterval": 5,
"pingInterval": 5,
"pingTimeout": 2,
"multiWANMetric": 3,
"multiWANWeight": 2,
"enable": false
},
"eth0": {
"interfaceName": "eth0",
"targetIps": [
"8.8.8.8",
"4.2.2.2"
],
"failureInterval": 5,
"recoveryInterval": 5,
"pingInterval": 5,
"pingTimeout": 2,
"multiWANMetric": 1,
"multiWANWeight": 2,
"enable": true
},
"wlm0": {
"interfaceName": "wlm0",
"targetIps": [
"8.8.8.8",
"4.2.2.2"
],
"failureInterval": 5,
"recoveryInterval": 5,
"pingInterval": 5,
"pingTimeout": 2,
"multiWANMetric": 2,
"multiWANWeight": 2,
"enable": false
}
}
}
Please carefully check that the mode is set to failover. In a failover setup, the multiwan Metric values assigned to each WAN interface define their priority: a lower metric typically indicates a higher priority (primary WAN), while higher metrics indicate secondary or tertiary WANs. Ensure these metrics are correctly configured to establish the desired failover order. Also, confirm that all intended WAN interfaces have enable: true.
Q:1 How to check Failover multi-WAN configuration?
Q:2 What is Failover mode?
Q:3 How to check if a WAN interface is enabled?
UCI Configuration verification
Check network config:
The network configuration file also plays a role:
cat /etc/config/network
The given one is just an example output; when this command is run, it will show something like this.
Example Response
config interface 'loopback'
option device 'lo'
option proto 'static'
option ipaddr '127.0.0.1'
option netmask '255.0.0.0'
config globals
option packet_steering '1'
config interface 'eth0'
option device 'eth0'
option proto 'dhcp'
option metric '1'
option ip4table '1'
option peerdns '0'
option default_wan '1'
list dns '8.8.8.8'
list dns '4.2.2.2'
option disabled '0'
option mtu '1500'
config interface 'eth2'
option device 'eth2'
option disabled '0'
option mtu '1500'
option proto 'static'
option ipaddr '172.1.30.3'
option netmask '255.255.255.0'
config rule
option priority '901'
option lookup 'main'
config interface 'eth1'
option disabled '0'
option device 'eth1'
option proto 'dhcp'
option metric '2'
option ip4table '2'
option mtu '1500'
config interface 'wlm0'
option disabled '1'
option proto '3g'
option pppname 'wlm0'
option device 'ttyUSB0'
option apn 'comgt'
option ipv6 '0'
option delegate '0'
option metric '3'
option ip4table '3'
config route '4f8253ad3b144cfca9f81e3223664117'
option target '172.1.30.3'
option netmask '255.255.255.0'
option gateway '172.1.30.1'
option metric '0'
option proto 'static'
option interface 'eth2'
This file defines the network interfaces and their associated settings, which are crucial for how multiwan will function. Ensure that all interfaces intended for use in the failover configuration are correctly defined and enabled here.
Check mwan3 config:
The details of the mwan3 configuration can be checked in the following files.
cat/etc/config/mwan3
The given one is just an example output; when this command is run, it will show something like this.
Example Response
config globals 'globals'
option mmx_mask '0x3F00'
option enabled '1'
option local_source 'lan'
option mode 'FAIL_OVER'
list notificationEmails 'apex_connect.ltd1@gmail.com'
config rule 'DEFAULT_HTTPS'
option family 'ipv4'
option sticky '1'
option proto 'tcp'
option dest_ip '0.0.0.0/0'
option dest_port '443'
option use_policy 'FAIL_OVER'
config rule 'DEFAULT_ANY'
option family 'ipv4'
option dest_ip '0.0.0.0/0'
option use_policy 'FAIL_OVER'
config interface 'eth0'
option enabled '1'
list track_ip '8.8.8.8'
list track_ip '4.2.2.2'
option interval '5'
option timeout '2'
option failure_interval '5'
option recovery_interval '5'
option down '1'
option up '3'
option initial_state 'offline'
option track_method 'ping'
option reliability '1'
option count '1'
option size '56'
option max_ttl '60'
option check_quality '0'
config member 'eth0_m1_w1'
option interface 'eth0'
option metric '1'
option weight '1'
config policy 'FAIL_OVER'
list use_member 'eth0_m1_w1'
option last_resort 'default'
config interface 'wlm0'
option enabled '0'
list track_ip '8.8.8.8'
list track_ip '4.2.2.2'
option interval '5'
option timeout '2'
option failure_interval '5'
option recovery_interval '5'
option down '1'
option up '3'
option initial_state 'offline'
option track_method 'ping'
option reliability '1'
option count '1'
option size '56'
option max_ttl '60'
option check_quality '0'
mwan3 Command-Line Interface
The mwan3 command provides additional information and control:
uci show mwan3
The given one is just an example output; when this command is run, it will show something like this.
Example Response
mwan3.globals=globals
mwan3.globals.mmx_mask='0x3F00'
mwan3.globals.enabled='1'
mwan3.globals.local_source='lan'
mwan3.globals.mode='FAIL_OVER'
mwan3.globals.notificationEmails='apex_connect.ltd1@gmail.com'
mwan3.DEFAULT_HTTPS=rule
mwan3.DEFAULT_HTTPS.family='ipv4'
mwan3.DEFAULT_HTTPS.sticky='1'
mwan3.DEFAULT_HTTPS.proto='tcp'
mwan3.DEFAULT_HTTPS.dest_ip='0.0.0.0/0'
mwan3.DEFAULT_HTTPS.dest_port='443'
mwan3.DEFAULT_HTTPS.use_policy='FAIL_OVER'
mwan3.DEFAULT_ANY=rule
mwan3.DEFAULT_ANY.family='ipv4'
mwan3.DEFAULT_ANY.dest_ip='0.0.0.0/0'
mwan3.DEFAULT_ANY.use_policy='FAIL_OVER'
mwan3.eth0=interface
mwan3.eth0.enabled='1'
mwan3.eth0.track_ip='8.8.8.8' '4.2.2.2'
mwan3.eth0.interval='5'
mwan3.eth0.timeout='2'
mwan3.eth0.failure_interval='5'
mwan3.eth0.recovery_interval='5'
mwan3.eth0.down='1'
mwan3.eth0.up='3'
mwan3.eth0.initial_state='offline'
mwan3.eth0.track_method='ping'
mwan3.eth0.reliability='1'
mwan3.eth0.count='1'
mwan3.eth0.size='56'
mwan3.eth0.max_ttl='60'
mwan3.eth0.check_quality='0'
mwan3.eth0_m1_w1=member
mwan3.eth0_m1_w1.interface='eth0'
mwan3.eth0_m1_w1.metric='1'
mwan3.eth0_m1_w1.weight='1'
mwan3.FAIL_OVER=policy
mwan3.FAIL_OVER.use_member='eth0_m1_w1'
mwan3.FAIL_OVER.last_resort='default'
mwan3.wlm0=interface
mwan3.wlm0.enabled='0'
mwan3.wlm0.track_ip='8.8.8.8' '4.2.2.2'
mwan3.wlm0.interval='5'
mwan3.wlm0.timeout='2'
mwan3.wlm0.failure_interval='5'
mwan3.wlm0.recovery_interval='5'
mwan3.wlm0.down='1'
mwan3.wlm0.up='3'
mwan3.wlm0.initial_state='offline'
mwan3.wlm0.track_method='ping'
mwan3.wlm0.reliability='1'
mwan3.wlm0.count='1'
mwan3.wlm0.size='56'
mwan3.wlm0.max_ttl='60'
mwan3.wlm0.check_quality='0'
This command with no arguments shows helpful information about the mwan3 command and all of its available subcommands.
Q:1 How to check network config?
Q:2 What does the mwan3 command-line interface show?
Run time Configuration Verification
Check the current mwan3 status:
mwan3 status
The given one is just an example output; when this command is run, it will show something like this.
Example Response:
Interface status:
interface eth0 is unknown and tracking is down (15)
interface wlm0 is unknown and tracking is down (31)
Current ipv4 policies:
Current ipv6 policies:
Directly connected ipv4 networks:
Directly connected ipv6 networks:
Active ipv4 user rules:
Active ipv6 user rules:
This provides a detailed overview of the mwan3 service's status, including interface states, connection statuses, and routing information. It can be more informative than the basic service status check.
mwan3 Service Status and Control
Check the status of the mwan3 service:
/etc/init.d/mwan3 status
The given one is just an example output; when this command is run, it will show something like this.
Example Response
running
This will output the status of the mwan3 service, whether it is running, and any details that may have been encountered. Look for error or failure indications.
If the service is not running, start it:
/etc/init.d/mwan3 start
The given one is just an example output; when this command is run, it will show something like this.
Example Response:
running
This command starts the mwan3 service, enabling monitoring of WAN connections and execution of failover based on the configured settings
For troubleshooting, you can stop and restart the service:
/etc/init.d/mwan3 stop
/etc/init.d/mwan3 restart
The given one is just an example output; when this command is run, it will show something like this.
Example Response
running
- Restart
running
Stopping and restarting the mwan3 service sometimes solves transient issues or applies configuration changes.
Q:1 How do we check if the mwan3 service is running or not?
Q:2 How to use the information found in the mwan3 status command?
Testing Verification
Validate WAN Selection with Traceroute:
Utilize the traceroute command to identify the actual path traffic takes to a destination. This helps verify which WAN interface is currently being used.
traceroute -n x.x.x.x
In a failover setup, the traceroute output should consistently show traffic exiting via the primary WAN interface as long as it's online and functional. If the primary WAN fails, a subsequent traceroute should show traffic exiting via the next available backup WAN. If it always shows only one WAN even when the primary is down, it indicates a problem in the mwan3 failover rules or routing configuration.
- When the primary WAN is up, traffic should go through it (e.g., eth0).
- When the primary WAN is down, traffic should shift to the backup WAN (e.g., eth1).
To test specifically via a particular WAN interface, use:
traceroute -i eth0 google.com
traceroute -i eth1 google.com
Check Email Notification
In the Failover system, the email ID you have configured (example: apex_connect.ltd1@gmail.com) should receive a notification:
-
When the primary WAN comes online (up/active)
-
When the primary WAN goes offline (fails/down)
The email should inform you whether a WAN interface is up or has failed.
Ping via specific WAN:
- Test connectivity through a specific interface using ping:
wan3 use eth0 ping -4 google.com
The given one is just an example output; when this command is run, it will show something like this.
Example Response
could not find family for eth0. Using ipv4.
Running 'ping -4 google.com' with DEVICE=eth0 SRCIP=192.168.1.4 FWMARK=0x3f00 FAMILY=ipv4
PING google.com (142.250.76.206): 56 data bytes
64 bytes from 142.250.76.206: seq=0 ttl=60 time=14.012 ms
64 bytes from 142.250.76.206: seq=1 ttl=60 time=13.390 ms
64 bytes from 142.250.76.206: seq=2 ttl=60 time=14.927 ms
64 bytes from 142.250.76.206: seq=3 ttl=60 time=14.979 ms
64 bytes from 142.250.76.206: seq=4 ttl=60 time=12.831 ms
64 bytes from 142.250.76.206: seq=5 ttl=60 time=14.128 ms
--- google.com ping statistics ---
6 packets transmitted, 6 packets received, 0% packet loss
round-trip min/avg/max = 12.831/14.044/14.979 ms
This command forces the ping to use eth0 and isolates WAN link connectivity issues. Examine the output for errors or unexpected behavior. If a ping fails on an interface, even if all the configurations appear correct, there may be a WAN connection issue that is preventing proper failover.
Q:1 Why is traceroute used for WAN selection verification?
Q:2 How to ping connectivity through WAN interface?
Log Verification
Checking logs can help you diagnose specific issues, such as failed authentication attempts or service errors.
System Log Inspection:
The system logs are always a good source of information in the operation of the Multi-WAN service. Errors related to interface configuration or routing issues are some of the problems that could be affecting the failover by reviewing the logs.
Check the system logs for errors related to Mwan3
logread -e mwan3
Investigate the log output for error messages, warnings, or unusual activity. Such logs can highlight specific isues with failover detection or switching.