How to Avoid Redundant WAN SNAFUs. Why a Perfectly Configured Sonicwall WAN Failover may not Work
It can be frustrating… you have redundant Internet links to make sure that you are always connected, and at the worst possible time, your primary ISP goes down and the backup fails to connect. I have see this scenario play out numerous times so in this article I will mention the most common causes of redundant WAN links failing to work.
First, we will assume that you have properly configured a Sonicwall failover and have tested it by disconnecting the primary WAN link and checking that it does in fact fail over to the secondary WAN. Note that this article is for situations where you have configured and tested Sonicwall WAN redundancy successfully during a test exercise, but it failed to work during a live situation.
Ping Responder Probe is not Checked On
One of the most common issues is that a ping responder is not set up. Sonicwall will ping responder.global.sonicwall.com to gauge whether or not the Internet is accessible. If this option is not checked, then Sonicwall will not roll over unless the WAN port shows as disconnected. This means that if your ISP router fails or the WAN cable is unplugged it will failover; but if the issue is with your ISP’s gateway or routing (which is usually the case) Sonicwall will not failover to the secondary WAN because the primary link is still active.
To avoid this type of situation, place a checkmark in the failover setting’s probe responder option as shown above. If the ISP has a routing or gateway problem, the FQDN will be inaccessible and the firewall will switch to the secondary WAN. If you don’t want to use Sonicwall’s responder, you can manually change it by logging in to the firewall’s advanced setting. To do this, log in the the firewall as admin, the change the URL to https://FQDNorIPaddress/diag.html as shown below.
In the internal settings, look for the Network and Failover Load Balancing settings and you will see a dialog box where you can input an IP address to be used as the failover health monitor.
The Secondary ISP is not Working
Another common reason for redundant WAN failure is that at some point the secondary ISP circuit stopped working and nobody noticed!
Oftentimes, secondary WAN ISP circuits stop working due to unpaid bills, changes in configuration, problems in cabling and myriad other reasons. You don’t want to wait until the primary fails to realize that the secondary is not working. The best way to address this issue is to use a program such an OpenNMS to monitor your WAN IPs and alert you when there is a failure. In the graph below, a WAN link has failed to respond and OpenNMS will send an outage alert email to apprise the admin of the situation.
If you do not have a way to monitor your WAN links, you should periodically test your failover by simulating an outage. Unplug your ISP’s modem/router and check that the failover works. Do this at least one per month to avoid the secondary ISP failing and nobody knowing about it.
ISP Failover and Last Mile Redundancy
There are many ISPs but not many last mile providers. A last mile provider owns the link or links from the nearest COLO/CO to the customer’s premise, either via copper, fiber or wireless. In order to avoid running a dizzying array of costly cabling through telephone poles, ISPs often share each other’s last mile infrastructure at a predetermined price. This means that if a car crashes into one of those telephone closets you see on street corners or an excavator cuts a cable, both your ISPs will go down if they share the last mile.
To avoid this situation, check that your primary and secondary WAN links are not using the same provider’s last mile.