No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

NBMA networks tunnel bouncing rootcause research

Publication Date:  2017-02-28 Views:  62 Downloads:  0
Issue Description

Original issue approach

On January 27th, 2017, a Chilean customer reached Huawei TAC regarding multiple tunnel instability for the NBMA to be implemented for final user’s video network.

This network connects the hub to the Internet using a wired connection and the spokes using wireless connections such as 3G and 4G.

On the IPSec Monitoring List it is evidenced that the tunnels had been somehow bouncing, reason why the customer reported instability:


Bases upon customer’s evidence, it can be accepted that the IPSec tunnels have already been set up, as per seen on the IKE Security Associations:


 

Also, the tunnels are passing constant sessions without reflecting possible bounces:

 


A couple of days after this was researched, we connected to the customer’s networks via Team Viewer to provide remote support. As long as we were connected, we noticed no bounce or no possible IKE trap regarding a VPN bouncing or going down.

Theoretical foundation

There are 4 possible causes for a tunnel to bounce once the NHRP neighborship has been established.

1.       NHRP Shortcut.

 

This would apply if a DSVPN technology would be used, since it’s designed to make the smart P2MP networks, then using the information from the configured routing protocol, it will always attempt to find the shortest route for its destination.

To be more precise, what this technology does is:

After spoke A is registered on the hub, the hub will forward the traffic to any other spoke that the spoke A wants to communicate to (Figure 1).


Figure 1 Spoke to Hub to Spoke

Once the hub has forwarded this traffic, it will let spoke A know how to reach the other spoke directly without having to query the hub to reach it, and this will, therefore tear the original tunnel down (Figure 2) in order to converge once again, but now Spoke to Spoke, calling this a Shortcut (Figure 3) Making further communications be able to, henceforth, communicate directly.


Figure 2 Spoke to Hub tear down


Figure 3 Spoke to Spoke smart stable convergence

2.       Configuration modification or internal/external tunnel encapsulation

Another fact that might impact the communication would be human intervention by modifying the stable configurations, and also encryption encapsulations through which the communication might go. For example, if the tunnel is transported through a VPN tunnel formed by any external VPN gateway, if there is any configuration modification on this latter tunnel (or any configuration at all) , it may impact the original tunnel.

3.       WAN network provisioning failure.

 

Now a days, many business networks have redundancy WAN service providers regardless of the WAN network kind (dedicated links or Internet), however, when the service goes down, or bounces, the tunnel for the NHRP neighbor will bounce as well, and that will, either drop the service for the whole branch, or, in case there is a redundancy, it will make it flap (and trigger traps). Configuration modification or internal/external tunnel encapsulation

4.       Non production traffic on the tunnel.

As a normal behavior of the P2MP VPN, it is detected when there is no production traffic passing through the tunnels, then tunnel is torn down. Since the tunnel has no production traffic, there will of course be no impact.


Handling Process

Conclusions and justification.

 

This handing is carried out by discarding every possibility for root cause described on the Theoretical foundation.

1.       NHRP Shortcut.                                 Not reliable root cause.

 

In order for a tunnel to work with NHRP shortcut, the command nhrp shortcut ought to be used (according to the USG6000 series HedEx), for example:


But this command was not used in the whole configuration. Therefore, this is not valid root cause.

 

2.       Configuration modification or internal/external tunnel encapsulation Not reliable root cause.

We were never informed about neither any configuration movement in the devices, nor any external VPN gateways working along the network, therefore this may not be a possible root cause.

3.       WAN network provisioning failure.

This is a possible root cause

 

When packets are meant to be router through a somehow non-existent route, then they are discarded. Taking this into account, if the WAN service is down, then the traffic that matches the policy intended to route to the tunnel will not be able to be forwarded, and will, then, be dropped.

 

A precise evidence for this is that the statistics for the physical interface that forwards for that tunnel, are not reflecting traffic (Figure 4), but the ones for the IPSec statistics does reflect considerable traffic dropped (Figure 5 & 6).


Figure 4 Physical interface not reflecting any kind of drop


Figure 5 IPSec tunnels have had traffic


Figure 6 IPSec tunnels have had plenty of drops

                It is extremely important to take into account that the customer had informed that the way that the spokes are connected to the network is via 3G or 4G. It is possible that these communications are not stable since they the air card is subject to instability as every other wireless technology.

Will be explained in detail in the next section.

 

5.       Non production traffic on the tunnel.

This is a possible root cause

 

Reasons why a 3G or 4G network might be untrustworthy.

 

In order to connect a device (or a whole network) to a 3G/4G network, an air card is required, when this air card works indoors there can be several issues for the external frequency to reach the air card depending on the distance to the RBS, the users saturation of the network, the material of the facilities walls, the electromagnetic devices in the area, and even the weather can cause bad physical effects on the wireless network among which there are: (eg. Reflection, refraction, diffraction, scattering, absorption, etc…).

All of these, may cause the communication to have drops, jitter and latency on the network.

As a result of this, I definitely suggest wired connections for business network to be implemented.

Solution

To concretely migrate all the access networks on the spokes from wireless technologies to wired ones.

Suggestions

To always make sure to rent wired service for WAN networks such as DSL or cable.

END