No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Load Balancing over TE Tunnels Fails Because of the Uneven Hash Results on an LPUF-21.

Publication Date:  2013-01-07 Views:  47 Downloads:  0
Issue Description

As shown in the following topology, devices at the ALXP site and at the DMN-TE site are all NE40E-X8s. NE40E-X8s at the ALXP site and NE40E-X8s at the DMN-TE site are connected through POS links. NE40E-X8s at each site are connected through an Eth-Trunk interface that is formed by bundling the GE interfaces together. The RNC accesses the ALXP site and the NodeB accesses the DMN-TE site using the VRRP.

The fully-connected MP-BGP peer relationship is established between the four NE40E-X8s to transmit wireless VPN routes. Two TE tunnels are established on ALXP-1 and are destined for DMN-TE-1. The path on TE 1 is from ALXP-1 to DMN-TE-1. The path on TE 2 is ALXP-1 -> ALXP-2 -> DMN-TE-2 -> DMN-TE-1. Traffic is forwarded through TE 1 and TE 2 in load balancing mode. Similarly, two TE tunnels are established on ALXP-1 and are destined for DMN-TE-2. The two TE tunnels also take part in load balancing. ALXP-2, DMN-TE-2, and DMN-TE-1 all forward traffic to the peer sites by load balancing traffic over the TE tunnels.

 

According to the preceding configurations, VPN traffic is load balanced over the TE tunnels by binding multiple TE tunnels to a tunnel policy. Therefore, the traffic from the RNC to the ALXP site is evenly distributed over the two POS links. However, based on the customer feedback, the traffic volume over the POS link between AXLP-1 and DMN-TE-1 is nearly twice as large as that over the POS link between AXLP-2 and DMN-TE-2.

Handling Process

1.         The on-site Huawei technical support personnel analyze the traffic model over the POS links. Generally, traffic carried on a network is Layer 3 traffic. A packet is forwarded over either TE tunnel based on the encapsulated outer label. Therefore, they view the traffic distribution over the TE tunnels using the display interface tunnel x/x/x command. The traffic distribution is shown as follows.

 

Based on the traffic distribution diagram, the traffic forwarded from ALXP-1 to POS 1/2/1 of DMN-TE-1 is 43 Mbit + 0 Mbit + 17 Mbit + 0 Mbit = 60Mbit. The traffic forwarded from ALXP-2 to POS 1/2/1 of DMN-TE-2 is 18 Mbit + 0 Mbit + 15 Mbit + 0 Mbit = 33 Mbit. The former traffic volume is nearly twice as that of the latter traffic volume, which is consistent with the customer feedback. The traffic volume on tunnel 0/0/500 is 43 Mbit and the traffic volume on tunnel 0/0/501 is 18 Mbit, with a large difference. Therefore, the NE40E-X8 at ALXP-1 hashes the traffic unevenly.

2.         The on-site Huawei technical support personnel analyze the traffic from the RNC to ALXP-1 as follows:

 

The main control board of the RNC is connected to the NE40E on ALXP-2 that is a backup device in the VRRP backup group. After entering ALXP-2, the traffic arrives at the NE40E on ALXP-1 through the Layer 2 VLAN channel on the Eth-Trunk interface. The on-site Huawei technical support personnel view the traffic information on the member interfaces of GE 8/0/0 and GE 1/0/3 of the Eth-Trunk interface. They find that most of the traffic reaches GE 8/0/0. The NE40Es at the AXLP site is also connected to the NE40Es at other sites which are omitted here. Therefore, the total traffic volume on the Eth-Trunk interface is larger than the sum of traffic volume from the ALXP site to the DMN-TE site.

Eth-Trunk0                  up    up         12%  9.73%          0          0

  GigabitEthernet1/0/3      up    down     7.81%    10%          0          0

  GigabitEthernet8/0/0      up    down       18%  9.39%          0          0

They find that the interface is located on the LPUF-21-A using the display device 8 command. Using the 588 chip causes the hash problem as described in the Problem Description. To resolve the problem, the hash algorithm must be optimized to load balance the traffic.

3.         The on-site Huawei technical support personnel capture and analyze the packets that the RNC sends to the NodeBs using the display pe-probe 8 0 iphp-data command. They find that the packets sent to different NodeBs carry different TTL values, with some odd and some even. The command output is as follows.

[ALXP-RNC-NE40E-01-hidecmd]display pe-probe 8 0 iphp-data

  01480000  00005e00  013f0025  9e9b0007

  8100e03f  080045b8  0021e22b  0000fe11  //TTL value

  4fc70aa1  dc940aa1  eb091422  f6150039  //source and destination IP addresses

……

[ALXP-RNC-NE40E-01-hidecmd]display pe-probe 8 0 iphp-data

  01880000  00005e00  013f0025  d4630223

  8100032c  080045b8  00258d39  0000fd11

  b8760aa1  115e0a43  7a0e07d2  0400011d

……

[ALXP-RNC-NE40E-01-hidecmd]display pe-probe 8 0 iphp-data

  01580000  00005e00  0128286e  d4630223

  81002006  08004528  0053057b  00003f11

  06800aa1  eb090aa1  6f2c4306  13b4003f

……

Root Cause

The traffic from the RNC to the NodeBs traverses the LPUF-21s on the NE40Es at the ALXP site. The LPUF-21s use 588 chips. The default hash algorithm with source and destination addresses is used. However, the default hash algorithm does not calculate all bits of an IP address but selects the left-most bits of an IP address for calculation. In this case, the left-most bits of an IP address of each NodeB are the same, causing the hash results to be uneven. As a result, the NE40Es forward the tunnel traffic over the POS links unevenly.

Solution

The TTL value can be hashed by running the load-balance avoid-degradation ipv4 slot 8 command on the LPUF-21. The command output is shown as follows.

<ALXP-RNC-NE40E-01>display interface Tunnel0/0/500 | include rate

    300 seconds output rate 33602392 bits/sec, 37530 packets/sec

12 seconds output rate 35500928 bits/sec, 39650 packets/sec

<ALXP-RNC-NE40E-01>display interface Tunnel0/0/501 | include rate

300 seconds output rate 38856632 bits/sec, 40216 packets/sec

1 seconds output rate 29764184 bits/sec, 30806 packets/sec

The command output shows that the traffic volume over the two tunnels increases but the traffic is load balanced over the two tunnels. Therefore, the problem is resolved.

Suggestions
1. To implement route load balancing or load balancing based on forwarding tables, analyze the traffic model and optimize the hash algorithm. 2. In this case, the RNC is directly connected to the LPUF-10 on the ALXP. The hash algorithm on the LPUF-10(include the source port and destination port) is more optimal than that on the LPUF-21. Therefore, switch the traffic on the RNC to the active link connected to the NE40E on ALXP-1 so that the traffic is sent to the LPUF-10 for processing. This way can prevent traffic forwarding from being unbalanced.

END