No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Traffic on TE Tunnels Is Uneven Due to a Defect in the Hash Algorithm on LPUF-21 Boards

Publication Date:  2013-09-29 Views:  28 Downloads:  0
Issue Description

As shown in the following topology, NE40E-X8s were deployed at ALXP and DMN-TE sites. These sites were connected using POS links. GE ports were bound to form an Eth-Trunk at each site. The RNC and NodeBs were connected to the sites through VRRP. Full-mesh MP-BGP neighbor relationships were established between the four NE40Es for transmitting VPN routes from the wireless network. There were two TE tunnels from ALXP-1 to DMN-TE-1. One was ALXP-1 -> DMN-TE-1, and the other was ALXP-1 -> ALXP-2 -> DMN-TE-2 -> DMN-TE-1. Load sharing was implemented on the two TE tunnels. Similarly, there were two TE tunnels from ALXP-1 to DMN-TE-2, and load sharing was implemented on the two TE tunnels. The other three routers each also have two TE tunnels destined for the peer site, with load sharing implemented.

CID:/icase/servlet/download?dlType=HtmlAreaImage&imageId=2815

 

A tunnel policy bound multiple TE tunnels to implement VPN traffic sharing on these TE tunnels. The traffic from the RNC to the ALXP sites were shared by the POS links between AXLP-1 and DMN-TE-1 and between AXLP-2 and DMN-TE-2 before the traffic reached the DMN-TE sites. However, the customer complained that the traffic on the POS link between ALXP-1 and DMN-TE-1 was twice as much as the traffic on the POS link between ALXP-2 and DMN-TE-2.
Handling Process

Huawei performed the following operations to diagnose the fault:

1. Analyzed the traffic model on the POS links. Run the display interface tunnel x/x/x command to check traffic distribution on the TE tunnels because traffic carried by the network was Layer 3 VPN traffic and outer labels were TE tunnel labels.

CID:/icase/servlet/download?dlType=HtmlAreaImage&imageId=2817

 

As shown in the preceding figure, the traffic on the POS1/2/1 link between ALXP-1 and DMN-TE-1 was 60 Mbit/s (43 + 0 + 17 + 0 = 60), while the traffic on the POS1/2/1 link between ALXP-2 and DMN-TE-2 was 33 Mbit/s (18 + 0 + 15 + 0 = 33). The former was almost twice as much as the latter. Traffic carried by tunnel0/0/500 and tunnel0/0/501 on ALXP-1 varied greatly, which was 43 Mbit/s and 18 Mbit/s respectively. It could be concluded that the hash algorithm on the NE40E at ALXP-1 was incorrect.

2. Analyzed the traffic from the RNC to ALXP-1.

CID:/icase/servlet/download?dlType=HtmlAreaImage&imageId=2818

 

The active board on the RNC was connected to the NE40E at ALXP-2, but the NE40E functioned as a backup router in the VRRP group. Therefore, after reaching ALXP-2, traffic was transmitted to the NE40E at ALXP-1 through the Layer 2 VLAN channels in the Eth-Trunk. Viewed information about the traffic on the member ports GE8/0/0 and GE1/0/3 in the Eth-Trunk at AXLP-1 and found that most traffic entered AXLP-1 through port GE8/0/0, as shown in the following figure (the NE40Es at the ALXP sites were also connected to several other NE40E sites, which are not drawn in the following figure; therefore, the total traffic carried by the Eth-Trunk was greater than the total traffic on the TE tunnels from the ALXP sites to the DMN-TE sites).

Eth-Trunk0                 up   up        12% 9.73%         0         0

 GigabitEthernet1/0/3     up   down    7.81%   10%         0         0

 GigabitEthernet8/0/0     up   down      18% 9.39%         0         0

It was found by running the display device 8 command that the board on the NE40E at ALXP-2 was an LPUF-21-A board. This board used the 588 chip that had a defect in the hash algorithm. Therefore, the hash algorithm must be optimized to resolve this issue.

3. Captured packets on the routers using the commands provided by R&D engineers to analyze the packets sent from the RNC to the NodeBs, and found that the TTL values carried by packets sent from the RNC to the NE40E at ALXP-1 and then to different NodeBs were different (some were even numbers and some were odd numbers).

[ALXP-RNC-NE40E-01-hidecmd]display pe-probe 8 0 iphp-data

 01480000 00005e00 013f0025 9e9b0007

 8100e03f 080045b8 0021e22b 0000fe11 //TTL value

 4fc70aa1 dc940aa1 eb091422 f6150039 //Source and destination IP addresses

……

[ALXP-RNC-NE40E-01-hidecmd]display pe-probe 8 0 iphp-data

 01880000 00005e00 013f0025 d4630223

 8100032c 080045b8 00258d39 0000fd11

 b8760aa1 115e0a43 7a0e07d2 0400011d

……

[ALXP-RNC-NE40E-01-hidecmd]display pe-probe 8 0 iphp-data

 01580000 00005e00 0128286e d4630223

 81002006 08004528 0053057b 00003f11

 06800aa1 eb090aa1 6f2c4306 13b4003f

……
Root Cause
Traffic from the RNC to NodeBs traversed the LPUF-21 boards on the NE40Es at the ALXP sites. The LPUF-21 boards used the 588 chip. By default, the chip used a 2-tuple hash algorithm based on source and destination IP addresses. However, the hash algorithm did not use all the bits in IP addresses for calculation, but used only the most significant bits. The most significant bits in IP addresses of NodeBs were almost the same. Therefore, the traffic carried by the NE40Es and the traffic transmitted to the POS links was uneven according to the hash algorithm result.
Solution

Huawei ran the load-balance avoid-degradation ipv4 slot 8 command on the LPUF-21 board to make the TTL values participate in the hash algorithm. Then the traffic on the two tunnels at ALXP-1 was as follows: 

<ALXP-RNC-NE40E-01>display interface Tunnel0/0/500 | include rate

   300 seconds output rate 33602392 bits/sec, 37530 packets/sec

12 seconds output rate 35500928 bits/sec, 39650 packets/sec

<ALXP-RNC-NE40E-01>display interface Tunnel0/0/501 | include rate

300 seconds output rate 38856632 bits/sec, 40216 packets/sec

1 seconds output rate 29764184 bits/sec, 30806 packets/sec

According to the preceding information, the traffic on the two tunnel increased because of busy hours, but load sharing was implemented.
Suggestions

1. If load sharing is implemented on routes on the control panel or label forwarding table but traffic is not shared actually, analyze the traffic model and develop a hash algorithm optimization policy.

2. In this case, the RNC is directly corrected to the LPUF-10 board at ALXP-1. The hash algorithm on the LPUF-10 board is better than that on the LPUF-21 board (specifically, source and destination port IDs participate in the algorithm). If traffic traverses the LPUF-10 board at ALXP-1, the issue described in this case will not occur. Therefore, it is recommended that the link from the RNC to the NE40E at ALXP-1 be the active link so that traffic is directly transmitted to the LPUF-10 board instead of traversing the LPUF-21 board.

END