The possible causes of the large number of duplicate packets were as follows:
1. TCP packets were discarded on the service layer.
2. The carrier network sent duplicate TCP packets.
Huawei further identified the cause of the issue:
1. Considering that the duplicate ACK packets had the same IP identification filed, suspected that the duplicate packets were generated during the transmission process and the issue was most probably caused by a fault on the carrier network.
2. Checked the traffic transmit and receive paths and found they were different, as indicated by the red and blue lines respectively. It is common that traffic is transmitted and received over different paths on live networks. Therefore, Huawei did not focus the analysis on the aspect.
3. Checked the forwarding entries for the traffic receive path and found that NE40E-1 did not have the MAC forwarding table for the SGSN.
Continued to analyze how the traffic was forwarded on NE40E-1 and NE40E-2:
(1) VLANIF 2020 for NE40E-1 and NE40E-2 and the SGSN belonged to the same network segment, which meant that they were in the same broadcast domain. Therefore, when the return traffic arrived at NE40E-2, the MAC address of the SGSN was encapsulated into the packets based on the ARP entry and then the packets were transmitted to Eth-trunk 2.
(2) When return traffic arrived at NE40E-1, the packets were forwarded at Layer 2 based on the MAC forwarding table.
(3) Checked and found that NE40E-1 did not have the MAC forwarding entry corresponding to the SGSN, which caused the packets to be duplicated and sent to three ports. As a result, a lot of duplicate TCP packets were received on the SGSN.
4. Analyzed why NE40E-1 did not have the MAC forwarding entry.
(1) A MAC forwarding entry was learned based on a Layer 2 packet's source MAC address and ingress port after the packet was sent downstream.
(2) When a packet was transmitted to NE40E-1, its MAC address was terminated and then forwarded at Layer 3. Therefore, the MAC forwarding entry corresponding to the packet cannot be created on NE40E-1.
(3) The SGSN, however, sent an ARP request packet every second for resolution, and the ARP request packet should be broadcast by NE40E-1 to NE40E-2 over Eth-trunk 2. The method could also implement MAC forwarding entry learning, but it failed.
(4) Checked the ARP entry on NE40E-1 and NE40E-2. Found that the ARP entry timestamp was 20 minutes on NE40E-1 and kept decreasing on NE40E-2. Huawei concluded that the ARP request packet was not broadcast by NE40E-1 over Eth-trunk 2.
(5) Performed an Ethernet packet debugging test on NE40E-1 and found that the ARP request packet sent by the SGSN was a unicast packet with its destination MAC address being the virtual MAC address of the VRRP group. The unicast ARP request packet then was terminated at NE40E-1 without being sent out at Layer 2. As a result, NE40E-1 failed to learn the MAC forwarding entry corresponding to the SGSN.