Packet loss in STP ring

Publication Date:  2013-05-27 Views:  204 Downloads:  2
Issue Description
S6506R, VRP3.1 release3132. S3528G, VRP3.1 release 0025p03.
The Customer attempted to inject bidirectional traffic from four SMB interfaces at the 90% bandwidth in two STP rings at the same time ( figure as below ), they found when breaking a link of one STP ring there's a few packet loss in the other STP ring in one direction. 
Alarm Information
N/A
Handling Process
The ultimate reason is because the test vlans are layout across in the two rings. One STP ring will be influenced by the other one which can bring instant broadcast. But if vlans in both rings are different and each STP ring has its own Vlan ,the packet loss won't happen if the link ports in stp rings only trunk the corresponding vlans within the Ring(no overlap). 
Root Cause
1. We inject bidirectional L3 layer traffic flows from four SMB interfaces at the 90% bandwidth (90M) in two STP rings at the same time. The packet flow in RING 1 transmit from the ingress AGG-2 g6/0/48 vlan 502 to the egress SGRO e0/1 vlan 501, the same route if conversed; The packet flow in RING 2 transmit from the ingress AGG-1 g6/0/48 (vlan 501) to the egress CHEW e0/1 (vlan 502), the same route if conversed. All the SMB interfaces are forced to full duplex and 100M speed.
2. When RING 1 and RING 2 receive the bidirectional L3 layer traffic flows exactly, we test by increasing traffic step from 50% to 90% and shut down the link between AGG-1 and CHEW. The STP topology in RING 2 has been changed, consequently the alternate link between CHEW and CROC will change from blocking to forwarding immediately. The new flow route should be chosen as AGG-1―AGG-2―CROC―CHEW; for RING 1 the traffic flow still the same way.
3. Due to the changing of STP topology in RING 2, device will send out STP TC packet to delete all the old mac in network according to standard and ensure the correct mac re-learning on the new flow route. At the moment the injected traffic from SMB 4 will be broadcasted to the whole VLAN 502 broadcast domain. The port AGG-2 g6/0/48 connected to the SMB 1 in RING 1 will receive the instant broadcast (the volume of the broadcast is 90M) from RING 2 because this port belong to vlan 502.
4. Now there are two traffics arrive at the port AGG-2 g6/0/48. One normal traffic comes from SMB 2 (90M) in RING 1, the other comes from SMB 4 - 90M broadcasting traffic in RING 2. The total output traffic ( direction to SMB1) exceeds the max bandwidth of AGG-2 g6/0/48 ( which was set to 100M full duplex), the congestion happens here lead to the packet loss.  that is the reason why the receiving packets have a little loss from SMB 2 to SMB 1. The packet loss issue in RING 2 is because of changing STP topology. the inverted traffic from SMB 1 to SMB 2 in RING 1 transmit into vlan 501 through L3 layer module, it wouldn't be influenced by RING 2. that’s why we only have uni-direction packet loss.
5. if the traffic from SMB is less than 50M, there is no packet loss as the total traffic will not exceed the max 100Mbandwidth.
Suggestions
Huawei devices work properly according to stp protocol, if we replace our AGG-2 with other vendor equipments, the same issue you will see

END