S7700(V2R7) Packets loss and services impact in the network

Publication Date:  2016-12-31 Views:  114 Downloads:  0
Issue Description

Customer reported that there were packets loss and services impact in the network,Topology below:

Handling Process

1.According to the customer feedback, there were some packets loss when a wireless user(ip: ping its gateway (CORE-01).

Check the information of switch CORE-01, we found that the ARP of should learn from Eth-trunk 30. But sometimes they'll learn Eth-trunk1, in this case, this core switch will send packets to peer which connect to Eth-trunk 30, and will not respond to the user, so it caused packets loss.The network is stable in normal scenario, The ARP should learn from a fixed port. However from the ARP track information , we found that a large number of ARP ever learn Eth-trunk1, we suspect there was a loop or shock in the network.


2.Check the log information of CORE-01 device,

we found that there were MAC flapping under Vlan 1, and all the flapping port is relation with Eth-trunk1, so we can deduced that the packets were loopback form Eth-trunk 1, it need check the peer device named CORE-02.

3.Check the log of CORE-02 switch, we also found that all the flapping port were relation with G1/0/47.

According the current information, the packets which caused the MAC flapping were loopback by the port G1/0/47, there was a loop under G1/0/47, it must check the peer device which connect to G1/0/47.

4.The port description of GE1/0/47 is BH-02 shows that it connects to a device named BH-02, and this port added Vlan 1.

The device named BH-02 is a router. The switch connect to many office site in different place with L2VPN which deployed on the router BH-02. Each Office are isolated, there was no loop between them. We can deduced there was a loop under one site.
Because the port of LSW is added Vlan 1 default. A loop under one site was the most possible when the office site has wrong connect. It need to check the office site by customer.

The flapping and loop diagram:

Root Cause
There was a loop under the GE1/0/47 port of switch CORE-02, that caused MAC and ARP flapping , and caused packets loss and service impact.
To find the RCA of this issue, it is recommended that customer check the MAC flapping and the port UP/down of the device in each office site at the issue time.
Optimization suggestions of this issue:
1)Develope STP of the access device alone for each site.
2)Develope single-port loop detection (loop-detect enable under port view)
3) Configure "stp edged-port enable" in the ports which connect to PCs and servers
4) A huge risk is that a large number of services is in Vlan 1. Because the ports of switch is default include Vlan 1, it was easy to cause a loop if one office site has wrong connect. This is not good for trouble isolate when all the service under one Vlan, and the whole network will impact if a loop on one device. So we suggest that customer optimize network that the different services develop on different VLAN.