Customer reported that there were packets loss and services impact in the network,Topology below:
1.According to the customer feedback, there were some packets loss when a wireless user(ip: 10.44.96.168) ping its gateway (CORE-01).
Check the information of switch CORE-01, we found that the ARP of 0.44.96.168 should learn from Eth-trunk 30. But sometimes they'll learn Eth-trunk1, in this case, this core switch will send packets to peer which connect to Eth-trunk 30, and will not respond to the user, so it caused packets loss.The network is stable in normal scenario, The ARP should learn from a fixed port. However from the ARP track information , we found that a large number of ARP ever learn Eth-trunk1, we suspect there was a loop or shock in the network.
2.Check the log information of CORE-01 device,
we found that there were MAC flapping under Vlan 1, and all the flapping port is relation with Eth-trunk1, so we can deduced that the packets were loopback form Eth-trunk 1, it need check the peer device named CORE-02.
3.Check the log of CORE-02 switch, we also found that all the flapping port were relation with G1/0/47.
According the current information, the packets which caused the MAC flapping were loopback by the port G1/0/47, there was a loop under G1/0/47, it must check the peer device which connect to G1/0/47.
4.The port description of GE1/0/47 is BH-02 shows that it connects to a device named BH-02, and this port added Vlan 1.
The device named BH-02 is a router. The switch connect to many office site in different place with L2VPN which deployed on the router BH-02. Each Office are isolated, there was no loop between them. We can deduced there was a loop under one site.
Because the port of LSW is added Vlan 1 default. A loop under one site was the most possible when the office site has wrong connect. It need to check the office site by customer.
The flapping and loop diagram：