The topology is shown as the below. Cisco Core switch is the gateway. CE12800 is the Aggregation switch. CE6851 is the access switch. CE12800 connects to Cisco core switch via Eth-trunk501 which has two member ports.
The problem is that suddenly the service in the network is totally down. After a few minutes, the service is restored again.
1. From the log from CE12800, we found that stp root bridge was changed, there must be a topology change in the network.
Mar 7 2017 10:34:33+03:00 CE12800 %%01MSTP/4/MSTPLOG_PROROOT_CHANGED(l):CID=0x80542723;The root bridge of MSTP process changed. (ProcessID=0, InstanceID=0, RootPortName=-, PreviousRootBridgeID=6c50-4dae-7d40, NewRootBridgeID=34a2-a2f5-8d21), RootPwName=-)
2. From the Cisco Core switch log, we found that the ports went to error disabled mode, the ports Gi2/1/5 and Gi1/1/16 were disabled for neighbor mismatch detected by UDLD (UDLD is Cisco private protocol). UDLD err-disabled all the ports connecting to CE12800, and service was affected.
Mar 7 2017 10:34:40.782 GMT: %UDLD-SW2-4-UDLD_PORT_DISABLED: UDLD disabled
Mar 7 2017 10:34:41.754 GMT: %UDLD-SW2-4-UDLD_PORT_DISABLED: UDLD disabled interface Gi1/1/16, neighbor mismatch detected.
Mar 7 2017 10:34:40.782 GMT: %PM-SW2-4-ERR_DISABLE: udld error detected on Gi2/1/5, putting Gi2/1/5 in err-disable state
Mar 7 2017 10:34:41.754 GMT: %PM-SW2-4-ERR_DISABLE: udld error detected on Gi1/1/16, putting Gi1/1/16 in err-disable state
3. 10 minutes later UDLD timed out, the ports been err-disabled were enabled, then Eth-Trunk 501 recovered and service resumed.
Mar 7 2017 10:44:41.814 GMT: %PM-SW2-4-ERR_RECOVER: Attempting to recover from udld err-disable state on Gi1/1/16
Mar 7 2017 10:44:41.926 GMT: %PM-SW1_STBY-4-ERR_RECOVER: Attempting to recover from udld err-disable state on Gi1/1/16
Cisco UDLD is configured on the ports that connecting to CE12800 [Eth-trunk 501], which is not recommended as it is cisco private protocol; and it can cause abnormal link status and the service interruption accordingly.
Removed UDLD configurations on cisco core switch side.
CE12800 CSS dual-homes to Cisco’s core switches. Now CE12800-2 has no links connecting to Cisco core switches, suggest deploying it.