Huawei S Series Campus Switches Troubleshooting Guide(V100 and V200)
How Do I Analyze Spanning Tree Protocol Failure Causes
STP defines many concepts, for example, root bridge, root port, designated port, and path cost. These concepts are used to construct a tree to cut redundant loops and implement link backup and path optimization. The algorithm used to construct the tree is called the spanning tree algorithm (STA).
The preceding functions are implemented by exchanging bridge protocol data units (BPDUs) between bridges. BPDUs are layer 2 packets, in which the destination MAC address is the multicast address 01-80-C2-00-00-00. All bridges supporting the STP will receive and handle BPDUs. The data area in BPDUs contains all information used for STP calculation. BPDUs are forwarded hop by hop. Therefore, the ports that do not support the STP will directly discard the BPDUs when receiving them.
This protocol can fail in some specific cases, for example, the network design is improper. If the STP protocol fails, a network loop may occur. The following describes the causes of STP failure.
Duplex Mismatch
Duplex mismatch on a point-to-point link is a very common configuration error. As shown in Figure 21-56, SwitchA serves as the root bridge. Port1 on SwitchA works in half-duplex mode whereas Port2 on SwitchB works in full-duplex mode. The duplex mismatch leads to a network loop. Since Port 2 is configured to work in full-duplex mode, Port2 can receive and transmit data at the same time. Port2 sends data even if Port1 is sending data.
This situation is a problem for SwitchA. Since Port1 of SwitchA works in half-duplex mode, it can only receive or send data at a time. In this situation, every BPDU that SwitchA sends undergoes deferment or collision and eventually gets dropped.
From an STP point of view, since SwitchB does not receive BPDUs from SwitchA any more, SwitchA has lost the root bridge. This leads SwitchB to unblock the port connected to SwitchC, and thereby creates a loop.
Unidirectional Link
A unidirectional link is a common cause of an STP loop. In Figure 21-57, suppose that the link between SwitchA and SwitchB is unidirectional, that is, traffic can only be transmitted from SwitchB to SwitchA. Assume that Port1 of SwitchB is blocked before the link becomes unidirectional. However, a port can only be blocked if it receives BPDUs from a bridge that has a higher priority. In this case, since all the BPDUs that come from SwitchA are lost, SwitchB eventually transitions Port1 toward SwitchA to forwarding state and forwards traffic. This creates a loop and the STP does not converge correctly.
In order to detect the unidirectional links before a loop occurs, Huawei switches support the Device Link Detection Protocol (DLDP). This feature can detect unidirectional links and break resulting loops by automatically disabling or prompting users to manually disable corresponding ports. On an STP network, DLDP can be used to detect unidirectional links.
For details about DLDP, see DLDP Configuration in product documentation.
Packet Corruption
Bad cables or incorrect cable length can cause packet corruption, which may lead to an STP failure.
High CPU Usage
If for any reason there is an overutilization of the CPU, the STA algorithm and STP calculation will be affected.
Awkward STP Parameter Tune and Diameter Issues
There are three types of timers in STP: Hello Time, Max Age, and Forward Delay. The default values are 2 seconds, 20 seconds, and 15 seconds respectively. As a general rule, you are advised to run the stp bridge-diameter command to configure the network diameter, rather than directly tune timers. Switches can automatically calculate the optimal values for the Hello Time, Forward Delay, and Max Age timers based on the network diameter.
Software Errors
Multiple factors may cause software errors and affect STP convergence.