No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Spanning Tree Protocol Issues and Related Design Considerations of S5700 Switches

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Spanning Tree Protocol Issues and Related Design Considerations of S5700 Switches

Spanning Tree Protocol Issues and Related Design Considerations of S5700 Switches

Introduction

Ethernet link aggregation, also called Eth-Trunk, bundles multiple physical links into a logical link to increase link bandwidth, without having to upgrade hardware. Link aggregation increases bandwidth, improves link reliability, and enables load balancing.

This document discusses how to configure an Eth-Trunk in manual load balancing mode, how to configure an Eth-Trunk in LACP mode, and how to locate an Eth-Trunk fault.

How Do I Analyze Spanning Tree Protocol Failure Causes

STP defines many concepts, for example, root bridge, root port, designated port, and path cost. These concepts are used to construct a tree to cut redundant loops and implement link backup and path optimization. The algorithm used to construct the tree is called the spanning tree algorithm (STA).

The preceding functions are implemented by exchanging bridge protocol data units (BPDUs) between bridges. BPDUs are layer 2 packets, in which the destination MAC address is the multicast address 01-80-C2-00-00-00. All bridges supporting the STP will receive and handle BPDUs. The data area in BPDUs contains all information used for STP calculation. BPDUs are forwarded hop by hop. Therefore, the ports that do not support the STP will directly discard the BPDUs when receiving them.

This protocol can fail in some specific cases, for example, the network design is improper. If the STP protocol fails, a network loop may occur. The following describes the causes of STP failure.

Duplex Mismatch

Duplex mismatch on a point-to-point link is a very common configuration error. As shown in Figure 1-1, SwitchA serves as the root bridge. Port1 on SwitchA works in half-duplex mode whereas Port2 on SwitchB works in full-duplex mode. The duplex mismatch leads to a network loop. Since Port 2 is configured to work in full-duplex mode, Port2 can receive and transmit data at the same time. Port2 sends data even if Port1 is sending data.

This situation is a problem for SwitchA. Since Port1 of SwitchA works in half-duplex mode, it can only receive or send data at a time. In this situation, every BPDU that SwitchA sends undergoes deferment or collision and eventually gets dropped.

From an STP point of view, since SwitchB does not receive BPDUs from SwitchA any more, SwitchA has lost the root bridge. This leads SwitchB to unblock the port connected to SwitchC, and thereby creates a loop.

Figure 1-1 Networking diagram of a duplex mismatch

Unidirectional Link

A unidirectional link is a common cause of an STP loop. In Figure 1-2, suppose that the link between SwitchA and SwitchB is unidirectional, that is, traffic can only be transmitted from SwitchB to SwitchA. Assume that Port1 of SwitchB is blocked before the link becomes unidirectional. However, a port can only be blocked if it receives BPDUs from a bridge that has a higher priority. In this case, since all the BPDUs that come from SwitchA are lost, SwitchB eventually transitions Port1 toward SwitchA to forwarding state and forwards traffic. This creates a loop and the STP does not converge correctly.

In order to detect the unidirectional links before a loop occurs, Huawei switches support the Device Link Detection Protocol (DLDP). This feature can detect unidirectional links and break resulting loops by automatically disabling or prompting users to manually disable corresponding ports. On an STP network, DLDP can be used to detect unidirectional links.

For details about DLDP, see DLDP Configuration in product documentation.

Figure 1-2 Networking diagram of a unidirectional link

Packet Corruption

Bad cables or incorrect cable length can cause packet corruption, which may lead to an STP failure.

High CPU Usage

If for any reason there is an overutilization of the CPU, the STA algorithm and STP calculation will be affected.

Awkward STP Parameter Tune and Diameter Issues

There are three types of timers in STP: Hello Time, Max Age, and Forward Delay. The default values are 2 seconds, 20 seconds, and 15 seconds respectively. As a general rule, you are advised to run the stp bridge-diameter command to configure the network diameter, rather than directly tune timers. Switches can automatically calculate the optimal values for the Hello Time, Forward Delay, and Max Age timers based on the network diameter.

Software Errors

Multiple factors may cause software errors and affect STP convergence.

How Do I Troubleshoot a Spanning Tree Fault

Before troubleshooting a spanning tree fault, you need to know these items, at minimum:

  • STP network topology
  • Location of the root bridge
  • Locations of blocked ports and redundant links

Identify a Network Loop

A loop may be caused by a spanning tress fault. For details about how to check whether a loop occurs, see "Troubleshooting: Layer 2 Loop" in Huawei S Series Campus Switches Maintenance Guide.

Restore Connectivity Quickly

Network loops have extremely severe consequences on an STP network. Administrators generally do not have time to look for the cause of the loop and prefer to restore connectivity as soon as possible. The easy way out in this case is to manually disable every port that provides redundancy in the network. If you can identify a part of the network that is affected most, disable ports in this area. Or, if possible, initially disable ports that should be blocked. Each time you disable a port, check to see if you have restored connectivity in the network. After the loop is eliminated by disabling a port, you need to further locate the loop cause.

Check Ports

Check Whether Blocked Ports Receive BPDUs

You can run the display stp command to check whether blocked ports and root ports can receive BPDUs periodically. You can run the command multiple times to check whether the ports receive BPDUs. The BPDU Received field in the command output indicates the number of received BPDUs.

<HUAWEI> display stp instance 0 interface gigabitethernet 1/0/1
-------[CIST Global Info][Mode MSTP]-------
CIST Bridge         :32768.00e0-fc0e-a421
Config Times        :Hello 2s MaxAge 20s FwDly 15s MaxHop 20
Active Times        :Hello 2s MaxAge 20s FwDly 15s MaxHop 20
CIST Root/ERPC      :32768.00e0-fc0e-a421 / 0 (This bridge is the root)
CIST RegRoot/IRPC   :32768.00e0-fc0e-a421 / 0 (This bridge is the root)
CIST RootPortId     :0.0
BPDU-Protection     :Disabled
TC or TCN received  :0
TC count per hello  :0
STP Converge Mode   :Normal
Share region-configuration :Enabled
Time since last TC  :0 days 23h:9m:30s
Number of TC        :1
Last TC occurred    :GigabitEthernet1/0/1
----[Port3(GigabitEthernet1/0/1)][FORWARDING]----
 Port Protocol       :Enabled
 Port Role           :Designated Port
 Port Priority       :128
 Port Cost(Legacy)   :Config=auto / Active=19
 Designated Bridge/Port   :32768.00e0-fc0e-a421 / 128.1229
 Port Edged          :Config=disabled / Active=disabled
 Point-to-point      :Config=auto / Active=true
 Transit Limit       :3 packets/hello-time
 Protection Type     :None
 Port STP Mode       :MSTP
 Config-digest-snoop :snooped=false
 Port Protocol Type  :Config=auto / Active=dot1s
 BPDU Encapsulation  :Config=stp / Active=stp
 PortTimes           :Hello 2s MaxAge 20s FwDly 15s RemHop 0
 TC or TCN send      :0
 TC or TCN received  :0
 BPDU Sent           :147
          TCN: 0, Config: 0, RST: 0, MST: 147
 BPDU Received       :0
          TCN: 0, Config: 0, RST: 0, MST: 0 

Check for a Duplex Mismatch

You can run the display interface interface-type interface-number command to check te information about a specified port. The Duplex field in the command output indicates the duplex mode of the port. The value FULL indicates the full-duplex mode, and the value HALF indicates the half-duplex mode. If the duplex modes of the ports at both ends of a link do not match, modify the settings to make them match.

<HUAWEI> display interface gigabitethernet 1/0/1
GigabitEthernet 1/0/1 current state : UP                                       
Line protocol current state : UP                                              
Description:                                                                   
Switch Port, Link-type : access(negotiated),  
PVID :    1, TPID : 8100(Hex), The Maximum Frame Length is 9216    
IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is 0025-9ef4-abcd 
Last physical up time   : -
Last physical down time : 2016-01-15 15:58:32 UTC-01:00
Current system time: 2012-06-05 18:56:41                                        
Port Mode: COMMON FIBER, Transceiver: 1000_BASE_SX_SFP                                                         
Speed : 1000,   Loopback: NONE                                                   
Duplex: FULL,   Negotiation: ENABLE                                              
Mdi   : -, Flow-control: DISABLE                                                                  
Last 300 seconds input rate 0 bits/sec, 0 packets/sec                           
Last 300 seconds output rate 0 bits/sec, 0 packets/sec                          
Input peak rate 0 bits/sec, Record time: -                                      
Output peak rate 0 bits/sec, Record time: - 

Check Port Utilization

A port with traffic overload may fail to transmit BPDUs. You can run the display interface brief command to check the bandwidth utilization in inbound and outbound directions of a specified port in the latest period. The InUti field in the command output indicates the bandwidth utilization in the inbound direction, and the OutUti field indicates the bandwidth utilization in the outbound direction. If the bandwidth utilization in the inbound or outbound direction of a port is close to 100%, the port utilization is too high.

<HUAWEI> display interface brief
PHY: Physical                                                                   
*down: administratively down                                                    
#down: LBDT down                                                   
(l): loopback                                                                   
(s): spoofing                                                                   
(E): E-Trunk down   
(b): BFD down                                                                   
(e): ETHOAM down                                                                
(dl): DLDP down                                                                 
(lb): LBDT block                                                                
(ms): MACsec down
InUti/OutUti: input utility/output utility                                      
Interface                   PHY   Protocol InUti OutUti   inErrors  outErrors   
GigabitEthernet0/0/1        up    up       0.06%   100%          0   21217388   
GigabitEthernet0/0/2        up    up        100%   100%          0          0   
GigabitEthernet0/0/3        up    up          0%   100%          0          0

Check Packet Corruption

You can run the display interface interface-type interface-number command to check information about a specified port. The Total Error field in the command output indicates the total number of error packets detected at the physical layer.

<HUAWEI> display interface gigabitethernet 1/0/1
GigabitEthernet 1/0/1 current state : UP                                       
Line protocol current state : UP                                              
Description:                                                                   
Switch Port, Link-type : access(negotiated),  
PVID :    1, TPID : 8100(Hex), The Maximum Frame Length is 9216    
IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is 0025-9ef4-abcd 
Last physical up time   : -
Last physical down time : 2016-01-15 15:58:32 UTC-01:00
Current system time: 2012-06-05 18:56:41                                        
Port Mode: COMMON FIBER, Transceiver: 1000_BASE_SX_SFP                                                         
Speed : 1000,   Loopback: NONE                                                   
Duplex: FULL,   Negotiation: ENABLE                                              
Mdi   : -, Flow-control: DISABLE                                                                  
Last 300 seconds input rate 0 bits/sec, 0 packets/sec                           
Last 300 seconds output rate 0 bits/sec, 0 packets/sec                          
Input peak rate 0 bits/sec, Record time: -                                      
Output peak rate 0 bits/sec, Record time: -                                     

Input:  0 packets, 0 bytes                                                      
  Unicast:                          0,  Multicast:                           0  
  Broadcast:                        0,  Jumbo:                               0  
  Discard:                          0,  Pause:                               0  
  Frames:                           0  

  Total Error:                      0 
  CRC:                              0,  Giants:                              0  
  Jabbers:                          0,  Fragments:                           0  
  Runts:                            0,  DropEvents:                          0  
  Alignments:                       0,  Symbols:                             0  
  Ignoreds:                         0

Output:  0 packets, 0 bytes                                                     
  Unicast:                          0,  Multicast:                           0  
  Broadcast:                        0,  Jumbo:                               0  
  Discard:                          0,  Pause:                               0  

  Total Error:                      0                                           
  Collisions:                       0,  ExcessiveCollisions:                 0  
  Late Collisions:                  0,  Deferreds:                           0  
  Buffers Purged:                   0                

    Input bandwidth utilization threshold : 80.00%                             
    Output bandwidth utilization threshold: 80.00%                             
    Input bandwidth utilization  :    0%                                        
    Output bandwidth utilization :    0% 

Check CPU Usage

A high CPU usage may have an impact on a system that runs the STA algorithm. You can run the display cpu-usage command every several seconds. Check whether the CPU Usage field keeps a large value. Generally, the CPU usage does not exceed 80% when a switch runs for a long time. If the CPU usage does not exceed 95% within a short period of time, it can be considered that the switch is running properly.

<HUAWEI> display cpu-usage
CPU Usage Stat. Cycle: 10 (Second)
CPU Usage         : 88% Max: 92%
CPU Usage Stat. Time : 2010-12-18  15:35:56
CPU utilization for five seconds: 68%: one minute: 60%: five minutes: 55%.
Max CPU Usage Stat. Time : 2015-01-27 10:08:10. 

TaskName        CPU  Runtime(CPU Tick High/Tick Low)  Task Explanation           
VIDL                 82%         8/ 4c8b1ff       DOPRA IDLE                     
OS                   12%         1/2c684bff       Operation System  
……

For details about how to deal with high CPU usage, see "Troubleshooting: High CPU Usage" in Huawei S Series Campus Switches Maintenance Guide.

Disable Unnecessary Functions

Disabling unnecessary functions helps simplify the network structure and eases the identification of the problem. As a general rule, making the configuration as simple as possible makes troubleshooting the problem easier.

How Do I Design STP for Trouble Avoidance

Know Where the Root Bridge Is

During network design, you need to identify which switch can best serve as the root bridge. Generally, choose a powerful switch in the center of the network as the root bridge. If you put the root bridge in the center of the network with direct connection to servers and routers, the average distance from clients to servers and routers can be reduced.

For example, SwitchA is selected as the root bridge in Figure 1-3. The reasons are as follows:

  • If SwitchB serves as the root bridge, Port2 of SwitchA or Port1 of SwitchC needs to be blocked. Assume that Port1 of SwitchC is blocked. PC1 can access the server and router in two hops. PC2 can access the server and router in three hops. The average distance is two and one-half hops.
  • If SwitchA serves as the root bridge, Port1 of SwitchB or Port2 of SwitchC needs to be blocked. Assume that Port1 of SwitchB is blocked. The router and the server are reachable in two hops for PC1 and PC2.
Figure 1-3 Select the root bridge location

Know Where Redundancy Is

You can run the stp cost command to tune the STP cost parameter to decide which ports to be blocked. This tuning is usually not necessary if you have a hierarchical design and a root bridge in a good location.

Knowledge of the location of redundant links helps you identify an accidental bridging loop and the cause. In addition, knowledge of the location of blocked ports allows you to determine the error location.

Minimize the Number of Blocked Ports

A critical action that STP takes is blocking ports. A single blocked port that mistakenly transitions to the forwarding state can melt down the network. A good way to limit the risk inherent in the use of STP is to reduce the number of blocked ports as much as possible.

  • Prune VLANs that you do not use

    You do not need more than two redundant links between two nodes in a bridge network. However, this kind of configuration is common:

    Figure 1-4 VLANs not in use are not pruned

    As shown in Figure 1-4, access switches Switch 2 and Switch3 are connected to core switches Core A and Core B. Packets of all VLANs are allowed to be transmitted between Core A and Core B. The links between access switches and between an access switch and a core switch are all trunk links. Users who connect to Switch 2 are all in VLAN 2 and Switch 3 only connects users in VLAN 3. By default, VLAN Central Management Protocol (VCMP) is enabled. Therefore, Switch 2 and Switch 3 allow packets from VLAN 2 and VLAN 3 to pass through. The result is three redundant paths between Core A and Core B. This redundancy results in more blocked ports and a higher likelihood of a loop.

    VCMP can synchronize VLAN configurations, but this function is not necessary in the core of the network. Therefore, VCMP needs to be disabled on devices at the core layer.

    As shown in Figure 1-5, after VCMP is disabled on Core A and Core B, only an access VLAN is used to connect each access switch to the core. In this design, only one port is blocked per VLAN. Also, with this design, you can remove all redundant links in just one step if you shut down Core A or Core B.

    Figure 1-5 VCMP is disabled
  • Use Layer 3 switching

    Layer 3 switching means routing approximately at the speed of switching. A router performs two main functions:

    • A router builds a forwarding table. The router generally exchanges information with peers by way of routing protocols.
    • A router receives packets and forwards them to the correct interface based on the destination address.
    Figure 1-6 Layer 3 switching is used

    As shown in Figure 1-6, Core A and Core B are Layer 3 switches. Only packets from VLAN 10 are allowed to be transmitted between Core A and Core B, so there is no possibility for an STP loop.

    • Redundancy is still present, with reliance on Layer 3 routing protocols. The design ensures a reconvergence that is even faster than reconvergence with STP.
    • There is no longer any single port that the STP blocks. Therefore, there is no potential for a bridging loop.

    There is a single drawback with this design. Migration to this kind of design generally implies a rework of the networking scheme.

Keep STP Even If It Is Unnecessary

Even if you have succeeded with the removal of all the blocked ports from your network and you do not have any physical redundancy, do not disable STP. STP is generally not very processor-intensive. In addition, the few BPDUs that are sent on each link do not significantly reduce the available bandwidth.

However, a bridge network without STP can melt down in a fraction of a second if an operator makes an error and creates a loop. Generally, disabling the STP in a bridge network is not worth the risk.

Translation
Download
Updated: 2019-06-29

Document ID: EDOC1100088110

Views: 489

Downloads: 23

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next