Troubleshooting an Interface Error-Down Event
Introduction
The Error-Down mechanism is a protection mechanism provided by CE series switches and involves multiple features including interface, stacking, super virtual fabric (SVF), and security. If any of these features is configured on an interface and the switch detects that the interface or relevant service fails, the switch shuts down the interface and sets the interface status to Error-Down to prevent the fault from affecting the entire network.
When an interface is in Error-Down state, it cannot receive or send packets, its indicator is off, and the switch generates the ERROR-DOWN_1.3.6.1.4.1.2011.5.25.257.2.1 hwErrordown alarm.
You can run the display interface command to check the cause of an interface Error-Down event.
# Run the display interface command to check the cause of an interface Error-Down event.
<HUAWEI> display interface 10ge 1/0/1 10GE1/0/1 current state : ERROR DOWN(link-flap) (ifindex: 53) Line protocol current state : DOWN Description: Route Port,The Maximum Transmit Unit is 1500,The Maximum Frame Length is 9216 Internet protocol processing : disabled IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is 04f9-388d-e682 Port Mode: AUTO, Port Split/Aggregate: - Speed: AUTO, Loopback: NONE Duplex: FULL, Negotiation: - Input Flow-control: DISABLE, Output Flow-control: DISABLE Mdi: -, Fec: - Last physical up time : - Last physical down time : 2019-03-24 18:28:31 Current system time: 2019-05-15 03:07:30 Statistics last cleared:never ...
In the preceding example, the cause of an Error-Down event on 10GE1/0/1 is link flapping. That is, when the switch detects that 10GE1/0/1 has frequently alternated between Up and Down states, it shuts down the interface and sets the interface status to ERROR DOWN(link-flap).
How Can I Trigger an Interface to Enter the Error-Down State?
There are many possible causes of an interface Error-Down event on CE series switches. Regardless of the cause of the interface Error-Down event, the switch must have detected an exception. When does the switch detect an exception?
- After the switch is started properly, exception detection starts, for example, link flapping detection.
- After basic functions of a feature are configured, the system detects exceptions related to the feature, for example, resource-mismatch and stack-config-conflict related to the stacking feature.
- After you configure an independent exception detection function or sub-function, the system starts to detect exceptions, for example, BPDU protection and MAC address flapping detection.
This section uses link flapping as an example to describe how to trigger an interface to enter the Error-Down state when a link flapping occurs.
Link flapping on an interface means that the physical status of the interface frequently alternates between Up and Down states. In this case, the network topology changes. For example, two links work in active/standby mode. If the interface of the active link frequently alternates between Up and Down events, services are switched between the active and standby links. Frequent service switchovers increase the load of the switch and may result in service data loss. The switch provides the link flapping protection function to resolve this problem. When an interface frequently alternates between Up and Down states, this function enables the switch to shut down this interface and set it to the ERROR DOWN(link-flap) state.
- Run system-view
The system view is displayed.
- Run port link-flap trigger error-down
Link flapping protection is enabled on the interface.
By default, link flapping protection is enabled on an interface.
- Run interface interface-type interface-number
The interface view is displayed.
- Run port link-flap { [ interval interval-value ] [ threshold threshold-value ] }
The link flapping interval and number of link flapping times are configured.
By default, the link flapping interval for an interface is 10 seconds and the number of link flapping times is 5.
- Run commit
The configuration is committed.
How Do I Restore an Interface in Error-Down State?
The causes of interface Error-Down events are different. Correspondingly, the recovery measures also vary. Typically, there are three recovery measures for an interface Error-Down event:
- Troubleshoot service faults. After troubleshooting, the interface automatically recovers from the Error-Down state without any manual intervention.
- Manually restart the interface.
- Configure automatic recovery functions before detecting an exception.
An interface cannot be restored when the configuration of a function that triggers the interface to enter the Error-Down state is deleted.
For the preceding recovery measures, the first one requires no manual intervention, and can be used to recover an interface that enters the Error-Down event due to some specific causes, such as ERROR DOWN(dual-active-fault-event) and ERROR DOWN(no-stack-link-event). For other causes of an interface Error-Down event, the switch provides two methods to recover the interface: manual and automatic recovery. Before taking recovery measures, you are advised to eliminate the loop on the network to prevent the interface from entering the Error-Down state again.
- Manual recovery: Run the shutdown and undo shutdown commands or run the restart command in the interface view to restart the interface.
- Automatic recovery: Run the error-down auto-recovery cause { auto-defend | bpdu-protection | crc-statistics | dual-active | fabric-link-failure | forward-engine-buffer-failed | forward-engine-interface-failed | link-flap | loopback-detect | m-lag | mac-address-flapping | no-stack-link | portsec-reachedlimit | spine-member-exceed-limit | spine-type-unsupported | stack-config-conflict | stack-member-exceed-limit | stack-packet-defensive | storm-control | transceiver-power-low } interval interval-value command in the system view to set the delay for an interface in Error-Down state to automatically restore to Up. After the delay expires, the interface in Error-Down state automatically goes Up.
The configured delay takes effect simultaneously for interfaces that enter in Error-Down state due to the same cause. Compared the manual mode, the automatic mode improves the efficiency and ensures that all interfaces that enter in Error-Down state due to the same cause go Up.
Related Information
For details about causes of an interface Error-Down event and corresponding recovery methods, see CloudEngine Series Switches Error-Down Mechanism.