Troubleshooting Fixed-Configuration Router Hardware Faults
- About This Document
- Introduction
- Troubleshooting Board Faults
- Troubleshooting Optical Module Faults
- A Device Cannot Display Any Optical Module Information But Services Are Running Normally
- An Optical Module Does Not Transmit Optical Signals or Its Transmit Optical Power Is Too Low
- An Optical Module Has Normal Transmit Power But Many Packets Are Dropped on the Interface Due to Bit Errors
- Optical Modules on the Local and Remote Devices Cannot Communicate
- Related Information
About This Document
Intended Audience
This document is intended for router management and maintenance engineers who have acquired network and basic router hardware information.
Symbol Conventions
The symbols that may be found in this document are defined as follows.
Symbol |
Description |
---|---|
Indicates a potentially hazardous situation which, if not avoided, could result in equipment damage, data loss, performance deterioration, or unanticipated results. NOTICE is used to address practices not related to personal injury. |
|
Calls attention to important information, best practices and tips. NOTE is used to address information not related to personal injury, equipment damage, and environment deterioration. |
Command Conventions
The command conventions that may be found in this document are defined as follows.
Convention |
Description |
---|---|
Boldface |
The keywords of a command line are in boldface. |
Italic |
Command arguments are in italics. |
[ ] |
Items (keywords or arguments) in square brackets [ ] are optional. |
{ x | y | ... } |
Optional items are grouped in braces and separated by vertical bars. One item is selected. |
[ x | y | ... ] |
Optional items are grouped in brackets and separated by vertical bars. One item is selected or no item is selected. |
{ x | y | ... } * |
Optional items are grouped in braces and separated by vertical bars. A minimum of one item or a maximum of all items can be selected. |
[ x | y | ... ] * |
Optional items are grouped in brackets and separated by vertical bars. Several items or no item can be selected. |
&<1-n> |
The parameter before the & sign can be repeated 1 to n times. |
# |
A line starting with the # sign is a comment. |
Interface Number Notes
Interface numbers used in this document are examples and may not exist on devices. In device configuration, use the existing interface numbers on devices.
Change History
Changes between document issues are cumulative. The latest document issue contains all the changes made in earlier issues.
- Issue 01 (2019-06-29)
Introduction
This document provides procedures for identifying and rectifying common hardware faults on fixed-configuration routers.
Most common fixed-configuration router hardware faults are related to boards and optical modules. The device notifies you of such faults by turning on the STAT indicator on the MPU. If the indicator is steady red, an alarm has been generated on the device. In this case, run the display alarm hardware command to check whether the alarm was generated due to a board or optical module fault. Board and optical module faults are described in separate chapters.
If the STAT indicator on the MPU is not steady red, search the following two chapters for the fault based on the fault symptoms.
Troubleshooting Board Faults
The System Fails to Start
Symptom
The STAT indicators on the two MPUs are steady off or blinking green.
Common Causes
- The two MPUs are not properly installed.
- Memory is unavailable.
- Read and write operations on the CF cards fail.
- The system software package is not stored on the CF cards.
Troubleshooting Procedure
- Check that the MPUs are properly installed.
Connect the console port of the MPU to the COM port of a PC. Power on the device. If full in or POST failed is displayed in the console port information, check whether the MPU is properly installed.
Press Ctrl+B to enter bootload Menu…0 reset mode reg value = 0x7 reset mode = 0x2 full in . . . . . . . . . . . . . . . . . . . . . . . .fail dev_name: eTSEC1 waiting for auto negotiation to complete... Speed 1000, full duplex recv seq num:1 !!!POST failed, reset board...
If the MPUs are properly installed, go to step 2.
- When the message "Press CTRL+T for full memory test" is displayed in the console port information, press Ctrl+T immediately to fully check the memory. If pass is displayed after the check, the memory is normal. In this case, go to step 3. Otherwise, the memory is faulty. In this case, replace the MPUs.
- Check that the CF card works properly.
Generally, if the following information is displayed, the CF card has encountered a read/write fault. In this case, replace the MPUs.
scan usb storage failed, reset usb and try again fail_count = 1 USB: Register 10011 NbrPorts 1 USB EHCI 1.00 scanning bus for devices... 1 USB Device(s) found scanning bus for storage devices... 0 Storage Device(s) found
If the CF card has no read/write fault, go to step 4.
- Check that the system software package exists on the CF card.
When the message "Press Ctrl+B to enter bootload Menu" is displayed in the console port information, press Ctrl+B immediately. On the displayed menu, enter 6 to select List file in CFcard to check the CF card list file. If there is a .cc file in the file list, the system software package exists on the CF card. Otherwise, it does not. In this case, go to step 5.
Bootload Menu(Hiboot Version: 04.00) 1. Boot with default mode 2. Boot from CFcard 3. Enter ethernet submenu 4. Set boot file and path 5. Modify boot ROM password 6. List file in CFcard 7. Modify System and Chassis Parameters 8. Modify start mode 9. Clear password for console user 10. Reboot 11. Enter TPM Submenu Enter your choice(1-11): 6
- Contact Huawei and provide the following information:
- Results of the troubleshooting procedure
- Configuration, log, and alarm files
The Slave MPU Fails to Be Registered
Symptom
The STAT indicator on the slave MPU is steady off or blinking green, and the display device command output shows that the status of the slave MPU is Abnormal and Unregistered.
Common Causes
- The slave MPU is starting.
- The system software packages loaded to the master and slave MPUs differ.
- The slave MPU is not properly installed.
- The memory on the slave MPU fails.
- A system software fault occurs.
Troubleshooting Procedure
- Check that the slave MPU has finished startup.
Startup begins when an MPU is powered on and ends when the MPU is successfully registered.
MPU startup usually takes less than 3 minutes. If you restart a device after upgrading system software, startup takes at most 20 minutes.
If the MPU is starting, wait until startup finishes. If the MPU is still unregistered, go to step 2.
- Check that the slave MPU is powered on.
If it is not powered on, run the power on slot slot-id command in the user view to power on the slave MPU. If the slave MPU remains unregistered after executing this command, go to step 3.
- Check that the system software loaded to the slave MPU is the same as that loaded to the master MPU.
Connect the console port of the slave MPU to a PC, and check whether the system software package shown after "The start file is" in the console port information is the same as that of the master MPU.
If the system software packages of the two MPUs are different, go to step 8. If they are the same but the slave MPU cannot be registered, go to step 4.
- Check that the MPU backplane is normal.
Remove and reinstall the slave MPU.
- Remove the MPU and check whether any pins on the MPU connector are bent. If pins are bent, replace the backplane.
- Reinstall the MPU and ensure that the MPU connector and backplane are properly connected. Then check whether the MPU can start.
If the MPU cannot start, go to step 5.
- Check that the slot that houses the slave MPU is normal.
Remove the slave MPU and install it in another slot to check whether it can be registered. If the MPU still cannot be registered, go to step 6.
NOTE:
This operation may interrupt services and should be performed only when no services are running on the device.
- Check that the memory on the slave MPU works properly.
Power off and remove the slave MPU. Wait 30 seconds, and connect the console port of the MPU to the COM port of a PC. Reinstall the MPU into the chassis, and power it on. When the message "Press CTRL+T for full memory test ................2" is displayed in the console port information, press Ctrl+T to fully check the memory of the MPU.
If an error message including fail is displayed after the check, the memory is faulty. In this case, replace the MPU.
NOTE:
If no information is displayed on the PC, the memory is faulty. In this case, replace the MPU.
If the MPU still cannot be registered, go to step 7.
- Check that the system software is correct.
Run the startup system-software system-file command to specify the system software file to use for the next startup and check whether the current system software is normal. Restart the device. If the slave MPU still cannot be registered, go to step 8.
- Contact Huawei and provide the following information:
- Outputs of the following commands:
- display diagnostic-information
- display alarm hardware
- display elabel
- Indicator status of power modules and the slave MPU
- User log files and diagnostic log files
- Outputs of the following commands:
A PIC Fails to Be Registered
Symptom
The STAT indicator on the master MPU is steady red, the STAT indicator on a PIC is steady yellow or off, and the display device command output shows that the status of the PIC is Abnormal and Unregistered.
Common Causes
- The PIC is in the startup period.
- The PIC is not properly installed.
- The PIC is faulty.
- The PIC model is not supported on the device.
Troubleshooting Procedure
- Check that the PIC has finished startup.
Startup begins when a PIC is powered on and ends when the PIC is successfully registered.
PIC startup takes at most 5 minutes when no system software package or file needs to be updated. With updates, startup takes up to 10 minutes.
If the PIC is starting, wait until startup finishes. If the PIC is still unregistered, go to step 2.
- Check that no voltage or component alarms are generated.
Run the display alarm hardware command to check whether an alarm has been generated for the PIC. If the command output contains a voltage or component alarm, the PIC is faulty. In this case, replace the PIC.
If the fault persists after the PIC is replaced, go to step 3.
- Check that the PIC model is supported on the device.
Obtain the PIC model from the paper label on the upper right corner of its front panel or from its electronic label. Run the display version command to query the device software version. Check whether the software version supports the PIC model.
For the PIC models supported by the software version, see "Boards" in the Hardware Description of your product.
If the PIC model is supported by the software version, go to step 4.
- Check that the PIC has been powered on.
Run the power on slot slot-id command in the user view to power on the PIC. If the PIC remains unregistered after executing this command, go to step 5.
- Remove and reinstall the PIC.
The PIC connects to an MPU through the inter-board communication channel. If this channel fails, the PIC cannot start. To resolve this problem, perform the following operations:
- Remove the PIC from the slot and check whether the PIC connector is intact. If there are any idle pin holes on the PIC connector, the connector is damaged. Check whether any pins are bent. If any are, replace the backplane.
- Install the PIC and ensure that the PIC connector and backplane are correctly connected. Then check whether the PIC can be registered.
If the fault persists after the PIC is reinstalled, go to step 6.
- Check that the slot housing the PIC is correct.
Reinstall the PIC in another slot.
After installation, check whether the PIC can be registered.
- If it cannot be, go to step 7.
- If it can be, install in the faulty slot another PIC that can successfully register and check whether the PIC can register in that slot. If it cannot register, a fault has occurred on the backplane or MPU.
- Check that the MPU is properly connected to the backplane.
Reinstall the MPU. If two MPUs are installed, reinstall the slave MPU first and check the MPU connector according to step 5. After performing a master/slave MPU switchover, reinstall the other MPU. If the fault is rectified, the MPU was not properly connected to the backplane. If the problem persists, go to step 8. This operation may interrupt services and is not recommended in scenarios where key services are running.
- After powering on the PIC, enter the diagnostic view immediately and run the set output-mode board open slot slot-id [ mbus-sol | 2400 | 9600 | 19200 | 38400 | 57600 | 115200 | 187500 ] command to collect startup information.
The set output-mode board open slot slot-id [ mbus-sol | 2400 | 9600 | 19200 | 38400 | 57600 | 115200 | 187500 ] command must be executed during the startup of the PIC. That is, run this command after powering on or reseating the PIC.
<HUAWEI> system-view [~HUAWEI] diagnose [~HUAWEI-diagnose] set output-mode board open slot 9 9600 ************************************************************ * Welcome To Enter Slot(9) SOL SERVER * * If you want to quit, please press CTRL+K * * All rights reserved (2010-2011) * ************************************************************ Boot area 0 Reset times is 3 Reset cause :cpu reset,scc b reset,power on reset, Last fiq: not ocurred[0x0] Totem C CLUSTER L1/L2 Cache Mbist end! Totem C LLC Mbist OK! Totem C HHA:OK Boot firmware (version iWare uniBIOS V2R1 SPC021B010) CPU info for Socket 0 Nimbus PLL0 : 1000MHz PLL1 : 800MHz PLL2 : 1200MHz PLL3 : 625MHz PLL4 : 650MHz CPU info for Socket 0 TC CPU : 2000MHz SC PLL2 : 933MHz SC PLL3 : 933MHz SC PLL4 : 800MHz SC Wafer ID: 17
- Contact Huawei and provide the following information:
- Outputs of the following commands:
- display diagnostic-information
- display power slot
- display alarm hardware
- display elabel
- Indicator status of power modules and the faulty PIC
- User log files and diagnostic log files
- Startup information collected in step 8
- Outputs of the following commands:
The MPU or PIC Unexpectedly Restarts
Symptom
Services are affected, and a historical board restart alarm is reported to the NMS.
Common Causes
- The board is restarted using a command or is powered off.
- The software or hardware of the board is faulty.
Troubleshooting Procedure
- Run the display board-reset slot-id command in the diagnostic view to check the restart cause.
<HUAWEI> system-view [~HUAWEI] diagnose [~HUAWEI-diagnose] display board-reset 9 Board 9 reset information: -- 1. DATE:2018-01-21 TIME:03:20:52+04:00 BARCODE:030QAF10D7000030 RESET Num:0 -- Reason:Board register, BarCode is 030QAF10D7000030. -- BootMode:NORMAL -- BootCode:0x060100ff
- Handle the problem using the cause-specific operation recommended in table below.
Table 1-1 Board restart causes and troubleshooting instructions
Cause
Description
Recommended Operation
User operations
Power off board from command.
A user has reset the board using a command or through the NMS.
Check whether the board is restarted using a command or from being powered off.
Reset board from PIC command.
Reset the chassis from command.
Canbus request to power off the board.
System loading
EPLD is upgrade,and reset board.
The board restarts after the EPLD is loaded.
No operation is required.
Board update by JTAG, and reset board.
The board restarts after it is upgraded using the JTAG channel.
Board update mbus, and reset board.
The board restarts after the MBus is upgraded.
Software exceptions
Board task exception occurs and reset lpu.
The system detects a software exception.
Collect user logs, diagnostic logs, and the display service-diagnostic-information command output, and contact Huawei.
Board task deadloop occurs and reset lpu.
The system detects a deadloop on the board.
Collect user logs, diagnostic logs, and the display service-diagnostic-information command output, and contact Huawei.
Component report failure.
The component fails to be reported.
Run the display reportfailure show_num [ begin_num ] [ verbose ] [ all | slave | slot slot-id ] command to check the cause. Collect user logs, diagnostic logs, and the display service-diagnostic-information command output, and contact Huawei.
Multiple-bit ECC check error, and reset board.
A soft error occurs on the chip.
Collect user logs, diagnostic logs, and the display service-diagnostic-information command output, and contact Huawei.
Device management
The heartbeat lost and reset board.
The MPU does not receive heartbeat messages from the PIC.
Collect user logs, diagnostic logs, and the display service-diagnostic-information command output, and contact Huawei.
Semls register failed, and reset board.
A board remains unregistered for a long time.
Collect the console port information recorded during board startup and contact Huawei.
Hardware components
Slave mpu is not compatible with the master, and power off board.
Hardware is incompatible.
Replace the slave MPU so that two MPUs are compatible.
This version does not support the type of board, and power off the board.
Remove the PIC and replace it with a PIC supported by the device and slot. For details about the PIC models supported by the slots, see "Boards" in the Hardware Description of your product.
Board is incompatible with chassis, and power off board.
Board is incompatible with slot, and power off board.
Troubleshooting Optical Module Faults
General Optical Module Fault Locating Procedure
- Check the model of an optical module. If it is not Huawei certified, replacing it with a Huawei-certified module is recommended.
- Run the display interface [ main | interface-type1 [ interface-number ] | slot slot-number ] command in the interface view to check the optical module interface information, such as its rate and wavelength. Check whether the information is consistent with the optical module specification described in the product manual.
- Use an optical power meter to measure the receive power on the interface.
NOTE:
If no optical power meter is available, replace the optical module with another one of the same model and check whether the new optical module works normally on the interface. Alternatively, run the display interface [ main | interface-type1 [ interface-number ] | slot slot-number ] command to check whether the transmit power and receive power of the optical module are consistent with the values in the product manual.
- Connect the transmitter and receiver of the optical module using the same fiber to create a loopback and check whether the interface can go up.
NOTE:
Optical fibers no longer than 10 km can be directly used for loopback. Longer optical fibers require optical attenuators for loopback.
- Check the remote interface configuration, such as the auto-negotiation state.
- Take actions based on the cause of the fault. If the optical module is not Huawei certified, replacing it with a Huawei-certified optical module is recommended. If the fault is caused by environmental factors or configurations, improve the environment or modify the configurations. If the problem persists, contact Huawei.
A Device Cannot Display Any Optical Module Information But Services Are Running Normally
Symptom
The optical module information cannot be queried using the display interface command, or the electronic label information of an optical module cannot be queried using the display elabel optical-module interface interface-number command.
Common Causes
- The optical module is not Huawei certified.
- The optical module is not properly installed.
- The optical module or device encounters exceptions.
Troubleshooting Procedure
- Check whether the optical module is Huawei certified. If not, contact the manufacturer of the optical module.
- Reinstall the optical module if possible, and check whether the optical module information can be displayed.
- If the problem persists, reboot or power off the device if possible. If the problem remains, check the system software version running on the device. If the device is not running the latest software version, you are advised to upgrade the system software to the latest version.
Reinstalling optical modules, rebooting the device, or upgrading the system software affects services. Exercise caution before performing these operations.
- If all the preceding operations fail to rectify the fault, contact Huawei.
An Optical Module Does Not Transmit Optical Signals or Its Transmit Optical Power Is Too Low
Symptom
The actual transmit power of an optical module, as measured by an optical power meter, is lower than the nominal transmit power of the module. The display interface command output shows that the transmit power of the optical module is lower than the alarm threshold.
Common Causes
- The optical interface is dirty.
- The optical module was faulty.
Troubleshooting Procedure
- Check whether the optical interface is dirty. If it is, use a cotton swab to gently clean it. Then, re-test the transmit power. Use dust-proof caps to protect unused optical modules.
- If the transmit power of the optical module is still abnormal, remove the optical module and install it on another optical interface. If the fault persists, the optical module is faulty. Replace the optical module and send back the faulty one for repair or contact Huawei.
An Optical Module Has Normal Transmit Power But Many Packets Are Dropped on the Interface Due to Bit Errors
Symptom
The peer NMS reports a CRC bit error alarm, and Output in the display interface interface-type interface-number extensive command output shows that excessive error packets are dropped on the local interface.
Common Causes
- The optical fiber encounters high connector loss or large bend radius.
- The optical module was faulty.
Troubleshooting Procedure
- Use an optical power meter to measure the receive power of the optical module and check whether it is the consistent with the receive power described in the product manual.
NOTE:
To measure the receive power of a fiber, place the optical power meter close to its receive end.
- If the receive power is below the threshold, check whether the fiber link is normal. Replace the optical fiber and check whether the peer end still reports the bit error alarm.
- If the problem persists, remove the optical module and install it on another optical interface. If the problem persists, the optical module is faulty. Return the optical module for repair or contact Huawei.
Optical Modules on the Local and Remote Devices Cannot Communicate
Symptom
A device is connected to a remote device through optical interfaces and fibers. The two interfaces are down and the devices cannot communicate with each other.
Common Causes
- The optical module used on the device is not Huawei certified.
- The optical module and fiber types are incompatible.
- One or both of the optical interfaces were shut down.
- The configurations on the local and remote interfaces are inconsistent.
- The transmit power of the optical module is too low or high.
- The receive power of the optical module is too low or high.
- The optical module fails.
Troubleshooting Procedure
- Check that the optical module is Huawei certified.
Using Huawei-certified optical modules on the device is recommended because other modules cannot provide required reliability and may prevent the interface from going up.
NOTE:
You can check whether a module is Huawei certified by checking its label. If the label has a Huawei logo, the optical module is Huawei certified. Otherwise, send the optical module model to Huawei to check whether it is certified.
- Check that the optical module matches the optical fiber.
If they do not match, replace one of them.
- A single-mode optical module (typically with a center wavelength of 1310 nm or 1550 nm) must be used with single-mode optical fibers (typically yellow).
- A multimode optical module (typically with a center wavelength of 850 nm) must be used with multimode optical fibers (typically orange).
- Check that the interface configurations of the local and peer devices are the same.
Run the display this interface command on both interfaces to check the bandwidth and auto-negotiation mode of both ends. If the bandwidth or auto-negotiation modes differ, modify the configuration to be consistent.
- Check that the optical module works properly.
Run the display optical-module { extend | base } information interface { interface-type | interface-number } or display optical-module { extend | base } information slot slot-id pic pic-id command to check the working status of the optical module. Run the display alarm hardware command to check the alarm information of the optical module..
<HUAWEI> system-view [HUAWEI] diagnose [HUAWEI-diagnose] display optical-module extend information interface GigabitEthernet 0/7/0 ============================================================================== Transceiver Digital Diagnostic Monitoring (DDM), Externally Calibrated =============|================================================================= Card7-Port0 + Value HighAlarm HighWarn LowWarn LowAlarm Status -------------|----------------------------------------------------------------- Temperature(C) 48.500 80.000 75.000 -5.000 -10.000 Normal Supply Voltage(V) 3.293 3.700 3.630 2.970 2.850 Normal Tx Bias(mA) 21.600 58.091 48.091 4.762 4.762 Normal Tx Power(avg dBm) -6.106 -3.103 -4.102 -8.102 -9.104 Normal Rx Power(avg dBm) -4.025 -3.000 -3.000 -19.030 -19.030 Normal =============|================================================================= <HUAWEI> display alarm hardware Index Level Date Time Info 1 critical 2018-10-27 14:58:18 GigabitEthernet0/4/4 is failed, the optica l module on card was removed -------------------------------------------------------------------
In the optical module alarm information, check the alarm causes.
- Low receive power
The strength of received signals is too low. A possible reason is that the remote interface is down or packets sent from the remote interface are dropped during transmission. In this case, check whether the distance between the two devices exceeds the maximum transmission distance of the optical module. If it does not, check whether the optical module on the remote interface or the optical fiber is damaged. If one is damaged, replace it.
- High receive power
The strength of received signals is too high. A possible reason is that the local and remote ends are nearby but a long-distance optical module is used on the remote end. In this case, install an optical attenuator on the remote module to reduce the transmit power.
- Low transmit power
The strength of signals sent from the local optical module is too low, or the optical module is faulty. The receive power on the remote end may be low, preventing the remote interface from going up, or causing the interface to drop packets. In this case, contact Huawei.
- High transmit power
The strength of signals sent from the local optical module is too high. This may cause a high receive power on the remote optical module. High receive power for a long time may burn the remote optical module. Another possible cause is that the local optical module has failed. In this case, replace the optical module.
To ensure normal communication between two optical interfaces, after the two interfaces are connected using optical modules and optical fibers, check for transmit and receive power alarms. Ensure that the transmit and receive power values of the two optical modules are within the normal ranges. Otherwise, traffic forwarding on the optical interfaces may be abnormal and the optical modules may become damaged.
- Low receive power
- If neither end has any alarms reported, use a single fiber to loopback the interface.
NOTE:
Optical fibers no longer than 10 km can be directly used for loopback. Longer optical fibers require optical attenuators for loopback.
- If the looped-back interface does not go up, use another optical module and retry the loopback. If the problem persists, the device is faulty. In this case, contact Huawei.
- If the looped-back interface goes up, check whether the peer device and link are faulty.
- Replace the optical fiber or optical module and check whether the interface goes up. If the interface goes up, the original optical fiber or optical module is faulty. In this case, replace the faulty fiber or module. Otherwise, contact Huawei.
Related Information
For more information about the fixed-configuration routers, see the following documents:
- About This Document
- Introduction
- Troubleshooting Board Faults
- Troubleshooting Optical Module Faults
- A Device Cannot Display Any Optical Module Information But Services Are Running Normally
- An Optical Module Does Not Transmit Optical Signals or Its Transmit Optical Power Is Too Low
- An Optical Module Has Normal Transmit Power But Many Packets Are Dropped on the Interface Due to Bit Errors
- Optical Modules on the Local and Remote Devices Cannot Communicate
- Related Information