No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Handling NMI Alarms for the Optical Channel Diagnosis Panel in an RH5485

Publication Date:  2015-06-18 Views:  104 Downloads:  0
Issue Description
Figure 1 and Figure 2 shows an optical channel diagnosis panel. The NMI alarm indicator is in the red box. If it is on, a hardware error is recorded by the system.

Figure 1 Optical channel diagnosis panel



Figure 2 Diagram of the optical channel diagnosis panel

Handling Process
1.  Log in to the integrated management module (IMM) to view the IMM event log, and find the following alarm logs:

18. I --  -- 8/12/2011:10:8:50 -- System "SN# 99C8637" has recovered from an Uncorrectable Bus Error
19. E --  -- 8/12/2011:10:8:38 -- A Uncorrectable Bus Error has occurred on system "SN# 99C8637"

2.  Remove alternating current (AC) power supplies from the server. After waiting for 15 minutes, power on and restart the system. Then alarms are cleared.
Root Cause
If a hardware error once occurred in the system, and the error is recorded in the system log, the NMI alarm indicator is lit (even if the system recovers). When the NMI alarm indicator is lit, the PCI or MEM indicator may be lit. If the PCI or MEM indicator is not on, the error may be recovered; however, the system log records the error. In this case, restart the server.

Peripheral Component Interconnect Express (PCIe) devices are installed in the server PCIe slots. A transient PCIe bus error occurs due to the improper contact and is recorded in the system log. Therefore, the NMI indicator is on.

If a hardware error is recorded in the system log, the NMI indicator is lit.

NOTE:
The improper contact between the memory and PCIe devices is inclined to cause a hardware error. The hardware error may be automatically recovered or is recovered after maintenance personnel handle the error.
Solution
If the NMI indicator is lit, and the system has recovered from the hardware error based on the system log, clear alarms by using the following methods:
  • Remove AC PSUs from the server, shut down the system, and wait for 15 minutes. Then restart the server.
  • Reinstall devices (including PCIe devices and the memory) that are inclined to cause the hardware error.
  • Delete alarm logs that cause NMI alarms from the system log.
If the NMI alarm indicator is lit, and the MIM or PCI alarm indicator is also lit, resolve the problem based on the prompt information on the MEM or PCI indicator.
Suggestions
When the RH5485 uses PCIe devices, NMI alarms are generated because the PCIe slots are fastened by latches that are inclined to cause contact problems. Reinstall PCIe devices, or remove AC PSUs and restart the server to clear alarms.

END