No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Handling MEM Alarms for the Optical Channel Diagnosis Panel in an RH5485

Publication Date:  2015-06-18 Views:  87 Downloads:  0
Issue Description
Hardware configuration:
RH5485 server

Symptom:

1.  Start the RH5485. Then the power on self-test (POST) is in the memory initialization phase, as shown in Figure 1.

Figure 1 Memory initialization



2.  The system prompts that memory initialization fails. The MEM indicator on the optical channel diagnosis panel is on, that is, the indicator in the red box shown in Figure 2 is lit. On the operator information panel shown in Figure 3, the system error indicator is on.

Figure 2 MEM indicator on the optical channel diagnosis panel



Figure 3 Operator information panel



3.  From the memory card, the memory expansion card/dual in-line memory module (DIMM) error indicator and DIMMx error indicator are lit, as shown in Figure 4.

NOTE:
x is an integer ranging from 1 to 8.

Figure 4 Indicators and button of the memory expansion card




Handling Process
1.  Check the fault type for the memory system.

Assume that alarms occur on a memory expansion card, install the card in another slot. If the fault symptom changes, a physical fault occurs on the card or its DIMMs. If the fault symptom persists, no physical fault occurs on the card or its DIMMs.

2.  If no physical fault occurs on the memory expansion card or its DIMMs, modify the memory configuration.

Assume that a memory expansion card on which alarms occur is configured with four DIMMs, reduce or increase DIMMs. If the server is configured with over two memory expansion cards, reduce or increase memory expansion cards. After modifying the memory configuration, restart the server. In this case, the system recalculates the power consumption of the server, verifies DIMMs or memory expansion cards, and activates disabled memory slots to clear alarms. Then restore the DIMM configuration.

3.  If a physical fault occurs on the memory expansion card or its DIMMs, continue to locate the fault.

If DIMMx error indicators are lit in pairs. For example, when DIMM 1 and DIMM 8 error indicators are lit, exchange DIMM 1 and DIMM 3.
  • If only the DIMM 3 error indicator is lit, original DIMM 1 (that is, DIMM 3 after exchange) is faulty.
  • If DIMM 1 and DIMM 8 error indicators are lit, exchange DIMM 6 and DIMM 8. If only the DIMM 6 error indicator is lit, original DIMM 8 (that is, DIMM 6 after exchange) is faulty.
  • If DIMM 1 and DIMM 3 are re-exchanged, and DIMM 1 and DIMM 8 error indicators are lit, replace the memory expansion card.
Assume that only one DIMMx error indicator is lit. For example, when the DIMM 1 error indicator is lit, exchange DIMM 1 and DIMM 3.
  • If only the DIMM 3 error indicator is lit, original DIMM 1 (that is, DIMM 3 after exchange) is faulty.
  • If the DIMM 1 error indicator is lit, replace the memory expansion card.
Solution
For a memory expansion card that is physically damaged, replace its spare parts.

For a memory expansion card that is not physically damaged, common solutions are as follows:

  • Modify the memory configuration (that is, change the number of memory expansion cards or DIMMs). In this way, when the server restarts, the system recalculates the power consumption, verifies DIMMs or memory expansion cards, and enables disabled memory slots.
  • Remove batteries from the complementary metal oxide semiconductor (CMOS) on the main board, reset the system clock, and reinstall the batteries. When the server restarts, the system verifies DIMMs or memory expansion cards again, and enables disabled memory slots.
  • IBM resolved the Q2 firmware problem that memory slots cannot be enabled in the Unified Extensible Firmware Interface (UEFI) in 2011. In the latest firmware, the UEFI provides the function of enabling a memory slot without removing batteries from the CMOS on the main board.
For a rare memory alarm symptom that is caused due to the compatibility bug on DIMMs, the memory system has memory alarms instead of physical faults, and the alarm symptom changes with the memory expansion card that is configured with DIMMs. In this case, mix the sequence of the DIMMs on the memory expansion card, and remove and reinsert the card to clear alarms.
Suggestions
There is a small possibility that a physical fault occurs on a memory expansion card or its DIMMs. Most memory alarms are caused due to poor contact (error verification occurs during memory self-test, and the system automatically disables the DIMMs of the related slots).

END