No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Wrong Slot Display in Memory Configuration Error

Publication Date:  2016-09-29 Views:  184 Downloads:  0
Issue Description

There are errors in the BMC interface like this: Memory(DIMM020) Configuration error.

And the problem still exist after exchange the DIMM020 with DIMM120, and exchange the CPU1 and CPU2.

The error Memory(DIMM020) Configuration error is still reported.

So engineer think the main board is faulted and apply the spare part to replaced it. But the same error still report even after replaced the main board.


Alarm Information

Memory(DIMM600) Configuration error

Handling Process

Simulate the problem in the lab, install 1 fault memory in DIMM600 but install 1 good memory in DIMM601. The error will be Memory(DIMM600) Configuration error.

If we install 1 good memory in DIMM600 but install 1 fault memory in DIMM601, the system shows the same error Memory(DIMM600) Configuration error.

This is the log when the fault memory is in DIMM600 but the good memory is in DIMM601:


This is the log when only install the fault memory in DIMM600:


This is the log when install the good memory in DIMM600 but fault memory in DIMM601:


For all these three scenarios, the warning code is the same.

And below is the analyze result from memory manufacture company:





Root Cause

The memory manufacture company foud the problem is caused by VrefCA short-circuit,and the short-circuit is caused by capacitor fault.

But the VrefCA is connected in the same channel. The BOIS will report the first memory fault when the VrefCA short-circuit happen.

Solution

DIMM020/021 and DIMM120/121 are belong to the same channel, exchange each other to verify which one is the real fault memory.

And found the DIMM021 is fault and the error disappear after replaced the DIMM021.

Suggestions

Because the data/control/address/power lines are all connected in one multiplying channel, it may lead to the whole channel isolated by one single error.

So you should considering the whole channel when you are handling the Memory Configuration Error issues.

END