There are errors in the BMC interface like this: Memory(DIMM020) Configuration error.
And the problem still exist after exchange the DIMM020 with DIMM120, and exchange the CPU1 and CPU2.
The error Memory(DIMM020) Configuration error is still reported.
So engineer think the main board is faulted and apply the spare part to replaced it. But the same error still report even after replaced the main board.
Memory(DIMM600) Configuration error
Simulate the problem in the lab, install 1 fault memory in DIMM600 but install 1 good memory in DIMM601. The error will be Memory(DIMM600) Configuration error.
If we install 1 good memory in DIMM600 but install 1 fault memory in DIMM601, the system shows the same error Memory(DIMM600) Configuration error.
This is the log when only install the fault memory in DIMM600:
This is the log when install the good memory in DIMM600 but fault memory in DIMM601:
For all these three scenarios, the warning code is the same.
And below is the analyze result from memory manufacture company:
The memory manufacture company foud the problem is caused by VrefCA short-circuit，and the short-circuit is caused by capacitor fault.
But the VrefCA is connected in the same channel. The BOIS will report the first memory fault when the VrefCA short-circuit happen.
DIMM020/021 and DIMM120/121 are belong to the same channel, exchange each other to verify which one is the real fault memory.
And found the DIMM021 is fault and the error disappear after replaced the DIMM021.
Because the data/control/address/power lines are all connected in one multiplying channel, it may lead to the whole channel isolated by one single error.
So you should considering the whole channel when you are handling the Memory Configuration Error issues.