Issue Description
The OS of an RH2288 cannot be accessed. The system displays the following information:
CPU 35: Machine Check Exception: 4 Bank 4: 0000000000000000
TSC 0
Kernel panic -not syncing: Uncorrected machine check
(Kernel panic -not syncing: nmi watchdog)
Handling Process
After the mainboard is replaced for three times, the problem persists. During the second replacement, the on-site engineer discovers that a pin in the CPU socket is bent. During the third replacement, the system stops at the preceding information.
The BMC logs show that a CAT error alarm about CPU2 is generated on August 9.
Check the faulty mainboard. A pin in the CPU2 socket is bent. After the faulty mainboard is replaced with a new one, the problem is resolved.
All three mainboards used in the three replacements have a bent pin in the CPU socket.
Root Cause
A pin in the CPU socket is bent. As a result, the OS cannot be accessed after the server is powered on.