Two physical CPUs are configured on the
RH5885H V3. After the Oracle Linux 6.5 runs for a period of time, the following
alarm is reported: "smpboot: CPU1: Not responding; smpboot: CPU3: Not
responding; smpboot: CPU5: Not responding". When the FusionServer
Tools-Toolkit is used to mount the system, the message "CPU1: Not responding"
is still displayed. However, the BMC displays that the hardware is normal and
no alarm log is generated. In addition, the self-check upon server startup is
Linux 6.5 startup alarm diagram is shown in the following figure.
The following figure is displayed when the
FusionServer Tools-Toolkit-V119 is used to mount the alarm diagram.
1. Use the minimization
test method to locate the faulty physical CPU.
2. Test the two physical CPUs one after another.
The test result shows that the CPUs are normal.
3. Replace the PCBA and rectify the fault on
the PCBA. After the mainboard is replaced, the system starts normally with a
single CPU but fails with two CPUs. The message "smpboot: CPU1: Not responding" is displayed.
4. Remove all cables from the rear panel. The
system starts properly.
5. Remove each cable and
reconnect it. It is found that the host starts properly after the USB cable of
the KVM is removed.
The BMC information displayed during the
minimization test shows that the host hardware is normal. The USB cable
connected to the external KVM is faulty. The physical CPU cannot respond to the
OS upon startup. As a result, the "smpboot: CPU1: Not responding"
alarm is generated on the system startup screen.
Communicate with the customer to replace the
backplane KVM and USB cable.
Log in to the BMC management port to query
fault alarms and event logs. If no alarm is generated, use the minimization
test method to locate the fault.