Two physical CPUs are configured on the RH5885H V3. After the Oracle Linux 6.5 runs for a period of time, the following alarm is reported: "smpboot: CPU1: Not responding; smpboot: CPU3: Not responding; smpboot: CPU5: Not responding". When the FusionServer Tools-Toolkit is used to mount the system, the message "CPU1: Not responding" is still displayed. However, the BMC displays that the hardware is normal and no alarm log is generated. In addition, the self-check upon server startup is normal.
The Oracle Linux 6.5 startup alarm diagram is shown in the following figure.
The following figure is displayed when the
FusionServer Tools-Toolkit-V119 is used to mount the alarm diagram.
The following figure is displayed when the FusionServer Tools-Toolkit-V119 is used to mount the alarm diagram.
1. Use the minimization test method to locate the faulty physical CPU.
2. Test the two physical CPUs one after another. The test result shows that the CPUs are normal.
3. Replace the PCBA and rectify the fault on the PCBA. After the mainboard is replaced, the system starts normally with a single CPU but fails with two CPUs. The message "smpboot: CPU1: Not responding" is displayed.
4. Remove all cables from the rear panel. The system starts properly.
5. Remove each cable and reconnect it. It is found that the host starts properly after the USB cable of the KVM is removed.
The BMC information displayed during the minimization test shows that the host hardware is normal. The USB cable connected to the external KVM is faulty. The physical CPU cannot respond to the OS upon startup. As a result, the "smpboot: CPU1: Not responding" alarm is generated on the system startup screen.
Communicate with the customer to replace the backplane KVM and USB cable.
Log in to the BMC management port to query fault alarms and event logs. If no alarm is generated, use the minimization test method to locate the fault.