1. Check that the BMC has generated the log for recording watchdog timeout, as shown in Figure 1.
Figure 1 Watchdog timeout log
timer use:SMS/OS indicates that the watchdog timer times out on the OS. According to the log severity and Timer expired in the description, no action is executed for the timeout, and the BMC does not generate any alarm. As a result, the server does not reset.
2. The server can properly start after being manually reset. Check the watchdog status and find that the watchdog timer runs properly on the OS, and the timeout action is Hard Reset, as shown in Figure 2.
Figure 2 Watchdog status
3. To analyze the cause that triggers no action from the watch dog, run the history | grep "ipmitool" command on Linux to check whether any watchdog command is manually operated. A command for stopping the watchdog is queried, as shown in Figure 3.
Figure 3 Watchdog operation history
4. Figure 4 shows the watchdog status after it is stopped. The watchdog timer runs on the OS but is stopped. The timeout action is No action. Initial Countdown is 300 seconds. During this period, the system executes the ipmitool raw 0x06 0x22 command every 2 minutes. However, after the command is executed, the value of Watchdog Timer Actions is still No action. and the value of Initial Countdown is still 300 seconds, as shown in Figure 5. As a result, the hardware watchdog performs no action when the OS does not respond, and the server does not reset. The watchdog timeout log is generated, as shown in Figure 1.
Figure 4 Stopped watchdog
Figure 5 Resetting the watchdog timer after the watchdog is reset