Node CH121 V3 from E9000 was rebooted without any reason.
After rebooting the node several times, the alarm was founded in the logs: System Restart [Unknown][IPMB]
After upgrading the BIOS and IBMC to the latest version the issue is the same, the CH121 V3 node is rebooting.
Check the logs and nothing was related to a Major alarm. In the fdm_log file the CPU received some errors:
The root cause for restarting your system it comes fro the Kernel BUG :
What this mean, the Kernel Bug is referring to the drivers of your Emulex MEZ910 card. This problems causes the kworker of the CPU: CPU: 11 PID: 10375 Comm: kworker/u80:0 Not tainted 3.10.0-327.28.3.el7.x86_64 #1, where .10.0-327.28.3.el7.x86_64 is the kernel version.
What is kworker? kworker means a Linux kernel process doing "work" (processing system calls). You can have several of them in your process list: kworker/0:1 is the one on your first CPU core, kworker/1:1.
Why does kworker hog your CPU? To find out why a kworker is wasting your CPU, nee to create CPU backtraces: watch your processor load (with top or something) and in moments of high load through kworker, execute echo l > /proc/sysrq-trigger to create a backtrace. (On Ubuntu, this needs you to login with sudo -s).
The incorrect functionally of the MZ910-2*10GE+2*8G FC card causes wasting CPU resources. Need to replace the MZ910 PCIe card and the CPU2:
After the replacement f the MZ910 PCIe card need t use the latest drivers from Huawei Support website:
Is clear that the issue was repeated also in the past without knowing about that: