No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Node CH121 V3 rebooted without any reason

Publication Date:  2018-02-20 Views:  62 Downloads:  0
Issue Description

Node CH121 V3 from E9000 was rebooted without any reason.

Alarm Information

After rebooting the node several times, the alarm was founded in the logs: System Restart [Unknown][IPMB]

 

Handling Process

After upgrading the BIOS and IBMC to the latest version the issue is the same, the CH121 V3 node is rebooting.

Check the logs and nothing was related to a Major alarm. In the fdm_log file the CPU received some errors:

 

Root Cause

The root cause for restarting your system it comes fro the Kernel BUG :

 What this mean,  the Kernel Bug is referring to the drivers of your Emulex MEZ910 card. This problems causes the kworker of the CPU: CPU: 11 PID: 10375 Comm: kworker/u80:0 Not tainted 3.10.0-327.28.3.el7.x86_64 #1, where .10.0-327.28.3.el7.x86_64 is the kernel version.

What is kworker? kworker means a Linux kernel process doing "work" (processing system calls). You can have several of them in your process list: kworker/0:1 is the one on your first CPU core, kworker/1:1.

Why does kworker hog your CPU? To find out why a kworker is wasting your CPU, nee to create CPU backtraces: watch your processor load (with top or something) and in moments of high load through kworker, execute echo l > /proc/sysrq-trigger to create a backtrace. (On Ubuntu, this needs you to login with sudo -s).

Solution

The incorrect functionally of the MZ910-2*10GE+2*8G FC card causes wasting CPU resources. Need to replace the MZ910 PCIe card and the CPU2:

 

After the replacement f the MZ910 PCIe card need t use the latest drivers from Huawei Support website:

http://support.huawei.com/enterprise/en/server/ch121-v3-pid-21070741/software/22923491/?idAbsPath=fixnode01%7C7919749%7C9856522%7C21782478%7C19955021%7C21070741



 

 

Suggestions

Is clear that the issue was repeated also in the past without knowing about that:

 

 

END