No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

the 3X CX912 switch board was abnormally restarted. The customer switched services to the 2X switch board manually to recover the service.

Publication Date:  2018-03-23 Views:  89 Downloads:  0
Issue Description

the 3X CX912 switch board was abnormally restarted. The customer switched services to the 2X switch board manually to recover the service.

Handling Process

The CPU of Switchboard includes 4 kernels(CPU0~CPU3)The CPU3 usage of the 2X switchboard is kept at 100% from 2018-03 -22 07:46(CPU usage%= User%+ Kernel%). Here is log screenshot:

 

 The thread of the stacking module is bound to CPU3.  the CPU3 usage of the 2X switchboard frequently appears to 100%, it causes the stack protocol communication between the active and standby stack members becomes abnormal. As a result, the stack splits, triggering the restart of lower priority switchboard (3X CX912 has the lower priority than 2X, so 3X switchboard restart).

The priority of the 2X switchboard is 150, 3X switchboard uses the default priority 100, so the priority of the 2X switchboard is higher than 3X switchboard (the default priority 100 of 3X switchboard will not be shown in log), as shown in the following figure:

Current switchboard software version is 3.10 which is too low, and it has the issue of frequently writing logs. When logs are frequently written, the log files are compressed, replaced, and deleted. These operations occupy a large number of CPU resources. CPU resources are occupied for a long time, causing stack protocol packets to fail to be processed. As a result, the switchboard restart mechanism with a lower priority is triggered.

Why a Stack Fail to Be Established After 3X Switchboard Restart and cause service affect:
The 2X switchboard frequently writes logs, after 3X switchboard restarting, the stack communication is still in abnormal state. As a result, when the 3X switchboard tried to establish the stack and falsely thinking the stack configuration conflicts, then the stack protocol stops working and the stack system cannot be set up. When 2X, 3X stack couldn’t be established, then both switchboards works as master, so the service affected.
Below log shows configuration is conflict, then stack cannot to be set up.

 

 

Root Cause

the version about the switchboard is too low, need to upgrade the version about it.

Solution

Upgrade the switch software to the latest 5.52 version (also upgrade the compatible CPLD version)
Download link:
http://support.huawei.com/enterprise/en/software/22468874-SW1000282975
Upgrade guide:
http://support.huawei.com/enterprise/en/doc/EDOC1000081379?idPath=7919749%257C9856522%257C21782478%257C19955021%257C19961380

Configure dual-active detection for the stack switchboards (Please make sure the stack domain ID for each E9000 is different, otherwise, if there are 2 or more E9000 have the same stack domain ID, then the service will be affected, so we don’t suggest to do it at today’s plan)
http://support.huawei.com/enterprise/en/doc/EDOC1000038842/?idPath=7919749%257C9856522%257C21782478%257C19955021%257C19961380

END