No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

MPU in an NE5000E Cluster Failed to Register Due to Voltage Drop

Publication Date:  2013-09-30 Views:  57 Downloads:  0
Issue Description
The standby MPU ccc2/10 in the slave CCC of an NE5000E cluster reset due to heartbeat loss, and failed to register.
Handling Process

Huawei performed the following operations to diagnose the fault:

Collected onsite information and found that the working voltage of ccc2/10 dropped 1 minute before ccc2/10's heartbeat loss.

aug 27 2010 21:45:31 r1-c-xpf-ne5000e %%01srm/1/volbelowfatalfail(l):slotid ccc2/10, address64, channel1 voltage below fatal threshold, voltage is 0.00v.

aug 27 2010 21:45:31 r1-c-xpf-ne5000e %%01srm/1/volbelowfatalfail(l):slotid ccc2/10, address64, channel0 voltage below fatal threshold, voltage is 0.00v.

aug 27 2010 21:45:31 r1-c-xpf-ne5000e %%01srm/1/volbelowfatalfail(l):slotid ccc2/10, address64, channel11 voltage below fatal threshold, voltage is 0.19v.

… …

The power supply was unstable. The voltage drop caused power-off of the standby MPU ccc2/10. The active MPU ccc2/9 proactively initiated a heartbeat check. If the standby MPU ccc2/10 did not respond within 1 minute, it was reset.

Then, the system attempts several times to power on ccc2/10 again.

aug 27 2010 21:51:48 r1-c-xpf-ne5000e %%01srm/3/pwronfinish(l):slotidccc2/10, board power-on finish!

aug 27 2010 21:51:48 r1-c-xpf-ne5000e %%01srm/3/mbusreg(l):slotidccc2/10, monitorbus node register!

 

However, ccc2/10 failed to power on and therefore failed to register.
Root Cause

There is voltage monitoring and power-on control logic on an MPU. When a voltage on the MPU drops to a value less than the lower threshold of the normal working voltage, the voltage monitoring logic powers off the MPU. In this issue, a voltage on ccc2/10 dropped abnormally. As a result, ccc2/10 was powered off and reset due to heartbeat loss.

When the system attempted to power on ccc2/10 again, the power-on control logic checked voltages and found that a voltage was out of the normal working voltage. Therefore, ccc2/10 failed to power on and register.

This issue occurred only on a few boards.

The faulty needed to be returned to Huawei headquarters for locating the faulty chip.
Solution
The standby MPU was replaced. Four MPUs are available in an NE5000E cluster. Even if only one MPU works normally, the cluster can function properly. Therefore, MPU replacement has no risk.
Suggestions
Huawei checks board running status and registration every day and traces equipment running status in real time so that potential faults can be found in a timely manner.

END