No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Slot 3 LPU board of NE40E become unregistered due to LPU power module faulty

Publication Date:  2014-06-28 Views:  50 Downloads:  0
Issue Description

Version&patch:  NE40E&80E V600R006C00SPC300  v600r006sph013
Problem description:
In one commerical NE40E site, slot 3 LPU board suddenly became unregistered, and all service under this slot is down. Engineer went to site and try to pull/plug board, but unable to restore; then first power off this board, then pull out the board, and wait for 2 minutes and insert it, then power on, board become normal status, service restored.
However, after 4 hours, problem occurs again. Engineer restored by the same method. 
We have following alarms when problem occurs:
Jun 22 2014 16:26:47-05:00 bga-avi-ne40e-01 %%01SRM/3/LPULOSHEARTBEATRESET(l)[72]:LPU3 reset because of the heartbeat loss.
Jun 22 2014 16:26:28-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[84]:SlotID 3, i2c1, address0, channel3 voltage below fatal threshold, voltage is  0.07V.
Jun 22 2014 16:26:28-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[85]:SlotID 3, i2c1, address0, channel2 voltage below fatal threshold, voltage is  0.07V.
Jun 22 2014 16:26:28-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[86]:SlotID 3, i2c1, address0, channel1 voltage below fatal threshold, voltage is  0.14V.
Jun 22 2014 16:26:27-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[87]:SlotID 3, i2c2, address64, channel7 voltage below fatal threshold, voltage is  0.25V.
Jun 22 2014 16:26:27-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[88]:SlotID 3, i2c2, address64, channel6 voltage below fatal threshold, voltage is  0.00V.
Jun 22 2014 16:26:27-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[89]:SlotID 3, i2c2, address64, channel5 voltage below fatal threshold, voltage is  0.00V.
Jun 22 2014 16:26:27-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[90]:SlotID 3, i2c2, address64, channel4 voltage below fatal threshold, voltage is  0.00V.
Jun 22 2014 16:26:27-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[91]:SlotID 3, i2c2, address64, channel3 voltage below fatal threshold, voltage is  0.00V.
Jun 22 2014 16:26:27-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[92]:SlotID 3, i2c2, address64, channel2 voltage below fatal threshold, voltage is  0.00V.
Jun 22 2014 16:26:27-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[93]:SlotID 3, i2c2, address64, channel1 voltage below fatal threshold, voltage is  0.00V.
Jun 22 2014 16:26:27-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[94]:SlotID 3, i2c2, address64, channel0 voltage below fatal threshold, voltage is  0.03V.
Jun 22 2014 16:26:27-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[95]:SlotID 3, i2c1, address0, channel4 voltage below fatal threshold, voltage is  0.11V.
Jun 22 2014 11:51:01-05:00 bga-avi-ne40e-01 %%01SRM/3/LPULOSHEARTBEATRESET(l)[369571]:LPU3 reset because of the heartbeat loss.
Jun 22 2014 11:50:42-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[369542]:SlotID 3, i2c2, address64, channel0 voltage below fatal threshold, voltage is  0.03V.
Jun 22 2014 11:50:42-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[369543]:SlotID 3, i2c2, address64, channel1 voltage below fatal threshold, voltage is  0.03V.
Jun 22 2014 11:50:42-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[369544]:SlotID 3, i2c2, address64, channel2 voltage below fatal threshold, voltage is  0.00V.
Jun 22 2014 11:50:42-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[369545]:SlotID 3, i2c2, address64, channel3 voltage below fatal threshold, voltage is  0.00V.
Jun 22 2014 11:50:42-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[369546]:SlotID 3, i2c2, address64, channel4 voltage below fatal threshold, voltage is  0.00V.
Jun 22 2014 11:50:42-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[369547]:SlotID 3, i2c2, address64, channel5 voltage below fatal threshold, voltage is  0.00V.
Jun 22 2014 11:50:42-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[369548]:SlotID 3, i2c2, address64, channel6 voltage below fatal threshold, voltage is  0.00V.
Jun 22 2014 11:50:42-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[369549]:SlotID 3, i2c2, address64, channel7 voltage below fatal threshold, voltage is  0.28V.


  ===============display device===============
==================================================
NE40E's Device status:
Slot #    Type       Online    Register      Status      Primary   
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1         LPU        Present   Registered    Normal      NA        
2         LPU        Present   Registered    Normal      NA        
3         LPU/SPU    Present   Unregistered  Abnormal    NA        
4         LPU        Present   Registered    Normal      NA        
5         LPU        Present   Registered    Normal      NA        
6         LPU        Present   Registered    Normal      NA        
7         LPU        Present   Registered    Normal      NA        
9         MPU        Present   NA            Normal      Master    
10        MPU        Present   Registered    Normal      Slave     
11        SFU        Present   Registered    Normal      NA        
12        SFU        Present   Registered    Normal      NA        
13        SFU        Present   Registered    Normal      NA        
14        SFU        Present   Registered    Normal      NA        
15        CLK        Present   Registered    Normal      Master    
16        CLK        Present   Registered    Normal      Slave     
17        PWR        Present   Registered    Normal      NA        
18        PWR        Present   Registered    Normal      NA        
19        FAN        Present   Registered    Normal      NA        
                                         
=============================================================

Handling Process
From the alarm we can see the LPU3 reset because of the heartbeat loss:
Jun 22 2014 16:26:47-05:00 bga-avi-ne40e-01 %%01SRM/3/LPULOSHEARTBEATRESET(l)[72]:LPU3 reset because of the heartbeat loss.

But why LPU3 has heartbeat loss? From the following alarm we can see power module on LPU3 was faulty, which leads to voltage decreasing of LPU3.

Jun 22 2014 16:26:28-05:00 bga-avi-ne40e-01 %%01SRM/1/VOLBELOWFATALFAIL(l)[84]:SlotID 3, i2c1, address0, channel3 voltage below fatal threshold, voltage is  0.07V.
Root Cause
The slot 3 LPU board itself has power convertor hardware faulty(as our board input is -48V, and some chips will use smaller voltage like -5v, then power convertor will make power conversion from -48V to smaller value), this leads to LPU board lost heartbeat and reset.
Solution
The solution is to change the LPU3 main motherboard with a spare part,as there is no relationsip between the pic and the fault ,so no need to change the pic,just
keep in use.

END