No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

the LPU1 resetted as ECM heartbeat loss

Publication Date:  2014-06-05 Views:  34 Downloads:  0
Issue Description

HUAWEINE40E  LPU4 was reset at 18:06:21 Aug 18 2012 because of the heartbeat loss.

Aug 18 2012 10:13:44 HUAWEINE40E %%01SRM/3/CHANNELFAULT(l): LPU-1 No.1 channel faulted.

Aug 18 2012 10:13:44 HUAWEINE40E %%01SRM/3/CHANNELFAULT(l): LPU-1 No.0 channel faulted.

Aug 18 2012 10:13:31 HUAWEINE40E %%01IFNET/4/BOARD_DISABLE(l): Board 1 has been unavailable.

Handling Process

From historical logs, the heartbeat was lost after some moment when the board was registered. This heartbeat is of LPU board management, which is used to check the state of LPU management channel, MPU sends one hello packet to LPU board every second, and when LPU receives the hello packet, it will return a response packet. If the MPU board doesn’t receive LPU hello packet in 3 seconds, LPU management channel is considered as abnormal and channel faulty alarm will be printed. At the same time, LPU board sends one heartbeat packet to MPU every second, if MPU doesn't receive this heartbeat packet in 20 seconds, it will reset the LPU board.

Channel faulty alarms were printed at 18:06:02; it meant that MPU detected LPU abnormality between 18:05:59 and 18:06:00. The LPU was reset at 18:06:21, it meant that MPU didn't receive LPU heartbeat packet from 18:06:00, then it reset the LPU board. Both the above mentioned detection mechanism detected LPU1 failure. The heartbeat transmission channel between LPU and MPU was 1+1 backup, it is impossible that both channels were failed at the same time. And other LPUs were working normally at that moment. So we can confirm that the failure point is of CPU of LPU1.

The following logs are related to LPU1 resetting records.

Mar 10 2012 18:06:21-04:00 NB40INDE01 %%01SRM/3/LPULOSHEARTBEATRESET(l)[5864]:

LPU1 reset because of the heartbeat loss.

Mar 10 2012 18:43:31-04:00 NB40INDE01 %%01SRM/3/LPULOSHEARTBEATRESET(l)[6797]:

LPU1 reset because of the heartbeat loss.

Mar 10 2012 20:01:21-04:00 NB40INDE01 %%01SRM/3/LPULOSHEARTBEATRESET(l)[7258]:

LPU1 reset because of the heartbeat loss.

Mar 11 2012 0631:28-04:00 NB40INDE01 %%01SRM/3/LPULOSHEARTBEATRESET(l)[8773]:

LPU1 reset because of the heartbeat loss.

Mar 11 2012 12:44:49-04:00 NB40INDE01 %%01SRM/3/LPULOSHEARTBEATRESET(l)[9233]:

LPU1 reset because of the heartbeat loss.

Root Cause

CPU module of LPU1 is failed. The possible reasons may be CPU chip failure, memory failure, clock Crystal Oscillator failure, etc. For the detailed CPU failure reason, please feedback the LPU board to RND engineer for further analysis.

Solution

CPU module of LPU1 is failed, please replace it ASAP.

Suggestions
NA

END