No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

J vendor Swith Power unit incident causing NE40E downstream PS core service interruption

Publication Date:  2012-07-27 Views:  67 Downloads:  0
Issue Description
NE40E version:V003R003C02B608+SPC60;
networking overview:

Detailed records of fault phenomena:
NE40E(KPGCE01-Gi) work as MCE that connect to GGSN(KPGGG02) as well as uplink to J Switch. PS core Report the incident regarding the 3G suscriber can not reach internet which is GI interface.


Alarm Information
NE40E alarm as below:
Sep 11 2011 22:05:42 KPGCE01-Gi %%01SRM/2/NODEFAULT(l):-Slot=6; PIC1 of LPU6 che
cked interrupt, because RXPowLowAlarm of XFP ALARM is detected. (Reason="XFP Low
Rx power alarming!")
Sep 11 2011 22:05:12 KPGCE01-Gi %%01SRM/2/NODERESUME(l):-Slot=6; In PIC1 of LPU6
, RXPowLowAlarm of XFP ALARM resumed from failure.
Sep 11 2011 22:04:41 KPGCE01-Gi %%01HWCM/4/TRAPLOG(l): OID 1.3.6.1.4.1.2011.6.10
.2.1 configure changed. (EventIndex=2538, CommandSource=1, ConfigSource=3, Confi
gDestination=2)
Sep 11 2011 22:02:43 KPGCE01-Gi %%01SRM/2/NODEFAULT(l):-Slot=6; PIC0 of LPU6 che
cked interrupt, because RXPowLowAlarm of XFP ALARM is detected. (Reason="XFP Low
Rx power alarming!")
Sep 11 2011 22:02:21 KPGCE01-Gi %%01HWCM/4/TRAPLOG(l): OID 1.3.6.1.4.1.2011.6.10
.2.1 configure changed. (EventIndex=2537, CommandSource=1, ConfigSource=3, Confi
gDestination=2)
Sep 11 2011 22:01:53 KPGCE01-Gi %%01SRM/2/NODERESUME(l):-Slot=6; In PIC0 of LPU6
, RXPowLowAlarm of XFP ALARM resumed from failure.
Sep 11 2011 22:01:24 KPGCE01-Gi %%01HWCM/4/TRAPLOG(l): OID 1.3.6.1.4.1.2011.6.10
.2.1 configure changed. (EventIndex=2536, CommandSource=1, ConfigSource=3, Confi
gDestination=2)
Sep 11 2011 21:46:39 KPGCE01-Gi %%01HWCM/4/EXIT(l): Exit from configure mode.
Sep 11 2011 21:35:03 KPGCE01-Gi %%01SHELL/4/TELNETFAILED(l): Failed to login thr
ough telnet. (Ip=10.239.41.12, Times=1)
Sep 11 2011 21:32:49 KPGCE01-Gi %%01HWCM/4/EXIT(l): Exit from configure mode.
Sep 11 2011 20:18:43 KPGCE01-Gi %%01SRM/2/NODEFAULT(l):-Slot=6; PIC0 of LPU6 che
cked interrupt, because RXPowLowAlarm of XFP ALARM is detected. (Reason="XFP Low
Rx power alarming!")
Sep 11 2011 20:18:42 KPGCE01-Gi %%01SRM/2/NODEFAULT(l):-Slot=6; PIC1 of LPU6 che
cked interrupt, because RXPowLowAlarm of XFP ALARM is detected. (Reason="XFP Low
Rx power alarming!")
Sep 11 2011 20:18:41 KPGCE01-Gi %%01IFNET/4/LINKNO_STATE(l): The line protocol o
n the interface Eth-Trunk5.13 has entered the DOWN state.
Sep 11 2011 20:18:41 KPGCE01-Gi %%01IFNET/4/LINKNO_STATE(l): The line protocol o
n the interface Eth-Trunk5.12 has entered the DOWN state.
Sep 11 2011 20:18:40 KPGCE01-Gi %%01IFNET/4/LINKNO_STATE(l): The line protocol o
n the interface Eth-Trunk5.11 has entered the DOWN state.
Sep 11 2011 20:18:40 KPGCE01-Gi %%01IFNET/4/LINKNO_STATE(l): The line protocol o
n the interface Eth-Trunk5.10 has entered the DOWN state.
Sep 11 2011 20:18:40 KPGCE01-Gi %%01PHY/4/PHY_STATUS_UP2DWN(l):-Slot=6; GigabitE
thernet6/1/0: change status to down.


Handling Process
1. the Alarm fistly appear in GGSN for GI loss internet ,PS core engineer checking their device is normal. so found Routing issue that the packet can reach 10.223.7.100 which is NE40E but next hop is unreachable refer to the tracert route result below.
Tracing route to google-public-dns-a.google.com [8.8.8.8] over a maximum of 30 hops:
1 * * * Request timed out.
2 366 ms 379 ms 399 ms 10.223.7.110
3 * * * Request timed out.
4 * * * Request timed out.
5 * * * Request timed out.
So GGSN is normal.
2. NE40E can see the Eth-Trunk5.X all sub-trunk interface go down. the further troubleshooting to interface can found interface went down as well due to 'XFP Low Rx power alarming!' refer to the alarm above.
So NE40E's uplink were down due to can not reach the optical power of the uplink device. we suspect the uplink SW is abnormal.we chase customer to ask J engieer check the device state.
3. J engineer found the SW loss power due to the power breaker went down in abnormity reason, the SW have 1+1 AC Power both connect to un-industrialisation power breaker.After the power on the service reconvery.


Root Cause
as figure above, we can suspect the reason as below:
1. GGSN issue.
2. NE40E issue.
3. J vendor  Switch&Firewall.

Suggestions
Null

END