No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

NE40E was continuously rebooting due insufficient memory

Publication Date:  2013-04-04 Views:  36 Downloads:  0
Issue Description

NE40E was continuously rebooting at that time. It was rarely – one time a two weeks. But day by day situation was becoming worse and worse. At the end any operation on switch caused device rebooting.

The version was NE40E&80E V600R001C00SPCe00
The configuration is

  SDRAM Memory Size   : 1024M bytes
  FLASH Memory Size   : 32M  bytes
  NVRAM Memory Size   : 512K bytes
  CFCARD Memory Size : 487M bytes

Feb 14 2012 16:45:50 rnd-r %%01SRM/4/MASTERREGISTER(l):Master MPU5 registered successfully.
Feb 14 2012 16:45:48 rnd-r %%01SRM/4/BOARDRESET(l):MPU6 will be reset, the reason is MPU has no memory.
[rnd-r-diagnose]


============================================================
  ===============display mpu-switch-cause===============
============================================================
 SlotNo      Date       Time    Cause                       Result           
------------------------------------------------------------------------------
6        2012/02/14  16:45:49  VRP Master No Memory        Success          

Handling Process

During checking the device Memory Usage was 89% which is very high.

Total Size: 899742920, Free Size: 92538236, Usage: 89%


The number of routes in VPN_in is more than 700 000:

==================================================================================
  ===============display bgp vpnv4 all routing-table statistics===============
==================================================================================
 
 Total number of routes from all PE:
790347
 Total routes of vpn-instance VPN_In: 789778
 Total routes of vpn-instance VPN_Mg: 155
 Total routes of vpn-instance VPN_ONM: 39
 Total routes of vpn-instance VPN_STP: 30
 Total routes of vpn-instance VPN_Tv: 197
 Total routes of vpn-instance VPN_Vo: 85
 
 
==============================================================
  ===============display bgp vpnv4 all peer===============
==============================================================
 
 BGP local router ID : 95.169.126.1
 Local AS number : 65121
 Total number of peers : 8                            Peers in established state : 8
 
  Peer            V          AS  MsgRcvd  MsgSent  OutQ  Up/Down       State PrefRcv
 
  10.1.0.0        4       65000     3179      771     0 04:47:48 Established     339
  10.1.93.0       4       65093      424      494     0 04:47:17 Established      16
  10.1.123.0      4       65123      393      459     0 04:47:33 Established      25
  10.1.125.0      4       65125      431      487     0 04:47:35 Established      22
 
  Peer of vpn instance :
 
  vpn instance VPN_In :
  83.229.242.133  4        6854    92395      642     0 04:47:48 Established  392965
  87.245.241.25   4        9002   103498      668     0 04:47:48 Established  396713
  vpn instance VPN_Tv :
  10.6.1.5        4       65000      642      701     0 04:47:48 Established      82
  10.6.1.25       4       65000      642      701     0 04:47:48 Established      82
 
During the process of learning routes, up to 200M of memory will be used for system inside processing. If memory usage is more than 90%, exceed threshold, device will reboot.  

First reboot ocured on February, 27 (October, 5 according incorrect timestamp in logs):


Oct  5 2011 12:43:00 rnd-r %%01VOSMEM/4/MEM_MAIN_USAGE_HIGH(l):The memory usage of mainboard exceeded the threshold. (Usage=86%, Threshold=85%)
Oct  5 2011 12:43:00 rnd-r SRM_BASE/2/STORAGEUTILIZEALARM: OID 1.3.6.1.4.1.2011.5.25.129.2.6.1 Storage utilization exceeded the prealarm threshold. (EntityPhysicalIndex=17170432, EntityPhysicalIndex=17170432, BaseUsageType=2, BaseUsageIndex=1, BaseTrapSeverity=3, BaseTrapProbableCause=75264, BaseTrapEventType=5, EntPhysicalName="SRU slot 6", RelativeResource="memory", BaseUsageValue=86, BaseUsageUnit=1, BaseUsageThreshold=85)

Oct  5 2011 12:43:03 rnd-r OSPF/4/OGNLSA:OID 1.3.6.1.2.1.14.16.2.12 An LSA is generated. (LsdbAreaId=0.0.0.0, LsdbType=2, LsdbLsid=10.177.1.45, LsdbRouterId=10.1.121.0, RouterId=10.1.121.0)
Oct  5 2011 12:43:09 rnd-r OSPF/4/OGNLSA:OID 1.3.6.1.2.1.14.16.2.12 An LSA is generated. (LsdbAreaId=0.0.0.0, LsdbType=2, LsdbLsid=10.177.1.45, LsdbRouterId=10.1.121.0, RouterId=10.1.121.0)
Oct  5 2011 12:43:14 rnd-r OSPF/4/OGNLSA:OID 1.3.6.1.2.1.14.16.2.12 An LSA is generated. (LsdbAreaId=0.0.0.0, LsdbType=2, LsdbLsid=10.177.1.45, LsdbRouterId=10.1.121.0, RouterId=10.1.121.0)
Oct  5 2011 12:43:18 rnd-r OSPF/4/OGNLSA:OID 1.3.6.1.2.1.14.16.2.12 An LSA is generated. (LsdbAreaId=0.0.0.0, LsdbType=2, LsdbLsid=10.177.1.45, LsdbRouterId=10.1.121.0, RouterId=10.1.121.0)
Oct  5 2011 12:43:24 rnd-r OSPF/4/OGNLSA:OID 1.3.6.1.2.1.14.16.2.12 An LSA is generated. (LsdbAreaId=0.0.0.0, LsdbType=2, LsdbLsid=10.177.1.45, LsdbRouterId=10.1.121.0, RouterId=10.1.121.0)
Oct  5 2011 12:43:30 rnd-r OSPF/4/OGNLSA:OID 1.3.6.1.2.1.14.16.2.12 An LSA is generated. (LsdbAreaId=0.0.0.0, LsdbType=2, LsdbLsid=10.177.1.45, LsdbRouterId=10.1.121.0, RouterId=10.1.121.0)
 

Oct  5 2011 12:45:05 rnd-r SRM_BASE/1/ENTITYRESET: OID 1.3.6.1.4.1.2011.5.25.129.2.1.5 Physical entity reset. (EntityPhysicalIndex=17170432, BaseTrapSeverity=3, BaseTrapProbableCause=66579, BaseTrapEventType=5, EntPhysicalContainedIn=16777216, EntPhysicalName="SRU slot 6", RelativeResource="", ReasonDescription="Because of fatal error or exception occur, the entity of Slot6 is resetting, not ready")
Oct  5 2011 12:45:05 rnd-r %%01SRM/4/BOARDRESET(l):MPU6 will be reset, the reason is MPU has no memory.

Root Cause

First reboot should be caused by routes flapping. During next several weeks, system rebooted due learning routes.

Solution
Solution is expending memory. 2G of memory is necessary.
R&D also suggested to install latest patch
After this operation problem did not recur.
Suggestions
Problem was caused by insufficient memory on the device, which could be found out from logs. Adding 1G of memory solved the problem.

END