No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

BGP Peers Could Not Be Established Within 10 Minutes After Master&Slave MPU Switchover Because of the Overlong Wait-time Set for OSPF GR

Publication Date:  2012-07-27 Views:  2 Downloads:  0
Issue Description
In an on-site test, after the master/slave MPU switchover on an NE40E, the DMS lost contact with the NE40E for 8�10 minutes. The IP address of the NE40E could not be successfully pinged on the DMS. 
Alarm Information
Jul 28 2008 13:53:32 BPLNE40E01MALAD %%01HA/5/S2M(D): Slave board in slot 10 changed to master. /// Master/slave switchover
Jul 28 2008 13:53:55 BPLNE40E01MALAD %%01BGP/6/GR_NOTIFY_ENTER(D): BGPentered into graceful restart state./// BGP enters into the GR state
Jul 28 2008 13:53:55 BPLNE40E01MALAD %%01BGP/6/TIMER_GR_PROTECT_CR(D): GR protection timer is created for IPv4-UNC instance, and initial time is 5 minutes: 0 seconds./// Create the timer for the P
Jul 28 2008 13:54:00 BPLNE40E01MALAD %%01BGP/6/TIMER_CR_EXPIRED(D): Connect retry timer expired for peer 10.201.60.132. (BGP address family=public) /// Constantly attempt to get connected
Jul 28 2008 13:54:00 BPLNE40E01MALAD %%01BGP/6/FSM_EVENT_CHANGED(D): Peer 10.201.60.132 receives CRTimerExpired event when in IDLE state. (BGP address family=public)
Jul 28 2008 13:54:00 BPLNE40E01MALAD %%01BGP/6/FSM_EVENT_CHANGED(D): Peer 10.201.60.132 receives Start event when in IDLE state. (BGP address family=public)
Jul 28 2008 13:54:00 BPLNE40E01MALAD %%01BGP/6/TIMER_CR_EXPIRED(D): Connect retry timer expired for peer 10.201.60.130. (BGP address family=public)
Jul 28 2008 13:54:00 BPLNE40E01MALAD %%01BGP/6/FSM_EVENT_CHANGED(D): Peer 10.201.60.130 receives CRTimerExpired event when in IDLE state. (BGP address family=public)
Jul 28 2008 13:54:00 BPLNE40E01MALAD %%01BGP/6/FSM_EVENT_CHANGED(D): Peer 10.201.60.130 receives Start event when in IDLE state. (BGP address family=public)
Jul 28 2008 13:54:00 BPLNE40E01MALAD %%01BGP/6/TIMER_CR_EXPIRED(D): Connect retry timer expired for peer 10.201.60.129. (BGP address family=public)
Jul 28 2008 13:54:00 BPLNE40E01MALAD %%01BGP/6/FSM_EVENT_CHANGED(D): Peer 10.201.60.129 receives CRTimerExpired event when in IDLE state. (BGP address family=public)
Jul 28 2008 13:54:00 BPLNE40E01MALAD %%01BGP/6/FSM_EVENT_CHANGED(D): Peer 10.201.60.129 receives Start event when in IDLE state. (BGP address family=public)
Jul 28 2008 13:58:54 BPLNE40E01MALAD %%01BGP/6/GR_PROTECTION_EXPIRE(D): GR protection timer expired for IPv4-VPN instance. ///The timer times out
Jul 28 2008 14:03:55 BPLNE40E01MALAD %%01BGP/6/TIMER_CR_EXPIRED(D): Connect retry timer expired for peer 10.201.60.132. (BGP address family=public) /// Constantly attempt to get connected
Jul 28 2008 14:03:55 BPLNE40E01MALAD %%01BGP/6/FSM_EVENT_CHANGED(D): Peer 10.201.60.132 receives CRTimerExpired event when in IDLE state. (BGP address family=public)
Jul 28 2008 14:03:55 BPLNE40E01MALAD %%01BGP/6/FSM_EVENT_CHANGED(D): Peer 10.201.60.132 receives Start event when in IDLE state. (BGP address family=public)
Jul 28 2008 14:03:55 BPLNE40E01MALAD %%01BGP/6/TIMER_CR_EXPIRED(D): Connect retry timer expired for peer 10.201.60.130. (BGP address family=public)
Jul 28 2008 14:03:55 BPLNE40E01MALAD %%01BGP/6/FSM_EVENT_CHANGED(D): Peer 10.201.60.130 receives CRTimerExpired event when in IDLE state. (BGP address family=public)
Jul 28 2008 14:03:55 BPLNE40E01MALAD %%01BGP/6/FSM_EVENT_CHANGED(D): Peer 10.201.60.130 receives Start event when in IDLE state. (BGP address family=public)
Jul 28 2008 14:03:55 BPLNE40E01MALAD %%01BGP/6/TIMER_CR_EXPIRED(D): Connect retry timer expired for peer 10.201.60.129. (BGP address family=public)
Jul 28 2008 14:03:55 BPLNE40E01MALAD %%01BGP/6/FSM_EVENT_CHANGED(D): Peer 10.201.60.129 receives CRTimerExpired event when in IDLE state. (BGP address family=public)
Jul 28 2008 14:03:55 BPLNE40E01MALAD %%01BGP/6/FSM_EVENT_CHANGED(D): Peer 10.201.60.129 receives Start event when in IDLE state. (BGP address family=public)
Jul 28 2008 14:03:57 BPLNE40E01MALAD %%01BGP/6/RECV_GRNOTIFY_MSG(D): Received GR end notification from RM. (Instance ID=0, Protocol ID=13, Message type=1) /// GR terminates
Jul 28 2008 14:04:00 BPLNE40E01MALAD %%01BGP/6/FSM_EVENT_CHANGED(D): Peer 10.201.60.132 receives RecvKeepAliveMessage event when in OPENCONFIRM state. (BGP address family=public) /// Get connected
Jul 28 2008 14:04:00 BPLNE40E01MALAD %%01BGP/6/SESSION_UP_INCREASE(D): The number of sessions in up state increased to 1.
Jul 28 2008 14:04:00 BPLNE40E01MALAD %%01BGP/6/SESSION_UP_INCREASE(D): The number of sessions in up state increased to 3. 

 
Handling Process
By default, the wait-time of BGP GR is 300s. According to the preceding analysis of the GR principle, the wait-time of OSPF GR must be shorter than the wait-time of BGP GR.
According to the experience, when the OSPF routing table contains no more than 2,000 entries, the default wait-time of GR, 120s, may be adopted. On the site, the engineer ran the following commands to adjust the wait-time of OSPF GR. Then the problem was solved.
ospf 1
graceful-restart wait-time 120 

 
Root Cause
1. The DMS accessed the NE40E through MPLS /BGP/ L3 VPN. The NE40E was configured with BGP GR and OSPF GR. The wait-time of OSPF GR was set to 600s (10 minutes).
2. Checking the alarms, the engineer found that the BGP peer was successfully created 10 minutes after the master/slave switchover. Because the DMS needed to use the BGP VPNv4 routing table for communications with devices, the engineer believed that the disconnection of the DMS from the devices for 10 minutes was caused by the failure to create the BGP peers within 10 minutes, hence the failure to create the BGP VPNv4 routing table.
3. The network boasts only four NE40Es. Therefore, the routing table contains a small number of entries and the BGP peers should have been created within one minute. From the log, the engineer found the repeated attempts to create the BGP peer after the master/slave switchover. However, such attempts came to no avail.
4. Through analysis, the engineer believed that the creation of a BGP peer, which used the loopback interface, required the TCP connection upon the creation of the IGP routing table. After the master/slave switchover, the OSPF peer could be re-created on the IP layer through the interconnected interface. However, because the wait-time of OSPF GR was 10 minutes, the OSPF routing table could be created and the FIB table on the LPU could be updated only after the wait-time was due. During the wait-time, no IGP routing table was created. Thus, the TCP connection necessary for creation of the BGP peer could not be set up. In this case, the BGP routing table could not be created. As a result, the DMS could not reach the devices. 

 
Suggestions
Null

END