No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

An MTU Issue on an NE40E-X3 Causes BGP Peer Flapping and Service Interruptions

Publication Date:  2019-04-19 Views:  47 Downloads:  0
Issue Description

NE40E-X3: V600R007C00SPC300; patch: V600R007SPH066;

S7712: V200R008C00SPC500

Networking topology:

Alarm Information

May  3 2017 13:09:22+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[149]:The status of the peer 11.249.48.254 changed from OPENCONFIRM to ESTABLISHED. (InstanceName=Public, StateChangeReason=Up)
May  3 2017 13:06:07+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[151]:The status of the peer 11.249.48.254 changed from ESTABLISHED to IDLE. (InstanceName=Public, StateChangeReason=CEASE/BFD Session Down)
May  3 2017 13:05:11+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[153]:The status of the peer 11.249.48.254 changed from OPENCONFIRM to ESTABLISHED. (InstanceName=Public, StateChangeReason=Up)
May  3 2017 13:04:20+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[154]:The status of the peer 11.249.48.254 changed from ESTABLISHED to IDLE. (InstanceName=Public, StateChangeReason=Socket Read Failed)
May  3 2017 13:01:20+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[157]:The status of the peer 11.249.48.254 changed from OPENCONFIRM to ESTABLISHED. (InstanceName=Public, StateChangeReason=Up)
May  3 2017 13:00:43+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[158]:The status of the peer 11.249.48.254 changed from ESTABLISHED to IDLE. (InstanceName=Public, StateChangeReason=Socket Read Failed)
May  3 2017 12:57:43+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[161]:The status of the peer 11.249.48.254 changed from OPENCONFIRM to ESTABLISHED. (InstanceName=Public, StateChangeReason=Up)
May  3 2017 12:57:06+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[162]:The status of the peer 11.249.48.254 changed from ESTABLISHED to IDLE. (InstanceName=Public, StateChangeReason=Socket Read Failed)
May  3 2017 12:54:06+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[165]:The status of the peer 11.249.48.254 changed from OPENCONFIRM to ESTABLISHED. (InstanceName=Public, StateChangeReason=Up)
May  3 2017 12:53:29+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[166]:The status of the peer 11.249.48.254 changed from ESTABLISHED to IDLE. (InstanceName=Public, StateChangeReason=Socket Read Failed)
May  3 2017 12:50:29+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[169]:The status of the peer 11.249.48.254 changed from OPENCONFIRM to ESTABLISHED. (InstanceName=Public, StateChangeReason=Up)
May  3 2017 12:49:52+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[170]:The status of the peer 11.249.48.254 changed from ESTABLISHED to IDLE. (InstanceName=Public, StateChangeReason=Socket Read Failed)
May  3 2017 12:46:52+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[173]:The status of the peer 11.249.48.254 changed from OPENCONFIRM to ESTABLISHED. (InstanceName=Public, StateChangeReason=Up)
May  3 2017 12:46:15+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[174]:The status of the peer 11.249.48.254 changed from ESTABLISHED to IDLE. (InstanceName=Public, StateChangeReason=Socket Read Failed)
May  3 2017 12:43:15+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[177]:The status of the peer 11.249.48.254 changed from OPENCONFIRM to ESTABLISHED. (InstanceName=Public, StateChangeReason=Up)
May  3 2017 12:42:38+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[178]:The status of the peer 11.249.48.254 changed from ESTABLISHED to IDLE. (InstanceName=Public, StateChangeReason=Socket Read Failed)
May  3 2017 12:39:38+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[181]:The status of the peer 11.249.48.254 changed from OPENCONFIRM to ESTABLISHED. (InstanceName=Public, StateChangeReason=Up)
May  3 2017 12:39:01+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[182]:The status of the peer 11.249.48.254 changed from ESTABLISHED to IDLE. (InstanceName=Public, StateChangeReason=Socket Read Failed)
May  3 2017 12:36:01+08:00 DC-1-NE40EX8 %%01BGP/3/STATE_CHG_UPDOWN(l)[185]:The status of the peer 11.249.48.254 changed from OPENCONFIRM to ESTABLISHED. (InstanceName=Public, StateChangeReason=Up)

Handling Process

1. Check the HQ NE40E-X8 log. It is found that BGP peer flapping occurred from 12:09 and this issue keeps until the upstream interface G 1/0/0 on the master NE40E-X3 is shut down.

2. During the whole troubleshooting, maintenance engineers of the branch master NE40E-X3 performed four shutdown operations on the upstream interface G 1/0/0 at 12:09, 13:04, 13:06, and 13:10. The comparison between the logs at both ends shows that BGP peer up/down still occurs between 12:09 and 13:04.

May  3 2017 05:04:20 NE40E-X3-1 %%01BGP/3/STATE_CHG_UPDOWN(l)[19]:The status of the peer 11.249.48.253 changed from ESTABLISHED to IDLE. (InstanceName=Public, StateChangeReason=Hold Timer Expired)
May  3 2017 05:01:20 NE40E-X3-1 %%01BGP/3/STATE_CHG_UPDOWN(l)[21]:The status of the peer 11.249.48.253 changed from OPENCONFIRM to ESTABLISHED. (InstanceName=Public, StateChangeReason=Up)
May  3 2017 05:00:43 NE40E-X3-1 %%01BGP/3/STATE_CHG_UPDOWN(l)[22]:The status of the peer 11.249.48.253 changed from ESTABLISHED to IDLE. (InstanceName=Public, StateChangeReason=Hold Timer Expired)
May  3 2017 04:57:43 NE40E-X3-1 %%01BGP/3/STATE_CHG_UPDOWN(l)[25]:The status of the peer 11.249.48.253 changed from OPENCONFIRM to ESTABLISHED. (InstanceName=Public, StateChangeReason=Up)
May  3 2017 04:57:06 NE40E-X3-1 %%01BGP/3/STATE_CHG_UPDOWN(l)[26]:The status of the peer 11.249.48.253 changed from ESTABLISHED to IDLE. (InstanceName=Public, StateChangeReason=Hold Timer Expired)
May  3 2017 04:54:06 NE40E-X3-1 %%01BGP/3/STATE_CHG_UPDOWN(l)[28]:The status of the peer 11.249.48.253 changed from OPENCONFIRM to ESTABLISHED. (InstanceName=Public, StateChangeReason=Up)
May  3 2017 04:53:29 NE40E-X3-1 %%01BGP/3/STATE_CHG_UPDOWN(l)[29]:The status of the peer 11.249.48.253 changed from ESTABLISHED to IDLE. (InstanceName=Public, StateChangeReason=Hold Timer Expired)

3. During the period between 12:09 and 13:04, BGP peer up/down occurred because of Hold Timer Expired. According to the BGP implementation, after a BGP peer relationship is set up, if the device has sent routes but not received routes from the BGP peer, the error Hold Timer Expired occurs for 180s. Therefore, the BGP peer is considered unreachable and then becomes down.

4. Check the MTU. If the size of the BGP update packet is larger than the MTU of transmission devices, the Update packet is discarded.

[NE40E-X3-1-GigabitEthernet1/0/0] ping -s 467 10.249.48.253
  PING 10.249.48.253: 467  data bytes, press CTRL_C to break
    Request time out
    Request time out
    Request time out
    Request time out
    Request time out
   
  --- 10.249.48.253 ping statistics ---
    5 packet(s) transmitted
    0 packet(s) received
    100.00% packet loss

[NE40E-X3-1-GigabitEthernet1/0/0] ping -s 466 10.249.48.253
  PING 10.249.48.253: 466  data bytes, press CTRL_C to break
    Reply from 10.249.48.253: bytes=466 Sequence=1 ttl=255 time=8 ms
    Reply from 10.249.48.253: bytes=466 Sequence=2 ttl=255 time=9 ms
    Reply from 10.249.48.253: bytes=466 Sequence=3 ttl=255 time=9 ms
    Reply from 10.249.48.253: bytes=466 Sequence=4 ttl=255 time=9 ms

  --- 10.249.48.253 ping statistics ---
    4 packet(s) transmitted
    4 packet(s) received
    0.00% packet loss
    round-trip min/avg/max = 8/8/9 ms

Root Cause
After a BGP peer relationship is set up, if the device has sent routes but not received routes from the BGP peer, the error Hold Timer Expired occurs for 180s. Therefore, the BGP peer is considered unreachable and then becomes down.
Solution

Workaround: Shut down the link between NE40E-X3-1 and NE40E-X8.

Solution: Change the MTU of China Telecom's transport devices to 1500.

END