NE40E BGP断开备用路径E1业务中断2-3分钟

发布时间:  2016-06-20 浏览次数:  145 下载次数:  2
问题描述

拓扑如下:

cid:image002.png@01D189D2.C6A47860


现象:断开厂站至主站NE40E-X3路由器的备用路径E1,业务中断2-3分钟。

中间的二级RR 是NE40EX3   正常情况下,到10.21.177 网段路径是R2 和NE40E X3 , R1和NE40E X3 是备份链路,但是现在存在一个问题,断开NE40E-X3路由器的备用路径E1,业务中断2-3分钟。  

正常情况下NE40E X3的路由信息

<SGDnet-LN[CY]-DD.R1[NE40E-X3]>disp ip rout vpn vpn-rt 10.21.177.236
Route Flags: R - relay, D - download to fib
------------------------------------------------------------------------------
Routing Table : vpn-rt
Summary Count : 1
Destination/Mask    Proto   Pre  Cost      Flags NextHop         Interface

  10.21.177.224/28  IBGP    255  0          RD   10.21.246.64    Serial3/2/1/13:0


断开厂站至主站NE40E-X3路由器的备用路径E1后
[SGDnet-LN[CY]-DD.R1[NE40E-X3]-Serial3/2/0/15:0]shutdown  

disp bgp vpnv4 all rout 10.21.177.224 28


BGP local router ID : 10.21.249.12
Local AS number : 30021

Total routes of Route Distinguisher(30021:101): 2
BGP routing table entry information of 10.21.177.224/28:
Label information (Received/Applied): 1025/NULL
From: 10.21.247.64 (10.21.247.64)
Route Duration: 18h35m05s 
Relay IP Nexthop: 10.21.204.106
Relay IP Out-Interface: Pos3/0/1
Relay Tunnel Out-Interface:
Relay token: 0x0
Original nexthop: 10.21.247.64     下一跳变成了10.21.247.64
Qos information : 0x0
Ext-Community:RT <30021 : 101>, RT <40421 : 301>
AS-path Nil, origin incomplete, MED 0, localpref 100, pref-val 0, valid, internal, best, select, pre 255
Not advertised to any peer yet

 

处理过程

分析shutdown Serial/2/0/15:0之后,由于下挂的厂站R1与厂站R2之间并未建立ospf 邻居关系,所以NE40E并不会从厂站R2上学习到10.21.247.64的路由,而在NE40E上还有一条到10.21.247.64的16位网段的路由,出接口为pos3/0/1,因为IGP cost的原因导致优选出接口为POS3/0/1的路由,但是由于这条路由迭代不到隧道和出接口,所以导致从NE40E会到厂站的流量不通 


[SGDnet-LN[CY]-DD.R1[NE40E-X3]-Serial3/2/0/15:0]shutdown  

disp bgp vpnv4 all rout 10.21.177.224 28


BGP local router ID : 10.21.249.12
Local AS number : 30021

Total routes of Route Distinguisher(30021:101): 2
BGP routing table entry information of 10.21.177.224/28:
Label information (Received/Applied): 1025/NULL
From: 10.21.247.64 (10.21.247.64)
Route Duration: 18h35m05s 
Relay IP Nexthop: 10.21.204.106
Relay IP Out-Interface: Pos3/0/1
Relay Tunnel Out-Interface: ——迭代不出隧道出接口及对应的token
Relay token: 0x0
Original nexthop: 10.21.247.64
Qos information : 0x0
Ext-Community:RT <30021 : 101>, RT <40421 : 301>
AS-path Nil, origin incomplete, MED 0, localpref 100, pref-val 0, valid, internal, best, select, pre 255
Not advertised to any peer yet


而在更改serial3/2/1/13:0的ospf cost 后,NE40E到厂站的业务路由马上由于cost值小而优选了serial/2/1/13:0这边的路径。

[SGDnet-LN[CY]-DD.R1[NE40E-X3]-Serial3/2/0/15:0]int ser 3/2/1/13:0
[SGDnet-LN[CY]-DD.R1[NE40E-X3]-Serial3/2/1/13:0]ospf cost 1

disp bgp vpnv4 all rout 10.21.177.224 28


BGP local router ID : 10.21.249.12
Local AS number : 30021

Total routes of Route Distinguisher(30021:101): 2
BGP routing table entry information of 10.21.177.224/28:
Label information (Received/Applied): 1025/156388
From: 10.21.246.64 (10.21.246.64)
Route Duration: 10d17h43m06s
Relay IP Nexthop: 10.21.211.158
Relay IP Out-Interface: Serial3/2/1/13:0
Relay Tunnel Out-Interface: Serial3/2/1/13:0
Relay token: 0x600080b
Original nexthop: 10.21.246.64

Qos information : 0x0
Ext-Community:RT <30021 : 101>, RT <40421 : 301>
AS-path Nil, origin incomplete, MED 0, localpref 100, pref-val 0, valid, internal, best, select, pre 255
Advertised to such 3 peers:
    10.21.249.1
    10.21.249.2
    10.21.249.5
BGP routing table entry information of 10.21.177.224/28:
Label information (Received/Applied): 1025/NULL
From: 10.21.247.64 (10.21.247.64)
Route Duration: 18h35m39s 
Relay IP Nexthop: 10.21.204.106
Relay IP Out-Interface: Pos3/0/1
Relay Tunnel Out-Interface:
Relay token: 0x0 
Original nexthop: 10.21.247.64
Qos information : 0x0
Ext-Community:RT <30021 : 101>, RT <40421 : 301>
AS-path Nil, origin incomplete, MED 0, localpref 100, pref-val 0, valid, internal, pre 255, not preferred for IGP metric
Not advertised to any peer yet



厂站R1的相关配置:
interface LoopBack0
ip address 10.21.247.64 255.255.255.255

interface GigabitEthernet0/0.103——应该是与厂站R2相连的接口
vlan-type dot1q vid 103
ip address xx.xx.1.97 255.255.255.248

interface Serial1/0
fe1 unframed
link-protocol ppp
ip address 10.21.211.154 255.255.255.252
mpls
mpls ldp
#
ospf 1
area 0.0.0.12
  network 10.21.211.0 0.0.0.255
  network 10.21.247.0 0.0.0.255
#

厂站R2相关配置:
interface Serial1/0
fe1 unframed
link-protocol ppp
ip address 10.21.211.158 255.255.255.252
mpls
mpls ldp
#
interface LoopBack0
ip address 10.21.246.64 255.255.255.255
#
interface GigabitEthernet0/1.103——与厂站R1互联的接口
vlan-type dot1q vid 103
ip address xx.xx.2.98 255.255.255.248
#
ospf 1
area 0.0.0.12
  network 10.21.211.0 0.0.0.255
  network 10.21.246.0 0.0.0.255

根因

NE40E上的实现是不管是否本地始发的,只要有到下一跳的路由就会参与迭代选路,然后再根据是否符合路由发布条件决定是否要发布优选的最优路由,具体原因是在NE40E上有从POS3/0/1口收到10.21.0.0/16位的路由,所以在shutdown对应的E1接口后,BGP vpnv4路由重新选路,此时优选到下一跳IGP metric小的路由,但是这条路由无法迭代出隧道,所以虽然vpnv4里面优选了这一条,但是由于迭代不到隧道出接口所以不会发布出去,远端设备其实是收不到这条的路由的,此时业务受影响。当NE40E到与厂站R1的BGP邻居超时down后下一跳为10.21.247.64(厂站R1)的路由不再参与BGP vpnv4 的路由选路,此时NE40E上优选到厂站R2的路由(即下一跳为10.21.246.64的路由),这条路由可以正常迭代到隧道出接口,所以可以正常发布出去,此时远端设备学习到相应路由,所以业务可以正常转发。业务受影响的时间就是在shutdown对应E1链路开始,一直到NE40E与R1的BGP邻居超时down后重新优选到厂站R2发布的路由这段时间。

如果NE40E收不到那条10.21.0.0/16位的路由,那在shutdown 对应的E1链路后,下一跳为10.21.247.64的路由由于下一跳不可达而变为invalid,这时也就不会再参与选路,NE40E上依然优选下一跳为厂站R2的路由,远端设备的路由也不会发生变化,流量继续按照原来的路径进行转发,不会导致现在的问题。
解决方案
在上级设备过滤10.21.0.0/16,不发布这条路由给NE40E X3即可解决

END