XX银行两台S9706交换机互联聚合组链路成员端口变化导致备S9706所有DP端口短暂阻塞

发布时间:  2016-01-25 浏览次数:  256 下载次数:  0
问题描述

核心S9706之间互联链路中其中一个down掉后,secondry root(S9706-2)上端口(xge0/0/0xge0/0/1xge0/0/2 xge0/0/3 xge0/0/4)发生一次短暂的阻塞过程,secondry root S9706所有DP端口短暂阻塞影响业务。

告警信息

 # Dec 17 2015 23:00:44+08:00 ZJSM9706-HX-2 MSTP/1/TOPOC:OID 1.3.6.1.2.1.17.0.2 Bridge topology change.

#Dec 17 2015 23:00:44+08:00 ZJSM9706-HX-2 MSTP/1/TOPOC:OID 1.3.6.1.2.1.17.0.2 Bridge topology change.

#Dec 17 2015 23:01:43+08:00 ZJSM9706-HX-2 LLDP/4/NBRCHGTRAP:OID: 1.0.8802.1.1.2.0.0.1 Neighbor information is changed. (LldpStatsRemTablesInserts=1, LldpStatsRemTablesDeletes=0, LldpStatsRemTablesDrops=0, LldpStatsRemTablesAgeouts=0)

#Dec 17 2015 23:01:43+08:00 ZJSM9706-HX-2 MSTP/4/PDISC:OID 1.3.6.1.4.1.2011.5.25.42.4.2.2 The port has been set to discarding state. (InstanceID=0, PortInstanceID=0, PortID=54, IfIndex=58, PortName=XGigabitEthernet6/0/4)

#Dec 17 2015 23:01:43+08:00 ZJSM9706-HX-2 MSTP/4/PDISC:OID 1.3.6.1.4.1.2011.5.25.42.4.2.2 The port has been set to discarding state. (InstanceID=0, PortInstanceID=0, PortID=53, IfIndex=57, PortName=XGigabitEthernet6/0/3)

#Dec 17 2015 23:01:43+08:00 ZJSM9706-HX-2 MSTP/4/PDISC:OID 1.3.6.1.4.1.2011.5.25.42.4.2.2 The port has been set to discarding state. (InstanceID=0, PortInstanceID=0, PortID=52, IfIndex=56, PortName=XGigabitEthernet6/0/2)

#Dec 17 2015 23:01:43+08:00 ZJSM9706-HX-2 MSTP/4/PDISC:OID 1.3.6.1.4.1.2011.5.25.42.4.2.2 The port has been set to discarding state. (InstanceID=0, PortInstanceID=0, PortID=51, IfIndex=55, PortName=XGigabitEthernet6/0/1)

#Dec 17 2015 23:01:43+08:00 ZJSM9706-HX-2 MSTP/4/PDISC:OID 1.3.6.1.4.1.2011.5.25.42.4.2.2 The port has been set to discarding state. (InstanceID=0, PortInstanceID=0, PortID=50, IfIndex=54, PortName=XGigabitEthernet6/0/0)

#Dec 17 2015 23:01:43+08:00 ZJSM9706-HX-2 IFNET/1/IF_LINKUP:OID 1.3.6.1.6.3.1.1.5.4 Interface 120 turned into UP state.(AdminStatus=1,OperStatus=1,InterfaceName=XGigabitEthernet4/0/15)

当汇聚组成员端口被移除时:

#Dec 17 2015 23:01:15+08:00 ZJSM9706-HX-2 BASETRAP/4/ENTITYINSERT:OID 1.3.6.1.4.1.2011.5.25.129.2.1.2 Physical entity is inserted.(Index=68273102, Severity=6, ProbableCause=65541, EventType=5, ContainedIn=68157449, PhysicalName="XGigabitEthernet4/0/15 Optical module")

#Dec 17 2015 23:01:06+08:00 ZJSM9706-HX-2 BASETRAP/4/ENTITYREMOVE:OID 1.3.6.1.4.1.2011.5.25.129.2.1.1 Physical entity is removed.(Index=68273102, Severity=6, ProbableCause=65541, EventType=5, ContainedIn=68157449, PhysicalName="XGigabitEthernet4/0/15 Optical module")

#Dec 17 2015 23:00:44+08:00 ZJSM9706-HX-2 MSTP/4/PFWD:OID 1.3.6.1.4.1.2011.5.25.42.4.2.1 The port has been set to forwarding state. (InstanceID=0, PortInstanceID=0, PortID=54, IfIndex=58, PortName=XGigabitEthernet6/0/4)

#Dec 17 2015 23:00:44+08:00 ZJSM9706-HX-2 MSTP/4/PFWD:OID 1.3.6.1.4.1.2011.5.25.42.4.2.1 The port has been set to forwarding state. (InstanceID=0, PortInstanceID=0, PortID=53, IfIndex=57, PortName=XGigabitEthernet6/0/3)

#Dec 17 2015 23:00:44+08:00 ZJSM9706-HX-2 MSTP/4/PFWD:OID 1.3.6.1.4.1.2011.5.25.42.4.2.1 The port has been set to forwarding state. (InstanceID=0, PortInstanceID=0, PortID=52, IfIndex=56, PortName=XGigabitEthernet6/0/2)

#Dec 17 2015 23:00:44+08:00 ZJSM9706-HX-2 MSTP/4/PFWD:OID 1.3.6.1.4.1.2011.5.25.42.4.2.1 The port has been set to forwarding state. (InstanceID=0, PortInstanceID=0, PortID=51, IfIndex=55, PortName=XGigabitEthernet6/0/1)

#Dec 17 2015 23:00:44+08:00 ZJSM9706-HX-2 MSTP/4/PFWD:OID 1.3.6.1.4.1.2011.5.25.42.4.2.1 The port has been set to forwarding state. (InstanceID=0, PortInstanceID=0, PortID=50, IfIndex=54, PortName=XGigabitEthernet6/0/0)

处理过程

拓扑:


版本:s9706 V200R003C00SPC500

 本次网络为正常运行的办公网络,在当晚进行互联端口调整时,发生该问题,由于是夜间操作,未影响客户生产业务,但为了避免后期类似问题产生,故分析此次现象根本原因。通过在S9706设备上查看STP状态及trap日志,按照时间点进行日志查看,通过查看设备上日志信息及trap内容分析,S9706-2eth-trunk1STP根端口,当shutdown eth-trunk1中的一个成员端口后,eth-trunk1STP cost值变化,导致下游STP重新计算一次,由于运行为RSTP故会发生一次P/A过程会阻塞备S9706所有DP端口,为华为RSTP正常收敛的正常现象。通过添加聚合组接口STP COST值,经实际验证,未出现STP收敛,故障排除。

根因

当两台汇聚设备之间Eth-trunk组成员端口发生变化时,RSTP发送TC报文进行stp收敛,本身网络中运行都是华为设备且为RSTP,RSTP的P/A机制发生作用,通过RSTPBPDU报文中的PA位进行快速STP收敛,由于只是调整互联聚合组成员端口未进行STP优先级调整,故初步推断为stp cost发生变化导致stp收敛。

通过查询产品资料确认该问题根因,S9706采用默认IEEE 802.1D方式计算,当聚合成员组为2个端口时STP COST 1,当其中一个downSTP cost 2.

1 端口速率与开销值对应表

Port Speed

Link Type

Path Cost 802.1D-1998

Path Cost 802.1T

Path Cost Legacy

0

-

65,535

200,000,000

200,000

10Gbps

Full-Duplex

2

2,000

2

Aggregated Link 2 Ports

1

1,000

1

Aggregated Link 3 Ports

1

666

1

Aggregated Link 4 Ports

1

500

1

解决方案

通过添加聚合组接口STP COST值为1,手动指定聚合端口组cost值避免自动计算stp  cost值,经实际验证,未出现STP收敛,故障排除。

建议与总结

在进行RSTP部署时涉及到聚合链路时进行STP COST合理规划,避免STP自动计算导致接口阻塞。

END