BPDU攻击导致接入VPLS的交换机环流量中断,VPLS的PE节点NE40E的CPU升高

发布时间:  2011-09-27 浏览次数:  148 下载次数:  4
问题描述

网络拓扑如下(或见附件):


如图所示,10ge的PRP环在中间连接多个GE环,3个GE环开启了MSTP来消除环路,同时连接到VPLS的网络中

3台ne40e升级从V300R001C01B05D 到V300R003C02B608,升级后NE40E的CPU升高,业务出现中断,之后客户拔出BLOCK口所连线缆,业务恢复,之后在客户的同意下,在4月24号早晨开启了STP,业务正常运行到4月25号的晚上23点47分,之后业务再次中断,客户立刻拔出了连接在MSTP block端口上的线缆,同时关闭了外部交换机的MSTP,之后业务恢复




NE40E log
CPU的上升发生在flapping问题发生的时候

Apr 25 2009 23:48:50 ATC21_NE40E %%01VOSCPU/4/CPU_USAGE_HIGH(l): The CPU is overloaded, and the tasks with top three CPU occupancy are LDP, IPCR, L2V. (CpuUsage=100%, Threshold=95%)Apr 24 2009 00:34:29 ATC21_NE40E %%01HWCM/4/EXIT(l): Exit from configure mode.

The CPU of the NE40E in the ATC52 ring was overloaded.

Apr 25 2009 23:49:47 ATC52_NE40E %%01VOSCPU/4/CPU_USAGE_HIGH(l): The CPU is overloaded, and the tasks with top three CPU occupancy are LDP, LSPM, IPCR. (CpuUsage=98%, Threshold=95%)
Apr 25 2009 23:49:44 ATC52_NE40E %%01SRM/4/CPUMEMALARM(l): Board 10 CPU usage is Lower than threshold.
Apr 25 2009 23:49:34 ATC52_NE40E %%01SRM/4/CPUMEMALARM(l): Board 10 CPU usage is Upper than threshold.
Apr 25 2009 23:48:48 ATC52_NE40E %%01VOSCPU/4/CPU_USAGE_HIGH(l): The CPU is overloaded, and the tasks with top three CPU occupancy are LDP, IPCR, L2V. (CpuUsage=99%, Threshold=95%)

The CPU of the NE40E in the AMTC52 ring was overloaded.

Apr 25 2009 23:49:47 AMTC_NE40E %%01SRM/4/CPUMEMALARM(l): Board 9 CPU usage is Lower than threshold.
Apr 25 2009 23:49:32 AMTC_NE40E %%01SRM/4/CPUMEMALARM(l): Board 9 CPU usage is Upper than threshold.
Apr 25 2009 23:48:57 AMTC_NE40E %%01VOSCPU/4/CPU_USAGE_HIGH(l): The CPU is overloaded, and the tasks with top three CPU occupancy are LDP, IPCR, L2V. (CpuUsage=100%, Threshold=95%)

基于LOG记录,CPU升高是由于接受到了大量TC包。
Apr 25 2009 23:48:45 ATC52_NE40E %%01MSTP/6/RECEIVE_CISTTC(l): MSTP received BPDU with TC, instance 0, port name is GigabitEthernet6/0/0.
Apr 25 2009 23:48:48 ATC52_NE40E %%01MSTP/6/RECEIVE_CISTTC(l): MSTP received BPDU with TC, instance 0, port name is GigabitEthernet7/0/0.
Apr 25 2009 23:48:48 ATC52_NE40E %%01VOSCPU/4/CPU_USAGE_HIGH(l): The CPU is overloaded, and the tasks with top three CPU occupancy are LDP, IPCR, L2V. (CpuUsage=99%, Threshold=95%)
Apr 25 2009 23:48:49 ATC52_NE40E %%01MSTP/6/RECEIVE_CISTTC(l): MSTP received BPDU with TC, instance 0, port name is GigabitEthernet6/0/0.
Apr 25 2009 23:48:49 ATC52_NE40E %%01MSTP/6/RECEIVE_CISTTC(l): MSTP received BPDU with TC, instance 0, port name is GigabitEthernet7/0/0.


交换机LOG
The BPDU packet caused the switch directly connected to the NE40E to perform calculation again (the record on April 25 was overwritten because the network kept flapping from April 25 to May 7).
这个BPDU包引起了直连NE40E的交换机重新计算,
[S5624P_ATE_22-hidecmd]_dis stp hist unit 1
---------- Stp Instance 0 history trace ----------

GigabitEthernet1/0/26 Desi->Root at 2009/05/07 00:12:39
{0.0018-822d-78be 0 4096.00e0-fc64-101e 0 0.0018-822d-78be 128.23}

GigabitEthernet1/0/26 Root->Desi at 2009/05/07 00:12:38
{32768.223e-50fe-aa64 0 4096.00e0-fc64-101e 0 32768.223e-50fe-aa64 128.23}

 



处理过程
并分析LOG发现接受到大块的TC包可能引起CPU的上升,同时基于实验室的现象还原,如果NE40E每秒收到10个TC包,LDP,L2V,MSTP,LSPM的CPU就会上升,最终整个NE40E的CPU就会达到100%。

基于LOG,发现来自一个未知设备(mac 223e-50fe-aa64)的BPDU,透过VPLS发送,在GE环的交换机收到了2类BPDU,因此,MSTP进行了重复计算,网络flapping发生,重复地block和unblock端口生成了TC包。当交换机的MSTP状态flapping生成了TC包后,交换机持续地删除MAC和ARP条目,这个就导致了3层流量中断,交换机网管中断。之后由于持续收到TC包NE40E也持续地删除本地和远端的MAC,持续地删除导致了NE40E的CPU上升。

基于交换机的STP接口状态变化,所有的3个环接受BPDU的包都来自一个MAC地址223e-50fe-aa64,但是检查所有的3个环,都没有找到这个地址,唯一能够确认的就是这个BPDU是透过VPLS环传送的,同时TC包在3个交换机环中产生,这里也证明了BPDU是通过VPLS环进行过渡的,因为3个交换机环非直连。



根因
N/A
解决方案

由于无法找到攻击BPDU的来源,同时考虑到BPDU可能又来自另一个MAC,因此建议过虑所有经过NE40E的BPDU包(NE40E自己产生的BPDU包不做过虑)

如果BPDU引起的MSTP在GE环里面的重计算能够防止的话,这个问题就解决了

策略配置在3台NE40E连接交换机的接口上,这样BPDU就不能经过中间的VPLS传输。
Configure an ACL rule because BPDU has its special multicast MAC address.
配置ACL基于BPDU特别的MAC地址
[Quaidway] acl number 4001
[Quaidway-acl-ethernetframe-4001] rule 10 deny dest-mac 0180-c200-0000 ffff-ffff-ffff
[Quaidway] traffic classifier test operator or
[Quaidway-classifier-test] if-match acl 4001
[Quaidway] traffic behavior test
[Quaidway-behavior-test] quit
[Quaidway] traffic policy test
[Quaidway-trafficpolicy-test] undo share-mode
[Quaidway-trafficpolicy-test] classifier test behavior test


将策略应用到每个已有的VLAN就例如ATC21_NE40E上的7/0/0接口
[Quaidway] interface GigabitEthernet7/0/0
[Quaidway-GigabitEthernet7/0/0] traffic-policy test outbound vlan 1 to 4094 link-layer

 



建议与总结
在2层组网中,尤其是存在交换机的STP或者MSTP组网的时候尤其需要注意BPDU保护

END