No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

BPDU Attack Caused Traffic on Rings Interrupted and CPU Usage of NE40Es High

Publication Date:  2013-10-23 Views:  26 Downloads:  0
Issue Description

Network topology:

CID:http://3ms.huawei.com/icase/servlet/download?dlType=HtmlAreaImage&imageId=34100

 

10GE RPR rings carried VPLS services and connected to three GE rings, on which MSTP was enabled to prevent loops.

Three NE40Es on the RPR rings were upgraded from V300R001C01B05D to V300R003C02B608. After the upgrade, the CPUs of these NE40Es became overloaded and services were interrupted. After the customer disconnected cables from the MSTP block ports, services recovered. With customer's consent, frontline personnel enabled STP on April 24. Services were interrupted again at 23:47 on April 25. The customer then disconnected cables from the MSTP block ports and disabled MSTP on the external switches, services recovered.

NE40E log:

The CPU usage became high when the network flapped.

Apr 25 2009 23:48:50 ATC21_NE40E %%01VOSCPU/4/CPU_USAGE_HIGH(l): The CPU is overloaded, and the tasks with top three CPU occupancy are LDP, IPCR, L2V. (CpuUsage=100%, Threshold=95%)Apr 24 2009 00:34:29 ATC21_NE40E %%01HWCM/4/EXIT(l): Exit from configure mode.

The CPU of the NE40E in the ATC52 ring was overloaded.

Apr 25 2009 23:49:47 ATC52_NE40E %%01VOSCPU/4/CPU_USAGE_HIGH(l): The CPU is overloaded, and the tasks with top three CPU occupancy are LDP, LSPM, IPCR. (CpuUsage=98%, Threshold=95%)
Apr 25 2009 23:49:44 ATC52_NE40E %%01SRM/4/CPUMEMALARM(l): Board 10 CPU usage is Lower than threshold.
Apr 25 2009 23:49:34 ATC52_NE40E %%01SRM/4/CPUMEMALARM(l): Board 10 CPU usage is Upper than threshold.
Apr 25 2009 23:48:48 ATC52_NE40E %%01VOSCPU/4/CPU_USAGE_HIGH(l): The CPU is overloaded, and the tasks with top three CPU occupancy are LDP, IPCR, L2V. (CpuUsage=99%, Threshold=95%)

The CPU of the NE40E in the AMTC52 ring was overloaded.

Apr 25 2009 23:49:47 AMTC_NE40E %%01SRM/4/CPUMEMALARM(l): Board 9 CPU usage is Lower than threshold.
Apr 25 2009 23:49:32 AMTC_NE40E %%01SRM/4/CPUMEMALARM(l): Board 9 CPU usage is Upper than threshold.
Apr 25 2009 23:48:57 AMTC_NE40E %%01VOSCPU/4/CPU_USAGE_HIGH(l): The CPU is overloaded, and the tasks with top three CPU occupancy are LDP, IPCR, L2V. (CpuUsage=100%, Threshold=95%)

According to the log, the CPU usage became high because a large number of TC packets were received.

Apr 25 2009 23:48:45 ATC52_NE40E %%01MSTP/6/RECEIVE_CISTTC(l): MSTP received BPDU with TC, instance 0, port name is GigabitEthernet6/0/0.
Apr 25 2009 23:48:48 ATC52_NE40E %%01MSTP/6/RECEIVE_CISTTC(l): MSTP received BPDU with TC, instance 0, port name is GigabitEthernet7/0/0.
Apr 25 2009 23:48:48 ATC52_NE40E %%01VOSCPU/4/CPU_USAGE_HIGH(l): The CPU is overloaded, and the tasks with top three CPU occupancy are LDP, IPCR, L2V. (CpuUsage=99%, Threshold=95%)
Apr 25 2009 23:48:49 ATC52_NE40E %%01MSTP/6/RECEIVE_CISTTC(l): MSTP received BPDU with TC, instance 0, port name is GigabitEthernet6/0/0.
Apr 25 2009 23:48:49 ATC52_NE40E %%01MSTP/6/RECEIVE_CISTTC(l): MSTP received BPDU with TC, instance 0, port name is GigabitEthernet7/0/0.

Switch log:

The BPDU packet caused the switch directly connected to the NE40E to perform calculation again (the record on April 25 was overwritten because the network kept flapping from April 25 to May 7).

[S5624P_ATE_22-hidecmd]_dis stp hist unit 1
---------- Stp Instance 0 history trace ----------

GigabitEthernet1/0/26 Desi->Root at 2009/05/07 00:12:39
{0.0018-822d-78be 0 4096.00e0-fc64-101e 0 0.0018-822d-78be 128.23}

GigabitEthernet1/0/26 Root->Desi at 2009/05/07 00:12:38
{32768.223e-50fe-aa64 0 4096.00e0-fc64-101e 0 32768.223e-50fe-aa64 128.23}
Handling Process

According to logs, long TC packets might cause increase of CPU usage. The test result in the lab showed that CPU usage of tasks LDP, L2V, MSTP, and LSPM increased when an NE40E received 10 TC packets per second and finally the CPU usage of the NE40E reached 100%.

The logs also showed that BPDUs from an unknown device (mac 223e-50fe-aa64) were transparently transmitted to GE rings through VPLS rings. As a result, the switches on the GE rings received two types of BPDUs, which triggered the MSTP on the switches to recalculate routes and caused network flapping. In this case, the interfaces on the switches were repeatedly blocked and unblocked, generating a lot of TC packets. When MSTP status of switches flapped, the switches kept deleting MAC and ARP entries. As a result, Layer 3 traffic was interrupted and the switches were disconnected from the NMS. The NE40E continuously received TC packets and deleted local and remote MAC addresses, causing a high CPU usage. 

The information about the changing of STP interface roles indicated that an unknown device with the MAC address 223e-50fe-aa64 sent BPDUs to the three GE rings over the VPLS rings. Then, all MAC addresses in the network were checked, but 223e-50fe-aa64 was not found. 
Root Cause
None
Solution

Configure an ACL rule because BPDU has its special multicast MAC address.

[Quaidway] acl number 4001
[Quaidway-acl-ethernetframe-4001] rule 10 deny dest-mac 0180-c200-0000 ffff-ffff-ffff
[Quaidway] traffic classifier test operator or
[Quaidway-classifier-test] if-match acl 4001
[Quaidway] traffic behavior test
[Quaidway-behavior-test] quit
[Quaidway] traffic policy test
[Quaidway-trafficpolicy-test] undo share-mode
[Quaidway-trafficpolicy-test] classifier test behavior test

Apply the ACL rules to each VLAN, for example interface 7/0/0 on ATC21_NE40E.

[Quaidway] interface GigabitEthernet7/0/0
[Quaidway-GigabitEthernet7/0/0] traffic-policy test outbound vlan 1 to 4094 link-layer
Suggestions
Ensure that BPDU protection scheme is used on a Layer 2 network that is enabled with STP or MSTP.

END