How to deal with high cpu-usage generated by mac-address learning process

Publication Date:  2014-06-28 Views:  482 Downloads:  0
Issue Description
I would like to talk about an interesting case encounter on S2700 series.
From time to time, CPU-usage rise from 20-30% to 50-60% and stay there for 10-30 minutes. Spanning tree protocol didn't flap and also there were no other events spotted inside the LAN network. Service running through the network is not impacted, traffic forwarding is running at line speed.
Well try to find together what is the root-cause. 

Let's consider below topology, S2700 is located on access layer, connecting hosts and servers. Vlan 1, default vlan is passing through uplink ports as trunk and downlink as access ports. Also is the management vlan for all other switches in the network.

Alarm Information
none.
Handling Process
First thing to do is to check which task is keeping the CPU busy.  I found soft_learn and frag_add at high level, both are used for mac-address learning process.

soft_learn           15%         0/3a62e52a       tS0d                       
frag_add             11%         0/2c156c2a       tS0e                      

On uplink there are a large number of mac-addresses on vlan 1

<LSW5>dis mac-address | include 0/0/1
-------------------------------------------------------------------------------
MAC Address    VLAN/VSI                          Learned-From        Type    
-------------------------------------------------------------------------------
aaaa-aa0f-4e3e 1/-                               GE0/0/1             dynamic 
bbbb-bb7a-4961 1/-                              GE0/0/1             dynamic  
....................................................................................................
Total items displayed = 207
207 entries for only one VLAN. This is quite big for the low-end series S27.
Also comparing the entries at two different times we will see that many mac-addresses are aged out and also many are learned again. Because of large broadcast data coming through uplink interface, this port will continue learning process.


In order to avoid high CPU-usage spikes caused by learning process we can disable this function for uplink interface.
# Disable MAC address learning for Gi0/0/1.
<Quidway> system-view
[Quidway] interface gi0/0/1
[Quidway-gigabitethernet0/0/1] mac-address learning disable

But this will generate some extra traffic into the network. Without mac-address learning function, data forwarding will be made in broadcast mode. So unwanted traffic will be flooded inside vlan.

For instance, let consider that in normal situation, from uplink switch will receive 30Mbps traffic, and forward to downlink to both host with 15Mbps rates.

Ethernet0/0/1                       up    up            10%    15%          0          0 
Ethernet0/0/2                       up    up            10%    15%          0          0 
GigabitEthernet0/0/1           up    up       3.01%  2.01%          0          0  

After we disable mac-address learning function traffic statistics will look like below:

Interface                   PHY   Protocol InUti OutUti   inErrors  outErrors 
Ethernet0/0/1             up    up            10%    25%          0          0 
Ethernet0/0/2             up    up            10%    25%          0          0 
GigabitEthernet0/0/1       up    up     3.01%  2.01%          0          0 

and mac-address table we’ll have only mac-address learned from eth0/0/1 and eth0/0/2.
[Quidway]dis mac-address                                               
-------------------------------------------------------------------------------
MAC Address    VLAN/VSI                          Learned-From        Type     
-------------------------------------------------------------------------------
0000-0000-0001 1/-                              ETH0/0/1             dynamic  
0000-0000-0002 1/-                              ETH0/0/2            dynamic   
So Ethernet0/0/1 and Ethernet0/0/2 has initially 15Mbps outbound traffic. At the same time, each downlink will add 10Mbps broadcast traffic input (S27 will broadcast unicast packets if it cannot find output port or mac table). Then, each downlink output traffic is 15M+10M=25M.

A big price to solve CPU-usage high load.
Download broadcast traffic can be isolated with port-isolation function. If we will add all downlink interfaces in the same isolation group, it will not be able to communicate each other, moreover broadcast traffic will not reach to other interface inside isolation group. 
Root Cause
none.
Suggestions
Conclusion:
1.   “Mac-address learning disable” on uplink and “port-isolation” on downlink will not modify the traffic path and characteristic but will solve CPU-usage spikes generated by mac-address learning process.
2.    Layer 2 forwarding consist on broadcast traffic from S2700 to core network and unicast traffic from core towards S2700 based on core switches mac address table.
    
Hope to enjoy reading this case.

END