Understanding CPU and CPU Usage
Introduction
This document describes the definition of CPU and CPU usage, how the CPU processes packets, and the impact of high CPU usage.
Definition
CPU - The Core of a Switch
A switch uses the distributed architecture, including forwarding and control planes. The forwarding plane implements Layer 2 and Layer 3 forwarding; the control plane implements forwarding control.
As shown in Figure 1-1, the control plane uses the universal embedded CPU and the forwarding plane uses forwarding chip:
- The forwarding chip implements Layer 2 and Layer 3 forwarding, for example, updating the MAC address table for Layer 2 forwarding and Layer 3 forwarding table for IP forwarding. The forwarding chip implements data forwarding with a high throughput.
- The CPU maintains software entries, such as routing and ARP entries, and configures the hardware Layer 3 forwarding table in chip based on the software forwarding entries. The CPU can also provide software-based Layer 3 forwarding. However, a disadvantage of CPU is that it has a low processing capability.
Packets on a network can be classified into control packets and data packets depending on their functions. If a switch does not have any hardware forwarding entry, the first packet reaching the switch is forwarded by the CPU and a Layer 3 forwarding hardware entry is created. The follow-up packets enter the forwarding chip through the inbound interface. Figure 1-2 shows this process.
- Flow 1 (data packets) is sent out by the forwarding chip, and does not pass the CPU.
- Flow 2 (control packets and a part of data packets) is forwarded to the CPU through the forwarding chip. The CPU determines whether to send the flow out or terminate it. Flow 2 consumes CPU resources, and cannot be forwarded in a high speed.
The Layer 2 and Layer 3 hardware entries in the forwarding chip determine whether a switch can implement high-speed forwarding; however, the hardware entries in the forwarding chip are created based on the software entries maintained in the CPU. Therefore, the CPU is the core of a switch.
CPU Usage
After a switch starts, the CPU runs more than 200 active tasks to manage the switch and monitor Layer 3 entry learning. The number of tasks may vary according to switch models. In addition, when more features are configured on a switch, more tasks run in the system.
CPU usage is the percentage of the amount of time a CPU spends processing non-idle tasks. It has the following characteristics:
- Constantly changing: A switch's CPU usage keeps changing with system operations and changes of the environment.
- Non-real-time: CPU usage data reflects CPU usage within a statistical period.
- Entity-relevant: CPU usage is calculated based on physical CPU. Generally, each service board on a switch has an independent physical CPU. Therefore, the CPU usages of different boards are calculated separately.
A CPU usage reflects task running status at a specified time point. In Figure 1-3, task A occupies CPU resource for 10 ms, task B occupies CPU resource for 30 ms, and they stop for 60 ms. Then, task A occupies CPU resource for 10 ms, task B occupies CPU resource for 30 ms, and they stop for 60 ms. In this period, the CPU usage is 40%. A high CPU usage indicates that the switch is running many tasks.
The CPU usage is a key indicator of switch performance.
How Does a CPU Process Packets (Modular Switch)
Switches forward data packets through the forwarding chip without involving the CPU. The following packets will be sent to the CPU for processing on a switch:
- Protocol packets to be terminated by the switch
All packets destined for the switch, including:
- Control packets of protocols, such as STP, LLDP, LACP, DLDP, EFM, GVRP, and VRRP
- Route update packets of routing protocols, such as RIP, OSPF, BGP, and IS-IS
- SNMP, Telnet, SSH packets
- ARP and ND reply packets
- Packets requiring special processing
- ICMP packets carrying options
- IPv6 packets with hop-by-hop option
- IPv4/IPv6 packets with a TTL value smaller than or equal to 1
- Packets with the switch's local IP address as the destination address
- ARP/ND/FIB Miss packets
- Packets forwarded to the CPU by matching ACL
- Packets discarded by the deny action in ACL rules after the logging function is enabled
- Packets redirected to the CPU by traffic policies
- Multicast-related packets
- IGMP protocol packets
- Unknown IP multicast packets
- Packets related to other features
- DHCP packets
- ARP and ND broadcast request packets
Switches classify packets sent to the CPU into appropriate queues with different priorities based on the weights of packets to ensure that important packets are processed first. Additionally, the rate of the packets sent to the CPU can be limited so that the number of packets sent to the CPU within a certain period is limited. This ensures that the CPU can properly process services.
On a stable network, the number of packets sent to the CPU is limited within a specified range, and therefore the CPU usage remains within a proper range. If a large number of packets are sent to the CPU within a short period, the CPU is busy processing these packets, resulting in a high CPU usage.
Impact of High CPU Usage
The CPU on a switch will be overloaded if the forwarding plane sends packets to the CPU at high speeds (for example, the CPU receives a large number of packets within a short time due to a loop on the network) or a task consumes CPU resources for a long time. When this occurs, the CPU may be unable to process other tasks in a timely manner, which may cause exceptions in services.
High CPU usage adversely affects the system processing capability and may result in the following network problems:
- No response to management requests
- Failure to set up a Telnet or SSH session with the switch, causing a failure to manage the switch, slow response of the switch, or delay in command execution
- SNMP timeout
- Long delay or even timeout of MAC/IP ping operations
- DHCP or 802.1X service failures caused by the switch's failure to forward or respond to requests from clients
- Changes in the STP topology or even loops
A switch maintains root and alternate ports based on the BPDUs periodically received on its CPU. If the upstream device cannot send BPDUs in a timely manner because its CPU is busy or the switch's CPU is too busy to process received BPDUs, the switch considers the original path to the root bridge to have failed and selects a new root port, causing network reconvergence. If the switch also has an alternate port, the switch uses the alternate port as the new root port. In this situation, a loop may occur on the network.
- Changes in the routing topology
Hello packets of dynamic routing protocols are processed by the CPU. If the CPU is too busy to process the received Hello packets or send Hello packets, route flapping occurs. For example, OSPF flapping, BGP flapping, or VRRP flapping may occur in this situation.
- Flapping of reliability detection protocols
The CPU is responsible for keepalive of detection protocols such as 802.3ah, 802.1ag, DLDP, BFD, and MPLS OAM. If a busy CPU cannot transmit or receive protocol packets promptly, protocol flapping occurs, which affects service traffic forwarding.
- LACP Eth-Trunk link flapping
LACP packets are processed by the CPU. If the CPU is too busy to receive and send LACP packets, the Eth-Trunk link will flap between Up and Down states.
- Dropping of software forwarded packets or increasing delay in forwarding such packets
- Memory usage of the switch increases.
Normal High CPU Usage Situations
A high CPU usage will cause service faults, for example, Border Gateway Protocol (BGP) route flapping, frequent Virtual Router Redundancy Protocol (VRRP) switchovers, or even user login failures. In some situations, a high CPU usage does not affect the network. For example, when a switch is reading optical transceiver information or traffic is bursting, the CPU usage may sharply increase. This is a normal and acceptable situation. Therefore, a high CPU usage may not be caused by faults. If a switch cannot process services for a long time, check whether a fault has occurred.
A high CPU usage resulting from the following events is normal and does not need to be handled. If the CPU usage can automatically restore to a normal range, you do not need to perform any operations.
- Traffic bursts.
- A board starts.
- The NMS frequently operates the switch.
- Information about all optical modules on the switch is queried at a time. For example, the display interface transceiver command is executed or optical module information is queried through the NMS.
- The switch is executing the copy flash:/ command or commands with a large amount of output information and requiring a long time to execute, for example, the debugging and display diagnostic-information commands.
- The switch is calculating the spanning tree.
On a device running Multiple Spanning Tree Protocol (MSTP) network, the CPU usage is proportional to the number of instances and active ports. On a device running VLAN-based Spanning Tree (VBST), each VLAN runs an independent instance. Therefore, VBST uses more CPU resources than MSTP when they have the same number of VLANs and active ports.
- The switch updates routing table in a large scale after receiving route update messages.
When a switch receives a route update message, the switch updates routing information and delivers it to the control plane, which consumes CPU resources. In a cluster/stack system, the switch also needs to synchronize routing information to other member switches.
During routing table update, the following factors affect the CPU usage:
- Number of entries in the routing table
- Update frequency
- Number of routing processes receiving the update messages
- Number of member switches in a stack
- Other events
- Many ports are added to many VLANs (For example, a user performs configuration in a port group to add many ports to many VLANs or change link types of the ports.)
- The switch frequently receives a large number of IGMP request messages.
- The switch processes a large number of concurrent DHCP requests (For example, a switch that functions as a DHCP server restores connections with a large number of users.)
- ARP broadcast storm.
- Ethernet broadcast storm.
- Software forwarding of a large number of concurrent protocol packets (For example, L2PT transparently transmits a large number of BPDUs or the DHCP relay/snooping module forwards a large number of DHCP packets within a short time.)
- A large number of data packets cannot be forwarded through the forwarding chip and are sent to the CPU (such as ARP Miss).
- Ports alternate between Up and Down.
CPU-related Tasks and Functions
Task Name |
Description |
---|---|
AAA |
Authentication, Authorization, and Accounting |
AM |
Address Management |
ARP |
Address Resolution Protocol |
BGP |
Border Gateway Protocol |
CMF |
Configuration Management Framework |
CSPF |
Constrained Shortest Path First |
DEVICE |
Device Management |
DHCP |
Dynamic Host Configuration Protocol |
ETRUNK |
Inter-chassis Trunk Protocol |
EUM |
Ethernet User Management |
EVPN |
Ethernet Virtual Private Network |
FEA |
Function Entity Action |
FEC |
Function Entity Control |
FIBRESM |
Resource Management |
IFM |
Interface Management |
IGMP |
Internet Group Management Protocol |
IP STACK |
IP Protocol Stack |
ISIS |
Intermediate System-to-Intermediate System |
L2VPN |
Layer 2 Virtual Private Network |
LDP |
Label Distribution Protocol |
LLDP |
Link Layer Discovery Protocol |
LOCAL PKT |
Local Packet Receiving and Transmitting |
MACM |
Static MAC Management |
MSTP |
Multiple Spanning Tree Protocol |
ND |
ICMPv6 Neighbor Discovery |
NETSTREAM |
Network Flow Sampling |
OAM |
Operation, Administration, and Maintenance |
OSPF |
Open Shortest Path Forwarding |
PEM |
Energy-saving Management |
PIM |
Protocol Independent Multicast |
PNP |
Plug-and-play |
RBS |
Remote Backup Service |
RGM |
Redundancy Gateway Management |
RM |
Routing Management |
SFLOW |
Sampled Flow |
SLA |
Service Level Agreement |
SMLK |
Smart Link Protocol |
STACKMNG |
Stack Management |
SYSTEM |
System Management |
TNLM |
Tunnel Management |
TRILL |
TRILL |
TUNNEL |
Tunnel |
VLAN |
Virtual LAN |
VRRP |
Virtual Router Redundancy Protocol |
VXLAN |
Virtual Extensible LAN |