No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

CPU Is Overloaded on an NE5000E Due to BGP Static Route Failure

Publication Date:  2013-09-13 Views:  74 Downloads:  0
Issue Description

Networking:

Five 10 Gbit/s links are configured between an E5000E (egress router on the MAN) and a group router.

Networking description:

NE5000E V200R003C02B609 and the group router establish EGBP connections through loopback addresses, and five static host routes are configured on the NE5000E and group router to ensure that the loopback addresses are reachable.

Symptom:

When one link becomes up or down, the CPU usage of the NE5000E becomes high.

Alarms indicating a high CPU usage:

Jan  6 2011 15:10:48 r1-c-gddg-dggc %%01VOSCPU/4/CPU_USAGE_HIGH(l)[4462757]: The CPU is overloaded, and the tasks with top three CPU occupancy are FIB, ROUT, IPCR. (CpuUsage=100%, Threshold=95%)
Jan  6 2011 15:11:48 r1-c-gddg-dggc %%01VOSCPU/4/CPU_USAGE_HIGH(l)[4462792]: The CPU is overloaded, and the tasks with top three CPU occupancy are FIB, ROUT, IPCR. (CpuUsage=100%, Threshold=95%)
Jan  6 2011 15:19:38 r1-c-gddg-dggc %%01VOSCPU/4/CPU_USAGE_HIGH(l)[4462978]: The CPU is overloaded, and the tasks with top three CPU occupancy are FIB, ROUT, IPCR. (CpuUsage=100%, Threshold=95%)
Jan  6 2011 15:20:38 r1-c-gddg-dggc %%01VOSCPU/4/CPU_USAGE_HIGH(l)[4463020]: The CPU is overloaded, and the tasks with top three CPU occupancy are FIB, ROUT, IPCR. (CpuUsage=100%, Threshold=95%)
Jan  6 2011 15:21:38 r1-c-gddg-dggc %%01VOSCPU/4/CPU_USAGE_HIGH(l)[4463049]: The CPU is overloaded, and the tasks with top three CPU occupancy are FIB, ROUT, IPCR. (CpuUsage=100%, Threshold=95%)
Jan  6 2011 15:22:39 r1-c-gddg-dggc %%01VOSCPU/4/CPU_USAGE_HIGH(l)[4463078]: The CPU is overloaded, and the tasks with top three CPU occupancy are FIB, IPCR, ROUT. (CpuUsage=100%, Threshold=95%)
Jan  6 2011 15:23:39 r1-c-gddg-dggc %%01VOSCPU/4/CPU_USAGE_HIGH(l)[4463093]: The CPU is overloaded, and the tasks with top three CPU occupancy are FIB, ROUT, IPCR. (CpuUsage=100%, Threshold=95%)

Alarms indicating ports entered the up or down state:

Jan  6 2011 15:09:17 r1-c-gddg-dggc %%01IFNET/4/LINKNO_STATE(l)[4462744]: The line protocol on the interface Pos12/0/1 has entered the DOWN state.
Jan  6 2011 15:17:52 r1-c-gddg-dggc %%01IFNET/4/LINKNO_STATE(l)[4462946]: The line protocol on the interface Pos12/0/1 has entered the UP state.
Jan  6 2011 15:19:32 r1-c-gddg-dggc %%01IFNET/4/LINKNO_STATE(l)[4462975]: The line protocol on the interface Pos12/0/1 has entered the DOWN state.
Jan  6 2011 15:20:41 r1-c-gddg-dggc %%01IFNET/4/LINKNO_STATE(l)[4463025]: The line protocol on the interface Pos12/0/1 has entered the UP state.
Handling Process

1. Log in to the NE5000E.
Command operations are slow sometimes.

2. Check the CPU usage.
The CPU usage reaches 100%.

3. Check the CPU tasks.
Fib, rout, and ipcr tasks occupy a huge CPU resources.

4. Check the logs.
Alarms are reported indicating that the CPU usage is high, and the 10 Gbit/s links between the NE5000E and the group router become up and down repeatedly. It is suspected that unstable links cause repeated route calculation and FIB refreshes, resulting in the high CPU usage.

5. Disable the unstable ports.
The CPU usage becomes normal.

6. Rectify the fault on the link and enable ports.
The CPU usage keeps normal.
Root Cause
The NE5000E has about 350,000 BGP routes. When the protocol on a port becomes down and its static routes become unavailable, the route management module instructs the BGP to update routes and FIBs, and 350,000 FIB entries are deleted. Similarly, after the protocol on a port becomes up, 350,000 FIB entries are added, occupying huge CPU resources and high CPU usage.
Solution
The link fault is rectified.
Suggestions

Impact of such faults on services:

-If a physical port is down and the protocol is also down, no traffic will be forwarded to this port even if the port's upstream forwarding table entries have not been completely updated. Traffic is shared by other ports (in this example, traffic is distributed to another four 10 Gbit/s links) and services are not affected.
-If only the protocol on a port is down, traffic is still forwarded to the port until routes are completely updated, resulting in loss of some traffic. After route convergence completes, no traffic will be forwarded to the port.

Upgrade NE5000E to V300R007 or later.

END