No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

High CPU Usage on a Network Interface Board on ME60

Publication Date:  2013-09-30 Views:  45 Downloads:  0
Issue Description

1. Version: ME60 V600R002C06SPC100

2. Network topology:

CID:http://3ms.huawei.com/icase/servlet/download?dlType=HtmlAreaImage&imageId=11254

 

3. Service topology:

CID:http://3ms.huawei.com/icase/servlet/download?dlType=HtmlAreaImage&imageId=11257

 

4. Symptom

I. Uneven load sharing:

Four ME60s existing in the customer ISP network functioned as LNSs. New ME60s in the network functioned as LACs. After services were cut over, users went online successfully.The customer reported that the number of sessions over each tunnel was approximate before the cut over but was dramatically different after the cutover. Statistics about sessions on one LAC were as follows:

<ALP:ANSR_BRAS_ME60-X8_01>display l2tp tunnel

 ---------------------------------------------------------

 -----------tunnel information in LAC----------------------

LocalTID RemoteTID RemoteAddress    Port   Sessions RemoteName

4764     6985      82.137.200.5     1701   417      LNS

4761     6845      82.137.200.227   1701   443      LNS

//82.137.200.5 and 82.137.200.227 were the first LNS.

4762     9265      82.137.200.228   1701   394      LNS

4765     9326      82.137.200.6     1701   421      LNS

//82.137.200.228 and 82.137.200.6 were the second LNS.

4771     53504     82.137.200.11    1701   1127     da02iplns01

//Third LNS

5872     16004     82.137.200.14    1701   372      LNS

//Fourth LNS

This problem was resolved after LAC and RADIUS-delivered configurations were changed.

II. CPU usage of the board in slot 1 on the LNS was high if the LNS had about 26000 online users.

On an ME60 functioning as an LNS, the LPUK boards in its slots 1, 2, 5, and 7 functioned as tunnel boards. However, only one 10G interface on the LPUK board in slot 1 connected to LACs as a network-side egress interface. Therefore, echo request and reply packets from all L2TP users were transparently transmitted from the tunnel boards in slots 2, 5, or 7 to the board in slot 1, and then to LACs.

When one LNS had about 25000 online users, a large number of echo packets were transmitted to the board in slot 1. As a result, the CPU usage of the board in slot 1 often reached about 70%.

CID:http://3ms.huawei.com/icase/servlet/download?dlType=HtmlAreaImage&imageId=11258

 

5.The alarm information was as follows:

<HBRAS06> display cpu-usage slot 1

CPU Usage Stat. Cycle: 60 (Second)

CPU Usage            : 75% Max: 99%

CPU Usage Stat. Time : 2012-06-07  00:17:03

CPU utilization for five seconds: 75%: one minute: 75%: five minutes: 75%.

TaskName        CPU  Runtime(CPU Tick High/Tick Low)  Task Explanation

TICK             1%         0/ 1bb9eb0                                    

VPR             27%         0/296495df       VPR                          

IPCQ             2%         0/ 3c8a401       IPCQIPC task for single queue

PES              2%         0/ 3f5c8b8       PES                          

LNS             13%         0/139b651d       LNS                          

SRM              1%         0/ 22ca41d       SRM         

VIDL            25%         0/262c17fc       System idle                  

OS              29%         0/2b2b8f38       Operation System              

 <HBRAS06> 

<HBRAS06>display access-user

  ------------------------------------------------------------------------------

  Total users                        : 26699

  IPv4 users                         : 26648

  IPv6 users                         : 0

  Dual-Stack users                   : 0

  Wait authen-ack                    : 23

  Authentication success             : 26676

  Accounting ready                   : 1

  Accounting state                   : 26645

  Wait leaving-flow-query            : 0

  Wait accounting-start              : 3

  Wait accounting-stop               : 9

  Wait authorization-client          : 0

  Wait authorization-server          : 0

  ------------------------------------------------------------------------------

  Domain-name                        Online-user

  ------------------------------------------------------------------------------

  default0                           : 0

  default1                           : 0

  default_admin                      : 1

  default                            : 0

  tarassul.sy                        : 26657

#
Handling Process

1. Checked configurations on the live network.
Four boards were bound with the LNS-group, among which only interface 1/0/0 on the board in slot 1 connected to LACs.

#
lns-group group1

 bind slot 1
 bind slot 2
 bind slot 5
 bind slot 7
 bind source GigabitEthernet1/0/0
 bind source LoopBack0
#
interface GigabitEthernet1/0/0
 description To_HwCore02-L2TP
 undo shutdown
 ip address 82.137.200.6 255.255.255.224
 ospf dr-priority 0
#

2. Checked the CPU usage of the board in slot 1 when an LNS has about 26000 online users.

<HBRAS06> display cpu-usage slot 1
CPU Usage Stat. Cycle: 60 (Second)

CPU Usage            : 75% Max: 99%
CPU Usage Stat. Time : 2012-06-07  00:17:03
CPU utilization for five seconds: 75%: one minute: 75%: five minutes: 75%.
 <HBRAS06>display access-user
  ------------------------------------------------------------------------------
  Total users                        : 26699
  IPv4 users                         : 26648
  IPv6 users                         : 0
  Dual-Stack users                   : 0
  Wait authen-ack                    : 23
  Authentication success             : 26676
  Accounting ready                   : 1
  Accounting state                   : 26645
  Wait leaving-flow-query            : 0
  Wait accounting-start              : 3
  Wait accounting-stop               : 9
  Wait authorization-client          : 0
  Wait authorization-server          : 0
  ------------------------------------------------------------------------------
  Domain-name                        Online-user
  ------------------------------------------------------------------------------
  default0                           : 0
  default1                           : 0
  default_admin                      : 1
  default                            : 0
  tarassul.sy                        : 26657
#

3. Checked the CPU usage of the board in slot 1 when an LNS has about 15981 online users.

The CPU usage was 61% and VPR task-occupied CPU usage was 15%. The board in slot 1 had 5803 users, and the other boards had 10178 users, 4022 users on the board in slot 2, 2234 users on the board in slot 5, and 3922 users on the board in slot 7.

4. Checked the VP packets that the board in slot 1 received.

The information was as follows:

[HBRAS06-hidecmd]display vp packet slot 1
ChID       ulSend     ulReceive  ulDiscard  ulQueFull

1          1343728586 1969670579 0          0        
-----------------------------------------------------
Slot       ulSend     ulReceive  ulDiscard  ulQueFull
0          783613772  203172957  0          0        
2          560114712  856525592  0          0        
5          29         360835390  0          0        
7          73         549136641  0          0        
-----------------------------------------------------

According to the preceding information, the board in slot 1 received over 1000 VP packets per second so that the VPR tasks occupied a large amount of CPU resources. Generally, a board receives less than 500 VP packets.

5. Analyzed the VP task packets.

After going online, L2TP users transmitted echo packets to each other. The packet transmission interval was determined by a terminal if the terminal initiated detection and was 20s if the device initiated detection.

The following showed the statistics about echo packets within 1s. According to the information, about 1000 packets were transmitted per second when about 10000 users were online. This indicated that the echo packets were transmitted at an interval about 10s, much shorter than 20s.

[HBRAS06-hidecmd]display ppp lpu 1 statistics

  --------------------------------------------------------------
                    PPP Packet Statics    

  --------------------------------------------------------------
  SEND_ECHO_REQ        : 2048478873   RECV_ECHO_REP      : 1965967957
SEND_ECHO_REP_FAST   : -1747269047   RECV_ECHO_REQ_FAST : 0

Conclusion: The PPP echo packet transmission interval configured on user terminals was short. In addition, because all users in the LNS group shared one network interface board, the CPU usage of the network interface board became high when a large number of users were online.
Root Cause
The PPP echo packet transmission interval configured on user terminals was short. In addition, because all users in the LNS group shared one network interface board, the CPU usage of the network interface board became high when a large number of users were online.
Solution

To resolve the problem, add network-side egress interfaces connecting to LACs and plan routes between LACs to LNSs for L2TP users, to share load of the board in slot 1.

Original planning:

CID:http://3ms.huawei.com/icase/servlet/download?dlType=HtmlAreaImage&imageId=11261

 

Optimized planning:

CID:http://3ms.huawei.com/icase/servlet/download?dlType=HtmlAreaImage&imageId=11262

 

Perform the following operations:

1. Add network-side egress interfaces on an LNS. The added interfaces must be located on other boards instead of the current network interface board.

2. Change configurations at the LAC-side and configurations that the RADIUS server delivers to the LAC-side so that user packets from LACs are shared by two links.

3. Modify the routes from the LAC to the intermediate switch or core router and outbound routes on the LNS to ensure that replies from the LNS to the LAC can be transmitted to interfaces on two boards.
Suggestions
None

END