No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

CPU Occupation Rate Was High on an NNI-side Interface Board Because Lots of LNS Users on the Live Network Got Online

Publication Date:  2013-10-08 Views:  36 Downloads:  0
Issue Description

Product version: ME60 V600R002C06SPC100
The network topology was as follows:
CID:/icase/servlet/download?dlType=HtmlAreaImage&imageId=20423

The simulated service flow was as follows:
CID:/icase/servlet/download?dlType=HtmlAreaImage&imageId=20424

Symptom:
1. The load balancing was uneven.
On the ISP of a customer, there were four ME60s, for LNS access. New ME60s on the network were used as the LACs. After the cutover of services, users got online successfully. However, according to feedbacks from the customer, before the cutover, the number of sessions on each tunnel was almost the same; after the cutover, the number of sessions varied a lot between tunnels. The session statistics on one LAC was as follows:

<ALP:ANSR_BRAS_ME60-X8_01>display l2tp tunnel
 ---------------------------------------------------------
 -----------tunnel information in LAC----------------------
LocalTID RemoteTID RemoteAddress    Port   Sessions RemoteName
------------------------------------------------------------------------------
4764     6985      82.137.200.5     1701   417      LNS

4761     6845      82.137.200.227   1701   443      LNS

//82.137.200.5 and 82.137.200.227 were of the first LNS.

4762     9265      82.137.200.228   1701   394      LNS
4765     9326      82.137.200.6     1701   421      LNS

//82.137.200.228 and 82.137.200.6 were of the second LNS.
4771     53504     82.137.200.11    1701   1127     da02iplns01

//The third LNS
5872     16004     82.137.200.14    1701   372      LNS

//The fourth LNS


This issue was solved after the LAC was changed and the configuration delivery mode of the RADIUS server was changed.

2. After the issue of uneven load balancing was solved, on two LNSs of V6R2 version, lots of users (about 26000 for each LNS) got online. As a result, the following issue occurred:
ME60s functioned as the INSs. The LPUK boards in slots 1, 2, 5, and 7 functioned as the tunnel boards. However, only one 10G port on the board in slot 1 was used as the network-side port to connect to all LACs. The ECHO probe request and response packets of all L2TP users were transparently transmitted from the tunnel board in slot 2, 5, or 7 to the board in slot 1, and then to the LACs.
When about 25000 users got online on one NE, the number of probe packets transmitted to the board in slot 1 became very high and the CPU occupation rate of the board increased to 70%, as shown in the following figure.
CID:/icase/servlet/download?dlType=HtmlAreaImage&imageId=20425

<HBRAS06> display cpu-usage slot 1
CPU Usage Stat. Cycle: 60 (Second)
CPU Usage            : 75% Max: 99%
CPU Usage Stat. Time : 2012-06-07  00:17:03
CPU utilization for five seconds: 75%: one minute: 75%: five minutes: 75%.
 
TaskName        CPU  Runtime(CPU Tick High/Tick Low)  Task Explanation
TICK             1%         0/ 1bb9eb0                                    
VPR             27%         0/296495df       VPR                          
IPCQ             2%         0/ 3c8a401       IPCQIPC task for single queue
PES              2%         0/ 3f5c8b8       PES                          
LNS             13%         0/139b651d       LNS                          
SRM              1%         0/ 22ca41d       SRM         
VIDL            25%         0/262c17fc       System idle                  
OS              29%         0/2b2b8f38       Operation System              
 
<HBRAS06> 
<HBRAS06>display access-user
  ------------------------------------------------------------------------------
  Total users                        : 26699
  IPv4 users                         : 26648
  IPv6 users                         : 0
  Dual-Stack users                   : 0
  Wait authen-ack                    : 23
  Authentication success             : 26676
  Accounting ready                   : 1
  Accounting state                   : 26645
  Wait leaving-flow-query            : 0
  Wait accounting-start              : 3
  Wait accounting-stop               : 9
  Wait authorization-client          : 0
  Wait authorization-server          : 0
  ------------------------------------------------------------------------------
  Domain-name                        Online-user
  ------------------------------------------------------------------------------
  default0                           : 0
  default1                           : 0
  default_admin                      : 1
  default                            : 0
  tarassul.sy                        : 26657
#
Handling Process
To address the issue, Huawei performed the following operations and observed the following information:
1. The live network configurations were as follows. An Ins-group was bound with four boards. Only the 1/0/0 port of the board in slot 1 was connected to LACs.
#
lns-group group1
 bind slot 1
 bind slot 2
 bind slot 5
 bind slot 7
 bind source GigabitEthernet1/0/0
 bind source LoopBack0
#
interface GigabitEthernet1/0/0
 description To_HwCore02-L2TP
 undo shutdown
 ip address 82.137.200.6 255.255.255.224
 ospf dr-priority 0
#
2. When the number of users grew to about 26000, the CPU occupation rate of the board in slot 1 was as follows:
<HBRAS06> display cpu-usage slot 1
CPU Usage Stat. Cycle: 60 (Second)
CPU Usage            : 75% Max: 99%
CPU Usage Stat. Time : 2012-06-07  00:17:03
CPU utilization for five seconds: 75%: one minute: 75%: five minutes: 75%.
 <HBRAS06>display access-user
  ------------------------------------------------------------------------------
  Total users                        : 26699
  IPv4 users                         : 26648
  IPv6 users                         : 0
  Dual-Stack users                   : 0
  Wait authen-ack                    : 23
  Authentication success             : 26676
  Accounting ready                   : 1
  Accounting state                   : 26645
  Wait leaving-flow-query            : 0
  Wait accounting-start              : 3
  Wait accounting-stop               : 9
  Wait authorization-client          : 0
  Wait authorization-server          : 0
  ------------------------------------------------------------------------------
  Domain-name                        Online-user
  ------------------------------------------------------------------------------
  default0                           : 0
  default1                           : 0
  default_admin                      : 1
  default                            : 0
  tarassul.sy                        : 26657
#
3. When the number of users grew to 15981, the PCU occupation rate of the board in slot 1 was 61% and the CPU occupation rate of VPR task was 15%.
a. At this time, the board in slot 1 had 5803 users, the board in slot 2 had 4022 users, the board in slot 5 had 2234 users, and the board in slot 7 had 3922 users. That is, boards in slots 2, 5, and 7 had 10178 users.
Record about VP packets received by the board in slot 1 was as follows. The detection interval was about 1s.
[HBRAS06-hidecmd]display vp packet slot 1
ChID       ulSend     ulReceive  ulDiscard  ulQueFull
1          1343728586 1969670579 0          0        
-----------------------------------------------------
Slot       ulSend     ulReceive  ulDiscard  ulQueFull
0          783613772  203172957  0          0        
2          560114712  856525592  0          0        
5          29         360835390  0          0        
7          73         549136641  0          0        
-----------------------------------------------------
Analyzed the above data and obtained the following result:
Each second, the board in slot 1 received more than 1000 VP packets, leading to high CPU occupation rate of the VPR task. In general situations, no more than 500 VP packets were received in one second.
b. Analysis on VP task packets: After an L2TP user got online, probe packets were sent to each other for confirmation. When a user terminal sends probe packets actively, the interval is set on the user terminal. In this case, the interval was 20s.
The following information showed the increase of ECHO probe packets in one 1s. The red characters indicated the number of response packets the equipment sent to the user terminal. The number of users was about 10000 and about 1000 packets were sent in each second. This meant that the probe interval on the user terminal was short (about 10s), which was uncommon. A common value was 20s. The collected information was as follows:
[HBRAS06-hidecmd]display ppp lpu 1 statistics
  --------------------------------------------------------------
                    PPP Packet Statics    
  --------------------------------------------------------------
  SEND_ECHO_REQ        : 2048478873   RECV_ECHO_REP      : 1965967957
SEND_ECHO_REP_FAST   : -1747269047   RECV_ECHO_REQ_FAST : 0 

The preceding analysis led to the following conclusion: The user terminal was special in that the PPP ECHO probe interval as short. The Ins_group configuration caused the fact that all users shared one network-side interface board. When lots of users got online, the CPU occupation rate of the network-side interface board was high.
Root Cause
The user terminal was special in that the PPP ECHO probe interval as short. The Ins_group configuration caused the fact that all users shared one network-side interface board. When lots of users got online, the CPU occupation rate of the network-side interface board was high.
Solution
The live network provided only one network-side egress port, so more egress ports are required to connect to LACs.
Plan routes from an LAC to an LNS for L2TP users, to reduce the bearer pressure on the board in slot 1.
The following changes are required:
1. Add a network-side egress port on LNSs. The new egress port must be on a board different from that of the existing egress port.
2. Change the LAC-side configurations and the LAC configurations delivered from the RADIUS, so that users from LACs take different links.
3. Change the routes from LACs to an intermediate switch or core router and change the egress routes on LNSs to ensure that packets from LNSs to LACs can take the two different ports.
After the preceding solution, the CPU occupation rate will decrease.
Suggestions
None

END