No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

upper limit of memory cause accouting packets fail to send to RADIUS

Publication Date:  2013-12-31 Views:  40 Downloads:  0
Issue Description

There are two ME60s which have similar issues:

The cpu usage of master MPU exceeded 80%, even up to 100% during service peak period. According to packets statistics in RADIUS server, accounting request packets amount from ME60 reduced sharp also. And part of sessions for PPPoE users may disconnect.

The service peak period happenesd from 20:00 to 22:00 usually.

Handling Process

Issue 1Accounting packets reduce sharp during service peak period

                               Step 1     Check the users amount and memory usage

When the issue happened, the user’s logout reason show “AAA with start accounting fail”. It means that ME60 sent Accounting start packet to AAA server but failed to receive the acknowledgement from server, and the user logout because of Accounting expiration.
The following query info show the maximum number of concurrent online users in the history, it happened during the service peak period.

[xx-hidecmd]disp max-onlineusers

 Max online users since startup       : 106007

 Time of max online users             : 2013/10/07 20:33:57

 

[xx-hidecmd]disp max-onlineusers

 Max online users since startup       : 108657

 Time of max online users             : 2013/10/13 20:27:40

  

Slot          CPU Usage     Memory Usage (Used/Total)

-----------------------------------------------------

17 MPU(Master)  90%           86% 1595MB/1855MB

                               Step 2     The error statistics for RADIUS Moudle

The variable record the count of error for RADIUS moudle. If the memory usage exceed 85% in master MPU board, ME60 will stop generating and sending the accounting packets, and the variable g_ulRDFailNumForMem will count it.
The purpose about stop the accounting packet is to save the free memory.  

[xx-hidecmd]disp radius statistics error

  g_ulRDFailNumForMem   = 1084729

 

 Issue 2CPU usage overload in master MPU

                               Step 1     When CPU usage overload, we query the tasks and get main tasks which contain AAA(for authentication, authorization and Accounting) and VSM(for Value-added Service).

[xx]dis cpu-usage

CPU Usage Stat. Cycle: 60 (Second)

CPU Usage            : 81% Max: 100%

CPU Usage Stat. Time : 2013-10-14  11:09:53

CPU Usage Stat. Tick : 0x4d6dc(CPU Tick High) 0xe54837c2(CPU Tick Low)

TaskName        CPU        Runtime(CPU Tick High/CPU Tick Low)

AAA            14%               0/119485e0

RDS             4%               0/ 54378be

VSM            11%               0/ e1615b1

                               Step 2     By checking the RADIUS response packet, the authentication accept packet contains the attribute of HW-Portal-URL(27). ME60 has to send the attribute value to DNS server to resolve it. The task AAA won’t give up the cpu resource even if it is just waiting for the response from DNS. So the cpu usage for the task will increase if the DNS response packet delay for a period.

Radius Received a Packet

  Server Template: 14

  Server IP   : 10.xx.1xx.201

  Vpn-Instance: startip

  Server Port : 1814

  NAS Port    : 50000

  Protocol: Standard

  Code    : Authentication accept

  Len     : 272

  ID      : 191

  [HW-Policy-Name(Huawei-95) ] [24] [12345678_noacc_any_64k]

  [HW-Portal-URL(Huawei-27) ] [68] [http://navigator.xx.xx.xx/other/information/low_speed.html]

  [HW-Portal-Mode(Huawei-85) ] [6 ] [1]

 

                               Step 3     Calculate the delay from DNS to ME60

We try to ping the DNS servers to test the delay, the max delay period of two DNSs are all larger than 300ms.

It means for one authentication accept packet from DNS server, the task AAA will cost even 350ms to receive the response from DNS, then the task proceed the left steps for user’s login.
But when we dump in DNS, the acknowledge period is less than 0.2ms.

So we conclude the link delay is primary cause.

<xx>ping -c 100 2xx.1xx.9x.1

  --- 2xx.1xx.9x.1 ping statistics ---

    100 packet(s) transmitted

    100 packet(s) received

    0.00% packet loss

    round-trip min/avg/max = 2/31/369 ms

 

<xx>ping -c 100 2xx.1xx.9y.1

  --- 2xx.1xx.9y.1  ping statistics ---

    100 packet(s) transmitted

    100 packet(s) received

    0.00% packet loss

round-trip min/avg/max = 2/35/474 ms

Root Cause

Issue 1: Accounting packets reduce sharp during service peak period

ME60 has to allocate memory resource for each accounting packet. There are so many users during service peak period. So the memory usage will increase.

According to the system security mechanism, To avoid the memory usage reaching threshold, if the memory usage exceed 85% in master MPU board, ME60 will stop generating Accounting packets.

We observe the system for days and find the memory usage is about 85% during service peak period. That’s why ME60 fail to send accounting packets. 

Issue 2: CPU usage overload in master MPU

By checking the Accounting accept packet, we find that the packet contains the attribute HW-Portal-URL (HUAWEI-27) . When ME60 get the packet, it will cost more cpu resource to deal with it. It will send the attribute to DNS to resolve it. If the delay of DNS acknowledge increase, the CPU usage for task AAA and VSM will raise.

Solution

Issue 1: Accounting packets reduce sharp during service peak period

We have to restrain the memory occupancy rate below 85% if we would ensure the Accounting packets would be generated and sent during service peak period. The following is the suggestion:

Cut part of users and transfer them to other ME60s to reduce the user amount during service peak period.

Issue 2: CPU usage overload in master MPU

To set the attribute format of HW-Portal-URL(Huawei-27) in RADIUS as follows:
http://2xx.1xx.1xx.1xx/

http://10.10.10.10/other/information/low_speed.html

If the URL is IP address, ME60 won’t send it to DNS to resolve it. It will avoid the delay time.

END