L2TP Users Drop offline due to Flash on/off of ADSL Modem

Publication Date:  2012-07-27 Views:  118 Downloads:  0
Issue Description
When interconnecting Quidway R2631E and R2611 (LNS) to Huawei 5200G (LAC), L2tp users perform PPPOE dial-in via a software; after a while online, the users fall offline.
Alarm Information
Handling Process
Multiple reasons could cause such drop-offline problem, and we could perform analysis via printing debugging information, via calling stack through functions, and via capturing packets.
Root Cause
I. The users drop offline because MA5200 (LAC) transmits CDN packets to R2631E (LNS), and there is no Termination Request sent by PC.
Capture and analyze packets at a hospital node. It is made sure that dropping-offline of users is resulted from unstable voltage which causes ADSL modem to flash-on/off and dropping-offline of L2TP further. When MODEM flashes off, PC transmits PADT packet to MA5200 to disconnect PPPOE link; after MA5200 has received PADT, it will transmit CDN to LNS (R2631E) to cut the users.
II. After MA5200 (LAC) has transmitted PPP Termination Request packet to R2631E (LNS) at the peer, LAC5200 transmits CDN packet to LNS to terminate the connection, and then L2TP users go offline by themselves. This is because the users may disconnect the link to reset the computer because of down of the machine.
 LNS-----Lac -----PC
(1) PC transmits TermReq to LNS, and PADT to �Lac;
(2) LNS responds to PC with TermAck after receiving TermReq;
(3)Lac deletes VPDN session after receiving PADT, and transmits CALL_DISCONNECT_NOTIFY (CDN) to LNS.
III. LNS transmits CDN to LAC5200
At this point, LNS has a PPP link maintenance mechanism for PPP echoreqest packet; if it has not received PPP Echoreply packets for ten times, the PPP link (set to 10 times) will be cut. Capture packets at uplink port of 5200, and we could find that 5200 has forwarded the PPP echoreqest packet sent by LNS, but there is no Reply from PC. This is because the voltage at the hospital node is not stable (causing computer and ADSLM modem to be out of power). Once ADSL is powered down, and 5200 has not been informed, PC will not respond for PPP echoreqest sent by LNS. If PPP echoreqest packet has been sent 10 times without response, L2TP Session link will be disconnected.
IV. LNS initiates CDN message to cut the session of users, causing them offline.
When LNS undergoes Radius authentication, Radius propagates the attribute of Session-timeout, with value as 24 hours. Each user after successful dial-in will start a timer of 24-hour; when he gets offline or drops offline for some reasons, the system does not remove the timer. When the succeeding user comes online, if its userid is the same to that the timer points at, another timer will be started, so the user will be suspended when the first timer times out.
According to the analysis above, it is the code that causes the problem.
Since the timer of the former user with the same id is not deleted when he goes offline, it is still available to next user, resulting in offline of the user.
a) To locate a complex problem, we need to list out all the possible factors, and analyze them one by one; if necessary, we could use function to call stack for analysis.
b) When troubleshooting l2tp problem, it is better to capture packets for analysis at different places.
|Capture packets          |Capture packets               |Capture packets