No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Service Switchback Between CE Devices (NE40Es) Takes a Long Time on an IPRAN Network

Publication Date:  2013-09-03 Views:  16 Downloads:  0
Issue Description

1. Version
NE40E&80E V600R001C00SPC800 + V600R001C00SPC026 patches
2. Network topology
The attachment shows the network topology. CE1 and CE2 are connected to 1400 downstream NodeBs through transmission devices, terminate VLAN tags of the NodeBs by Layer 3 subinterfaces, and have Virtual Router Redundancy Protocol (VRRP) configured. CE1 is the active device, and CE2 is the standby device. CE1 and CE2 are connected to the upstream RNC at Layer 3.
3. Network configuration
On CE1:
#
interface GigabitEthernet2/0/3
 negotiation auto
 undo shutdown
 mode user-termination
#
interface GigabitEthernet2/0/3.1
 description to Nodeb_xxx
 control-vid 13 dot1q-termination
 dot1q termination vid 1 to 128
 dot1q vrrp vid 13
 ip binding vpn-instance Iub
 ip address 172.x.x.129 255.255.255.128
 vrrp vrid 114 virtual-ip 172.x.x.129
 vrrp vrid 114 priority 254
 arp broadcast enable
#
On CE2:
#                                                                              
interface GigabitEthernet2/0/3                                                 
 negotiation auto                                                              
 undo shutdown                                                                 
 mode user-termination                                                         
#                                                                              
interface GigabitEthernet2/0/3.1                                               
 description to Nodeb_xxx                                            
 control-vid 13 dot1q-termination                                              
 dot1q termination vid 1 to 128                                                
 dot1q vrrp vid 13                                                             
 ip binding vpn-instance Iub                                                   
 ip address 172.x.x.129 255.255.255.128                                     
 vrrp vrid 114 virtual-ip 172.x.x.129                                       
 vrrp vrid 114 priority 254                                                    
 arp broadcast enable   
 #

4. Symptom
a. Before the switchover, services run properly on CE1.
b. After port G2/0/3 on CE1 is shut down, services are switched to CE2.
c. Services are interrupted for 30 minutes during a switch from CE2 back to CE1.
Handling Process
Subinterface 2/0/3.1 on CE1 is connected to 128 NodeBs and configured for dot1q VLAN tag termination. After the undo shut command is run on CE2 to switch services from CE2 back to CE1, CE1 needs to relearn ARP entries.
ARP Miss messages are triggered due to no ARP entry at the lower layer. After receiving ARP Miss message, the subinterface for dot1q VLAN generates fake ARP entries to prevent ARP Miss messages from being sent. The subinterface then writes the ARP Miss messages to packet sending queue Q2PK. With the system scheduling, the CE device reads ARP Miss messages in the Q2PK queue and replicates and sends ARP request messages in VLANs.
Collect information on the existing network and find that over 9000 ARP Miss messages exist in the Q2PK queue for a long time (the reason is illustrated below). When new ARP Miss messages are triggered and written to the Q2PK queue, the CE device needs to send new ARP request messages during the switchback. The Q2PK queue follows the "first in first out" rule. Therefore, the CE device sends new ARP request messages only after successfully sending the previous 9000 ARP request packets. According to the current system performance and 128 VLANs configured, the CE device needs to send 9000 x 128 ARP request messages for the 9000 ARP Miss messages. Assume that the CE device sends 1024 (threshold) ARP request messages in 1s, the CE device takes about 20 minutes to send ARP request messages for these 9000 Miss messages, during which ARP entries cannot be learned and services are interrupted.
Reason that the Q2PK queue has over 9000 ARP Miss messages: A large number of NodeBs on the existing network, among which about 15 NodeBs are often disconnected. The RNC sends three ping packets every 5s to periodically detect the status of each NodeB. The CE device checks ARP entries when forwarding entries. When no ARP entry matching the disconnected NodeBs, the CE device triggers ARP Miss messages and broadcasts ARP request messages. The subinterface broadcasts each ARP request message to all configured VLANs, that is, 128 VLANs. The CE device needs to send 1152 (15 x 3/5 x 128) ARP request messages every second. The card on the CE device sends a maximum of 1024 ARP request messages every second. The remaining 128 (1152 – 1024) ARP Miss messages for which ARP request messages are not sent are written to the Q2PK queue. After about 70s (9000/128), over 9000 ARP Miss messages are written to the Q2PK queue.
Root Cause
N/A
Solution

Solutions:

1. Decrease the number of NodeBs connected to CE1 and CE2 to fewer than 1000. Ensure that services are switched within seconds through R&D tests.
2. Install the SPC036 patch on CE1 and CE2. The SPC036 patch optimizes the Q2PK queue to improve its efficiency so that service switchback is implemented in 2 minutes.
3. Upgrade CE1 and CE2 to V600R003 and configure ARP hot standby so that ARP entries on CE1 and CE2 are synchronized to each other and services can be switched within seconds. Note that the numbers of NodeBs and ARP entries are limited. For the specifications, see the MBB solution.

The customer implements solution 2 currently and solutions 1 and 3 in the future based on requirements.

END