No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade
Knowledge Base

HRP traffic issue in Eudomon 1000E causing service outage

Publication Date:  2013-04-22  |   Views:  371  |   Downloads:  0  |   Author:  y00732219  |   Document ID:  EKB1000026958

Contents

Issue Description

WAP services in Delhi Hub down for 10 min.

Alarm Information

Circle team inform the TAC team about the outage in North region.

WAP users are not able to browse internet.

< DEL-1000E-FW-B1 >

2012-07-06 10:26:02 DEL-1000E-FW-B1 %%01SHELL/4/LOGIN(l): vrf:public user:huawei login from 10.176.14.27
2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01HRPI/4/LOG(l): HRP config state change: MASTER -> SLAVE
2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01VRRP/4/StateWarning(l):
Eth-Trunk3 | Virtual Router 44 :  MASTER --> BACKUP

2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01VRRP/4/StateWarning(l):
Eth-Trunk7 | Virtual Router 3 :  MASTER --> BACKUP

2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01VRRP/4/StateWarning(l):
Ethernet1/0/0 | Virtual Router 254 :  MASTER --> BACKUP

2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01VRRP/4/StateWarning(l):
Ethernet1/0/2 | Virtual Router 77 :  MASTER --> BACKUP

2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01VRRP/4/StateWarning(l):
GigabitEthernet0/0/0 | Virtual Router 11 :  MASTER --> BACKUP

2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01VRRP/4/StateWarning(l):
GigabitEthernet0/0/1 | Virtual Router 33 :  MASTER --> BACKUP

2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01ARP/4/ARP_DUPLICATE_IPADDR(l): Receive an ARP packet with duplicate ip address 10.181.186.75 from Eth-Trunk3, source MAC is 0000-5e00-012c
2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01HRPI/4/LOG(l): HRP config state change: SLAVE -> MASTER
2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01VRRP/4/StateWarning(l):
Eth-Trunk3 | Virtual Router 44 :  BACKUP --> MASTER

2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01VRRP/4/StateWarning(l):
Eth-Trunk7 | Virtual Router 3 :  BACKUP --> MASTER

2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01VRRP/4/StateWarning(l):
Ethernet1/0/0 | Virtual Router 254 :  BACKUP --> MASTER

2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01VRRP/4/StateWarning(l):
Ethernet1/0/2 | Virtual Router 77 :  BACKUP --> MASTER

2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01VRRP/4/StateWarning(l):
GigabitEthernet0/0/0 | Virtual Router 11 :  BACKUP --> MASTER

2012-07-06 10:25:53 DEL-1000E-FW-B1 %%01VRRP/4/StateWarning(l):
GigabitEthernet0/0/1 | Virtual Router 33 :  BACKUP --> MASTER

Handling Process

Date
(Day/month/year) Time
(24h) Operational situation Performed activities Owner Result

06-07-2012 11:32 Wap services were down for 10 minutes Wap services automatically restored.Circle team opened ticket for RCA Huawei TAC team analyzing the issue
06-07-2012  11:40 WAP service restored TAC find  out there is some abnormality in VRRP of Firewalls Huawei Issue Escalated  to GTAC
06-07-2012  11:50 WAP service restored  GTAC analyzing the issue Huawei GTAC asked for logs of downtime
06-07-2012 12 :00 WAP services restored Log shared with GTAC Huawei GTAC  Analyzing the issue
06-07-2012 12:30 WAP services restored GTAC analyzing the issue Huawei GTAC suspect the issue in HRP backup
06-07-2012 13:00 WAP services restored GTAC confirmed that issue comes when HRP interface utilization exceeds threshold.

Root Cause

At the time of outage the backup firewall became master because it did not receive heartbeat packet 3 times in a row from master firewall so the backup firewall considered that master firewall is down and decided to become master.
After analyzing the log we find out that backup firewall did not receive heartbeat packet from master firewall because the utilization of HRP link (which forwards heartbeat packet between firewall) exceed threshold.
So there were two masters at a same time. So communication between Core switch and firewall broke down causing outage.
The  capacity of heartbeat interface between firewall   is E1/0/0(100M), as the size of the heartbeat packets forwarded between master and slave are around 300bytes, so the actual performance of the link will be around 70M, So at any time when traffic cross 70 M ,link will drop packets.
If traffic continuously cross 70M for 3 seconds, it will possibly lead to drop heartbeat packets for 3 times, and causing the backup firewall to became  master.
OVERRUN on the interface:-
On the interface overruns is increasing that shows that utilization of HRP Ethernet exceeds threshold.
Around 13:00, the eth1/0/0 interface statistic was as below:
Ethernet1/0/0 current state : UP 
Line protocol current state : UP
Ethernet1/0/0 current firewall zone : dmz
Description : For heartbeat
The Maximum Transmit Unit is 1500 bytes, Hold timer is 10(sec)
Internet Address is 10.181.185.41/29
IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is 0018-8299-1c84
Media type is twisted pair, loopback not set, promiscuous mode not set
Output flow-control is unsupported, input flow-control is unsupported
Output queue : (Urgent queue : Size/Length/Discards)  0/50/0
Output queue : (Protocol queue : Size/Length/Discards) 0/1000/0
Output queue : (FIFO queuing : Size/Length/Discards)  0/75/0
no negotiation auto, 100Mb/s-speed mode, duplex full, loopback not set
                      Last 5 minutes input rate 6879126 bytes/sec, 20600 packets/sec  // 52M/s as 5 min average speed
                      Last 5 minutes output rate 669 bytes/sec, 6 packets/sec
                       Input: 355239941 packets, 2705769619 bytes
                   434 broadcasts, 355237670 multicasts
                   1008 errors, 0 runts, 0 giants,
0 CRC, 1 align errors, 1007 overruns,   //but we can see overrun packets statistic here, means during this 5min, sometimes the real time speed has over 70m already. So some packets has been lost.
                      Output:23427203 packets,  2929072642 bytes
                   1842 broadcasts, 23424979 multicasts
                   0 errors, 0 underruns, 0 collisions
                   0 deferred

Around 16:00, the eth1/0/0 interface statistic was as below:
HRP_S[DEL-1000E-FW-B1]display interface Ethernet1/0/0
16:03:47  2012/07/06
Ethernet1/0/0 current state : UP 
Line protocol current state : UP
Ethernet1/0/0 current firewall zone : dmz
Description : For heartbeat
The Maximum Transmit Unit is 1500 bytes, Hold timer is 10(sec)
Internet Address is 10.181.185.41/29
IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is 0018-8299-1c84
Media type is twisted pair, loopback not set, promiscuous mode not set
Output flow-control is unsupported, input flow-control is unsupported
Output queue : (Urgent queue : Size/Length/Discards)  0/50/0
Output queue : (Protocol queue : Size/Length/Discards) 0/1000/0
Output queue : (FIFO queuing : Size/Length/Discards)  0/75/0
no negotiation auto, 100Mb/s-speed mode, duplex full, loopback not set
                        Last 5 minutes input rate 7366539 bytes/sec, 22059 packets/sec  //56M/s as 5 min average speed
                      Last 5 minutes output rate 404 bytes/sec, 4 packets/sec
                       Input: 579642108 packets, 331555803 bytes
                   438 broadcasts, 579639828 multicasts
                   1092 errors, 0 runts, 0 giants,
                   0 CRC, 1 align errors, 1091 overruns,//same as above, the overrun statistic was increasing.
                       Output:23496409 packets,  2936050000 bytes

Suggestions

As the current version of firewall is old, The heartbeat packets traffic between master and backup firewall is heavy .So one suggestion is to upgrade firewall to new version E1000E V100R002C01SPC200+ patch SPH204, it can reduce  20 % of the heartbeat traffic between firewalls . This will reduce the possibility of happening this issue again.
Another suggestion which can totally resolve this issue is change the heartbeat interface from 100M interface to a GE interface.