No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

S9700(V2R6) bearer service (internet, email and file-server) are interrupted occasionally

Publication Date:  2016-12-31 Views:  99 Downloads:  0
Issue Description

Customer reported that internet, email and file-server services of some users are affected and stayed about 30 minutes in the morning of 30th Nov and 1st Dec.
Topology:

Handling Process
1.According to the feedback of customer on 1st Dec and our troubleshooting onsite 2nd Dec, we found that ping-packets were lost intermittently (couple seconds) during the issue,and it will lead the service interrupt sometime.
2.Check the information & logs of Core-Switch, we found that the number of ARP had exceeded the specification of the board EH1D2X16SFC0. The number of ARP is 8415, but the specification of EH1D2X16SFC0 is only 8184, all ARP-addresses need to fully storage on every board and then ensure the services totally normal:
 

Step 1      ===============================================

Step 2        ===============display arp===============

Step 3      ===============================================

Step 4      IP ADDRESS      MAC ADDRESS     EXPIRE(M) TYPE        INTERFACE   VPN-INSTANCE

Step 5                                                VLAN/CEVLAN

Step 6      ------------------------------------------------------------------------------

Step 7      172.17.0.254    9404-9cdd-c542            I -         Vlanif1

Step 8      172.17.1.28     74a0-63f5-5602  2         D-0/0       Eth-Trunk2

Step 9                                                   1/-

172.17.1.26     74a0-63f5-565a  2         D-0/0       Eth-Trunk2

。。。。。。

Step 10      192.168.120.1   9404-9cdd-c541            I -         Vlanif2100

Step 11      192.168.120.2   085b-0e47-372a  5         D-0/0       Eth-Trunk12

Step 12                                                2100/-

Step 13      ------------------------------------------------------------------------------

Total:8415      Dynamic:8299    Static:0     Interface:116 

Step 14      ==================================================

Step 15        ===============display device===============

Step 16      ==================================================

Step 17      Chassis 1 (Master Switch)

Step 18      S9706's Device status:

Step 19      Slot  Sub Type         Online    Power      Register       Status     Role 

Step 20      -------------------------------------------------------------------------------

Step 21      1     -   EH1D2X16SFC0 Present   PowerOn    Registered     Normal     NA   

Step 22      2     -   EH1D2X16SFC0 Present   PowerOn    Registered     Normal     NA   

Step 23      3     -   EH1D2X16SFC0 Present   PowerOn    Registered     Normal     NA   

Step 24      4     -   EH1D2X16SFC0 Present   PowerOn    Registered     Normal     NA   

Step 25      5     -   EH1D2G24SEC0 Present   PowerOn    Registered     Normal     NA   

Step 26      6     -   EH1D2G48TEC0 Present   PowerOn    Registered     Normal     NA   

Step 27      7     -   EH1D2SRUDC00 Present   PowerOn    Registered     Normal     Master

Step 28      8     -   EH1D2SRUDC00 Present   PowerOn    Registered     Normal     Slave

Step 29      PWR1  -   -            Present   PowerOn    Registered     Normal     NA   

Step 30      PWR2  -   -            Present   PowerOn    Registered     Normal     NA   

Step 31      CMU1  -   EH1D200CMU00 Present   PowerOn    Registered     Normal     Master

Step 32      FAN1  -   -            Present   PowerOn    Registered     Normal     NA   

Step 33      FAN2  -   -            Present   PowerOn    Registered     Normal     NA   

Step 34      Chassis 2 (Standby Switch)

Step 35      S9706's Device status:

Step 36      Slot  Sub Type         Online    Power      Register       Status     Role 

Step 37      -------------------------------------------------------------------------------

Step 38      1     -   EH1D2X16SFC0 Present   PowerOn    Registered     Normal     NA   

Step 39      2     -   EH1D2X16SFC0 Present   PowerOn    Registered     Normal     NA   

Step 40      3     -   EH1D2X16SFC0 Present   PowerOn    Registered     Normal     NA    

Step 41      4     -   EH1D2X16SFC0 Present   PowerOn    Registered     Normal     NA   

Step 42      5     -   EH1D2G24SEC0 Present   PowerOn    Registered     Normal     NA   

Step 43      6     -   EH1D2G48TEC0 Present   PowerOn    Registered     Normal     NA   

Step 44      7     -   EH1D2SRUDC00 Present   PowerOn    Registered     Normal     Master

Step 45      8     -   EH1D2SRUDC00 Present   PowerOn    Registered     Normal     Slave

Step 46      PWR1  -   -            Present   PowerOn    Registered     Normal     NA   

Step 47      PWR2  -   -            Present   PowerOn    Registered     Normal     NA   

Step 48      CMU1  -   EH1D200CMU00 Present   PowerOn    Registered     Normal     Master

Step 49      FAN1  -   -            Present   PowerOn    Registered     Normal     NA   

FAN2  -   -            Present   PowerOn    Registered     Normal     NA   

The ARP specification of EH1D2X16SFC0 is only 8184

And the ARP specification of EH1D2G24SEC0 is 16376:

The ARP specification of EH1D2G48TEC0 is 16376:

3.We also found that there are some risks (IP CONFLICT、ARP tracks、Mac-Flapping and STP TC packets etc.) on current site, these risks will result in packet loss, so they need to confirm with customer and solved them.
(1)IP CONFLICT: the possible cause is that some users configure static IP-Address on the PCs or devices, and they don't know if these static IP-Addresses are duplicated with the others. So all IP-Addresses should be deployed by the same DHCP-server.
 

Step 1      #2016-12-1 12:11:15-03:00 CORE_HUAWEI ARP/4/ARP_IPCONFLICT_TRAP:OID 1.3.6.1.4.1.2011.5.25.123.2.6 ARP detects IP conflict. (IP address=172.30.152.99, Local interface=Eth-Trunk13, Local MAC=e098-613f-c411, Local vlan=304, Local CE vlan=0, Receive interface=Eth-Trunk23, Receive MAC=3cbb-fd84-ec12, Receive vlan=304, Receive CE vlan=0, IP conflict type=Remote IP conflict).

Step 2      #2016-12-1 12:11:07-03:00 CORE_HUAWEI ARP/4/ARP_IPCONFLICT_TRAP:OID 1.3.6.1.4.1.2011.5.25.123.2.6 ARP detects IP conflict. (IP address=172.30.155.255, Local interface=Eth-Trunk19, Local MAC=e850-8b10-43ba, Local vlan=304, Local CE vlan=0, Receive interface=Eth-Trunk30, Receive MAC=2cf0-eeed-25f5, Receive vlan=304, Receive CE vlan=0, IP conflict type=Remote IP conflict).

#2016-12-1 12:11:06-03:00 CORE_HUAWEI ARP/4/ARP_IPCONFLICT_TRAP:OID 1.3.6.1.4.1.2011.5.25.123.2.6 ARP detects IP conflict. (IP address=172.30.153.17, Local interface=Eth-Trunk13, Local MAC=646c-b25d-8e8f, Local vlan=304, Local CE vlan=0, Receive interface=Eth-Trunk30, Receive MAC=c01a-da3e-aafa, Receive vlan=304, Receive CE vlan=0, IP conflict type=Remote IP conflict).

(2)ARP tracks and Mac-Flapping: it should be caused by the loops on devices, so the same ARP and MAC-address flapped among the ports of loops.

 

Step 1      =====================================================

Step 2        ===============display arp track===============

Step 3      =====================================================

Step 4      Operate Flags: M - Modify, D - Delete

Step 5      --------------------------------------------------------------------------------

Step 6      Op IP-Address      MAC-Address    VLAN Old-Port     New-Port     System-Time  

Step 7      --------------------------------------------------------------------------------

Step 8      M  172.30.154.131  2cf0-eef1-4914 304  Eth-Trunk30  Eth-Trunk13  12-01 12:15:38-03:00

Step 9      M  172.30.153.245  8022-759d-9efa 304  Eth-Trunk14  Eth-Trunk13  12-01 12:15:40-03:00

Step 10      M  172.30.154.47   2cf0-eef0-ad26 304  Eth-Trunk30  Eth-Trunk13  12-01 12:15:41-03:00

Step 11      M  172.30.154.47   c065-9907-cf55 304  Eth-Trunk13  Eth-Trunk30  12-01 12:15:41-03:00

Step 12      M  172.25.1.11     b047-bfc0-e787 320  Eth-Trunk30  Eth-Trunk28  12-01 12:15:41-03:00

Step 13      M  172.30.158.10   f8cf-c50d-5c8f 304  Eth-Trunk30  Eth-Trunk13  12-01 12:15:42-03:00

Step 14      M  172.30.152.14   9839-8e1d-fb49 304  Eth-Trunk13  Eth-Trunk30  12-01 12:15:42-03:00

M  172.30.158.10   f8cf-c50d-5c8f 304  Eth-Trunk13  Eth-Trunk30  12-01 12:15:42-03:00

There are many Mac-Flapping alarm on the logs:

 

Step 1      #2016-12-1 12:06:45-03:00 CORE_HUAWEI L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 MAC move detected, VlanId = 304, MacAddress = 1432-d11a-e390, Original-Port = Eth-Trunk3, Flapping port = Eth-Trunk23. Please check the network accessed to flapping port.

Step 2      #2016-12-1 12:05:54-03:00 CORE_HUAWEI L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 MAC move detected, VlanId = 304, MacAddress = 8065-6d67-03dd, Original-Port = Eth-Trunk19, Flapping port = Eth-Trunk17 and Eth-Trunk16. Please check the network accessed to flapping port.

Step 3      #2016-12-1 12:01:54-03:00 CORE_HUAWEI L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 MAC move detected, VlanId = 304, MacAddress = 8065-6d67-03dd, Original-Port = Eth-Trunk19, Flapping port = Eth-Trunk17 and Eth-Trunk16. Please check the network accessed to flapping port.

Step 4      #2016-12-1 11:55:54-03:00 CORE_HUAWEI L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 MAC move detected, VlanId = 320, MacAddress = 1889-5b16-4c44, Original-Port = Eth-Trunk14, Flapping port = GE1/6/0/46 and Eth-Trunk12. Please check the network accessed to flapping port.

Step 5      #2016-12-1 11:52:41-03:00 CORE_HUAWEI L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 MAC move detected, VlanId = 320, MacAddress = 1889-5b16-4c44, Original-Port = Eth-Trunk12, Flapping port = Eth-Trunk13 and GE1/6/0/46. Please check the network accessed to flapping port.

#2016-12-1 11:47:37-03:00 CORE_HUAWEI L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 MAC move detected, VlanId = 304, MacAddress = 0016-980a-d0c2, Original-Port = Eth-Trunk10, Flapping port = Eth-Trunk13. Please check the network accessed to flapping port.

(3)STP TC packets: it means that the topology of network is not stable, some ports of devices switch to down and up frequently.

 

Step 1      ==================================================================

Step 2        ===============display stp tc-bpdu statistics===============

Step 3      ==================================================================

Step 4       -------------------------- STP TC/TCN information --------------------------

Step 5       MSTID Port                        TC(Send/Receive)      TCN(Send/Receive)

Step 6       0     GigabitEthernet1/5/0/5      17/11                 0/0

Step 7       0     GigabitEthernet1/5/0/12     11142/2               0/0

Step 8       0     GigabitEthernet1/5/0/23     15982/0               0/0

Step 9       0     GigabitEthernet1/6/0/45     40662/279             0/0

Step 10       0     Eth-Trunk1                  421140/19             0/0

 0     Eth-Trunk2                  430252/1562           0/0

 

Root Cause
The major cause of packet loss and part-service-interrupt is that the number of current-ARP exceeded the specification of the board EH1D2X16SFC0.
The other risks (IP CONFLICT、ARP tracks、Mac-Flapping and STP TC packets etc.) also affected the traffic-forwarding.
Solution
Try to decrease the number of ARP on Core-Switch, we suggest to transfer parts of IP-Address gateway from Core-Switch to the Aggregator-switch or Access-Switch.
Suggestions
And there some risks need to solved:
1.There are many IP CONFLICTs with IP-subnet 172.30.15x.xxx, please make clear why different equipments have the same IP-Address.
2.Please check the reason of STP TC:if there ports that switch to up and down frequently? Then check if they need to be configured as STP edge port.
3.Please check the reason of ARP tracks and Mac-flapping:If the network is stable in normal scenario, the ARP and MAC-Address should learn from a fixed port. However from the ARP track information , we found that a large number of ARP ever learned from different ports. In this case, the Core-Switch maybe send packets to the error peer device that result in packets loss. we suspect there was are loops or shock in the network。
4.The current patch of Core-Switch is just V200R006SPH003, please upgrade to the newest patch V200R006SPH009; If possible, it is recommended to upgrade to the newer version V200R008C00SPC500 directly.
5.Interface GigabitEthernet1/6/0/11 is on heavy traffic and more than 90% of its bandwidth, please try to expand the links.

END