Multiple Servers Connected to an S7700 Cannot Communicate

Publication Date:  2015-11-02 Views:  714 Downloads:  0
Issue Description
As shown in Figure, an S7706 that runs V200R001C00SPC300 and V200R001SPH007 functions as the aggregation switch in the server area and serves as the gateway of all servers. Multiple VLANs are assigned on the S7700, multiple servers are deployed in each VLAN, and servers in different VLANs need to communicate.

Figure  Networking



The administrator finds that servers fail to communicate sometimes. For example, the server at 10.1.2.6 in VLAN 500 can communicate with the server at 10.1.4.11 sometimes, services are interrupted sometimes, and ping packets are discarded in some situations.

The configuration is as follows (interfaces are access interfaces and the configuration is not provided here)

vlan 100 
description ==hongruan== 
vlan 101 
description ==hongruan-sub== 
vlan 200 
description ==tianyu== 
vlan 300 
description ==xiweier== 
vlan 400 
description ==UT== 
vlan 500 
description ==xike== 
vlan 600 
description ==dongfangwangxin== 
vlan 700 
description ==guanyong== 
vlan 900 
description ==shiboyun== 
vlan 1000 
description ==wangguan== 

interface Vlanif100 
description ==hongruan== 
ip address 10.1.2.3 255.255.255.192 

interface Vlanif101 
ip address 10.1.2.67 255.255.255.192 

interface Vlanif200 
description ==tianyu== 
ip address 10.1.2.131 255.255.255.128 

interface Vlanif300 
description ==xiweier== 
ip address 10.1.3.3 255.255.255.128 

interface Vlanif400 
ip address 10.1.3.131 255.255.255.128 
vrrp vrid 7 virtual-ip 10.1.3.129 

interface Vlanif500 
ip address 10.1.4.3 255.255.255.128 

interface Vlanif600 
ip address 10.1.4.131 255.255.255.128 

interface Vlanif700 
ip address 10.1.5.3 255.255.255.128 

interface Vlanif900 
ip address 10.1.6.3 255.255.255.128 

interface Vlanif1000 
ip address 10.1.254.2 255.255.255.128 
#
Handling Process
1. Check the ARP table. When service forwarding fails, the ARP entry matching the IP address does not exist. Run the display arp track command on the S7700. The command output shows that there is the log about deleting the ARP entry. The ARP entry deletion time is the same as the packet loss time of the server.

[S7700] display arp track
Operate Flags: M - Modify, D - Delete
--------------------------------------------------------------------------------
Op IP-Address      MAC-Address    VLAN Old-Port     New-Port     System-Time   
--------------------------------------------------------------------------------
M  10.1.3.180      e41f-1360-0710 400  GE1/0/39     GE1/0/40     09-05 12:34:35
D  10.1.2.6       f01f-afd2-9cd6 300   GE2/0/30                  09-05 12:34:59
D  10.1.4.11       e0db-5524-f9d8 500  GE2/0/10                  09-05 12:35:33

According to the preceding information, ping packets are lost because the ARP entry is deleted on the S7700. The S7700 cannot process excess ARP Request packets simultaneously, so the S7700 does not send ARP Reply packets to the server in a timely manner. Within the aging time, the ARP entry of the server is deleted.

2. Run the display cpu-defend statistics packet-type arp-request all command. You can view the following information:

[S7700] display cpu-defend statistics packet-type arp-request all 
Statistics on mainboard: 
------------------------------------------------------------------------------- 
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets) 
------------------------------------------------------------------------------- 
arp-request            79785920     13193856         1246655          206154 
------------------------------------------------------------------------------- 
Statistics on slot 1: 
------------------------------------------------------------------------------- 
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets) 
------------------------------------------------------------------------------- 
arp-request             3730112            0           58283               0 
------------------------------------------------------------------------------- 
Statistics on slot 2: 
------------------------------------------------------------------------------- 
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets) 
------------------------------------------------------------------------------- 
arp-request            73818304     20585792         1153411          321653 
------------------------------------------------------------------------------- 
Statistics on slot 3: 
------------------------------------------------------------------------------- 
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets) 
------------------------------------------------------------------------------- 
arp-request              531264            0            8301               0 
------------------------------------------------------------------------------- 
Statistics on slot 5: 
------------------------------------------------------------------------------- 
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets) 
------------------------------------------------------------------------------- 
arp-request                 N/A          N/A               0               0 
------------------------------------------------------------------------------- 
Statistics on slot 6: 
------------------------------------------------------------------------------- 
Packet Type         Pass(Bytes)  Drop(Bytes)   Pass(Packets)   Drop(Packets) 
------------------------------------------------------------------------------- 
arp-request            15580920            0          232981               0 
-------------------------------------------------------------------------------

3. Bind the static ARP entry of the server on the S7706 to ensure that the ARP entry of the server remains unchanged during testing. Perform the ping operation. No packet is discarded.

There are too many downstream ARP Request packets. As a result, the S7706 randomly discards ARP Request packets of the server. Within the aging time, the ARP entry of the server is deleted. Consequently, ping packets of the downstream server are discarded.

4. Configure a CPU defense policy on the S7700 to check the MAC address of the server that sends too many ARP Request packets.

cpu-defend policy test
auto-defend enable
auto-defend attack-packet sample 5  //The switch samples and identifies every five packets. A small sampling ratio indicates more consumed CPU resources.
auto-defend threshold 30  //Checking threshold for attack source tracing
auto-defend trace-type source-mac  //Attack source tracing based on source MAC addresses
auto-defend protocol arp  //ARP packets that the device monitors in attack source tracing
cpu-defend-policy test global  //Apply the CPU defense policy globally.

5. Run the display auto-defend attack-source slot 2 command to check the MAC address of the server that sends excess ARP Request packets.

Attack Source User Table (MPU):  

------------------------------------------------------------------------------------------------ 
  MacAddress       InterfaceName      Vlan:Outer/Inner      TOTAL 
------------------------------------------------------------------------------------------------ 
0000-0000-00db   GigabitEthernet2/0/22         193           416

You can also run the display logbuffer command to check the MAC address of the server of which ARP Request packets are discarded.
Root Cause
The downstream server sends many ARP Request packets to the S7700, whereas the S7700 can process ARP Request packets of a certain number. In this case, normal ARP Request packets are discarded by CPCAR, and the packets cannot be sent to the CPU of the S7700 for processing.

Within the aging time, the ARP entry of the server is aged out on the S7700. Consequently, servers in different VLANs cannot communicate.
Solution
The server automatically sends dozens of ARP packets every second. The frequency is then adjusted to be one ARP packet per second, and services are restored.
Suggestions
When faults occur in the forwarding plane of a switch, check entries in the control plane. Most Layer 2 problems occur because the rate of protocol packets exceeds the CPCAR value.

END