Due to Inconsistency Between the Inbound Path and the Outbound Path, FTP Network Connections are Down

Publication Date:  2012-07-17 Views:  238 Downloads:  0
Issue Description
Symptom
The servers can be pinged through from all clients, but some clients fail to access some servers over FTP.
Service Introduction
1. FTP services. Clients are on MDCN, and servers are connected to the SE800.
Network diagram
Alarm Information
None.
Handling Process
1.          All servers on the live network can be pinged through, while some FTP packets are not allowed through. Therefore, routing problems can be ruled out. The fault originates from networking.
2.          Identify the difference between ping packets and FTP packets. Ping packets use the ICMP protocol. A ping session consists only a packet sending process and a packet response process. Ping packets contain no status information. However, an FTP service uses the TCP protocol. An FTP session requires three times of confirmation (handshake information). Therefore, it is suspected that the fault is relevant to status information.
3.          Analyze the networking of the live network. Firewalls on the live network implement dual-system hot backup. According to preceding experience, the fault may be caused by the inconsistency between the inbound path and the outbound path. According to the network diagram, a server is connected to two switches. The two links are in active/active mode. Therefore, the inbound path may be inconsistent with the outbound path. Verify TCP services on a network with inconsistent inbound path and outbound path in the lab.
4.          According to the verification result, FTP services may fail while ping services are normal on a network with inconsistent inbound path and outbound path.
To conclude, the FTP service on the live network becomes abnormal due to packet loss caused by inconsistent inbound path and outbound path.
Root Cause
1.          All internal servers are directly connected to the two SE800 devices through two paths for redundancy. The SE800_A interface of path 1 is at 10.26.100.81/28, and the SE800_B interface of path 2 is at 10.26.100.97/28. An internal server selects one from the SE800 devices randomly, because both paths reach the Cisco switches on floor 2. VRRP is implemented from the SE800 devices to Cisco switches and from Cisco switches to the SE800 devices. The SE800 VRRP heartbeat packets go through Cisco switches.

2.          If the SE800_A is the master device, each Cisco switch can receive packets from clients. Because the next hop is the SE800 VRRP master device, packets are sent to SE800_A through the E1000E_A. Packets from servers to clients are received by either SE800 device randomly. If SE800_A receives these packets, it sends the packets to Cisco VRRP master device through the E1000E_A link. In this case, the inbound path and outbound path for the firewall are consistent. If SE800_A receives these packets, it sends the packets to the Cisco VRRP master device through the E1000E_B link. In this case the inbound path and outbound path for the firewall are inconsistent.
3.          The execution result of the tracert command on onsite servers reveals inconsistency between the inbound path and the outbound path.
gprsadm@GGSNWH08_RE0> traceroute 10.25.5.71
traceroute to 10.25.5.71 (10.25.5.71), 30 hops max, 40 byte packets
 1  10.26.100.97 (10.26.100.97)  0.379 ms  0.310 ms  0.348 ms
 2  10.26.97.236 (10.26.97.236)  2.411 ms  2.350 ms  2.524 ms
 3  10.25.253.1 (10.25.253.1)  0.529 ms  0.556 ms  0.463 ms
……
4.          When the inbound path is consistent with the outbound path, services are normal. When the inbound path is inconsistent with the outbound path, the backup packets are delayed due to the aging thread scan (the latency is about several seconds). If a server is pinged from a client, the ICMP packets for establishing a session go through E1000E_A, and the response packets go through the E1000E_B link. When E1000E_B receives the ICMP response packets, it discards these packets if it fails to search for the session. Several seconds later, the backup packets reach E1000E_B. At this time, E1000E_B has already created a corresponding session, and subsequent ICMP packets are responded normally. Therefore, no response packet of the first ping packet is received, while those of the subsequent ping packets are received (the lab verification result is the same).
5.          An FTP service is different from a ping command. It is a TCP service. Changes of its session table strictly follow three-way handshake. Therefore, After E1000A creates an FTP session, the backup packets reach E1000_B several seconds later. After E1000E_B receives syn-ack packets, it sends the backup session to E1000E_A if it can detect the session. E1000E_A allows ack packets through only when it is in the syn-ack state. Therefore, the following occurs:

As a result, FTP packets are not allowed through (lab verification result).  
Suggestions
  To solve such a problem caused by inconsistent inbound path and outbound path, use any of the following methods:
1.        Enable the firewall quick backup function to eliminate the latency of session backup.
[Eudemon]hrp mirror session enable
2.        Disable the link status check function.
[Eudemon]undo firewall session  link-state check  

END