某局点USG2200 ping公网直连下一跳延时和丢包问题分析

发布时间:  2014-12-26 浏览次数:  159 下载次数:  0
问题描述
现网组网:内网---------SW---------USG2250---------ISP
USG2250作为公网出口上行连接运营商,下行通过交换机连接内网。客户反馈公网出口带宽50M。
故障时,在USG2250上ping直连下一跳延时达到300多毫秒,丢包率也很高。断开内网的一个服务器后ping正常。要求分析下USG2250 ping公网直连下一跳延时和丢包的原因。
[USG2250-1]ping -c 100 182.x.y.1
15:28:21  2014/07/01
  PING 182.x.y.1: 56  data bytes, press CTRL_C to break
    Reply from 182.x.y.1: bytes=56 Sequence=1 ttl=255 time=310 ms
    Reply from 182.x.y.1: bytes=56 Sequence=2 ttl=255 time=300 ms
    Request time out
    Reply from 182.x.y.1: bytes=56 Sequence=4 ttl=255 time=290 ms
    Reply from 182.x.y.1: bytes=56 Sequence=5 ttl=255 time=290 ms
    Reply from 182.x.y.1: bytes=56 Sequence=6 ttl=255 time=290 ms
    Reply from 182.x.y.1: bytes=56 Sequence=7 ttl=255 time=280 ms
    Reply from 182.x.y.1: bytes=56 Sequence=8 ttl=255 time=290 ms
    Request time out
    Request time out
    Reply from 182.x.y.1: bytes=56 Sequence=11 ttl=255 time=300 ms
    Reply from 182.x.y.1: bytes=56 Sequence=12 ttl=255 time=300 ms
    Reply from 182.x.y.1: bytes=56 Sequence=13 ttl=255 time=300 ms
    Reply from 182.x.y.1: bytes=56 Sequence=14 ttl=255 time=300 ms
    Reply from 182.x.y.1: bytes=56 Sequence=15 ttl=255 time=310 ms
    Request time out
    Request time out
    Reply from 182.x.y.1: bytes=56 Sequence=18 ttl=255 time=290 ms
  --- 182.x.y.1 ping statistics ---
    18 packet(s) transmitted
    13 packet(s) received
    27.77% packet loss
    round-trip min/avg/max = 280/296/310 ms
处理过程
1、在内网连接服务器的时候,登录设备检查接口和流量情况,G0/0/0连接公网,自动协商为1000Mb/s接口,G0/0/1连接内网,自动协商成100Mb/s接口。流量为从内网到公网11Mbps,远小于客户说的50Mbps出口带宽。
[USG2250-1]dis ip i b
15:28:12  2014/07/01
*down: administratively down
(s): spoofing
Interface                   IP Address      Physical Protocol Description
Cellular0/1/0               unassigned      down     up(s)    Huawei, USG2200
GigabitEthernet0/0/0        182.x.y.34  up       up       to-dianxing
GigabitEthernet0/0/1        192.168.220.5   up       up       to-neiwang
GigabitEthernet6/0/0        192.168.13.1    down     down     xintiaokou
[USG2250-1] dis inter g 0/0/1
15:28:59  2014/07/01
GigabitEthernet0/0/1 current state : UP 
Line protocol current state : UP
GigabitEthernet0/0/1 current firewall zone : trust
Description : to-neiwang, Route Port
The Maximum Transmit Unit is 1500 bytes, Hold timer is 10(sec)
Internet Address is 192.168.220.5/24
IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is 0022-a110-5a3a
Media type is twisted pair, loopback not set, promiscuous mode not set
100Mb/s-speed mode, Full-duplex mode, link type is auto negotiation
flow control is disable
QoS max-bandwidth : 100000 Kbps
Output queue : (Urgent queue : Size/Length/Discards)  0/50/0
Output queue : (Frag queue : Size/Length/Discards)  0/1000/0
Output queue : (Protocol queue : Size/Length/Discards) 0/1000/0
Output queue : (FIFO queue : Size/Length/Discards)  0/256/0
    Last 300 seconds input rate 11324600 bits/s, 1674 packets/s
    Last 300 seconds output rate 6334768 bits/s, 1404 packets/s
    Input: 13866756 packets, 11447758127 bytes
           19910 broadcasts, 10629 multicasts
           0 errors, 0 runts, 0 giants, 0 FCS
           0 length error, 0 code error, 0 align errors
    Output:11924224 packets,  7260684460 bytes
           258 broadcasts, 0 multicasts
           0 errors, 0 collisions, 0 late collisions
           0 ex. collisions, 0 FCS error
           0 deferred, 0 runts, 0 giants
[USG2250-1]dis inter g 0/0/0
15:51:24  2014/07/01
GigabitEthernet0/0/0 current state : UP 
Line protocol current state : UP
GigabitEthernet0/0/0 current firewall zone : untrust
Description : to-dianxing, Route Port
The Maximum Transmit Unit is 1500 bytes, Hold timer is 10(sec)
Internet Address is 182.x.y.34/26
IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is 0022-a110-5a39
Media type is twisted pair, loopback not set, promiscuous mode not set
1000Mb/s-speed mode, Full-duplex mode, link type is auto negotiation
flow control is disable
QoS max-bandwidth : 1000000 Kbps
Output queue : (Urgent queue : Size/Length/Discards)  0/50/0
Output queue : (Frag queue : Size/Length/Discards)  0/1000/0
Output queue : (Protocol queue : Size/Length/Discards) 0/1000/0
Output queue : (FIFO queue : Size/Length/Discards)  0/256/0
    Last 300 seconds input rate 6424568 bits/s, 1324 packets/s
    Last 300 seconds output rate 11533496 bits/s, 1559 packets/s
    Input: 13983466 packets, 8599891892 bytes
           1161 broadcasts, 0 multicasts
           0 errors, 0 runts, 0 giants, 0 FCS
           0 length error, 0 code error, 0 align errors
    Output:16031673 packets,  13370293991 bytes
           1809 broadcasts, 0 multicasts
           0 errors, 0 collisions, 0 late collisions
           0 ex. collisions, 0 FCS error
           0 deferred, 0 runts, 0 giants
[USG2250-1] dis i b
15:46:19  2014/07/01
PHY: Physical
*down: administratively down
^down: standby down
(s): spoofing
InUti/OutUti: input utility/output utility
Interface                   PHY   Protocol InUti OutUti   inErrors  outErrors
Cellular0/1/0               down  up(s)       0%     0%          0          0
GigabitEthernet0/0/0        up    up       0.39%     1%          0          0
GigabitEthernet0/0/1        up    up         11%     3%          0          0
GigabitEthernet6/0/0        down  down        0%     0%          0          0
NULL0                       up    up(s)       0%     0%          0          0
2、检查USG2250的CPU使用率,管理面和转发面的CPU都不高,不会因为CPU利用率高导致丢包。
[USG2250-1]dis cpu-usage
15:45:14  2014/07/01
===== Current CPU usage info =====
CPUID  CPUNAME  %CPU     STATUS          RUN_CNT
0     MGMT     10%        Running 0x00000000
1     FPATH       4%         Waiting   0x006e4772
2     AGE 100%        Running 0x00000000
3     FPATH       4%         Running 0x006e43e9
4     FPATH       1%         Waiting   0x0038173c
5     FPATH       1%         Waiting   0x00381758
6     FPATH       1%         Waiting   0x0038173c
7     FPATH       1%         Waiting   0x003816ea
3、防火墙上配置流统看ping的结果,防火墙本身没有丢包,对端回应的报文就已经少了。从前面的信息以及流统分析,防火墙没有达到性能瓶颈,自身也没有丢包,应该跟防火墙不相关。
[USG2250-1] acl 3333
[USG2250-1-acl-adv-3333]rule permit icmp destination 182.x.y.1 0
[USG2250-1-acl-adv-3333]rule permit icmp source 182.x.y.1 0
[USG2250-1-diagnose]firewall statistic acl 3333 enable
[USG2250-1]ping -c 100 182.x.y.1
15:33:00  2014/07/01
  PING 182.x.y.1: 56  data bytes, press CTRL_C to break
    Request time out
    Request time out
    Reply from 182.x.y.1: bytes=56 Sequence=3 ttl=255 time=300 ms
    ……
    Reply from 182.x.y.1: bytes=56 Sequence=100 ttl=255 time=290 ms
  --- 182.x.y.1 ping statistics ---
    100 packet(s) transmitted
    77 packet(s) received
    23.00% packet loss
    round-trip min/avg/max = 270/299/320 ms
[USG2250-1-diagnose] dis firewall statistic acl
15:47:22  2014/07/01
Current Show sessions count: 7
 
Protocol(ICMP) SourceIp(182.130.246.34) DestinationIp(182.x.y.1) 
SourcePort(44001) DestinationPort(2048) VpnIndex(public) 
           Receive           Forward           Discard 
Obverse : 100        pkt(s) 100        pkt(s) 0          pkt(s) 
Reverse : 77         pkt(s) 77         pkt(s) 0          pkt(s)
 
Discard detail information:
4、但是公网出口带宽有50M,实际经过防火墙只有11M,断开内网服务器ping又正常,到底内网服务器对USG2250自身ping公网下一跳有什么影响呢?
5、协调客户断开内网服务器进行对比测试,对于防火墙来说,唯一的差别就是流量大小由原来的11M降低到0.1M,ping就正常了。难道客户说的出口带宽50M是不对的?
[USG2250-1]dis i b
18:27:34  2014/07/01
PHY: Physical
*down: administratively down
^down: standby down
(s): spoofing
InUti/OutUti: input utility/output utility
Interface                   PHY   Protocol InUti OutUti   inErrors  outErrors
Cellular0/1/0               down  up(s)       0%     0%          0          0
GigabitEthernet0/0/0        up    up       0.01%  0.01%          0          0
GigabitEthernet0/0/1        up    up       0.16%  0.06%          0          0
GigabitEthernet6/0/0        down  down        0%     0%          0          0
NULL0                       up    up(s)       0%     0%          0          0
[USG2250-1]  ping -c 100 182.x.y.1
18:30:36  2014/07/01
  PING 182.x.y.1: 56  data bytes, press CTRL_C to break
    Reply from 182.x.y.1: bytes=56 Sequence=1 ttl=255 time=1 ms
    Reply from 182.x.y.1: bytes=56 Sequence=2 ttl=255 time=1 ms
    ……
    Reply from 182.x.y.1: bytes=56 Sequence=99 ttl=255 time=1 ms
    Reply from 182.x.y.1: bytes=56 Sequence=100 ttl=255 time=1 ms
  --- 182.x.y.1 ping statistics ---
    100 packet(s) transmitted
    100 packet(s) received
    0.00% packet loss
    round-trip min/avg/max = 1/1/10 ms
6、在防火墙上配置car对出口流量限流测试,限流5M,ping就立刻正常,删除限流策略,ping延时又增加。逐步调整car的配置,调整到限流7M时,ping也正常,再扩大就延时正常。
#
car-class test type per-ip
car max 5000
#
#
traffic-policy interzone trust untrust outbound per-ip
policy 0
  action car
  policy source 192.168.220.8 0
  policy car-type source-ip
  policy car-class test
#
[USG2250-1]ping -c 100 182.x.y.1
18:38:33  2014/07/01
  PING 182.x.y.1: 56  data bytes, press CTRL_C to break
    Reply from 182.x.y.1: bytes=56 Sequence=1 ttl=255 time=1 ms
    Reply from 182.x.y.1: bytes=56 Sequence=2 ttl=255 time=10 ms
    ……
    Reply from 182.x.y.1: bytes=56 Sequence=46 ttl=255 time=10 ms
    Reply from 182.x.y.1: bytes=56 Sequence=47 ttl=255 time=1 ms
  --- 182.x.y.1 ping statistics ---
    47 packet(s) transmitted
    47 packet(s) received
    0.00% packet loss
round-trip min/avg/max = 1/3/10 ms
[USG2250-1-traffic-policy-interzone-trust-untrust-outbound-per-ip]policy 0 disable 
18:39:29  2014/07/01
Info: The policy is disabled.

[USG2250-1-traffic-policy-interzone-trust-untrust-outbound-per-ip]q
18:39:32  2014/07/01
[USG2250-1]ping -c 100 182.x.y.1
18:39:37  2014/07/01
  PING 182.x.y.1: 56  data bytes, press CTRL_C to break
    Reply from 182.x.y.1: bytes=56 Sequence=1 ttl=255 time=160 ms
    Reply from 182.x.y.1: bytes=56 Sequence=2 ttl=255 time=110 ms
Reply from 182.x.y.1: bytes=56 Sequence=3 ttl=255 time=110 ms
[USG2250-1-per-ip-car-class-test]car max 7000
18:47:20  2014/07/01
[USG2250-1-per-ip-car-class-test]q
18:47:20  2014/07/01
[USG2250-1]ping -c 100 182.x.y.1
18:47:26  2014/07/01
  PING 182.x.y.1: 56  data bytes, press CTRL_C to break
    Reply from 182.x.y.1: bytes=56 Sequence=1 ttl=255 time=1 ms
    Reply from 182.x.y.1: bytes=56 Sequence=2 ttl=255 time=1 ms
    Reply from 182.x.y.1: bytes=56 Sequence=3 ttl=255 time=1 ms
    Reply from 182.x.y.1: bytes=56 Sequence=4 ttl=255 time=10 ms
Reply from 182.x.y.1: bytes=56 Sequence=5 ttl=255 time=10 ms
7、从测试结果很明显能够确认,USG2250实际的公网出口带宽并没有客户说的50M,大于7M的流量就开始在运营商侧丢包。所以问题的根本原因就是运营商带宽不足导致。
建议与总结
1、自身ping延时和丢包的问题,可能的原因还是那么几个(接口、CPU、流量),排查的手段也还是那些,逐个排查,根据现网实际的信息来分析判断,碰到多了,同一类问题的处理自然而然形成固有的套路,
2、客户提供的信息不能不信也不能全信,全信了有时候就容易掉进神坑,需要保持怀疑的态度去对待;
3、怀疑运营商带宽限制的时候,在防火墙上做限流是个很好的测试方式,可以提供很明确的证据说明运营带宽不足。

END