S9300端口出方向出现discard丢包计数增长故障

发布时间:  2015-09-24 浏览次数:  846 下载次数:  0
问题描述

1、组网:两台S9300设备通过Eth-Trunk互联,两台设备分别下挂服务器,部分业务需跨设备访问。

2、  故障现象:网管报报端口大量丢包的告警信息,但监控到实际业务未受到影响。登陆对应设备查看对应端口计数,多次刷新,多个接口信息discard计数增长,如:

 

GigabitEthernet2/0/31

Output:  19040059771 packets, 15564833213514 bytes

  Unicast:                19033747559,  Multicast:                     4349024

  Broadcast:                  1963188,  Jumbo:                               0

  Discard:                    8140109,  Total Error:                         0

 

  Collisions:                       0,  ExcessiveCollisions:                 0

  Late Collisions:                  0,  Deferreds:                           0

  Buffers Purged:                   0

处理过程

1、在出现discard丢包的时候首先观察接口的状态

 

GigabitEthernet2/0/31 current state : UP

Line protocol current state : UP

Description:HUAWEI, Quidway Series, GigabitEthernet2/0/31 Interface

Switch Port, TPID : 8100(Hex), The Maximum Frame Length is 9216

IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is 4cb1-6c7e-da30

Last physical up time   : 2015-09-16 07:06:52

Last physical down time : 2015-09-16 06:57:30

Port Mode: COMMON COPPER

Speed : 1000,  Loopback: NONE

Duplex: FULL,  Negotiation: ENABLE

Mdi   : AUTO

Last 300 seconds input rate 112067448 bits/sec, 27120 packets/sec

Last 300 seconds output rate 130123712 bits/sec, 27490 packets/sec

Input peak rate 985727944 bits/sec, Record time: 2015-09-12 05:29:10    

Output peak rate 974035880 bits/sec, Record time: 2015-09-15 09:52:04    //历史出方向峰值

 

Input:  16553717087 packets, 9444065199935 bytes

  Unicast:                16550336032,  Multicast:                     1938229

  Broadcast:                  1442826,  Jumbo:                               0

  Discard:                          0,  Total Error:                         0

 

  CRC:                              0,  Giants:                              0

  Jabbers:                          0,  Fragments:                           0

  Runts:                            0,  DropEvents:                          0

  Alignments:                       0,  Symbols:                             0

  Ignoreds:                         0,  Frames:                              0

 

Output:  19040059771 packets, 15564833213514 bytes

  Unicast:                19033747559,  Multicast:                     4349024

  Broadcast:                  1963188,  Jumbo:                               0

  Discard:                    8140109,  Total Error:                         0

 

  Collisions:                       0,  ExcessiveCollisions:                 0

  Late Collisions:                  0,  Deferreds:                           0

  Buffers Purged:                   0

 

    Input bandwidth utilization threshold : 100.00%

    Output bandwidth utilization threshold: 100.00%

    Input bandwidth utilization  : 11.21%                                

    Output bandwidth utilization : 13.01%                                //当前接口出流量

 

接口的历史流量峰值很大,但当前流量远小于接口带宽的峰值,多次查看,discard计数一直在增加。

 

2、根据RFC2863规定,接口出方向discard丢包是由于突发流量过大拥塞导致。设备接口统计的速率是过去300秒的平均数,不能体现峰值数据。为了进一步确定是由于突发流量导致,在discard计数增加时,配置镜像,进行抓包分析流量图趋势。

镜像配置命令如下:

observe-port 1 interface GigabitEthernet3/0/24

interface GigabitEthernet2/0/31

port-mirroring to observe-port 1 outbound

 

通过抓包后分析流量图,发现接口流量在毫秒级到达了峰值,端口discard计数增加由于拥塞导致,这儿属于正常现象。

根因

突发流量过大,出现拥塞,接口不能缓存,产生discard丢包。

解决方案

1、扩容增加带宽,比如把千兆接口换成万兆接口;

2、找到突发流量的根源,从根本解决该问题;

3、在上游设备对流量进行整形,控制流量的突发行为。

建议与总结

1、出方向的Discard计数是突发流量超过了接口带宽和缓存导致的,RFC2863中描述出方向的discard计数就是缓存不足的丢包计数。

RFC 2863                The Interfaces Group MIB               June 2000

ifOutDiscards OBJECT-TYPE

    SYNTAX      Counter32

    MAX-ACCESS  read-only

   STATUS      current

    DESCRIPTION

            "The number of outbound packets which were chosen to be

            discarded even though no errors had been detected to prevent

            their being transmitted.  One possible reason for discarding

            such a packet could be to free up buffer space.

 

            Discontinuities in the value of this counter can occur at

            re-initialization of the management system, and at other

            times as indicated by the value of

            ifCounterDiscontinuityTime."

2、一般出现的拥塞的情况:

       a、下挂服务器突发情况严重,有些服务器发送流量并不平稳,而是在很短时间内集中发送大量的报文,比如1000Mbit/s情况下平均算是1Mbit/ms,如果在1ms时间内服务器发送超过1M, 毫秒级突发峰值达到或者超过1Gbit/s, 这时需要单板的缓存来缓冲当这种这样的突发,将超过1Gbit/s的流量暂存再发出,而当突发持续的时间比较久(超过几毫秒),端口的缓存会被耗尽,导致拥塞丢包。

       b、 大带宽的端口进,小带宽的端口出;多端口进打一个端口出等场景

3、wireshark抓包后调出流量图的方法

 

discard

 

4、当前在现网中大多出现discard丢包场景接口是千兆接口,在处理这种场景时,建议观察口用万兆接口,这样更好的观察流量趋势。

END