S9703 ICMP应答延时大问题的处理

发布时间:  2016-09-22 浏览次数:  206 下载次数:  0
问题描述

1)问题涉及的设备及版本:


设备类型

版本

补丁

S9703

V200R007C00SPC500

V200R007C00SPH003

  2)网络拓朴:

   

   华为两台S9703采用VRRP部署,S9703下接两台友商加密机,加密机采用双机主备方式部署在网络中。主、备加密机同时启用Ping检测到S9703VRRP虚地址的连通性,如果2S内VRRP主交换机设备没有回应Ping请求或者回应超过2S时,加密机就会主备倒换、影响业务。

   3)故障描述:

    近期网管频繁出现大量告警、网络中断。每次有告警故障时,经过排查发现都是由于主加密机ping检测S9703的虚地址超时导致加密机主备倒换所引起的。


 

告警信息

处理过程

经过在加密机的上行接口与S9703的入接口做流镜像抓包发现:

1)   加密机上抓包显示收到的ICMP应答报文比较慢。


2)   在交换机上抓包,同样显示ICMP应答报文比较慢。


3)   查看交换log日志信息,在PING应答延时大的时间点没有异常记录。查看cpu-defend统计,历史记录有大量报文上送cpu

<ZJ_HUAWEI_9703_1>dis cpu-d  stat all

Warning: This feature is not supported on slot mainboard.

 Statistics on slot 1:

--------------------------------------------------------------------------------

Packet Type          Pass(Packet/Byte)   Drop(Packet/Byte)  Last-dropping-time 

--------------------------------------------------------------------------------

arp-miss                       7343836              960106  2016-09-14 04:10:06

                             891503214           325494473

arp-reply                      4876208               23516  2016-08-29 18:20:02

                             334333392             1599104

arp-request                  232496454             4233354  2016-04-09 11:21:47

                           15804222740           287865652

bgp                            3181563                 440  2016-08-03 00:51:46

                             371346685              260423

fib-hit                         161734               72703  2016-09-10 23:40:05

                              29547653           107473804

ftp                               2102                   0  -                  

                                138596                   0

gre-keepalive                        0                   0  -                  

                                     0                   0

http                            274865                1212  2016-07-14 02:51:49

                              31326864              137559

https                            34238                   0  -                  

                               3527913                   0

hw-tacacs                      2716845                   0  -                  

                             212097479                   0

icmp                         287418486                  53  2016-08-23 17:30:00

                           22470675492               44572

isis                          45789391               40256  2015-09-30 06:21:42

                           26074346660            60818641

lnp                            2043171                   0  -                  

                             138935628                   0

mpls-fib-hit                  32739182                   0  -                  

                            3874680204                   0

mpls-ldp                      11232207                   0  -                  

                             856159527                   0

ntp                             115109                   0  -                  

                              11281711                   0

portal                               3                   0  -                  

                                  1379                   0

                                  1841                   0

snmp                          95751077                  53  2015-11-27 17:11:42

                           15349243345                5837

ssh                            1203605                   0  -                  

                             186181287                   0

tcp                             731760                1196  2016-08-09 00:00:00

                              51531196              206724

telnet                        13985488                4188  2016-09-06 16:00:04

                             905218755              281034

ttl-expired                   33014539              389563  2016-09-14 10:30:06

                            3707542127           415230981

vbst                         633733009                   0  -                  

                           43545519488                   0

vrrp                          79982015               12425  2016-08-12 16:40:00

                            5168775986              795200

wapi                                 0                   0  -                  

                                     0                   0

--------------------------------------------------------------------------------

 Statistics on slot 2:

--------------------------------------------------------------------------------

Packet Type          Pass(Packet/Byte)   Drop(Packet/Byte)  Last-dropping-time 

--------------------------------------------------------------------------------

arp-miss                        880393                 642  2016-06-26 22:51:49

                              74735581              194991

arp-reply                        53388                   0  -                  

                               3416832                   0

arp-request                    1543151                   0  -                  

                              98761664                   0

bgp                                179                   0  -                  

                                 11492                   0

fib-hit                          54886                  51  2016-06-02 15:41:49

                               7495294               23376

ftp                               1161                   0  -                  

                                 78740                   0

http                             13253                   0  -                   

                                888195                   0

https                             2870                   0  -                  

                                190520                   0

hw-tacacs                            0                   0  -                  

                                     0                   0

icmp                          33472905                  11  2015-12-21 06:31:42

                            3247336545               13846

ntp                               2380                   0  -                  

                                278338                   0

snmp                               931                   0  -                  

                                100534                   0

ssh                              11396                   0  -                   

                                770188                   0

tcp                             256108                   0  -                  

                              17004307                   0

telnet                          128531                   0  -                  

                              10159550                   0

ttl-expired                       1689                   0  -                  

                                111392                   0

4)   根据以上信息判断:出现PING回应延时大时,交换机没有丢包,根据cpu-defend统计记录有各种协议报文上送CPU,分析认为在延时大的时候上送CPU协议报文比较多,而交换机对ICMP报文处理的优先级较低,由于优先级低的协议报文得不到优先调度导致ICMP应答慢。

根因

经定位分析在大量的协议报文上送CPU处理的情况下,交换机对ICMP报文处理的优先级较低,由于CPU调度机制,ICMP报文得不到优先调度导致ICMP应答慢,延时抖动较大,导致加密机ping检测失败,引起主备倒换。

解决方案

 交换机使能ICMP 快回功能(不会影响现网业务),同时加密机调整探测失败时间,可以尝试调整为15S观察下;

 S9703使用ICMP快回功能命令:icmp-reply fast

建议与总结

END