下行交换机堆叠问题导致ME60bfd抖动

发布时间:  2016-11-14 浏览次数:  187 下载次数:  0
问题描述

如图,两台ME60子接口通过下行的两台堆叠S5720交换机建立BFD session,配置如下

主ME60

interface GigabitEthernet1/1/0.1
 vlan-type dot1q 1001
 ip address 1.1.0.2 255.255.255.248      
 vrrp vrid 1 virtual-ip 1.1.0.1
 admin-vrrp vrid 1
 vrrp vrid 1 priority 120
 vrrp vrid 1 preempt-mode timer delay 1200
 vrrp vrid 1 track interface Eth-Trunk1 reduced 50
 vrrp vrid 1 track bfd-session 1001 peer

#

bfd rbp1 bind peer-ip 1.1.0.3 source-ip 1.1.0.2
 discriminator local 1001
 discriminator remote 2001
 commit

备ME60

interface GigabitEthernet1/1/0.1
 vlan-type dot1q 1001
 ip address 1.1.0.3 255.255.255.248
 vrrp vrid 1 virtual-ip 1.1.0.1
 admin-vrrp vrid 1
 vrrp vrid 1 track bfd-session 2001 peer

#

bfd rbp1 bind peer-ip 1.1.0.2 source-ip 1.1.0.3
 discriminator local 2001
 discriminator remote 1001
 commit

 

告警信息

通过display logbuff发现设备有大量的bfd session up and down的记录

Nov  8 2016 13:39:59+08:00 dgd21amghw05 %%01BFD/4/STACHG_TODWN(l)[190]:Slot=1;BFD session changed to Down. (SlotNumber=1, Discriminator=1001, Diagnostic=NeighborDown, Applications=VRRP, ProcessPST=False, BindInterfaceName=None, InterfacePhysicalState=None, InterfaceProtocolState=None)
Nov  8 2016 13:39:59+08:00 dgd21amghw05 %%01RBP/5/VRRP_NOTIFY(l)[191]:The VRRP status changed. (VrId=1, If=GigabitEthernet1/1/0.1, OldStatus=MASTER, NewStatus=PEERBFDDOWN, ChangeReason=PEERBFDDOWN)
Nov  8 2016 13:11:29+08:00 dgd21amghw05 %%01BFD/4/STACHG_TOUP(l)[192]:Slot=1;BFD session changed to Up. (SlotNumber=1, Discriminator=1001, FormerStatus=Init, Applications=VRRP, BindInterfaceName=None, ProcessPST=False)
Nov  8 2016 13:11:29+08:00 dgd21amghw05 %%01BFD/4/STACHG_TODWN(l)[193]:Slot=1;BFD session changed to Down. (SlotNumber=1, Discriminator=1001, Diagnostic=DetectDown, Applications=VRRP, ProcessPST=False, BindInterfaceName=None, InterfacePhysicalState=None, InterfaceProtocolState=None)
Nov  8 2016 13:11:29+08:00 dgd21amghw05 %%01RBP/5/VRRP_NOTIFY(l)[194]:The VRRP status changed. (VrId=1, If=GigabitEthernet1/1/0.1, OldStatus=MASTER, NewStatus=PEERBFDDOWN, ChangeReason=PEERBFDDOWN)

处理过程

1、检查bfd的session状态,发送接收间隔为10ms,发送次数为3,确认ME60两边保持一致。

<dgd21amghw05>dis bfd session peer-ip 1.1.0.3 verbose
--------------------------------------------------------------------------------
Session MIndex : 16391     (Multi Hop) State : Up        Name : rbp1          
--------------------------------------------------------------------------------
  Local Discriminator    : 1001             Remote Discriminator   : 2001     
  Session Detect Mode    : Asynchronous Mode Without Echo Function            
  BFD Bind Type          : Peer IP Address                                    
  Bind Session Type      : Static                                             
  Bind Peer IP Address   : 1.1.0.3                                            
  Bind Interface         : -                                                  
  Track Interface        : -                                                  
  Bind Source IP Address : 1.1.0.2                                            
  FSM Board Id           : 2                TOS-EXP                : 7        
  Min Tx Interval (ms)   : 10               Min Rx Interval (ms)   : 10       
  Actual Tx Interval (ms): 10               Actual Rx Interval (ms): 10       
  Local Detect Multi     : 3                Detect Interval (ms)   : 30       
  Echo Passive           : Disable          Acl Number             : -        
  Destination Port       : 3784             TTL                    : 254      
  Proc Interface Status  : Disable          Process PST            : Disable  
  WTR Interval (ms)      : -                                                  
  Active Multi           : 3                DSCP                   : -        
  Last Local Diagnostic  : No Diagnostic                                      
  Bind Application       : VRRP
  Session TX TmrID       : -                Session Detect TmrID   : -        
  Session Init TmrID     : -                Session WTR TmrID      : -        
  Session Echo Tx TmrID  : -                                                  
  PDT Index              : FSM-4000004 | RCV-0 | IF-0 | TOKEN-0               
  Session Description    : -                                                  
--------------------------------------------------------------------------------

2、根据日志分析,Diagnostic=DetectDown在30ms内未收到对方的回复,判定bfd session down

Nov  8 2016 12:50:28+08:00 dgd21amghw05 %%01BFD/4/STACHG_TODWN(l)[196]:Slot=1;BFD session changed to Down. (SlotNumber=1, Discriminator=1001, Diagnostic=DetectDown, Applications=VRRP, ProcessPST=False, BindInterfaceName=None, InterfacePhysicalState=None, InterfaceProtocolState=None)

3、长ping bfd session建立的源目的IP地址,未发现丢包情况。

4、检查各设备的接口是否有error产生,通过display interface brief观察到下游交换机堆叠口XG0/0/1有大量inerrors产生,并且保持缓慢稳定的增长

5、该inerrors增长速度较为缓慢约为2个每s,bfd session建立数据包会经过该条堆叠口,有一定的几率丢弃数据包,导致bfd session检测失败。参考案例KB1000359435解决堆叠口inerrors问题。

根因
下行堆叠的5720交换机堆叠口持续产生inerrors,导致建立bfd session的数据包概率性丢失,从而发生bfd session不定时up and down
解决方案
造成bfd session抖动的很大原因是中间链路的质量,当发生bfd session抖动时检查源目的地址ping包质量,检查每个接口的inerrors和outerrors。此案例中也要观察堆叠交换机的堆叠线路的质量。

END