某局点S9306交换机在链路割接后,出现CPU高,同时下挂组播业务受到影响。
Oct 28 2015 05:13:13+08:00 klyg_core_S_s9306-1 %%01VOSCPU/4/CPU_USAGE_HIGH(l)[5980137]:The CPU is overloaded(CpuUsage=100%, Threshold=80%), and the tasks with top three CPU occupancy are:
Aug 28 2015 05:14:12+08:00klyg_core_S_s93061 %%01MCAST/6/SUPPRESS_LEAVE(l)[842071]:Suppress leave packet. (VlanID=207, GroupIp=237.0.80.1, ReceiveInterface=Eth-Trunk22)
1、割接操作执行reset multicast routing-table all命令后,组播业务长时间不能恢复正常,发现Cpu利用率过高告警。
2、查看相关组播表项,发现存在相关组播表项:
<klyg_core_S_s9306-1>dis igmp group verbose
The port information of Group 239.1.32.90 on VLAN 70:
Time of this group has been up : 09:23:19
The port information of (0.0.0.0, 239.1.32.90):
Time of this source has been up : 09:23:19
3、查看Cpu利用率情况,发现设备CPU高的任务为VPR、bcmRX、IPCR,这三个任务是设备关于收包和下发表项消息的任务。
dis cpu-usage
VPR total : 22%
bcmRX total : 14%
IPCR total : 9%
4、查看日志发现在操作结束后存在大量抑制组播leave报文日志。
Aug 28 2015 05:14:12+08:00klyg_core_S_s93061 %%01MCAST/6/SUPPRESS_LEAVE(l)[842071]:Suppress leave packet. (VlanID=207, GroupIp=237.0.80.1, ReceiveInterface=Eth-Trunk22)
<klyg_core_S_s9306-1>display igmp control-message counters int vlan 207
Vlanif207(10.1.207.254):
Message Type Sent Valid Invalid Ignore
------------------------------------------------------------------
General Query 121657 0 0 0
Group Query 28816321 0 0 0
Source Group Query 0 0 0 0
------------------------------------------------------------------
IGMPV1V2
Report ASM 0 325886320 6063423 0
Report SSM 0 0 0 2
------------------------------------------------------------------
LEAVE ASM 0 27620055 15 54781 //收到大量离开报文。
LEAVE SSM 0 0 0 0
------------------------------------------------------------------
6、配置防攻击策略,关闭组播业务相关物理接口,再逐个延迟开启。
a.配置防攻击策略:
acl number 3300
rule 5 permit igmp destination 224.0.0.2 0 ===>匹配igmp leave报文
#
cpu-defend policy igmp-deny
blacklist 1 acl 3300
#
cpu-defend-policy igmp-deny global
b.逐个开启物理端口
7、Cpu占用率回落,业务恢复。
由于现网组播业务规模较大,同时下挂了大量的录制服务器,清除全部组播表项后,断流的录制服务器会周期性频繁发出leave报文,按照IGMP协议的正常交互流程,此时会触发交换机频繁发送Query报文和收到Report报文的连锁反应,此时交换机需要频繁添加删除表项,触发IGMP协议频繁交互,导致交换机收到大量协议报文,引发设备CPU过高设备无法及时IGMP处理报文,导致无法正常建立IGMP表项。
通过阻断Leave报文防止设备Cpu过高,再通过逐个开启物理接口降低突发数:
a.配置防攻击策略:
acl number 3300
rule 5 permit igmp destination 224.0.0.2 0 ===>匹配igmp leave报文
#
cpu-defend policy igmp-deny
blacklist 1 acl 3300
#
cpu-defend-policy igmp-deny global
b.逐个开启物理端口
int gx/x/x
undo shutdown
END