USG9520掉电重启后双机频繁切换

发布时间:  2016-12-21 浏览次数:  99 下载次数:  0
问题描述

组网如下所示:




版本信息

USG9520:V300R001C20SPC200


问题描述:USG9520与NE40E和ME60之间运行ospf,在教育局机房断电后UPS没有产生作用,机房来电后,USG9520变开始出现频繁主备切换。

告警信息

在设备上查看hrp主备切换状态能看到正在频繁的切换,


HRP_S<DY-JYJ-USG9520_1>display hrp his
HRP_S<DY-JYJ-USG9520_1>display hrp history-information
18:58:14  2016/11/15
 2016-11-15 18:56:52 HRP core state changed, old_state = normal, new_state = abnormal(slave), local_priority = 47000, peer_priority = 47002.
 2016-11-15 18:55:33 HRP core state changed, old_state = abnormal(slave), new_state = normal, local_priority = 47002, peer_priority = 47002.
 2016-11-15 18:52:32 HRP core state changed, old_state = normal, new_state = abnormal(slave), local_priority = 47000, peer_priority = 47002.
 2016-11-15 18:51:58 HRP core state changed, old_state = abnormal(slave), new_state = normal, local_priority = 47002, peer_priority = 47002.
 2016-11-15 18:48:57 HRP core state changed, old_state = normal, new_state = abnormal(slave), local_priority = 47000, peer_priority = 47002.
 2016-11-15 18:47:55 HRP core state changed, old_state = abnormal(slave), new_state = normal, local_priority = 47002, peer_priority = 47002.
 2016-11-15 18:44:55 HRP core state changed, old_state = normal, new_state = abnormal(slave), local_priority = 47000, peer_priority = 47002.
 2016-11-15 18:42:03 HRP core state changed, old_state = abnormal(slave), new_state = normal, local_priority = 47002, peer_priority = 47002.
 2016-11-15 18:39:02 HRP core state changed, old_state = normal, new_state = abnormal(slave), local_priority = 47000, peer_priority = 47002.
 2016-11-15 18:36:33 HRP core state changed, old_state = abnormal(slave), new_state = normal, local_priority = 47002, peer_priority = 47002.
 2016-11-15 18:33:32 HRP core state changed, old_state = normal, new_state = abnormal(slave), local_priority = 47000, peer_priority = 47002.
 2016-11-15 18:32:15 HRP core state changed, old_state = abnormal(slave), new_state = normal, local_priority = 47002, peer_priority = 47002.
 2016-11-15 18:29:14 HRP core state changed, old_state = normal, new_state = abnormal(slave), local_priority = 47000, peer_priority = 47002.
 2016-11-15 18:27:44 HRP core state changed, old_state = abnormal(slave), new_state = normal, local_priority = 47002, peer_priority = 47002.
 2016-11-15 18:24:42 HRP core state changed, old_state = normal, new_state = abnormal(slave), local_priority = 47000, peer_priority = 47002.
 2016-11-15 18:21:57 HRP core state changed, old_state = abnormal(slave), new_state = normal, local_priority = 47002, peer_priority = 47002.
 2016-11-15 18:18:55 HRP core state changed, old_state = normal, new_state = abnormal(slave), local_priority = 47000, peer_priority = 47002.
 2016-11-15 18:18:06 HRP core state changed, old_state = abnormal(slave), new_state = normal, local_priority = 47002, peer_priority = 47002.
 2016-11-15 18:15:04 HRP core state changed, old_state = normal, new_state = abnormal(slave), local_priority = 47000, peer_priority = 47002.

处理过程

1.       分析防火墙日志发现,ip-link
1
不停Updown,而ip-link
1
是双机监控对象;因此Ip-link1不停up/down会引起双机优先级变化,从而导致主备状态频繁切换。

2,在主备状态正常时,从防火墙ping 2/2/0口的下一跳地址(NE)测试发现有丢包,通过trace示踪发现有部分报文没有上送到主控板,从而造成丢包

3,查看防火墙丢包统计发现,带宽限制丢包非常多。

4,检查配置发现配置了car-policy,但是配置了应用;进一步检查发现主机没有加载sa库,该版本在sa库未加载状态下会忽略应用匹配流量,从而会对所有流量做限速,从而导致防火墙丢包


5,重新加载sa库,再次测试不存在问题。加载sa库依赖于cf卡文件sa_lu_last_type_file.txt,该文件可能在下电时损坏从而导致sa库加载失败











根因
可能由于下电等原因导致文件sa_lu_last_type_file.txt损坏,sa库加载失败,car-policy匹配错误,从而导致丢包
解决方案
重新加载sa

END