S5700EI NTP无法跟时钟服务器进行同步

发布时间:  2014-12-31 浏览次数:  2880 下载次数:  0
问题描述
当配置为ntp-service unicast-server 10.10.203.50完全无法进行同步;
修改为ntp-service unicast-peer 10.10.203.50后,依然不同步,该IP可以被ping通
告警信息
采集的相关NTP的debug信息:
Debug的信息:
Nov  3 2014 01:37:17.850.1-05:13 Core_Switch_1 NTP/7/NTP_DBG_Pkt4_Send:
FileId [ 084 ], LineNo [ 00170 ]-
packet to 10.10.203.50 (port: 123) from 10.122.86.14 (port: 123)
leap: 3, version: 3, mode: 1
stratum: 0, poll: 64, precision: 2^18
rdel: 0.000, rdsp: 5.722, refid: 0.0.0.0
reftime: 00:00:00.000 UTC Jan 1 1900(00000000.00000000)
orgtime: 00:00:00.000 UTC Jan 1 1900(00000000.00000000)
rectime: 00:00:00.000 UTC Jan 1 1900(00000000.00000000)
xmttime: 06:50:37.857 UTC Nov 3 2014(D801A43D.DBA3DB3B)
<Core_Switch_1>
Nov  3 2014 01:37:17.900.1-05:13 Core_Switch_1 NTP/7/NTP_DBG_Pkt4_Recv:
FileId [ 065 ], LineNo [ 00180 ]-
packet from 10.10.203.50 (port: 123) to 10.122.86.14 (port: 123) on Vlanif1
leap: 0, version: 3, mode: 4
stratum: 2, poll: 64, precision: 2^6
rdel: 0.000, rdsp: 10862.427, refid: 10.10.203.201
reftime: 10:31:11.984 UTC Nov 2 2014(D800866F.FC1D125F)
orgtime: 06:50:37.857 UTC Nov 3 2014(D801A43D.DBA3DB3B)
rectime: 06:50:37.872 UTC Nov 3 2014(D801A43D.DF710A2E)
xmttime: 06:50:37.872 UTC Nov 3 2014(D801A43D.DF710A2E)
inptime: 06:50:37.909 UTC Nov 3 2014(D801A43D.E8DCCA70)
<Core_Switch_1>
Nov  3 2014 01:38:22.860.2-05:13 Core_Switch_1 NTP/7/NTP_DBG_Event:
FileId [ 084 ], LineNo [ 02553 ]- Poll interval expired for 10.10.203.50
<Core_Switch_1>id
Nov  3 2014 01:38:22.870.1-05:13 Core_Switch_1 NTP/7/NTP_DBG_Event:
FileId [ 014 ], LineNo [ 00127 ]- Event: 0x83, EventNum: 0x0a, Peer: 10.10.203.50
NTP状态信息:
<Core_Switch_1>dis ntp-service status
clock status: unsynchronized
clock stratum: 16
reference clock ID: none
nominal frequency: 60.0002 Hz
actual frequency: 60.0002 Hz
clock precision: 2^18
clock offset: 0.0000 ms
root delay: 0.00 ms
root dispersion: 5.73 ms
peer dispersion: 0.00 ms
reference time: 00:00:00.000 UTC Jan 1 1900(00000000.00000000)
原因:
<Core_Switch_1>display ntp-service event clock-unsync
  1. Clock source   : 10.10.203.50
     Session type   : active, configured
     Unsync reason  : Peer reachability lost
     Unsync time    : 2014-11-03 00:50:54-05:13

  2. Clock source   : 10.10.203.50
     Session type   : active, configured
     Unsync reason  : Peer reachability lost
     Unsync time    : 2014-11-03 01:27:38-05:13
包统计:
<Core_Switch_1>display ntp-service statistics packet
NTP IPv4 Packet Statistical Information
---------------------------------------
Sent                                  : 63
    Send failures                      : 0
Received                              : 63             //服务器都回应了
    Processed                          : 16
    Dropped                            : 47            //但是大部分被丢弃了
       Validity test failures          : 0
          Authentication failures      : 0
       Invalid packets                 : 47
       Access denied                   : 0
       Rate-limited                    : 0
       Processing delay                : 0
       Interface disabled              : 0
       Max dynamic association reached : 0
       Server disabled                 : 0
       Others                          : 0
Last 10 packets drop reasons:
   [2014-11-03 01:29:46-05:13]Global drop: Received invalid packet.
   [2014-11-03 01:30:50-05:13]Global drop: Received invalid packet.
   [2014-11-03 01:31:54-05:13]Global drop: Received invalid packet.
   [2014-11-03 01:32:58-05:13]Global drop: Received invalid packet.
   [2014-11-03 01:34:04-05:13]Global drop: Received invalid packet.
   [2014-11-03 01:35:08-05:13]Global drop: Received invalid packet.
   [2014-11-03 01:36:14-05:13]Global drop: Received invalid packet.
   [2014-11-03 01:37:17-05:13]Global drop: Received invalid packet.
   [2014-11-03 01:38:22-05:13]Global drop: Received invalid packet.
   [2014-11-03 01:39:28-05:13]Global drop: Received invalid packet.
处理过程
从采集信息中发现:服务器是回复了所有请求数据包;但是大部分数据包都被当作不可用数据被丢弃了
从debug信息中发现:
查看debug信息,收到NTP报文的rdsp值为10862ms,
根据RFC1305规定, 最大的同步距离是1秒,距离的计算公式包含peer.rootdispersion的值,
所以一旦rdsp达到10862ms,最终的distance肯定超过了1秒这个阈值,所以本设备收到报文后认为报文非法,丢弃没有处理。

对端刀片服务器是使用的windows 2008系统,在刀片服务器做如下修改:(目前大部分用作始终服务器大多使用为windows server)
1.将HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\services\W32Time\Config\LocalClockDispersion 的值由10改为0,
2.在cmd窗口执行命令"w32tm /config /update"来重启时间服务,执行这个命令不会影响正常业务。

解决方案
由于在PC上并不会检查该项指标,所以可以成功同步。修改时钟服务器该指标发送情况后解决

END