本站点使用Cookies,继续浏览表示您同意我们使用Cookies。Cookies和隐私政策>
发布时间: 2020-09-29 | 浏览次数: 2267 | 下载次数: 0 | 作者: qWX927369 | 文档编号: EKB1100056929
硬件配置:CH121 V5+3004+M.2安装OS
问题现象:现网多个客户反馈esxi系统莫名失去响应,重启可以恢复:
Customer1:
Issue description:
ESXI host was not available from vCenter.
Host1: 210305769910J7000271 ,happen time as follow:
Customer2:
Issue description:
ESXI host was not available from vCenter.
Host faces issue: "boot bank cannot be found at path '/bootbank, hostd detected to be unresponsive."
Customer contact to VMware, VMware respond below:
"There are such events in log:
2018-10-11T05:44:07.606Z cpu40:66691)NMP:nmp_ThrottleLogForDevice:3576: last error status from devicenaa.6a4be2babed7a00023166c0367fc1d04 repeated 20480 times ...
2018-10-11T05:44:24.766Z cpu25:66081)lsi_mr3:mfi_TaskMgmt:693: ABORT request for SN 5402802 Wld 680232018-10-11T05:44:24.767Z cpu42:65675)ScsiDeviceIO: 2968: Cmd(0x439e0aa68340)0x28, CmdSN 0x5270b2 from world 68023 to dev "naa.6a4be2babed7a00023166c0367fc1d04"failed H:0x5 D:0x0 P:0x0 Invalid sense data: 0x0 0x0 0x0.
2018-10-11T05:44:24.767Z cpu42:65675)ScsiDeviceIO: 2968:Cmd(0x439e017753c0) 0x28, CmdSN 0x5270b0 from world 68023 to dev"naa.6a4be2babed7a00023166c0367fc1d04" failed H:0x3 D:0x0 P:0x0 Invalidsense data: 0x80 0x41 0x0.
2018-10-11T05:44:24.767Z cpu42:65675)ScsiDeviceIO: 2968:Cmd(0x439e016e9e40) 0x28, CmdSN 0x5270b1 from world 68023 to dev"naa.6a4be2babed7a00023166c0367fc1d04" failed H:0x5 D:0x0 P:0x0Invalid sense data: 0x7f 0x41 0x0.
This events related to local disk:
Local AVAGO Disk(naa.6a4be2babed7a00023166c0367fc1d04) VMW_SATP_LOCAL VMW_PSP_RR
SCSI codes for these events:
Host Status [0x5] ABORT Thisstatus is returned if the driver has to abort commands in-flight to the target.This can occur due to a command timeout or parity error in the frame.
Host Status [0x3] TIME_OUT This status is returned when the command in-flight to the arraytimes out.
Additional Sense Data 41/00 DATA PATH FAILURE (SHOULD USE 40 NN)
That may indicate a issue with the controller or disk group. Please contact to your vendor for further analysis."
整机1 SN:XXXXX,在3-14 22:00之后心跳丢失
从kernel日志来看,22:33开始启动。
pxa的日志在21:35查询当前主机状态的时候也是出错的
Kernel日志无法写入
针对kernel中记录的IO error,从日志分析侧分析未发现硬盘以及RAID卡报错。
整机2:XXXXX
心跳停止,重启的记录
整机2也有hostd 服务停止的记录,但是这台并没有 IO error的记录。
针对3004+vmware系统,出现本地存储无响应,无法写入日志信息,并且上层vmware系统无响应的问题,参考范例先从RAID卡的配套驱动和FW来解决。