FusionCompute 上虚拟机做仲裁服务器,5300V3存储双活,测试时虚拟机发生迁移,FC上无报错

发布时间:  2017-01-22 浏览次数:  492 下载次数:  0
问题描述

FusionCompute 上虚拟机做仲裁服务器,两套5300V3存储双活,双活功能测试时其中一个节点上的虚拟机发生迁移,FC上无报错

客户多次做测试均为同样的现象。

告警信息




处理过程

收集节点和存储侧日志分析,分析得出:

Dec 28 11:00:02 CNA-01 multipathd: 8:64: mark as failed

Dec 28 11:00:02 CNA-01 multipathd: 369ce37410077ca7f000ada9400000000: remaining active paths: 1

Dec 28 11:00:02 CNA-01 kernel: [12702178.160109] sd 13:0:3:1: rejecting I/O to offline device

Dec 28 11:00:02 CNA-01 kernel: [12702178.160114] sd 13:0:3:1: [sde] killing request

Dec 28 11:00:02 CNA-01 kernel: [12702178.160119] sd 13:0:3:1: rejecting I/O to offline device

Dec 28 11:00:02 CNA-01 kernel: [12702178.160126] sd 13:0:3:1: rejecting I/O to offline device

Dec 28 11:00:02 CNA-01 kernel: [12702178.160138] sd 13:0:3:1: [sde] 

Dec 28 11:00:02 CNA-01 kernel: [12702178.160141] device-mapper: multipath: Failing path 8:64.

Dec 28 11:00:02 CNA-01 kernel: [12702178.160144] Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

Dec 28 11:00:02 CNA-01 kernel: [12702178.160146] sd 13:0:3:1: [sde] CDB: Write(10): 2a 00 05 41 64 88 00 00 08 00

Dec 28 11:00:02 CNA-01 kernel: [12702178.160153] end_request: I/O error, dev sde, sector 88171656

Dec 28 11:00:02 CNA-01 kernel: [12702178.168102] sd 13:0:0:1: rejecting I/O to offline device

Dec 28 11:00:02 CNA-01 kernel: [12702178.168105] sd 13:0:0:1: [sdd] killing request

Dec 28 11:00:02 CNA-01 kernel: [12702178.168109] sd 13:0:0:1: rejecting I/O to offline device

Dec 28 11:00:02 CNA-01 kernel: [12702178.168110] sd 13:0:0:1: [sdd] killing request

Dec 28 11:00:02 CNA-01 kernel: [12702178.168113] sd 13:0:0:1: rejecting I/O to offline device

Dec 28 11:00:02 CNA-01 kernel: [12702178.168122] sd 13:0:0:1: [sdd] 

Dec 28 11:00:02 CNA-01 kernel: [12702178.168125] sd 13:0:0:1: [sdd]  Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

Dec 28 11:00:02 CNA-01 kernel: [12702178.168129] sd 13:0:0:1: [sdd] CDB: Write(10)Result: hostbyte=DID_NO_CONNECT driverbyte=DRIVER_OK

Dec 28 11:00:02 CNA-01 kernel: [12702178.168133] sd 13:0:0:1: [sdd] CDB: Read(10):: 2a 28 00 00 c3 00 00 ac 08 31 80 8a 00 00 00 00 06 03 00 00

Dec 28 11:00:02 CNA-01 kernel: [12702178.168149]

Dec 28 11:00:02 CNA-01 kernel: [12702178.168151] end_request: I/O error, dev sdd, sector 12672

Dec 28 11:00:02 CNA-01 kernel: [12702178.168153] end_request: I/O error, dev sdd, sector 3282831498

Dec 28 11:00:02 CNA-01 kernel: [12702178.168157] device-mapper: multipath: Failing path 8:48.

Dec 28 11:00:03 CNA-01 multipathd: checker failed path 8:48 in map 369ce37410077ca7f000ada9400000000

Dec 28 11:00:03 CNA-01 multipathd: 369ce37410077ca7f000ada9400000000: Entering recovery mode: max_retries=18

Dec 28 11:00:03 CNA-01 multipathd: 369ce37410077ca7f000ada9400000000: remaining active paths: 0

Dec 28 11:00:03 CNA-01 multipathd: 369ce37410077ca7f000ada9400000000: Entering recovery mode: max_retries=18

Dec 28 11:00:13 CNA-01 ft-ctl: [INFO]:send HEARTBEART_MESSGE again count = 2538540

Dec 28 11:00:14 CNA-01 tapmanager: [INFO][pthread_daemon_undead:436]: send HEARTBEART_MESSGE again count = 2538540

Dec 28 11:00:27 CNA-01 kernel: [12702203.100057]  rport-1:0-2: blocked FC remote port time out: removing target and saving binding

Dec 28 11:00:27 CNA-01 kernel: [12702203.100126]  rport-1:0-6: blocked FC remote port time out: removing target and saving binding


根因

分析日志得出结论为:在客户做测试的时候,下电主站点存储后,CNA1主机访问不了存储导致,CNA1上虚拟机发生迁移,是由于安装fusioncompute的时候选择的通用多路径,不支持双活功能造成的

解决方案

改为华为专用多路径后问题解决,由于客户需求需要对接友商存储才改成通用多路径

建议与总结

已告知客户华为多路径不能兼容其他存储,但是V3存储异构接管的功能可以实现

END