1. Since the customer have used MicroSoft cluster verification tool to test the network and no error reported for networking configuration, we need to check the SCSI-3 persistent reservation first. The failover will fail in case of new node can't get reservation.
As below, we can login storage CLI command line and change mode to diagnose, then execute command "scsi show reservation lun [ -l LUN ID]" to inquiry the LUN reservation. Please check SCSI reservation state and InitiatorWWN, then we found the reservation have changed from old host node to the new one.
2. To exclude the multipath software, we tried to install and uninstall Huawei Ultrapath software, but not work.
3. We use hostinfo_tool of Ultrapath to colloct all Windows host logs. We checked Windows system event in systemeventlog\System.evtx and found alarm as below:
We searched the resolve suggestion on MicroSoft technet and tried, but not work either.
4. We collect storage system logs to analyze if something abnormal on storage. Then we found both of the controllers had a lot of ping timeout on iSCSI link(search "[ERR][Ping") as below:
This means two of the iSCSI links have some problom.
We try to ping storage iSCSI service IP from hosts, all passed.
5. We found the customer change MTU of storage port from default 1500 to 9000. So, we ask the customer to check if configured a wrong MTU value on host or switch.
As the result from customer, All switch ports are set as 9216(maximum). But one of the host is set as 1500, the other one is set as 9000. After the customer change it from 1500 to 9000. The failover issue was resolved.
6. But, the customer still found the storage is very slow. For example, it takes about minutes to scan disk or failover disk on MSFC. Then we checked host configuration through remote session and found the customer enabled "Jumbo Packet" and set MTU as 9000. Finally, we change it to 9014 and the problem resolved.