Vmware datastores on OceanStor 5500 V3 became failed and can't be mounted

Publication Date:  2016-03-01 Views:  602 Downloads:  0
Issue Description

Customer reported a problem when he started massive migration of virtual machines from one datastore to some others in Vmware 5.5 after several hours some target datastores gone failed state and could not be mounted. Source and target datastores were located on the same OceanStor 5500 V3 system, but in different pools, source in SAS tier pool, target in NL-SAS tier pool.

The same way LUN devices of failed datastores were visible in Vmware and all paths were active.



Alarm Information
No Alarm Information present on storage device during this issue take place.
Handling Process

During analysis of issue several log messages helped to make right conclusion.

1. Back bad records presence in storage log


2. Path flapping records in Vmware logs


Root Cause

The problem is in OceanStor 5500 V3 software version compatibility problem with 4K sector disks, which is described in TT DTS2015091001112.

Trouble Ticket Number

DTS2015091001112

Symptom

When random small read and write I/Os are sent to a 4K-sector disk, services are occasionally interrupted when the stripe depth of the storage pool is greater than or equal to 128 KB.

Severity

Suggestion.

Root Cause

If the stripe depth of the storage pool is greater than or equal to 128 KB, and the upper-layer services are random small I/Os, the upper-layer module integrates the size of I/Os sent to the 4K-sector disk into the I/Os whose size is no more than that of the stripe depth of the storage pool. When the number of memory pages of an integrated I/O occasionally exceeds the maximum of an I/O on the lower-layer module, the lower-layer module splits the integrated I/Os. The size of the split child I/Os may not be a multiple of the size of I/Os accepted by the 4K-sector disk; therefore, read and write I/O fail, resulting in occasional service interruption.

Solution

When the lower-layer module splits the I/Os that are sent to the 4K-sector disk, the module checks the size of each child I/O when it is split. If the size of the split I/Os is not a multiple of that of the I/Os accepted by the 4K-sector disk, the lower-layer module adjusts the size of the child I/Os to be a multiple of the I/Os accepted by the 4K-sector disk, and continues splitting the remaining I/Os.

Impact

There is no impact on the system.

Customer target datastores were located in NL-SAS pool based on large 4K sector disks. Described compatibility problem led to service disruption for some datastore LUNs and impossibility to mount datastores.

Solution

First, check which LUNs are placed on 4K disks and affected by problem:

In Running data log, search Disk Sectorif 4160, it’s 4K-sector disk.


Then apply following plan to resolve the problem:

1. install patch V300R001C20SPH001

2. Recovery fault page:

Force to flash all LUNs. Developer:/> change lun_fault_page recover lun_id_list=<4K LUNs list>


 

Run this command for every 4K disk LUNs.

3. Use Oceanstor toolkit to make inspection.

4. Migrate VM and check if problem is resolved.

END