Customer creates multiple LUN clones for different LUN in 5800 V3. In the process of Clone LUN operation, one clone LUN is reported Warning：Switchover failure occurred on the LUN in the storage pool.
Alarm information as follows:
Why does separately switch owner controller of slave LUN?
According to logs, before report Warning “Lun Switchover failure”, there are alarms about the disconnection between host initiator and storage front port.
According to initiator WWN, confirm that the host disconnected is BackUp_DMS_Host：
According to running logs, confirm that slave LUN 314 is mapped to host BackUp_DMS_Host：
To check multipath logs, it confirms that all the four physical links were ever interrupted between the host BackUp_DMS_Host and the storage：
These four physical links recover now：
And physical link 1 which connects target 260b8038bc1df01d (link 1 connects to storage controller A, port p3) resumed earlier than physical link 2 which connects target 26198038bc1df01d (link 2 connects to storage controller B, port p1). Time interval is more than 5s.
When host multipath find one link recover, and no other path is reported in more than 5s, it will select the recovered link as the optimal path to send command about switching LUN controller. When the link 1 recovered, the link 2 is not reported path recovery in more than 5s. So multipath sent command to switch owner controller of slave LUN 314 to controller A , link 1：
This results that slave LUN 314 can’t switch owner controller and reports alarm as the relationship of clone. So far, there was a disconnect link between BackUp_DMS_Host and storage. To check the cable connection between host and storage, it’s found that the fiber optic cable is loose. So do the operation to reinforce the cable.
Warning“Switchover failure occurred on the LUN in the storage pool”means failure of switching owner controller about special LUN(ID：314). To check base information of LUN 314:
The owner controller of LUN 314 is controller 0B. At the same time, the work controller is also 0B. While LUN 314 is a clone LUN, its main LUN is LUN 214：
When LUN was cloned, the owner controller of master LUN and slave LUN must keep consistent. To check master LUN 214:
The owner controller of clone master LUN 214 is controller 0B. At the same time, the work controller is also 0B. Then clone slave LUN 314 keep consistent with clone master LUN. If LUN clone want to switch owner controller belonging relations, main and slave LUN must switch ownership at the same time, in order to keep main LUN’s owner controller consistent with slave LUN. If switching owner controller of slave LUN only, but not for main LUN, the management system will automatically terminate the switching operation. It causes reporting Warning“Switchover failure occurred on the LUN in the storage pool”
As Handling Process, there was a disconnect link between BackUp_DMS_Host and storage. To check the cable connection between host and storage, it’s found that the fiber optic cable is loose. So do the operation to reinforce the cable.
This problem is derived from a warning to the other faults. Sometimes a simple alarm only indicates the surface, the real cause is hidden. Fully understand the significance of the fault alarm and know the trigger factor, then check whether there are something abnormal in the time. To comb the fault relationship, it helps us to find the real cause.