OceanStor 5800 V3 During the process of using LUN, the storage reports the alarm “The link between the initiator and the host port disconnected”

Publication Date:  2015-12-15 Views:  257 Downloads:  2
Issue Description

Customer does re-planning of the business in 5800 V3 storage, which creates or deletes LUNs many times. During the process the storage reports alarm:“The link between the initiator of the host and the host port disconnected”. But there is no abnormal in the physical link between storage and host.

Alarm Information

The alarm as follows:

Handling Process

This alarm means that logical link disconnects between host initiator and storage. Host and storage are connected through FC Channel. Once the logical link occur problem, the front end of FC driver will discover logical link disconnection. The storage system reports these similar exceptions to print:

But there is no these abnormal prints in this case, which means that the storage doesnt discover the logical link disconnection between the host and the storage port. At the same time, check the physical link between the host and the storage, there is also not any problems.

The host has installed Huawei UltraPath, check UltraPath log:

Multiple paths are indeed detected in the path of the exception of the print.

Check the storage event record in time point of the alarm. It can be found that the user does operation of deleting and adding LUN mapping before the alarm reported:

Root Cause

Due to UltraPath cant distinguish the difference of two scenarios about deleting LUN mapping and logical link disconnection. So after LUN mapping is removed, UltraPath log records the link disconnection of LUN deleted. When re-add LUN mapping, UltraPath takes 10s as a cycle to detect whether logical link recovery. Once one link recovery, UltraPath will send the alarm of link disconnection recorded to the storage side through this logic link recovered, then storage reports the alarm: “The link between the initiator of the host and the host port disconnected”.

Solution
Now UltraPath is temporarily unable to distinguish the difference between deleting LUN mapping and logical link disconnection. With full understanding of fault alarm, check whether there are other abnormal operation in time point of alarm and sort out the relationship. If the alarm is caused by deleting LUN mapping, we can explain and remind customer. Or if the alarm is caused by logical link disconnection, we need to check whether there is any problem in the phyiscal link.
Suggestions

This kind of problem is that the storage software alarm is not clear, which not effectively identify the cause of the problem. Physical link fault alarm can intuitively identify physical link problems. About logical link alarm, it need to be understood that the alarm is created by UltraPath and sent to the storage. Now UltraPath is temporarily unable to distinguish the difference between deleting LUN mapping and logical link disconnection. With full understanding of fault alarm, check whether there are other abnormal operation in time point of alarm and sort out the relationship. It will help us to find the real cause of the alarm, troubleshoot potential risks

END