In a site (the attachment is the networking image), S2600 storage device uses single control, the version of controller software is 1.04.01.205.T01, SES version is S021, the operating system in server is AIX6100-01 and use HACMP cluster software, the version is 22.214.171.124. There are two nodes in controller A mapping two private LUN, and 2 hosts share 2 LUN as cluster storage. When the private LUN has been enabled, startup the cluster, single node fail.
The main node can startup normally but the other one fails.
We find these phenomena by analyzing the storage log:
We find the dispensing of reservation command in log:
Oct 14 23:25:56 AK-I kernel: Reserve (6) command for Host LUN 0, Device Lun 8 @ [jif=372919227] SCSI_PrintDebugInfo : 1382
And we also find the command to clear reservation:
Oct 14 23:25:56 AK-I kernel: SCSI_ClearReserveExec
Oct 14 23:25:56 AK-I kernel:  @ [jif=372919957] SCSI_ClearReserveExec : 2200
Oct 14 23:25:56 AK-I kernel: This is the master controller
Oct 14 23:25:56 AK-I kernel:  @ [jif=372919957] SCSI_ClearReserveExec : 2207
Oct 14 23:25:56 AK-I kernel: Enter SCSI_ClearReserve
Oct 14 23:25:56 AK-I kernel:  @ [jif=372919957] SCSI_ClearReserve : 2286
We appoint the problem is the order of startup and turn off node and private LUN, so we can use this solution:
1. The start order: start HA firstly, then active the LUN’s group by varyonvg command.
2. The close order: close the group of private LUN by command varyonvg, then close HA.
In the situation without private LUN, when we startup the cluster, the node startup early will dispense reservation, then dispense LOGOUT to clear reservation. The other node is same.
If we don’t clear the reservation, the other node can’t startup normally.
Cluster node clear reservation is finished by LOGOUT command, LOGOUT will down all session. But the private LUN and sharing disk share one session for main node, they are two connection in one session. So when the private LUN is active, LOGOUT can’t down the session and can’t clear the reservation. This is the reason of can’t startup the second node.
Please notice the startup order.