No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade
Knowledge Base

Delete capacity expanded lun and do not run hot_add cause two-node switching automatically

Publication Date:  2012-10-18  |   Views:  270  |   Downloads:  0  |   Author:  SU1001573748  |   Document ID:  EKB1000017488

Contents

Issue Description

ATAE host is connected to S2300, use two-node cluster made by vcs5.1. After hard disk capacity expansion, 4 lun is assigned to this two-node cluster. Then two lun is not needed anymore and deleted. Do fdisk operation on existing lun, there is switching on two-node cluster. 

Alarm Information

There is no hard disk alarm information on storage according to storage log.

Handling Process

After delete lun, do not run hot_add in host in time, causing system distribute command to deleted lun and two-node switch.

Root Cause

1. Storage side analysis
6-30 15:15:10 4 LUN mirror is added
45 0 User admin:Add mapping successfully, host group ID is 1, hostLUN ID is 1, deviceLUN ID is 8. 2011-06-30 15:15:10
45 0 User admin:Add mapping successfully, host group ID is 1, hostLUN ID is 2, deviceLUN ID is 9. 2011-06-30 15:15:10
45 0 User admin:Add mapping successfully, host group ID is 1, hostLUN ID is 3, deviceLUN ID is 10. 2011-06-30 15:15:10
45 0 User admin:Add mapping successfully, host group ID is 1, hostLUN ID is 4, deviceLUN ID is 11. 2011-06-30 15:15:10
8-15 11:34:26 2 LUN mirror is deleted
45 0 User admin:Delete mapping successfully, host group ID is 1, hostLUN ID is 3, deviceLUN ID is 10. 2011-08-15 11:34:26
45 0 User admin:Delete mapping successfully, host group ID is 1, hostLUN ID is 4, deviceLUN ID is 11. 2011-08-15 11:34:26
There is similar print in storage log, even though mirror is deleted, host still distribute IO to these 2 LUN.
Aug 15 11:34:41 OceanStor kernel: [31863152409]Target: map has changed or hostlun(3) is not mapped, scsi cmnd = 28 @ [jif=31863152409] SCSI_CmdParse : 1185
Aug 15 11:34:41 OceanStor kernel: [31863152411]Target: map has changed or hostlun(4) is not mapped, scsi cmnd = 28 @ [jif=31863152411] SCSI_CmdParse : 1185
2. Host side analysis
Host first VCS error, in host time 11:12 (host time and storage time is different), report VG error, and report disk sde IO error before, mirror is deleted at this time, so report error that IO cannot distribute
Aug 15 11:12:55 ZJHZ-PS-CMREAD-SV-VGOP1-INT-SD kernel: end_request: I/O error, dev sde, sector 8
Aug 15 11:12:55 ZJHZ-PS-CMREAD-SV-VGOP1-INT-SD kernel: Buffer I/O error on device sde, logical block 1
Aug 15 11:12:55 ZJHZ-PS-CMREAD-SV-VGOP1-INT-SD kernel: Buffer I/O error on device sde, logical block 2
Aug 15 11:12:56 ZJHZ-PS-CMREAD-SV-VGOP1-INT-SD kernel: Buffer I/O error on device sde, logical block 3
Aug 15 11:12:56 ZJHZ-PS-CMREAD-SV-VGOP1-INT-SD Had[30951]: VCS ERROR V-16-1-30 (ZJHZ-PS-CMREAD-SV-VGOP1-INT-SD) bin:???:vg_monitor.sh:There may be some error in the VG,please check it
How to confirm that mirror has been deleted. Because multipathing receive 3f/e, it means mirror in storage has changed. 2 LUN mirror has been deleted at this time.
Aug 15 11:12:40 ZJHZ-PS-CMREAD-SV-VGOP1-INT-SD kernel: UP_D:[9513221983] UP_done:C0P0L0,r=8000002,MPP_CHECK_CONDITION,sk=6,ASC/ASCQ=3f/e,SN:317527644.
There always be IO error of sde or sdf disk before VCS error everytime, mirror of these two disks are deleted
Host would distribute IO to LUN after mirror deleted, it is doubted that do not run hot_add after LUN is deleted, host do not know these two disks are deleted (as below, mirror is deleted in host time 11:12:40, but hot_add command run at 12:09)
//Aug 15 11:11:19 ZJHZ-PS-CMREAD-SV-VGOP1-INT-SD kernel: UP_D:[9513201720] mppLnx_UpdateLUN: hot_add get ReportLun info for array 0
/Aug 15 12:09:52 ZJHZ-PS-CMREAD-SV-VGOP1-INT-SD kernel: UP_D:[9514079709] mppLnx_UpdateLUN: hot_add get ReportLun info for array 0
It is possible that vg_monitor supervise that there is error information in two deleted disk path, and that cause two-node switching. There is vg_monitor script in below, please confirm
vgscan 2>&1 |grep "error"
if [ $? == 0 ]
then
VCSAG_LOG_MSG "E" "There may be some error in the VG,please check it" 30
exit 100
fi


Suggestions

Recommend: after lun capacity expansion or lun deleted, please run hot_add at host in time to rescan disk, in case unnecessary problem happen.