N8500业务节点异常正电原因导致业务无法启动

发布时间:  2014-09-04 浏览次数:  340 下载次数:  0
问题描述
1、N8500节点业务端口不亮灯
2、客户端无法挂载NAS共享
3、NAS conIP无法PING通
4、通过串口登陆,如下现象:
N8500_01 login: support
Password:
Last login: Fri Aug 29 16:24:48 CST 2014 on console
***********************************************************
*           N8000 Clustered NAS Storage System            *
*                                                         *
*                   Enterprise Edition                    *
* Warning: Only N8000 Clustered NAS Storage System distributed  *
*     patches & RPMs can be installed on this system!     *
*     Do not delete contents of lost+found directory      *
*        of filesystems as it may contain critical        *
* temporary N8000 Clustered NAS Storage System configuration data!  *
***********************************************************
N8500_01:~ # su - master
***********************************************************
*            N8000 Clustered NAS Storage System           *   
*                                                         *
*                                                         *
*             Warning: Authorized Access Only             *
***********************************************************

5.7P2 ENTERPRISE EDITION  (Wed Apr 25 08:20:49 2012),      Installed on Fri Jan 24 17:40:32 CST 2014
Welcome, master (Master). Today's date is Fri Aug 29 16:36:08 CST 2014.
N8500> cluster show
N8000 cluster show ERROR V-288-2136 Unable to show cluster information
N8500_01:~ # gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   4e6a02 membership 01
N8500_02:~ # gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   4e6a02 membership 01
NAS版本N8000V200R001C00SPC500B015
告警信息
红灯告警
处理过程
将节点重启,关闭fencing,将仲裁盘踢出后重新扫盘,重启设备后重新开启fencing。
N8500_01:~ # vxdisk list -o alldgs
DEVICE       TYPE            DISK         GROUP        STATUS
huawei-s5500t0_6 auto:simple     huawei-s5500t0_6  sfscoorddg   online
huawei-s5500t0_7 auto:simple     huawei-s5500t0_7  sfscoorddg   online
huawei-s5500t0_8 auto:simple     -            (sfsdg)      online shared
huawei-s5500t0_9 auto:simple     huawei-s5500t0_9  sfscoorddg   online
huawei-s5500t0_10 auto:simple     -            (sfsdg)      online shared
huawei-s5500t0_11 auto:simple     -            (sfsdg)      online shared
N8500_02:~ # vxdmpadm getsubpaths
NAME  STATE[A]  PATH-TYPE[M] DMPNODENAME  ENCLR-NAME   CTLR   ATTRS
===================================================================
sdb   ENABLED(A) PRIMARY      huawei-s5500t0_6 huawei-s5500t0 c6  -
sdi   ENABLED    SECONDARY    huawei-s5500t0_6 huawei-s5500t0 c7  -
sdd   ENABLED    SECONDARY    huawei-s5500t0_7 huawei-s5500t0 c6  -
sdk   ENABLED(A) PRIMARY      huawei-s5500t0_7 huawei-s5500t0 c7  -
sde   ENABLED(A) PRIMARY      huawei-s5500t0_8 huawei-s5500t0 c6  -
sdl   ENABLED    SECONDARY    huawei-s5500t0_8 huawei-s5500t0 c7  -

N8500_01:~ # gabconfig -a
GAB Port Memberships
===============================================================
Port a gen   4e6a06 membership 01
Port h gen   4e6a12 membership 0
Port h gen   4e6a12    visible ;1

N8500.Storage> fencing status
IO Fencing Status
=================
Disabled      
Disk Name               Coord Flag On  
==============          ============== 
huawei-s5500t0_6        Yes            
huawei-s5500t0_7        Yes            
huawei-s5500t0_8        Yes            
N8500.Storage> fencing on huawei-s5500t0_6,huawei-s5500t0_7,huawei-s5500t0_8
N8000 fencing INFO V-288-21002 Please ensure that no service is running in any of the file systems before this action!
Do you want to continue (y/n): y
VxVM vxprint ERROR V-5-1-582 Disk group sfscoorddg: No such disk group   
N8000 fencing Success V-288-1008 IO Fencing feature now Enabled with SCSI3 Persistent Reservations
100% [#] Enabling fencing with disks                                     
N8500.Storage> fencing status

IO Fencing Status
=================
Enabled       

Disk Name               Coord Flag On  
==============          ============== 
huawei-s5500t0_6        Yes            
huawei-s5500t0_7        Yes            
huawei-s5500t0_8        Yes            
N8500.Storage> fs list
FS  STATUS  SIZE  LAYOUT  MIRRORS  COL  UMNS  USE% NFS SHARED CIFS SHARED  SECONDARY TIER  POOL LIST
=====================================================================================================
PC_FS  offline  10.00T  striped  - 3  - yes  no  no  storage_pool
N8500.Storage> fs online PC_FS
100% [#] Online filesystem                                               
N8500.Storage> fs online OA_FS
N8500.Storage> exit
N8500> system clock show
Fri Aug 29 18:38:14 Asia/Shanghai 2014
DST_STATUS=Disable
N8500> support services autofix
Attempting to fix service faults.................done
N8500> nfs server status
NFS Status on N8500_01 : ONLINE
NFS Status on N8500_02 : ONLINE
N8500> cifs server status
CIFS Status on N8500_01 : OFFLINE
CIFS Status on N8500_02 : OFFLINE

Homedirfs                 :
Security                  : user
Clustering Mode           : normal
根因
客户反馈设备之前运行正常,由于业务需要,客户对设备进行是搬迁,关机前将与存储相连接的线缆直接拔出,下电时对两节点进行同时下电,分析认为是节点同时下是电,造成三块仲裁盘数据损坏,导致NAS服务无法正常启动,从而造成业务中断。
建议与总结
通过本次故障现象,可以确认NAS节点在进行下电操作的时候,应尽量避免同时下电、禁止在节点未下电时将与存储相连的线缆全部拔出。

END