Storage Oceanstor S6800T V2 Startup Fail

Publication Date:  2015-11-20 Views:  351 Downloads:  0
Issue Description

Oceanstor 6800T V2 can not startup after equipment moving from DC to other place.

 

Alarm Information

system can not startup,try to login the ISM, here is the error info.


Handling Process

1.after we get the system logs. we analyzed the ctrl_info log, and found the controller startup failed at:FAIL ACTION: RecoveryProfile: NotifyEamRecoverCfg

diagnose>diagnose>sys showtrace
      Date Time        FlowId               Setup                RunCnt   FailCnt   Status
-------------------   ------   ------------------------------   ------   -------   ------
2015-11-19 12:58:35   2        CLS_POWER_ON                     1        1         Failure
2015-11-19 12:58:38   5        CLS_LINK_CHECK                   2        0         Success
2015-11-19 12:57:12   8        NODE_POWER_ON                    1        0         Success
2015-11-19 12:58:34   9        VOTE_OVER                        1        0         Success
2015-11-19 12:59:49   22       CLS_SYNC_EXTENT                  1        0         Success
2015-11-19 12:58:35   130      CLS_BBU2+0                       1        0         Success
diagnose>sys showtrace 2
CLS_POWER_ON :
TotalRunCnt   TotalFailCnt   CurStatus
-----------   ------------   ---------
1             1              Failure
Description:
Power on node bitmap(3).
The node (nodeId 1) has been dimm shielded,so isolate it.
id         date time                       current trace             
----   -------------------   ----------------------------------------
0000   2015-11-19 12:58:35   PowerOn: CheckPowerOnRetryTime         
0001   2015-11-19 12:58:35   PowerOn: SetRunStatus                  
0002   2015-11-19 12:58:35   PowerOn: SelfCheck                     
0003   2015-11-19 12:58:35   SelfCheck: CheckMirrorCmd              
0004   2015-11-19 12:58:35   SelfCheck: CheckPcieLineByEmp          
0005   2015-11-19 12:58:35   SelfCheck: NotifyOmCheckLink           
0006   2015-11-19 12:58:35   SelfCheck: NotifyEmpDeviceIn           
0007   2015-11-19 12:58:36   SelfCheck: InitSysSn                   
0008   2015-11-19 12:58:36   SelfCheck: InitSysWWN                  
0009   2015-11-19 12:58:36   SelfCheck: NotifyUpgradeSyncVersion    
0010   2015-11-19 12:58:38   SelfCheck: CheckPcieLinkByXnet         
0011   2015-11-19 12:58:39   SelfCheck: SetBdmNormalBitmap          
0012   2015-11-19 12:58:39   SelfCheck: SetBDMOnlineBitmap          
0013   2015-11-19 12:58:39   SelfCheck: SetBdmClusterMaster         
0014   2015-11-19 12:58:39   SelfCheck: NotifyBdmAccessDisk         
0015   2015-11-19 12:58:39   SelfCheck: NotifyBdmReportSpecialDisk  
0016   2015-11-19 12:59:49   SelfCheck: GetNodeInfo                 
0017   2015-11-19 12:59:49   SelfCheck: CheckRecovDirtyData         
0018   2015-11-19 12:59:49   SelfCheck: CheckProductModel           
0019   2015-11-19 12:59:49   SelfCheck: CheckDimmShield             
0020   2015-11-19 12:59:49   SelfCheck: CheckMemSize                
0021   2015-11-19 12:59:49   SelfCheck: CheckRecovDirtyData         
0022   2015-11-19 12:59:49   SelfCheck: SyncCfgFile                 
0023   2015-11-19 12:59:49   SelfCheck: SyncExtent                  
0024   2015-11-19 12:59:49   PowerOn: ConditionCheck                
0025   2015-11-19 12:59:49   PowerOn: RecoverDirtyData              
0026   2015-11-19 12:59:49   RecoveryDirty: CreateDefaultCachePtt   
0027   2015-11-19 12:59:49   CreateDefaultCpt: GetNodeCapacity      
0028   2015-11-19 12:59:49   CreateDefaultCpt: SetRawCapacity       
0029   2015-11-19 12:59:49   CreateDefaultCpt: TaskOver             
0030   2015-11-19 12:59:49   RecoveryDirty: RecordPowerOnStep       
0031   2015-11-19 12:59:49   RecoveryDirty: SetCacheWorkMode        
0032   2015-11-19 12:59:49   RecoveryDirty: RecoverDirtyData        
0033   2015-11-19 12:59:49   SYS_VAULT: InitRecoverBitmap           
0034   2015-11-19 12:59:49   SYS_VAULT: RecoverLocalCache           
0035   2015-11-19 12:59:49   SYS_VAULT: RecoverMirrorCache          
0036   2015-11-19 12:59:49   SYS_VAULT: CheckRecovRet               
0037   2015-11-19 12:59:49   SYS_VAULT: ClearDirtyFlag              
0038   2015-11-19 12:59:49   SYS_VAULT: ClearUselessData            
0039   2015-11-19 12:59:49   SYS_VAULT: ClearMemProtectFlg          
0040   2015-11-19 12:59:49   SYS_VAULT: SyncDirtyData               
0041   2015-11-19 12:59:49   SYS_VAULT: SetMemProtectFlg            
0042   2015-11-19 12:59:49   SYS_VAULT: SetDirtyFlg                 
0043   2015-11-19 12:59:49   SYS_VAULT: RecoverTaskOver             
0044   2015-11-19 12:59:49   PowerOn: RecoverConfigure              
0045   2015-11-19 12:59:49   RecoveryProfile: LogZoneStartWork      
0046   2015-11-19 12:59:49   RecoveryProfile: SyncSysTime           
0047   2015-11-19 12:59:49   RecoveryProfile: NotifyDbStartWork     
0048   2015-11-19 12:59:55   RecoveryProfile: NotifyEmpLoadDbInfo   
0049   2015-11-19 12:59:55   RecoveryProfile: NotifyEmpRecoverDb    
0050   2015-11-19 12:59:55   RecoveryProfile: NotifyEamRecoverCfg   
0051   2015-11-19 12:59:55   PowerOn: TaskOver                      

---> FAIL ACTION: RecoveryProfile: NotifyEamRecoverCfg

2:check the message log,we found the root cause: system check the encloure id failed and DB is different in enclourse.

[2015-11-19 12:59:55][93524][1500000300b7e][ERR][AssginFrameId: Fail to check enclosure id, special enclosure id in DB is different, (MAC(0x51a987fb4648), db inner ID 2, special id 1).][EAM][checkSpecialFrameID,1248][TP_Eam_TPool_7]
[2015-11-19 12:59:55][93524][15000003000f5][ERR][Run Action(FrameIdAssgin CheckFrameID) of Task(AssginFrameId) return, goto Task Over.][EAM][go2NextTaskAction,459][TP_Eam_TPool_7]

 

Root Cause

from the error information. we get the root cause,it should be the system diagram is not correct will result in this issue.

Solution

we provide two solution to customer:

solution 1:corrected the system diagram, here is the referrence:

 Step 1  To use the P0 on A controller and B controller to connect with disk enclosure 0(PRI port).

step 2.  The disk enclosure 0 must be with 4 coffer disks. There are some label on this disk.

 

solution 2: if there is no any data in system, we can clear the DB to make the system re-configure the DB space.

this operation with high risk, after cleaning DB, the data can not read from storage, please double check with customer and get the confirmation from customer.

1、  login two controllers by putty or other SSH tools( you will see minisystem );

2、   then executing “help”, find “cleardb” command(or a command like “clearxxx”);

3、  Then executing the command (both controllers);

4、  Then reboot the system(make sure both controllers restart);

 

 

 

END