No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Storage failed to power on after change the back end topology

Publication Date:  2016-06-27 Views:  34 Downloads:  0
Issue Description

The data center suffered a electricity interrupt, the storage can't power on after the electricity restored. NotifyEamRecoverCfg procedure was failed to complete.


Alarm Information

Both controllers can't power on.

Handling Process

Find below error display in the Message logs:

AssginFrameId: Fail to check enclosure id, special enclosure id in DB is different

It means the configuration of the enclosure now is different with before, the MAC address of the enclosure recorded in the DB is 0x3884a77bcaa8, but the MAC address of the enclosure detected during power on this time is 0xbf088de58244. That indicate the back end topology was changed after the last power on.

[2016-05-28 10:29:56][ERR][AssginFrameId: Fail to check enclosure id, special enclosure id in DB is different, (MAC(0xbf088de58244), db inner id 3, special id 1).][EAM][checkSpecialFrameID,1351]

[2016-05-28 10:29:56][ERR][AssginFrameId: Special enclosure MAC in DB is 0x3884a77bcaa8(a8-ca-7b-a7-84-38), current enclosure MAC is 0xbf088de58244(44-82-e5-8d-08-bf).][EAM][checkSpecialFrameID,1361]

[2016-05-28 10:29:56][ERR][AssginFrameId: Process failed.][EAM][assginFrameIdTaskOver,2324]

[2016-05-28 10:29:56][ERR][(powerOnAssginFrameIdActionDone): Loop Action Failed, LoopPara=(0).][EAM][defaultActionLoopDone,716]

[2016-05-28 10:29:56][ERR][EAM_POWER_ON: EAM power on task(name EamPowerOn) failed.][EAM][eamPowerOnTaskOver,6122]

Found the storage's last power on date is in the January of 2016 after check the logs, and then check the key words "receive frame event" in the power on records of January. Found that there was only one disk enclosure at that time, the WWN is 0x5a8ca7ba7843803f and the loop is loop 0( loop 2031616, 0x1F0000).

[2016-01-07 15:32:35] [INFO][EHP receive frame event(IN), WWN(0x807060504030201) father WWN(0x102030405060708 loop number(0x0), SCSI device(65535:65535:65535:65535), WWN(0x807060504030201)]

[2016-01-07 15:32:43] [INFO][EHP receive frame event(IN), WWN(0x5a8ca7ba7843803f) father WWN(0x807060504030201 loop number(0x1f0000), SCSI device(6:0:0:0), WWN(0x5a8ca7ba7843803f), MGT typ]

[2016-01-07 15:33:18] [INFO][FRAME_EVENT: Node(id 0) receive frame event info(event type in, 0xwwn 807060504030201, fwwn 0x102030405060708, loop 0, depth 0, SN 2102350BSB10FC000023.]

[2016-01-07 15:33:18] [INFO][FRAME_EVENT: Node(id 0) receive frame event info(frame type 1, is first disk 0, mac 0x7c1cf1f4134e, time stamp 37725).]

[2016-01-07 15:33:18] [INFO][FRAME_EVENT: Node(id 0) receive frame event info(event type in, 0xwwn 5a8ca7ba7843803f, fwwn 0x807060504030201, loop 2031616, depth 1, SN 210235980810FC000360.]

[2016-01-07 15:33:18] [INFO][FRAME_EVENT: Node(id 0) receive frame event info(frame type 0, is first disk 1, mac 0xa8ca7ba78438, time stamp 37824).]

Search the key words 0x5a8ca7ba7843803f between these two power on, and found that the enclosure 0x5a8ca7ba7843803f was moved to P1 interface after last power on.

[2016-01-07 15:47:47] [INFO][EHP receive frame event(OUT), WWN(0x5a8ca7ba7843803f) father WWN(0x807060504030201 loop number(0x1f0000), SCSI device(6:0:0:0), WWN(0x5a8ca7ba7843803f), MGT ty]

[2016-01-07 15:48:47] [INFO][EHP receive frame event(IN), WWN(0x5a8ca7ba7843803f) father WWN(0x807060504030201 loop number(0x1f0001), SCSI device(6:0:0:0), WWN(0x5a8ca7ba7843803f), MGT typ]

And there are two more disk enclosures added into the loop 0.

[2016-05-28 10:26:37][INFO][EHP receive frame event(IN), WWN(0x807060504030201) father WWN(0x102030405060708 loop number(0x0), SCSI device(65535:65535:65535:65535), WWN(0x807060504030201)][DMI]

[2016-05-28 10:26:45][INFO][EHP receive frame event(IN), WWN(0x54482e58d08bf03f) father WWN(0x807060504030201 loop number(0x1f0000), SCSI device(6:0:0:0), WWN(0x54482e58d08bf03f), MGT typ][DMI]

[2016-05-28 10:26:45][INFO][EHP receive frame event(IN), WWN(0x5a8ca7b9f713303f) father WWN(0x54482e58d08bf03f loop number(0x1f0000), SCSI device(6:0:1:0), WWN(0x5a8ca7b9f713303f), MGT ty][DMI]

[2016-05-28 10:26:46][INFO][EHP receive frame event(IN), WWN(0x5a8ca7ba7843803f) father WWN(0x807060504030201 loop number(0x1f0001), SCSI device(6:0:2:0), WWN(0x5a8ca7ba7843803f), MGT typ][DMI]


The system will compare the MAC address of the disk enclosure with the record in the DB, the power on will failed if the MAC addresses are not consistent.

And the first 4 disks are not the sytem disks, they are the member disks in the disk domain. The data in these 4 disks will be damaged if DB write the configuration data into them during power on.


Root Cause
After the storage was power on successfully last time, the storage back end SAS topology was modified online. The configuration disk enclosure was moved to loop 1 and added two more new disk enclosures in the loop 0. During the power on this time, the current configuration is not consistent with the configuration in the DB. This leads to the power on failed. 
Solution
1. Restore the storage back end topology, change it back to as the same as when last power on. And then restart the storage.
2. After the storage is powered on, check the data in the LUN which maybe damaged.
3. Contact R&D to recover the data if the data was damaged.

END