Two disks of VTL3500 are offline and RAID5 was faulty

Publication Date:  2013-03-21 Views:  295 Downloads:  3
Issue Description
All the LUN of RAID 5 are offline, the storage software can’t write date to the LUN. We can see that the two hard disks are offline.
Alarm Information
see the picture:

2 Key Message
2.1 seqNum: 0x00033283
Time: Wed Mar  6 01:05:57 2013

Code: 0x00000072
Class: 0
Locale: 0x02
Event Description: State change on PD 1b(e0x08/s6) from FAILED(11) to UNCONFIGURED_BAD(1)
Event Data:
=========================

2.2 seqNum: 0x00033286
Time: Wed Mar  6 01:05:59 2013

Code: 0x00000072
Class: 0
Locale: 0x02
Event Description: State change on PD 0a(e0x08/s14) from HOT SPARE(2) to REBUILD(14)
Event Data:
=====================

2.3 seqNum: 0x000332ad
Time: Wed Mar  6 01:06:08 2013

Code: 0x00000072
Class: 0
Locale: 0x02
Event Description: State change on PD 20(e0x08/s11) from FAILED(11) to UNCONFIGURED_BAD(1)
Event Data:
=====================================
2.4 seqNum: 0x000332ae
Time: Wed Mar  6 01:06:09 2013

Code: 0x00000065
Class: 2
Locale: 0x02
Event Description: Rebuild failed on PD 0a(e0x08/s14) due to source drive error
Event Data:
==========

2.5 seqNum: 0x00033291
Time: Wed Mar  6 01:06:08 2013

Code: 0x00000051
Class: 0
Locale: 0x01
Event Description: State change on VD 01/1 from DEGRADED(2) to OFFLINE(0)
Event Data:

Handling Process
There are two ways to solve this issue.
1 Recover data of the RAID5 for VTL3500
Please see attachment for the way how to recover data.
2 Reconfigure the VTL system
  Reconfigure the VTL3500 system after change the two failure disks.
3 Final solution
Our customer used the second way to recover VTL3500.
Root Cause
The hard disk in the Slot 6 is offline at the time 2013/3/6 01:05:57 , and then the hot spare disk in the Slot 14 is changed to rebuild , The state of raid group is changed from ONLINE to DEGRADED . And then the hard disk in the Slot 11 is offline at the time 2013/3/6 01:06:08, the time of rebuild is so short that the Slot 14 hard disk rebuild failed. because the Slot 6 and Slot 11 are offline in raid5 system, so the state of raid5 group is changed from DEGRADED to OFFLINE.

Root Cause
The raid card firmware log was lost after reboot the system, so we can’t find the message about hardware event for the time 2013/3/6. So we can’t find out the reason why the hard disks offline. According to our experience, the most of two disks offline were caused by hard disks failure.
Suggestions
None

END