A RAID Group Is Degraded Because a Hard Disk Fails and No Hot Spare Disk Is Configured

Publication Date:  2012-07-17 Views:  267 Downloads:  0
Issue Description
Related information about the product and version: CSS V100R001C01 Database Volume.
During the RAID group initialization, the 0xB02160001 A RAID group is degraded alarm is reported to the ISM.
Alarm Information
None
Handling Process
Step 1     Replace the faulty hard disk with a new one according to the alarm description.                               Step 2     Check whether the 0xB02160002 RAID group reconstruction starts alarm is reported to the ISM.
l   If yes, it indicates the RAID group is reconstructing. Wait until the reconstruction is complete, and then go to Step 3.

The reconstruction time is proportional to the data amount. You can run the MegaCli64 -pdrbld -showprog -physdrv [E:S] -aall command to check the reconstruction progress of the hard disk, as shown in the red circle in Figure 1-12.
Figure 1-1  Checking the reconstruction progress of the hard disk

 
l   If no, go to Step 4.
 Step 3     And then check whether the 0xB02160003 RAID group reconstruction succeeds alarm is reported to the ISM.
l   If yes, the fault is removed.
l   If no, contact technical support engineers.
Step 4     Log in on the maintenance terminal to the alarm device as the root user.
Step 5     Run the MegaCli64 -adpautorbld -enbl -aall command to set the LSI RAID card to automatically reconstruct.
Step 6     Set the replacement or newly inserted hard disk as the hot spare disk.
 Step 7     Run the MegaCli64 -pdinfo -physdrv [E:S] -aall command to check the status of the hot spare disk, as shown in the red circle in Figure 1-13. physdrv indicates the hard disk used to configure a hot spare disk, E and S in [E:S] indicate Enclosure Device ID and Slot Number respectively.

Run the MegaCli64 -pdlist -aall | grep Enclosure -m1 command to query the ?Enclosure Device ID.
Figure 1-2  Checking the status of the hot spare disk

 
l   If the Firmware state is in the online state, the reconstruction is complete.
l   If the Firmware state is in the rebuild state, the reconstruction is in progress and you need to wait until the reconstruction is complete.
l   Otherwise, contact technical support engineers.
Step 8     After the reconstruction, run the MegaCli64 -ldinfo -lx-aall command to check the status of the RAID group, as shown in the red circle in Figure 1-14. x specifies the virtual drive number for the command.
Figure 1-3  Checking the status of the RAID group

 
l   If the RAID group is in the Optimal state, the reconstruction succeeds.
l   Otherwise, contact technical support engineers.
                               Step 9     Check whether the 0xB02160001 A RAID group is degraded alarm disappears from the ISM interface.
l   If yes, the fault is removed.
l   If no, contact technical support engineers.
----End
 
Root Cause
On the ISM interface, the alarm listed in Table 1-10 is reported.
Table 1-1  Alarm of RAID group degradation
Alarm ID Alarm Name Alarm Cause Alarm Description
0xB02160001 A RAID group is degraded A hard disk is faulty and there is no hot spare disk in RAID group. The hard disk ([slot-id]) of the device ([dev-name]) is faulty, and there is no hot spare disk, which results in the degrade of the RAID group ([raid-name]); location: cloud storage domain ([domain-name]), rack ID ([rack]), frame ID ([frame]).

1.         Check the disk online indicator according to the alarm information and detect the indicator is red.
2.         Log in on the maintenance terminal to the alarm device as the root user.
3.         Run the MegaCli64 -pdlist -aall | grep Hotspare$ -B12 command to check whether a hot spare disk is configured. No hard disk whose Firmware state is Hotspare exists, indicating that no hot spare disk is specified in the system or the hot spare disk has been used when the RAID group was reconstructed.

If Firmware state of one hard disk is Hotspare, as shown in the red circle in Figure 1-11, it indicates that a hot spare disk is specified in the system.
Figure 1-1  Viewing the system hard disk

 
Therefore, the RAID group degrades because a hard disk fails and no hot spare disk is configured.
Suggestions
None

END