Customer feedback one server RH5288 V3 failed to boot up at 26th May 2018.
There were three disks got failure alarm from bmc. The hard disks in slots5, 16 and 27 were all involved.
According to RAID controller card logs, "Command timeout","reset" and “media error” were recorded for the hard disks in slots 5, 16, and 27.
1: The hard disk in slot 5 was faulty and the status became FAILED due to command timeout.
Figure 3: The status of the hard disk in slot 27 became SHIELD due to media error.
During system log collection, the hard disks did not respond, and the RAID controller card removed them from the array. In this case, the hard disk alarms were generated.
"Flash LED=BD00561F" was found in the hard disk 5 lower layer logs provided to the manufacturer for analysis.
IRAW: A hard disk background processing mechanism. When a hard disk is in the idle state, the mechanism checks whether the data written to the hard disk is the same as the data in the cache, ensuring the accuracy of the written data.
The Seagate 6 TB SATA firmware versions earlier than SN05 do not check whether the cache is used by IRAW operations when running host read commands. When cache conflict occurs, hard disk I/O operations occasionally time out, resulting in hard disk alarms.
It is recommended that the Seagate hard disk firmware be upgraded to SN06 to resolve the problem. SN06 FW download link as below: http://support.huawei.com/enterprise/en/software/22458199-SW1000228987