No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Server RH5288 V3 failed to boot up

Publication Date:  2018-07-06 Views:  313 Downloads:  0
Issue Description

Customer feedback one server RH5288 V3 failed to boot up at 26th May 2018.

 

Alarm Information

There were three disks got failure alarm from bmc. The hard disks in slots5, 16 and 27 were all involved.

Handling Process

According to RAID controller card logs, "Command timeout","reset" and “media error” were recorded for the hard disks in slots 5, 16, and 27.

Figure
1: The hard disk in slot 5 was faulty and the status became FAILED due to command timeout.

 

 
Figure 2: The status of the hard disk in slot 16 became FAILED due to command timeout.


Figure 3: The status of the hard disk in slot 27 became SHIELD due to media error.

During system log collection, the hard disks did not respond, and the RAID controller card removed them from the array. In this case, the hard disk alarms were generated.

"Flash LED=BD00561F" was found in the hard disk 5 lower layer logs provided to the manufacturer for analysis.



Flash LED is an interval hard disk alarm mechanism. A Flash LED ID will be generated and logged when the hardware firmware fails to work due to processing exceptions. Each Flash LED ID is unique. The Seagate 6 TB SATA firmware versions earlier than SN05 do not check whether the cache is used by idle read after write (IRAW) operations when running host read commands. Therefore, the BD-00561F error code is reported when cache conflict occurs.
SN05 modifies the firmware processing mechanism and checks whether cache is used before running host read commands. If the cache is used by IRAW, the firmware terminates IRAW and releases the cache.

Note:

IRAW: A hard disk background processing mechanism. When a hard disk is in the idle state, the mechanism checks whether the data written to the hard disk is the same as the data in the  cache, ensuring the accuracy of the written data.

Root Cause

The Seagate 6 TB SATA firmware versions earlier than SN05 do not check whether the cache is used by IRAW operations when running host read commands. When cache conflict occurs, hard disk I/O operations occasionally time out, resulting in hard disk alarms.




 

 

 

 

 

Solution

It is recommended that the Seagate hard disk firmware be upgraded to SN06 to resolve the problem. SN06 FW download link as below: http://support.huawei.com/enterprise/en/software/22458199-SW1000228987



 

END