No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

Other Error Is Reported for Hard Disks of an RH2288 V3

Publication Date:  2019-04-23 Views:  23 Downloads:  0
Issue Description
 An alarm indicating that the hard disk fails to write I/O is generated randomly on the OS, and the alarm is cleared after a period of time.
Handling Process

1. Collect fault symptoms. The I/O write failure is random and it is not specific to a certain hard disk.

2. Run the following command on the RAID controller card of the OS to query the alarm information. It is found that the number of Other Error Count is high. In this case, analysis on the BMC and OS logs is needed.

3. Collect the BMC and OS logs.

      3.1. According to the SEL recorded in the BMC logs, the server did not print any abnormal disk information.

      3.2. Search for other error in the smart files in the disk directory of the OS log. There are 10 hard disks with incorrect statistics.

3.3 Locate the sasraidlog file in the raid directory in the OS log (log file names vary according to RAID controller card models). The log shows that multiple hard disks and the hard disk backplane have generated some I/O timeout records.

 3.4. Collect the OS log one day later to check the Other Error Count of hard disks. It is found that the value of Other Error Count keeps increasing quickly.

4. Replace the hard disk backplane, RAID controller card, and SAS cable.


Root Cause

The communication between the hard disk and the system is abnormal because the SAS link is not functioning properly. This results in I/O command delivery timeout and a great value of Other Error Count.

Solution

1. Other Error is caused by hard disk reset due to IO timeout on the SAS link.

2. It is recommended that you collect the Other Error Count within a specified period. For servers with a high increment in Other Value Count, you are advised to replace the hard disk backplane, SAS cables, and RAID controller card.

END