Two OceanStor 2200 V3 storage devices reported the controller failure alarm (alarm ID: 0xF00CF005), and the error code was 0x4000cf3b.
18:32:19 0xF00CF005F Major None Controller (Controller Enclosure CTE0, controller B, item 03057201, SN 210305720110GB000107) is faulty. Error code: 0x4000cf3b.
22:20:33 0xF00CF005F Major None Controller (Controller Enclosure CTE0, controller B, item 03057201, SN 210305720110GC000075) is faulty. Error code: 0x4000cf3b.
Working Principle of the Slow Disk Alarm
The SAS driver calculates the average I/O service time at an interval of 30 minutes. If the average I/O service time exceeds 100 ms, this period is considered a slow period. Within 24 hours, if the average I/O service time exceeds 100 ms in 21 hours, the slow disk alarm is reported (error code: 0x4000CF3b).
(Total time of a period – Total idle time in the period)/Number of I/Os in the period
As shown in the preceding figure, for example,two I/Os exist in the period of 30 minutes, so the average I/O service time of the two I/Os is [30 minutes – (Ide1 + Idle2 + Idle3)]/2. If the average I/O service time exceeds 100 ms, this period is a slow period.
Analysis of System Logs
Based on the analysis of system logs, the SAS driver confirmed that the system responded slowly to I/Os, because the average I/O service time exceeded 100 ms. Therefore, the slow disk alarm was reported.
1. Based on the analysis of system disk logs, no I/O timeout or SMART exception information was found. This
slow disk alarm was not caused by the hardware fault of the system disk.
2. Based on the analysis of system logs, it was found that the total idle time in a period exceeded the total time of the period (30 minutes). After code confirmation, it was found that more idle time was calculated due to a logic defect in the software.
I/O service time calculated from [30 minutes – (Ide1 + Idle2 + Idle3 + …+DeltaIdle)]/(IO count) exceeds 100 ms, which is abnormal. When this problem occurs in a total of 21 hours (42 times), the controller failure alarm is reported.
When the system disk of a controller had only a few I/Os, the SAS driver incorrectly determined a slow disk, causing the system to report a false controller failure alarm.
V300R005C00SPH301 resolve the issue, so need to upgrade storage the version.