TC-A2017 Disk Abnormality Leads to Request Timeout of Solaris

Publication Date:  2012-07-18 Views:  135 Downloads:  0
Issue Description
Related information about the product and version: the T3500
In Solaris, file systems cannot be normally read and written and the status of the zpool cannot be queried, which leads to file system unavailability. Run the zpool status command and the system generates error information recurrently, as shown in Figure 2-20.
Figure 1-1 System recurrent error information
Alarm Information
Handling Process

l   Replace the hard disk in slot sdc. For details, see Replacing a Disk Module in Oceanspace T3000 Storage Node Troubleshooting-(V100R001C01).

Root Cause
Run the ipmitool command to enter the serial port 4 interface. For details, see 3.1 Entering the Serial Port Interface Of the T3500. Run the qerr command and find that phy_chg_cnt of disk phy 02 (corresponding to slot sdc) is 0xe6. This value is higher than that of other hard disks, as shown in Figure 2-21, which indicates that disk phy 02 is abnormal.
Figure 1-1 Serial port 4 information

1.         Run the qerr command to obtain the value of phy_chg_cnt. This value indicates that the acclumuated quantity of phy change since the slot is power on. When the device is powered off, the value returns to 0.
2.         Hot swap or disk fault can generate phy change, which makes the value of phy_chg_cht increase.
3.         If multiple hot swaps occur in one slot, the value of phy_chg_cnt of this slot will be higher; however, this cannot indicate that the hard disk in this slot is faulty. Only when no hot swaps occur but the value of phy_chg_cnt is still higher, it can indicate that the hard disk is faulty.
4.         The correponding relationships between phy x and the T3500 slot numbers are displayed as shown in Table 2-3.
When Solaris sends requests to the hard disk, a large amount of phychange are generated, which make the disk link time out. However, the Solaris does not handle the fault after it detects the fault. As a result, the file system is abnormal.
Wait for three minutes after the hard disk is inserted and then check the disk online status indicator to determine whether the hard disk is normal.

Run the qerr command. The correponding relationships between phy x and the T3500 slot numbers are displayed as follows:Table 2-3.
Table 1-1  Corresponding relationships between phy x and the T3500 slot numbers
phy 23 (x) phy 17 (r) phy 11 (l) phy 5 (f)
phy 22 (w) phy 16 (q) phy 10 (k) phy 4 (d)
phy 21 (v) phy 15 (p) phy 9 (j) phy 3 (d)
phy 20 (u) phy 14 (o) phy 8 (i) phy 2 (c)
phy 19 (t) phy 13 (n) phy 7 (h) phy 1 (b)
phy 18 (s) phy 12 (m) phy 6 (g) phy 0 (a)