we have three backend array S5600 for the N8300 engine, the business is normal before, but now when we connect the array 2 to the N8300, the cpu usage will exceed 90%, and there is no response to all the commands on the N8300 engine. beside, when the user want to access the other arrays via NFS share, it is very slow.
because there is no alarm in the array device, so we are sure the lun can be change to write back status.
we run the command chglun -i xx -w 1 -m 1 to change the write policy. but when we check the status again, it is the same. at last, after we inquired with the customer, we can restart the array device, then the device is OK.
because it will affect the business, so we disconnect the link between the faulty array and N8300, we connect it for 2 minites each time for the troubleshooting.
1. we use the command vmstat to check the cpu and io data, but the it is not so clear for the performance.
2. then we open two windows to check the status of top and iostat -xm 2, when we connect the link, the cpu idle value will exceed 90%(the normal value is 20%)
3. only one array is faulty, so we gone to check the status of the S5600 with the commands showdisk -l, showrg, showlun -i xx, we found that one LUN's running cache writen policy is write through, no mirroring, which is different from other. we confirmed with R&D, due to the stripe priceple, one file system will spread on all the LUNs in the array, so one is broken, it will affect the whole file system.