[T Series]An iSCSI Link-Down During Asianux 3 SP2IOmeter Read/Write Operations Led to a Not Responding Host

Publication Date:  2012-07-19 Views:  178 Downloads:  0
Issue Description

Product and version information:

  • S5500T V100R001 V100R002
  • S5600T V100R001 V100R002
  • S5800T V100R001 V100R002
  • S6800T V100R001 V100R002
  • Application server operating system: Asianux 3 SP2 for X86_64
  • Application server native iSCSI initiator version: iscsi-initiator-utils-6.2.0.868-0.18.1AXS3

The storage array was connected to the application server through iSCSI connections. The Asianux host used IOmeter to perform read/write testing on the LUN mapped from the storage array. During the read/write operations, an iSCSI link-down occurred (caused by a cable removal or unexpected power-off). The host CPU utilization was close to 100% and the host was not responding even to SSH or KVM login attempts.

Alarm Information
None
Handling Process
1. Attempt to restore the iSCSI connections and wait 5 minutes for the host to start responding.
2. If the host is still not responding, restart the host.
Root Cause

Dynamo on Linux has an infinite retry mechanism. When a slave block device returned an i/o error to upper-layer applications, Dynamo retried this failed I/O immediately, but the iSCSI driver returned this retry I/O as an error. The above situation was a logical infinite loop. This infinite loop resulted in a CPU utilization of close to 100% as shown in Figure1.
    Figure 1 CPU utilization

 

When the CPU utilization was close to 100%, the host may not respond to any external events.

 
We ran the same test on a Red Hat5.4 host, and the same symptoms occurred

Suggestions
When using IOmeter on a Linux host which is connected to the storage array through iSCSI connections, avoid iSCSI link-downs; otherwise, the Linux host CPU utilization may approach 100%.

END