Product and version information:
The storage array was connected to the application server through iSCSI connections. The Asianux host used IOmeter to perform read/write testing on the LUN mapped from the storage array. During the read/write operations, an iSCSI link-down occurred (caused by a cable removal or unexpected power-off). The host CPU utilization was close to 100% and the host was not responding even to SSH or KVM login attempts.
Dynamo on Linux has an infinite retry mechanism. When a slave block device returned an i/o error to upper-layer applications, Dynamo retried this failed I/O immediately, but the iSCSI driver returned this retry I/O as an error. The above situation was a logical infinite loop. This infinite loop resulted in a CPU utilization of close to 100% as shown in Figure1.
Figure 1 CPU utilization
When the CPU utilization was close to 100%, the host may not respond to any external events.
We ran the same test on a Red Hat5.4 host, and the same symptoms occurred