FusionCompute VMs can not start up in one CNA host

Publication Date:  2015-05-23 Views:  237 Downloads:  0
Issue Description

VMs is fault recovering, VMs can not HA migrate to other CNA hosts.

Handling Process

1. Check all NIC and Bond state by ifconfig command, and all NICs and bond are up:

ifconfig

eth1      Link encap:Ethernet  HWaddr F8:4A:BF:55:6B:28

          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:0 errors:0 dropped:0 overruns:0 frame:0

          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0

 

2. Ping storage service IP, but is not reachable, so the problem is the defect connection between CNA host and storage, and OS disk can not unmount.

3. Check and find iscsi bond is UP, so we thought this link is normal, so login ISM and check storage
Login other CNA host, and ping storage service ip, it's reacheable, and using iscsiadm -m session command to check iscsi session, and it's normal.
So the storage link should be normal.

4. Login FusionCompute portal and check storage bond, and found one port is degraded to 100MB, other three ports are 1000MB
 
5. Disable PORT4, network to storage restores, and VMs start up normally.
Root Cause

Because bond mode 'bond2:load balancing (xor)' will limit the throughput flow, in order to make server port is always communicate with one specified port of the bond. When the speed of this NIC is degraded to 100MB, NIC is still in health state and provide service. But when the speed is not compatible with peer end port, the communication will not continue even the port is normal. After disable this port, Bond will reassign flow to other 1000MB ports.

END