Part of the storages’ speed have decreased and causes to creating virtual machine failed with a certain probability

Version: R02C00SPC100
The symptom of the problem: There has created 30 virtual machines on the ITA in site, and only 20 virtual machines have been created successfully, the other 10 are failed.

1. Analyze the VdeskTop’s log “vdesktop.log”, according to the ID of the virtual machine which hasn’t been created successfully to query, the time the failure appears is when the galax deleting the virtual machine, so we consider it caused by problem on the galax;
2. Analyze the CNA’s log “nc.log”:
Search “doRunInstance”, at 07:43, it begins to create the virtual machine, the log is displayed as followed:
[Wed Oct 26 07:43:39 2011][031239][031291][EUCAINFO  ] doRunInstance() invoked (id=i-53290A61 cores=2 disk=30 memory=2048
And then search “enter startup_thread”, the [025382] in the log is the ID of the virtual machine:
[Wed Oct 26 07:43:39 2011][031239][025382][EUCAINFO  ] enter startup_thread.
Then, continue to search this ID [025382] in the following log:
Wed Oct 26 09:43:48 2011][031239][025382][EUCAWARN  ] system(vbsdd if=/usr/mnt/imagei-53290A61/root of=/dev/mapper/i-53290A61-root bs=1M copyspeed=30M conv=fdatasync oflag=direct>/dev/null 2>> /opt/eucalyptus/var/log/eucalyptus/nc.log) with 9
[Wed Oct 26 09:43:48 2011][031239][025382][EUCAERROR ] dd if=/usr/mnt/imagei-53290A61/root of=dev/mapper/i-53290A61-root failed.
[Wed Oct 26 09:43:48 2011][031239][025382][EUCADEBUG ] startup_image, end.
[Wed Oct 26 09:43:48 2011][031239][025382][EUCAFATAL ] Failed to prepare images for instance i-53290A61 (error=11)
[Wed Oct 26 09:43:48 2011][031239][025382][EUCADEBUG ] vm isolation id is i-53290A61.
[Wed Oct 26 09:43:48 2011][031239][025382][EUCADEBUG ] i-53290A61 isolation tag is 0.
[Wed Oct 26 09:43:48 2011][031239][025382][EUCAFATAL ] changeSelfIsolationCounterWhenFail: HVM vbsdd failed!
[Wed Oct 26 09:43:48 2011][031239][025382][EUCAINFO  ] changeStateForRebuild():vm i-53290A61 runing faild ,change state Booting to teardown!
3. Analyze the above log, we find, at 09:43, 2 hours later from the virtual machine is began to create, the work of the mirror image DD to the storage IPSAN hasn’t completed, it reaches the 2 hours’ limit, and then the galax deletes the virtual machine.
4. On the CNA, execute the command “” to query the volume name which IPSAN’s LUN maps to the local, in this case, it’s “sdc”.
5. Re-execute “dd if=/dev/sdc of=/dev/null bs=1M count=1000”, test the speed of copying 1G data from the LUN to the local, during multiple testing, we find the speed is not steady, and the slowest speed is only about 4M.
The capacity of the virtual machine in site is 30G, if the speed is just 4M, the process of making the mirror image DD to the storage’s LUN will spend more than 120 minutes, and therefore, it has exceeded the 2 hours’ limit.
6. Check the storage’s network in site, we find there are two network interfaces have been consulted as 100M.
7. Replace these two network interface’s network cable, the speed has been consulted as 1G, and re-rest the process of DD data from the storage LUN to the local disk again, we find the speed is steady with about 30M.
8. Test to create virtual machine again, and there isn’t any abnormal condition, hence, the problem has been settled.

1. The VDesktop’s problem causes to creating failed;
2. The CNA downloads files from the ESC failed;
3. The CNA make the mirror image DD to the storage device failed.