No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

The space of the CRM’s pgsql database is full causes can’t create the virtual machine

Publication Date:  2012-11-20 Views:  30 Downloads:  0
Issue Description
There is the version of R2C00SPC100 in a certain desktop cloud site. The maintainers reflect that they can't create the virtual machine in a cluster via ITA or BMS, but use the same mirror image file they can create the virtual machine normally in the other clusters.

Alarm Information
There is the emergency alarm “000015 the utilization rate of the server hard disk is over the threshold” on the OMS Portal, the detail alarm shows that the utilization rate of the master CRM's “/dev/drbd1” is over 80%.

Handling Process
1. Use “df -h” to view the master CRM's disk utilization status, we find the utilization rate of the “/var/lib/pgsql” is reached 100%.
2. Execute “du -h /var/lib/pgsql” to view, and we find the “/var/lib/pgsql/backups” file catalog is the most large one, enter in this file, we find there are numerous spare files, which have recorded all the files from the starting to now, in the normal condition, this catalog just saves the spare files in the nearest 7 days. It denotes that the progress of the sparing appears some abnormal condition and causes the outdated files haven’t been deleted automatically.
3. After confirming with the researcher, we find the version of R2C00SPC100 may have the problem that the outdated spare file can't be deleted, and in the version of R2C00SPC200, the problem has been settled.
4. Because it can't be upgraded at this time at the site, the avoiding measure is to delete the outdated spare files manually and then release some space, and then create the virtual machine on the ITA.

Root Cause
1. The CRM's “/dev/drbd1” disk is “/var/lib/pgsql”, this space is used to store the CRM's pgsql database, we can't write any record to the CRM's database while the space is full.
2. Analyze the “vdesktop.log” log of the ITA, we find it is the Galax reporting to the ITA, and the virtual machine's status is “terminated”, but in the normal condition it must be “pending”, it shows that the reason of creating the virtual machine failed is the GALAX’s problem. There is part of the logs extracted from the “vdesktop.log” as the following displayed:
QueryCreateInstancesTask.java  219  getInstanceState():instanceID=i-343B066F, status=terminated
3. Analyze the “cloud-debug” log of the master ESC, we find it is caused by creating the system volume failed. The corresponding log is as followed:
{DEBUG} {ClusterSink.75} {edu.ucsb.eucalyptus.cloud.ws.VolumeSynSender 103} generate volume Id :vol-38A30685------denotes the ID of the distributed volume is: vol-38A30685
{ERROR} {ClusterSink.75} {com.huawei.galax.storage.vbs.client.BRMClient 153} BRM createVolume failed! --AxisFault-----there denotes it's the BRM returning to create volume failed.
4. Analyze the “vbs-message” log of the master CRM, we find it's caused by can't insert the volume data into the BRM's vbs database, the corresponding logs are as followed:
catch an Exception when add data in db.org.hibernate.exception.GenericJDBCException: could not insert: [com.huawei.galax.storage.vbs.brm.db.entities.VolumeInfo]
<163> 3 2012-04-17T15:36:33.614497+00:00   org.hibernate.exception.GenericJDBCException: could not insert: [com.huawei.galax.storage.vbs.brm.db.entities.VolumeInfo]

Suggestions
For the problem about the space is full, there appeared the corresponding alarm on the OMS Portal at a early time, we suggest the maintainers to deal with the problem timely while finding the alarm.


END