No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Using FusionCompute for Virtualization(Cascaded OpenStack)

Using FusionCompute for Virtualization(Cascaded OpenStack)

Overview

When using the FusionSphere OpenStack cloud platform, problems such as resource residues and resource unavailable occur because of unexpected system failures (such as host reboot, process restart), or backup recovery, resulting in service failure. In this case, the resource pool consistency audit is performed to ensure data consistency in the resource pool, ensuring the normal operation of the services.

Scenarios

The system audit is required for the OpenStack-based FusionSphere system when data inconsistency occurs in the following scenarios:

  • When a service-related operation is performed, a system exception occurs. For example, when you create a VM, a host process restarts, causing the operation to fail. In this case, residual data may reside in the system or resources may become unavailable.
  • If any service-related operation is performed after a system database is backed up and before the database is restored, residual data may reside in the system or resources may become unavailable after the system database is being restored using the data backup.

The system audit is used to help administrators detect and handle data inconsistency.

Therefore, conduct a system audit when:

  • An alarm is generated indicating that data inconsistency verification fails.
  • The system database is restored using a data backup.
  • The routine system maintenance is performed.
NOTE:
  • You are advised to conduct a system audit when the system is running stably. Do not use audit results when a large number of service-related operations are in progress.
  • During the audit process, if service-related operations (for example, creating a VM or expanding the system capacity) are performed or any system exception occurs, the audit result may be distorted. In this case, conduct the system audit again after the system recovers. In addition, confirm the detected problems again based on the audit result processing procedure.

Audit Mechanism

The following illustrates how a system audit works:

  • The system obtains service data from databases, hosts, and storage devices, compares the data, and generates an audit report.
  • The audit guide and Command Line Interface (CLI) commands are provided for users to locate and handle the data inconsistency problems listed in the audit report.

You can conduct a system audit in either automatic or manual mode:

  • Automatic: The system automatically starts to audit at 4:00 every day and reports an alarm and generates an audit report if it detects any data inconsistency. If the alarm has been generated, the system does not generate a second one. If no data inconsistency problem is detected but an alarm has been generated for data inconsistency, the system automatically clears this alarm.
  • Manual: Log in to FusionSphere OpenStack and run the required command to start an audit.

Audit Process

If any audit alarm is generated, conduct an audit based on the process shown in Figure 18-2.

Figure 18-2 Audit process

Manual Audit

Scenarios

  • The system database is restored using a data backup.
  • Inconsistency problems are handled. The manual audit is used to verify that the problems are rectified.

Prerequisites

Services in the system are running properly.

Procedure

  1. Log in to any controller node in the AZ in the cascaded FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.

    Please enter 1, enable Keystone V3 authentication with the built-in DC administrator.

  3. Run the following command in security mode to perform manual audit: (For details, see Command Execution Methods.)

    infocollect audit --item ITEM --parameter PARAMETER --type TYPE

    Table 18-27 describes parameters in the command.

    If you do not specify the audit item, an audit alarm will be triggered when an audit problem is detected. However, if the audit item is specified, no audit alarm will be triggered when an audit problem is detected.

    Table 18-27 Parameter description

    Parameter

    Mandatory or Optional

    Description

    item

    Optional

    Specifies a specific audit item. If you do not specify the audit item, an audit alarm will be reported when an audit problem is detected. However, if the audit item is specified, no audit alarm will be reported when an audit problem is detected. Values:

    • 1001: indicates that a VM is audited. The following audit reports are generated after an audit is complete:
      • orphan_vm.csv: Audit report about orphan VMs
      • invalid_vm.csv: Audit report about invalid VMs
      • host_changed_vm.csv: Audit report about VM location inconsistency
      • stucking_vm.csv: Audit report about stuck VMs
      • diff_property_vm.csv: Audit report about VM attribute inconsistency
      • diff_state_vm.csv: Audit report about VM status inconsistency
      • host_invalid_migration.csv: Audit report about abnormal hosts that adversely affect cold migrated VMs
    • 1002: indicates that an image is audited. The following audit report is generated after an audit is complete:

      stucking_images.csv: Audit report about stuck images

    • 1003: indicates that a zombie process is audited. The following audit report is generated after an audit is complete:

      zombie_process_hosts.csv: Audit report about zombie processes

    • 1004: This item is required only for the KVM virtualization platform but not for FusionCompute.
    • 1005: indicates that the records of migrated databases are audited. The following audit reports are generated after an audit is complete:
      • cold_cleaned.csv: Audit report about residual data after cold migration
      • live_cleaned.csv: This report is generated only in KVM scenarios and not generated in FusionCompute scenarios.
      • cold_stuck.csv: Audit report about stuck databases of the cold migration
    • 1102: indicates that the Neutron namespace is audited. The following audit report is generated after an audit is complete:

      redundant_namespaces.csv: Audit report about redundant Neutron namespaces

    • 1103: indicates that an orphan Neutron port is audited. The following audit report is generated after an audit is complete:

      neutron_wild_ports.csv: Audit report about orphan Neutron ports

    • 1201: indicates that the invalid volume, orphan volume, volume attachment status and stuck volume are audited. The following audit reports are generated after an audit is complete:
      • fakeVolumeAudit.csv: Audit report about invalid volumes
      • wildVolumeAudit.csv: Audit report about orphan volumes
      • VolumeAttachmentAudit.csv: Audit report about the volume attachment status
      • VolumeStatusAudit.csv: Audit report about stuck volumes
      • FrontEndQosAudit.csv: Audit report about front-end QoS
      • VolumeQosAudit.csv: Audit report about volume QoS
    • 1204: indicates that the invalid snapshot, orphan snapshot, and stuck snapshot are audited. The following audit reports are generated after an audit is complete:
      • fakeSnapshotAudit.csv: Audit report about invalid snapshots
      • wildSnapshotAudit.csv: Audit report about orphan snapshots
      • SnapshotStatusAudit.csv: Audit report about stuck snapshots
      • wildInstanceSnapshotAudit.csv: Audit report about residual orphan child snapshots
    • 1205: indicates that an orphan snapshot is audited. The following audit report is generated after an audit is complete:

      wildSnapshotAudit.csv: Audit report about orphan snapshots

    • 1206: indicates that the volume attachment status is audited. The following audit report is generated after an audit is complete:

      VolumeAttachmentAudit.csv: Audit report about the volume attachment status

    • 1207: indicates that a stuck volume is audited. The following audit report is generated after an audit is complete:

      VolumeStatusAudit.csv: Audit report about stuck volumes

    If the parameter is not specified, all the audit items are performed by default.

    parameter

    Optional. This parameter can be specified only after the audit item is specified.

    Specifies an additional parameter. You can specify only one value which needs to match the item.

    • If item is set to 1001, you can set the value of vm_stucking_timeout which indicates the timeout threshold in seconds for VMs in an intermediate state. The default value is 14400. The value affects the audit report about stuck VMs. You can also set the value of host_invalid_timeout which indicates the heartbeat timeout threshold in seconds for abnormal hosts. The default value is 14400. The value affects the audit report about abnormal hosts that adversely affect cold migrated VMs.
    • If item is set to 1002, you can set the value of image_stucking_timeout which indicates the timeout period in seconds for transient images. The default value is 86400. The value affects the audit report about stuck images.
    • If item is set to 1005, you can set the value of migration_stucking_timeout which indicates the timeout period in seconds. The default value is 14400. The migration_stucking_timeout parameter affects the audit report about intermediate state of the cold migration.
    • If item is set to other values, no additional parameter is required.

    Example: --parameter vm_stucking_timeout=3600

    type

    Optional

    Specifies the additional parameter, which indicates whether an audit is synchronous or asynchronous. If this parameter is not specified for an audit, the audit is a synchronous one. The values are:

    • sync: specifies a synchronous audit. For details, see the following command.
    • async: specifies an asynchronous audit. For details, see Asynchronous Audit. The audit progress and audit result status of an asynchronous audit can be obtained by invoking the interface for querying the task status.

    Run the following command to detect a VM in the intermediate state for greater than or equal to 3600 seconds when conducting an audit:

    infocollect audit --item 1001 --parameter vm_stucking_timeout=3600

    Information similar to the following is displayed:

    +--------------------------------------+----------------------------------+  
    | Hostname                             | Path                             |  
    +--------------------------------------+----------------------------------+  
    | CCCC8175-8EAC-0000-1000-1DD2000011D0 | /var/log/audit/2015-04-22_020324 |  
    +--------------------------------------+----------------------------------+

    In the command output, Hostname indicates the ID of the host for which the audit report is generated, and Path indicates the directory containing the audit report.

    You need log in to the host firstly and then view audit reports based on Collecting Audit Reports to view it.

Collecting Audit Reports

Scenarios

  • An alarm is generated.
  • Routine maintenance is performed.

Prerequisites

A local PC running the Windows operating system is available.

Procedure

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables to the host.

    For details, see Importing Environment Variables.

    Please enter 1, enable Keystone V3 authentication with the built-in DC administrator.

  3. Run the following command to obtain the External OM plane IP address of a host where the audit service is deployed. For details, see Command Execution Methods.

    cps template-instance-list --service collect info-collect-server

    Information similar to the following is displayed:

  4. Log in to a host with the info-collect-server role deployed using the OM IP address in the cascaded FusionSphere OpenStack system.

    For details, see section Logging In to a Host with a Role Deployed.

  5. Run the following command to query the time for the last audit conducted on the host:

    ls /var/log/audit -Ftr | grep /$ | tail -1

    Information similar to the following is displayed:

    2014-09-20_033137/
    NOTE:
    • The command output indicates the audit time. For example, 2014-09-20_033137 indicates 3:31:37 on September 20th, 2014.
    • If no result is returned, no audit report is available on the host.

  6. Run the following command to create a temporary directory for saving the audit report:

    mkdir -p /home/fsp/last_audit_result

  7. Run the following command to copy the latest audit report on the host to the temporary directory:

    cp -r /var/log/audit/`ls /var/log/audit -Ftr | grep /$ | tail -1` /home/fsp/last_audit_result

  8. Run the following command to modify the temporary directory and file permission:

    chmod 777 /home/fsp/last_audit_result/ -R

  9. Use WinSCP or other tools to copy the last_audit_result folder in the /home/fsp/last_audit_result directory to the local PC.

    The default username is fsp, and the default password is Huawei@CLOUD8.

  10. Run the following command to delete the temporary folder from the host:

    rm -r /home/fsp/last_audit_result

Analyzing Audit Results

  • If multiple faults are displayed, handle the faults one by one based on 2.
  • If the audit alarm is generated in a cascaded FusionSphere OpenStack system, the alarm must be handled regardless of whether it is also generated in the cascading system. After the alarm is cleared, audit the cascading system (manually or using the automatic routine audit by the cascading system) to ensure data consistency between the cascading and cascaded systems.

Scenarios

Analyze the audit results when:

  • When receiving audit-related alarms, such as volume, VM, snapshot, and image audit alarms, log in to the system, obtain the audit reports, and rectify the faults accordingly.
  • After enabling the backup and restoration feature, log in to the system and perform a consistency audit. Then obtain the audit reports and rectify the fault accordingly.
  • To perform routine maintenance for the system, log in to the system and perform an audit. Then obtain the audit reports and rectify the fault accordingly.

Prerequisites

You have obtained the audit report. For details, see Collecting Audit Reports.

Procedure

  1. For an audit-related alarm, select the target audit report based on the information available in Details in Additional Info for further analysis.
  2. Check the audit report name.

    • VM Audit
      • If the report name is orphan_vm.csv, handle the fault according to section Orphan VMs.

        Otherwise, residual resources may reside.

      • If the report name is invalid_vm.csv, handle the fault according to section Invalid VMs.

        Otherwise, unavailable VMs may be visible to users.

      • If the report name is host_changed_vm, handle the fault according to section VM Location Inconsistency.

        Otherwise, VMs may become unavailable.

      • If the report name is stucking_vm.csv, handle the fault according to section Stuck VMs.

        Otherwise, VMs may become unavailable.

      • If the report name is diff_state_vm.csv, handle the fault according to section VM Status Inconsistency.

        Otherwise, tenant operation permissions on VMs may be restricted.

      • If the report name is diff_property_vm.csv, handle the fault according to section VM Attribute Inconsistency.

        Otherwise, the system data may become inconsistent.

      • If the report cold_stuck.csv is not left blank, handle the fault according to section Intermediate Status of the Cold Migration.

        Otherwise, the affected VMs may fail to be maintained.

      • If the report host_invalid_migration.csv is not left blank, handle the fault according to section Abnormal Hosts That Adversely Affect Cold Migrated VMs.

        Otherwise, the affected VMs may fail to be maintained.

    • Volume Audit
      • If the report name is wildVolumeAudit.csv, handle the fault according to section Orphan Volumes.

        Otherwise, volumes may be unavailable in the Cinder service but occupies the storage space.

      • If the report name is fakeVolumeAudit.csv, handle the fault according to section Invalid Volumes.

        Otherwise, unavailable volumes may be visible to users.

      • If the report name is VolumeStatusAudit.csv, handle the fault according to section Stuck Volumes.

        Otherwise, volumes may become unavailable.

      • If the report name is VolumeAttachmentAudit.csv, handle the fault according to section Handling Inconsistent Volume Attachment Information.

        Otherwise, residual resources may reside.

      • If the report name is FrontEndQosAudit.csv and the report is not empty, rectify the fault based on Frontend Qos. Otherwise, residual resources may exist.
      • If the report name is VolumeQosAudit.csv and the report is not empty, rectify the fault based on Volume Qos. Otherwise, unavailable volumes may be visible to users.
    • Snapshot Audit
      • If the report name is wildSnapshotAudit.csv, handle the fault according to section Orphan Volume Snapshots.

        Otherwise, residual resources may reside.

      • If the report name is fakeSnapshotAudit.csv, handle the fault according to section Invalid Volume Snapshots.

        Otherwise, unavailable volume snapshots may be visible to users.

      • If the report name is wildInstanceSnapshotAudit.csv, rectify the fault based on Residual Orphan Child Snapshots.Otherwise, residual volumes snapshot resources exist, occupying system resources.
    • Image Audit
      • If report stucking_images.csv is not left blank, handle the fault according to section Stuck Images.

        Otherwise, maintenance operations cannot be performed on required VMs.

    • Other Audit
      • If the report name is zombie_process_hosts.csv, the zombie processes have been generated in the nova-novncproxy service and automatically processed. For details, see section Nova novncproxy Zombie Process.
      • If the report cold_cleaned.csv is not left blank, residual cold migration records reside in the environment and automatically processed. For details, see section Residual Cold Migration Data.
      • If report live_cleaned.csv is not left blank, handle the fault according to section Detecting and Deleting Residual Live Migration Data.

        Otherwise, maintenance operations cannot be performed on required VMs.

Handling Audit Results

  • In this chapter, import environment variables before performing the nova (in nova xxx format), cinder (in cinder xxx format), neutron (in neutron xxx format), glance (in glance xxx format), cps (in cps xxx format), or cpssafe command. For details, see Importing Environment Variables.
  • The commands in OpenStack can be performed in either secure mode or insecure mode. For details, see section Command Execution Methods.

Orphan VMs

Context

A VM is orphaned in the following scenario:

  • The VM is present on a host but does not exist in the system database or is in the deleted status in the database.
  • If an orphan VM is not created by a tenant, it is recommended that the tenant deletes it to release computing and network resources.

Parameter Description

The name of the audit report for an orphan VM is orphan_vm.csv. Table 18-28 describes parameters in the report.

Table 18-28 Parameter description

Parameter

Description

uuid

Specifies the VM universally unique identifier (UUID).

hyper_vm_name

Specifies the VM Uniform Resource Name (URN) registered in FusionCompute.

host_id

Specifies the ID of the host accommodating the VM.

If the compute node is deployed in active/standby mode, the value of host_id is the logical ID of the compute node.

Possible Causes

  • A database is backed up for future restoration. However, after the backup is created, one or more VMs are created. After the database is restored, records of these VMs are deleted from the database, but these VMs reside on their hosts and become orphan VMs.
  • A VM is created using FusionCompute.
  • A VM is deleted when the fc-nova-compute component is in the fault status.

Impact on the System

  • VMs orphaned by database restoration are invisible to tenants.
  • System resources are leaked.

Procedure

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a host with the info-collect-server role deployed using the OM IP address in the cascaded FusionSphere OpenStack system.

    For details, see section Logging In to a Host with a Role Deployed.

    NOTE:
    • Run the following command to obtain the ID of the host where the info-collect-server role is deployed:

      cps template-instance-list --service collect info-collect-server

    • If the host you have logged in to is the one for which the OM IP address is to be obtained, go to 4.

  4. Run the following script to check whether the VM is an orphan VM:

    python /usr/local/bin/info-collect-server/server/audit_script.py cascaded_vm_wild_confirm fc-nova-computeXXX uuid

    NOTE:
    • The value of fc-nova-computeXXX is that of host_id obtained from the audit report.
    • The value of uuid is that in the audit report.
    • If the command output displays "This VM is wild.", the VM is an orphan VM. If the command output displays "This VM is not wild.", the VM is not an orphan VM. If the command output displays "ERROR", contact technical support for assistance.
    • If yes, go to 5.
    • If no, the VM is not an orphan VM. No further action is required.

  5. Log in to the FusionCompute web client, query the VM information using UUID based on section Querying Information About a VM in FusionCompute Using UUID and check whether the VM is stopped.

    • If yes, go to 7.
    • If no, go to 6.

  6. On the VM details page, click Operation, and choose Stop.
  7. Click Hardware and select Disks.

    The VM disk list page is displayed.

  8. In the VM disk list, click More and select Detach to detach all disks from the VM.
  9. Confirm with the tenant whether to delete the invalid VM.

    • If yes, go to 10.
    • If no, contact technical support for assistance.

  10. On the VM details page, click Operation and choose Delete.

Invalid VMs

Context

An invalid VM is the one that is recorded as normal in the system database but is not present in FusionCompute.

For an invalid VM, confirm with the tenant whether the VM is useful. If the VM is not useful, delete the VM records from the database.

Parameter Description

The name of the audit report is invalid_vm.csv. Table 18-29 describes parameters in the report.

Table 18-29 Parameter description

Parameter

Description

uuid

Specifies the VM UUID.

tenant_id

Specifies the tenant ID.

hyper_vm_name

Specifies the VM name on the host, for example, instance_xxx.

updated_at

Specifies the last time when the VM status was updated.

status

Specifies the current VM status.

task_status

Specifies the current VM task status.

host id

Specifies the ID of the host accommodating the VM.

If the compute node is deployed in active/standby mode, the value of host_id is the logical ID of the compute node.

Impact on the System

Users can query the VM using the Nova APIs, but the VM does not exist on the host.

Possible Causes

  • A database is backed up for future restoration. However, after the creation, one or more VMs are deleted. When the database is restored using the backup, records of these VMs are present to the restored database, but these VMs, indeed, have been deleted.
  • The VM creation fails due to FusionCompute exceptions or network faults during the VM creation or rebuilding process, resulting in residual VM data in the database.

Procedure

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host .

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a host with the info-collect-server role deployed using the OM IP address in the cascaded FusionSphere OpenStack system.

    For details, see section Logging In to a Host with a Role Deployed.
    NOTE:
    • Run the following command to obtain the ID of the host where the info-collect-server role is deployed:

      cps template-instance-list --service collect info-collect-server

    • If the host you have logged in to is the one for which the OM IP address is to be obtained, go to 4.

  4. Run the following script to check whether the VM is an invalid VM:

    python /usr/local/bin/info-collect-server/server/audit_script.py cascaded_vm_fake_confirm fc-nova-computeXXX uuid

    NOTE:
    • The value of fc-nova-computeXXX is that of host_id obtained from the audit report.
    • The value of uuid is that in the audit report.
    • If the command output displays "This VM is fake.", the VM is an invalid VM. If the command output displays "This VM is not fake.", the VM is not an invalid VM. If the command output displays "ERROR", contact technical support for assistance.
    • If yes, go to 5.
    • If no, the VM is not an invalid VM and the fault is falsely reported due to time differences. In this case, no further action is required.

  5. Run the following command to check whether the VM is recreated for the last time:

    nova instance-action-list uuid

    NOTE:

    The value of uuid is that in the audit report.

    Check whether the value of Action in the last row of the command output is rebuild.

    • If yes, go to 6.
    • If no, go to 7.

  6. Confirm with the tenant whether to rebuild the VM.

    • If yes, rebuild the VM by following operations provided in section Rebuilding a VM. In this case, no further action is required.
    • If no, go to 7.

  7. Confirm with the tenant whether to delete the invalid VM.

    • If yes, go to 8.
    • If no, contact technical support for assistance.

  8. Run the following script to delete the VM:

    python /usr/local/bin/info-collect-server/server/audit_script.py cascaded_vm_fake_clean fc-nova-computeXXX uuid

    NOTE:
    • The value of fc-nova-computeXXX is that of host_id obtained from the audit report.
    • The value of uuid is that in the audit report.
    • If the command output displays "SUCCESS: Clean vm information succeed.", the VM is successfully deleted.

    Check whether the VM is successfully deleted.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

VM Location Inconsistency

Context

The host and hypervisor accommodating a VM recorded in the database are inconsistent with the actual host and hypervisor.

If the fault is confirmed, correct the actual VM location information (host ID) in the database.

Parameter Description

The name of the audit report is host_changed_vm.csv. Table 18-30 describes parameters in the report.

Table 18-30 Parameter description

Parameter

Description

uuid

Specifies the VM UUID.

tenant_id

Specifies the tenant ID.

hyper_vm_name

Specifies the VM name registered in the hypervisor.

updated_at

Specifies the last time when the VM status was updated.

status

Specifies the VM status.

task_status

Specifies the VM task status.

host_id

Specifies the ID of the host accommodating a VM recorded in the database.

If the compute node is deployed in active/standby mode, the value of host_id is the logical ID of the compute node.

hyper_host_id

Specifies the ID of the actual host accommodating the VM.

If the compute node is deployed in active/standby mode, the value of hyper_host_id is the logical ID of the compute node.

hypervisor_hostname

Specifies the hypervisor name of the VM recorded in the database.

hyper_hypervisor_hostname

Specifies the name of the hypervisor in which the VM is running.

Possible Causes

  • A database is backed up for future restoration. However, after the creation, the VM specifications are adjusted or one or more VMs are cold migrated. After the database is restored, location records of these VMs in the database are inconsistent with the actual VM locations.
  • Users manually migrate a VM to another cluster in FusionCompute, resulting in inconsistent VM location

Impact on the System

The VM becomes unavailable if the VM location recorded in the database is inconsistent with the actual host accommodating the VM.

Procedure

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a host with the info-collect-server role deployed using the OM IP address in the cascaded FusionSphere OpenStack system.

    For details, see section Logging In to a Host with a Role Deployed.

    NOTE:
    • Run the following command to obtain the ID of the host where the info-collect-server role is deployed:

      cps template-instance-list --service collect info-collect-server

    • If the host you have logged in to is the one for which the OM IP address is to be obtained, go to 4.

  4. Run the following script to check whether the VM is in the clusters connected to fc-nova-computeXXX:

    python /usr/local/bin/info-collect-server/server/audit_script.py cascaded_vm_host_changed_confirm fc-nova-computeXXX uuid

    NOTE:
    • The value of uuid is that in the audit report.
    • The value of fc-nova-computeXXX is that of hyper_host_id obtained from the audit report.
    • If the command output displays "The VM is in cluster of fc-nova-computeXXX.", the VM is in a cluster connected to fc-nova-computeXXX.
    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Log in to the host accommodating the active GaussDB or gaussdb_nova service based on section Logging In to the Active GaussDB Node and run the following command to modify the information about the host accommodating the VM recorded in the database:

    sh /usr/bin/info-collect-script/audit_resume/host_changed_handle_without_hyper_name.sh uuid hyper_host_id

    NOTE:
    • The password of the gaussdba account is required during the command execution process. The default password of user gaussdba is FusionSphere123.
    • The value of uuid is that in the audit report.
    • The value of hyper_host_id is that in the audit report.

    Check whether the command is successfully executed based on the command output.

    • If yes, go to 6.
    • If no, contact technical support for assistance.

  6. Run the following command to modify the name of the hypervisor accommodating the VM recorded in the database:

    sh /usr/bin/info-collect-script/audit_resume/host_changed_handle_hypervisor_name.sh uuid hyper_hypervisor_hostname

    NOTE:
    • The password of the gaussdba account is required during the command execution process. The default password of user gaussdba is FusionSphere123.
    • The value of uuid is that in the audit report.
    • The value of hyper_ hypervisor_hostname is that in the audit report.

    Check whether the command is successfully executed based on the command output.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Stuck VMs

Context

A stuck VM is the one that is kept in a transition status for more than 24 hours and cannot automatically restore due to a system exception (for example, FusionCompute exception) during a VM service process (for example, starting a VM).

Manually restore the VM based on the VM status and the task status.

Parameter Description

The name of the audit report is stucking_vm.csv. Table 18-31 describes parameters in the report.

Table 18-31 Parameter description

Parameter

Description

uuid

Specifies the VM UUID.

tenant_id

Specifies the tenant ID.

hyper_vm_name

Specifies the VM name registered in the hypervisor.

updated_at

Specifies the last time when the VM status was updated.

status

Specifies the VM status.

task_status

Specifies the VM task status.

host id

Specifies the ID of the host accommodating the VM.

If the compute node is deployed in active/standby mode, the value of host_id is the logical ID of the compute node.

Possible Causes

A system exception occurs when a VM service operation is in process.

Impact on the System

The VM becomes unavailable and occupies system resources.

Procedure

Restore the VM based on the VM statuses and task statuses listed in Table 18-32. For other situations, contact technical support for assistance.

Table 18-32 VM restoration methods

VM Status

Task Status

Possible Scenario

Restoration Method

building

scheduling

Creating a VM

See Method 1.

building

None

Creating a VM

See Method 1.

building

block_device_mapping

Creating a VM

See Method 1.

building

networking

Creating a VM

See Method 1.

N/A

resize_prep

Modifying VM attributes

See Method 2.

N/A

rebooting

Restarting a VM

See Method 2.

N/A

rebooting_hard

Restarting a VM

See Method 2.

N/A

pausing

Pausing a VM

See Method 2.

N/A

unpausing

Unpausing a VM

See Method 2.

N/A

suspending

Suspending a VM

See Method 2.

N/A

resuming

Resuming a VM

See Method 2.

N/A

powering_off

Stopping a VM

See Method 2.

N/A

powering_on

Starting a VM

See Method 2.

N/A

migrating

Live migrating a VM

See Method 2.

N/A

deleting

Deleting a VM

See Method 2.

N/A

resize_prep

Modifying VM attributes

See Method 3.

Method 1

  1. Confirm with the tenant whether to delete the stuck VM.

    • If yes, go to 2.
    • If no, contact technical support for assistance.

  2. Set the VM status in the FusionSphere OpenStack system to error.

    For details, see section Setting the VM Status.

  3. Run the following command to delete the VM:

    nova delete uuid

    NOTE:

    The value of uuid is that in the audit report.

Method 2

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a host with the info-collect-server role deployed using the OM IP address in the cascaded FusionSphere OpenStack system.

    For details, see section Logging In to a Host with a Role Deployed.

    NOTE:
    • Run the following command to obtain the ID of the host where the info-collect-server role is deployed:

      cps template-instance-list --service collect info-collect-server

    • If the host you have logged in to is the one for which the OM IP address is to be obtained, go to 4.

  4. Run the following script to set the VM status in the FusionSphere OpenStack system to ensure that the VM status is consistent with that in the FusionCompute system:

    python /usr/local/bin/info-collect-server/server/audit_script.py cascaded_vm_status_reset fc-nova-computeXXX uuid

    NOTE:
    • The value of fc-nova-computeXXX is that of host_id obtained from the audit report.
    • The value of uuid is that in the audit report.
    • If the command output displays "SUCCESS: This vm's status is successfully reset.", the VM status is successfully set.

    Check whether the VM status is successfully set.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 3

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a host with the info-collect-server role deployed using the OM IP address in the cascaded FusionSphere OpenStack system.

    For details, see section Logging In to a Host with a Role Deployed.

    NOTE:
    • Run the following command to obtain the ID of the host where the info-collect-server role is deployed:

      cps template-instance-list --service collect info-collect-server

    • If the host you have logged in to is the one for which the OM IP address is to be obtained, go to 4.

  4. Run the following script to check whether VM specifications obtained from the FusionSphere OpenStack system and those obtained from the FusionCompute system are consistent:

    python /usr/local/bin/info-collect-server/server/audit_script.py cascaded_vm_flavor_confirm fc-nova-computeXXX uuid

    NOTE:
    • The value of fc-nova-computeXXX is that of host_id obtained from the audit report.
    • The value of uuid is that in the audit report.
    • If the command output displays "The flavor of VM is same between OpenStack and FusionCompute.", the VM specifications in the FusionSphere OpenStack system are consistent with those in the FusionCompute system.
    • If yes, go to 5.
    • If no, log in to the FusionCompute web client, click Hardware, query the information about the VM CPU, memory, and CPU QoS using the VM UUID by following operations provided in section Querying Information About a VM in FusionCompute Using UUID, and modify the VM specifications on the FusionCompute web client based on the command output to ensure that the specifications are consistent with those in the FuisonSphere OpenStack system. Then go to 5.

  5. Update the VM status in the FusionSphere OpenStack system based on Method 2.

VM Status Inconsistency

Context

The VM status recorded in the database is inconsistent with the actual VM status in FusionCompute.

Parameter Description

The name of the audit report for the VM status inconsistency is diff_state_vm.csv. Table 18-33 describes parameters in the report.

Table 18-33 Parameter description

Parameter

Description

uuid

Specifies the VM UUID.

tenant_id

Specifies the tenant ID.

hyper_vm_name

Specifies the VM name registered in the hypervisor.

updated_at

Specifies the last time when the VM status was updated.

status

Specifies the VM status.

task_status

Specifies the VM task status.

power_status

Specifies the VM power supply status.

host_id

Specifies the ID of the host accommodating a VM recorded in the database.

If the compute node is deployed in active/standby mode, the value of host_id is the logical host ID of the compute node.

hyper_host_id

Specifies the ID of the host accommodating a VM.

If the compute node is deployed in active/standby mode, the value of hyper_host_id is the logical ID of the compute node.

hyper_status

Specifies the VM status in FusionCompute.

Possible Causes

  • A database is backed up for future restoration. However, after the creation, the VM is started or stopped. When the database is restored using the backup, status record of the VM in the database may be inconsistent with the actual VM status.
  • If an exception occurs in the FusionCompute system or management network, the services in the FusionSphere OpenStack system is interrupted or fails, resulting in inconsistent VM status.
  • Other unknown errors result in inconsistent VM status.

Impact on the System

  • System data is inconsistent.
  • Tenants' operation rights on the VM are restricted.

Procedure

Handle the fault based on the processing methods applied to different VM statuses and scenarios listed in Table 18-34. For For other situations, contact technical support for assistance.

Table 18-34 Processing methods

OpenStack VM Status

FusionCompute VM Status

Possible Scenario

Processing Method

error

running

  • FusionCompute error during VM creation
  • Restarting or deleting VMs when FusionCompute system fails

See Method 1.

error

stopped

  • FusionCompute error during VM creation
  • FusionCompute error during VM adjustment

See Method 1.

error

hibernated

Backing up the management data for further restoration

See Method 2.

error

paused

Backing up the management data for further restoration

See Method 2.

active

hibernated

FusionCompute error during the VM suspending (hibernating) process

See Method 2.

active

paused

Backing up the management data for further restoration

See Method 2.

suspended

running

  • FusionCompute error when a suspended VM is restored
  • Restoring a suspended VM on the FusionCompute web client

See Method 2.

suspended

stopped

Stopping a suspended VM on the FusionCompute web client

See Method 2.

suspended

paused

Backing up the management data for further restoration

See Method 2.

paused

running

  • FusionCompute error when a paused VM is restored
  • Restoring a paused VM on the FusionCompute web client

See Method 2.

paused

stopped

Stopping a paused VM on the FusionCompute web client

See Method 2.

paused

hibernated

Stopping, starting, and hibernating a paused VM on the FusionCompute web client

See Method 2.

shutoff

running

Starting a paused VM on the FusionCompute web client

See Method 3.

N/A

unknown

FusionCompute system error

Contact technical support for assistance.

NOTE:

If the system automatically restores VM service operations or statuses of FusionSphere OpenStack and FusionCompute systems during VM auditing, the audit report may be incorrect due to time difference. Therefore, check whether the VM status recorded in the database is consistent with the actual VM status. If the statuses are consistent, no further action is required.

Method 1

  1. Check whether the VM has been properly loaded or started based on section Querying Whether a VM Was Properly Started.

    NOTE:

    If OS-SRV-USG:launched_at is left blank, the VM becomes abnormal during the creation process.

    • If yes, go to 2.
    • If no, ask the tenant to delete the VM. If the tenant fails to delete the VM, contact technical support for assistance.

  2. Confirm with the tenant whether the VM is still in use.

    • If yes, go to 3.
    • If no, ask the tenant to delete the VM. If the tenant fails to delete the VM, contact technical support for assistance.

  3. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  4. Import environment variables. For details, see Importing Environment Variables.
  5. Log in to a host with the info-collect-server role deployed using the OM IP address in the cascaded FusionSphere OpenStack system.

    For details, see section Logging In to a Host with a Role Deployed.

    NOTE:
    • Run the following command to obtain the ID of the host where the info-collect-server role is deployed:

      cps template-instance-list --service collect info-collect-server

    • If the host you have logged in to is the one for which the OM IP address is to be obtained, go to 6.

  6. Run the following script to check whether VM specifications obtained from the FusionSphere OpenStack system and those obtained from the FusionCompute system are consistent:

    python /usr/local/bin/info-collect-server/server/audit_script.py cascaded_vm_flavor_confirm fc-nova-computeXXX uuid

    NOTE:
    • The value of fc-nova-computeXXX is that of hyper_host_id obtained from the audit report.
    • The value of uuid is that in the audit report.
    • If the command output displays "The flavor of VM is same between OpenStack and FusionCompute.", the VM specifications in the FusionSphere OpenStack system are consistent with those in the FusionCompute system.
    • If yes, go to 7.
    • If no, log in to the FusionCompute web client, click Hardware, query the information about the VM CPU, memory, and CPU QoS using the VM UUID by following operations provided in section Querying Information About a VM in FusionCompute Using UUID, and modify the VM specifications on the FusionCompute web client based on the command output to ensure that the specifications are consistent with those in the FuisonSphere OpenStack system. Then go to 7.

  7. Update the VM status in the FusionSphere OpenStack system based on Method 2.

Method 2

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a host with the info-collect-server role deployed using the OM IP address in the cascaded FusionSphere OpenStack system.

    For details, see section Logging In to a Host with a Role Deployed.

    NOTE:
    • Run the following command to obtain the ID of the host where the info-collect-server role is deployed:

      cps template-instance-list --service collect info-collect-server

    • If the host you have logged in to is the one for which the OM IP address is to be obtained, go to the next step.

  4. Run the following script to set the VM status in the FusionSphere OpenStack system to ensure that the VM status is consistent with that in the FusionCompute system:

    python /usr/local/bin/info-collect-server/server/audit_script.py cascaded_vm_status_reset fc-nova-computeXXX uuid

    NOTE:
    • The value of fc-nova-computeXXX is that of hyper_host_id obtained from the audit report.
    • The value of uuid is that in the audit report.
    • If the command output displays "SUCCESS: This vm's status is successfully reset.", the VM status is successfully set.

    Check whether the VM status is successfully set.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 3

Contact the tenant and log in to the VM to stop it.

VM Attribute Inconsistency

Context

The VM attributes recorded in the database are inconsistent with those recorded in FusionCompute.

The consistency auditing of the following attributes is supported by this version:

  • VM boot device
  • VM NIC
  • VM ID

Parameter Description

The name of the audit report for the VM attribute inconsistency is diff_property_vm.csv.Table 18-35 describes parameters in the report.

Table 18-35 Parameter description

Parameter

Description

uuid

Specifies the VM UUID.

tenant_id

Specifies the tenant ID.

hyper_vm_name

Specifies the VM name registered in the hypervisor.

updated_at

Specifies the last time when the VM status was updated.

status

Specifies the VM status.

task_status

Specifies the VM task status.

host_id

Specifies the ID of the host accommodating a VM recorded in the database.

If the compute node is deployed in active/standby mode, the value of host_id is the logical ID of the compute node.

hyper_host_id

Specifies the ID of the host accommodating a VM.

If the compute node is deployed in active/standby mode, the value of hyper_host_id is the logical ID of the compute node.

property_name

Specifies the VM attributes, including the following:

  • bootDev: Boot devices of a VM, which include:
    • hd: indicates that the VM boots from hard disks.
    • network: indicates that the VM boots from the network.
  • nic: VM NICs (MAC addresses separated by slashes)
  • internal_id: the VM ID generated by FusionCompute when the VM is created

property

Specifies the VM attribute values, indicating different types of attributes.

hyper_property

Specifies the VM attributes in FusionCompute.

Possible Causes

  • A database is backed up for future restoration. However, after the creation, the VM boot device is changed or NICs are added or deleted. When the database is restored using the backup, attribute records of the VM in the database are rebuilt and therefore inconsistent with the actual VM attributes.
  • A FusionCompute data backup is created for future restoration. However, after the creation, the VM is rebuilt. When the database is restored using the backup, the VM ID in the database is inconsistent with the actual VM ID.
  • The VM boot device or NIC data recorded in the database is inconsistent with the actual one due to the system fault (FusionCompute exceptions) in the service process.

Impact on the System

  • System data is inconsistent.
  • The VM becomes unavailable.

Procedure

Rectify the fault according to different attribute names recorded in the audit report.

  • If the attribute name is the boot device (bootDev) of a VM, rectify the fault by following operations provided in VM Boot Device Processing.
  • If the attribute name is VM NICs (nic), rectify the fault by following operations provided in VM NIC Processing.
  • If the attribute name is the VM ID (internal_id), rectify the fault by following operations provided in VM ID Processing.

VM Boot Device Processing

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a host with the info-collect-server role deployed using the OM IP address in the cascaded FusionSphere OpenStack system.

    For details, see section Logging In to a Host with a Role Deployed.

    NOTE:
    • Run the following command to obtain the ID of the host where the info-collect-server role is deployed:

      cps template-instance-list --service collect info-collect-server

    • If the host you have logged in to is the one for which the OM IP address is to be obtained, go to 4.

  4. Run the following script to set the VM boot mode in FusionCompute to ensure that the VM boot mode is consistent with that in the FusionSphere OpenStack system:

    python /usr/local/bin/info-collect-server/server/audit_script.py cascaded_vm_boot_dev_reset fc-nova-computeXXX uuid

    NOTE:
    • The value of fc-nova-computeXXX is that of hyper_host_id obtained from the audit report.
    • The value of uuid is that in the audit report.
    • If the command output displays "SUCCESS: Reset vm boot dev success.", the VM boot mode is successfully set.

    Check whether the VM boot mode is successfully set.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

VM NIC Processing

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a host with the info-collect-server role deployed using the OM IP address in the cascaded FusionSphere OpenStack system.

    For details, see section Logging In to a Host with a Role Deployed.

    NOTE:
    • Run the following command to obtain the ID of the host where the info-collect-server role is deployed:

      cps template-instance-list --service collect info-collect-server

    • If the host you have logged in to is the one for which the OM IP address is to be obtained, go to 4.

  4. Run the following script to check whether the VM NIC information is consistent in the FusionCompute and FusionSphere OpenStack systems:

    python /usr/local/bin/info-collect-server/server/audit_script.py cascaded_vm_interface_confirm fc-nova-computeXXX uuid

    NOTE:
    • The value of fc-nova-computeXXX is that of host_id obtained from the audit report.
    • The value of uuid is that in the audit report.
    • If multiple inconsistency types of NICs exist, handle the fault based on the inconsistency type.
    • If the command output displays the following contents, the VM NICs exist on both the FusionCompute and FusionSphere OpenStack systems. No further action is required.
      Nics of vm is same in Openstack and FusionCompute
    • If the command output displays the following contents, the VM NIC only exists in the FusionCompute system. 28:6e:d4:88:c6:29 indicates the MAC address of the NIC on FusionCompute. Go to 7.
      Nics of vm only exist in FusionCompute is ["28:6e:d4:88:c6:29"]
    • If the command output displays the following contents, the VM NIC only exists in the FusionSphere OpenStack system. fa:16:3e:f1:cf:62 indicates the MAC address of the NIC in the FusionSphere OpenStack system. Go to 5.
      Nics of vm only exist in Openstack is ["fa:16:3e:f1:cf:62"]

  5. Run the following command to query the VM port information and take a note of the port ID that the MAC address maps in the property attribute:

    neutron port-list --device_id uuid

    NOTE:

    The value of uuid is that in the audit report.

  6. Run the following command to delete residual NIC data:

    neutron port-delete port_id

    NOTE:

    port_id: indicates the port ID corresponding to the MAC address in the property attribute obtained in 3. If there are multiple port IDs corresponding to the MAC address, repeat 4 to delete residual port IDs.

  7. Confirm with the tenant whether the NIC that resides in the FusionCompute system is still in use in a VM.

    • If yes, no further action is required.
    • If no, go to 8.

  8. Log in to the FusionCompute web client to query the VM information using the VM UUID based on section Querying Information About a VM in FusionCompute Using UUID and switch to the VM details page.

    NOTE:

    The value of uuid is that in the audit report.

  9. On the VM details page, choose Hardware > NIC and locate the NIC to be deleted.

    NOTE:

    You can determine the MAC address of the NIC based on hyper_property in the audit report and delete the NIC that the MAC address maps.

  10. Click More and select Delete from the drop-down list.

VM ID Processing

  1. Check whether the VM ID is left blank in the property attribute.

    • If yes, go to 2.
    • If no, go to 5.

  2. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  3. Import environment variables. For details, see Importing Environment Variables.
  4. Run the following command to check whether the VM is deleted:

    nova instance-action-list uuid

    NOTE:

    The value of uuid is that in the audit report.

    Check whether the value of Action in the last row of the command output is delete.

    • If yes, go to 9.
    • If no, contact technical support for assistance.

  5. Log in to the alarm page of the FusionCompute web client and check whether alarm Uncontrolled VMs Detected is displayed.

    For details, see section Querying Information About a VM in FusionCompute Using UUID.
    • If yes, go to 6.
    • If no, go to 8.

  6. Query the alarm object URN to obtain the ID of the uncontrolled VM.

    NOTE:

    For example, the alarm URN is urn:sites:3B5E0684:vms:i-00000014. i-00000014 is the ID of the uncontrolled VM.

  7. Check whether the ID recorded in the property is consistent with the uncontrolled VM ID obtained in 6.

    • If yes, clear the alarm according to FusionCompute alarm information and then go to 8.
    • If no, go to 8.

  8. Rebuild a VM.

    For details, see section Rebuilding a VM.

  9. Run the following command to delete the VM:

    nova delete uuid

    NOTE:

    The value of uuid is that in the audit report.

  10. Check whether the VM is successfully deleted.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Orphan Volumes

Context

An orphan volume is the one that is present to a VRM node but is not recorded in the Cinder database, or the one whose status is error in the Cinder database but is not bound to a VM on a VRM node.

If the management data is lost due to backup-based system restoration, contact technical support to restore the volume data as required. Then delete the volume whose status is error in the Cinder database.

Parameter Description

The name of the audit report is wildVolumeAudit.csv. Table 18-36 describes parameters in the report.

Table 18-36 Parameter description

Parameter

Description

volume_name

Specifies the unique volume identifier (volume ID) to which volume- is prefixed on the VRM node.

volume_type

Specifies the volume type, such as san, lvm, dsware, or vrm.

Impact on the System

An orphan volume is unavailable in the Cinder service but occupies the storage space.

Possible Causes

  • A database is backed up for future restoration. However, after the backup is created, one or more volumes are created. When the database is restored using the backup, record of the volume creation information is deleted from the database, but the information resides on the storage devices or the VRM node.
  • Volumes on the VRM node are not created using the Cinder service.
  • The data status is faulty in the Cinder database but normal on the VRM node.
NOTE:

When you design system deployment for a site, do not create volumes through the VRM interface. Otherwise, false audit reports may be generated.

Procedure

  1. Open the audit report and query the volume information.
  2. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  3. Import environment variables. For details, see Importing Environment Variables.
  4. Run the following command to check whether the volume exists in the Cinder service:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the volume_name value excluding the prefix volume- in the audit report. If the command output displays "ERROR: No volume with a name or ID of 'XXX(volume uuid)' exists.", the volume does not exist.

    • If yes, go to 5.
    • If no, go to 9.

  5. Check whether the volume status is error.

    NOTE:

    The volume status can be obtained from the description of the status attribute.

    • If yes, go to 6.
    • If no, contact technical support for assistance.

  6. Confirm with the tenant to determine whether to delete the volume.

    • If yes, go to 7.
    • If no, contact technical support for assistance.

  7. Run the following command to delete the volume:

    cinder force-delete Volume uuid

    NOTE:

    The volume UUID is the volume_name value excluding the prefix volume- in the audit report.

  8. Run the following command to check whether the volume exists:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the volume_name value excluding the prefix volume- in the audit report.

    • If yes, contact technical support for assistance.
    • If no, no further action is required.

  9. Confirm with tenant to determine whether to delete the orphan volume.

    • If yes, go to 10.
    • If no, contact technical support for assistance.

  10. Run the following command to obtain the OM IP address of the blockstorage-driver-vrmXXX-assigned host:

    cps host-list

    NOTE:

    To obtain the OM IP address, locate the host whose roles value contains blockstorage-driver-vrmXXX in the command output and take a note of the OM IP address.

  11. Log in to the blockstorage-driver-vrmXXX-assigned host based on section Using SSH to Log In to a Host and run the following command to query the volume details:

    python /usr/bin/info-collect-script/audit_resume/get_vrm_volume.py -qi uuid -cf `echo "blockstorage-driver-vrmXXX" | awk -F '-' '{print "cinder-volume-"$3}'`

    NOTE:
    • The value of uuid is the volume_name value excluding the prefix volume- in the audit report.
    • The roles information about blockstorage-driver-vrmXXX can be obtained from 10.

  12. Obtain the data store and volume UUID information based on the command output in 11.

    NOTE:
    • The data store is the value of the volume_datastore_name field returned in the command output.
    • The volume UUID is the value of the volume_uuid field returned in the command output.

  13. Query the volume details in the FusionCompute system using the volume UUID based on section Rebuilding a VM and locate the volume based on the volume data store information obtained in 12.
  14. Check whether the value of Attach VM is Bound.

  15. Click More, select Delete from the More drop-down list, and check whether the volume is deleted.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Invalid Volumes

Context

An invalid volume is the one that is recorded in the Cinder database but is not present to a VRM node. It can also be the one whose status is error in the Cinder database but is attached to a VM.

Delete the invalid volume from the Cinder database.

NOTE:

Process the fault in the orphan VM auditing before deleting an invalid volume.

Parameter Description

The name of the audit report is fakeVolumeAudit.csv. Table 18-37 describes parameters in the report.

Table 18-37 Parameter description

Parameter

Description

volume_id

Specifies the volume ID.

volume_displayname

Specifies the name of the volume created by a tenant.

volume_name

Specifies the unique volume identifier to which volume- is prefixed on the VRM node.

volume_type

Specifies the volume type, such as san, lvm, dsware, or vrm.

location

Specifies the volume location.

Impact on the System

The volume can be queried using the Cinder command but cannot be used.

Possible Causes

A database is backed up for future restoration. However, after the backup is created, one or more volumes are deleted. When the database is restored using the backup, records of these volumes reside in the database and become invalid volumes.

Volumes fail to create using images, resulting in residual volumes in the database.

Procedure

  1. Open the audit report and query the volume information.
  2. Log in to the host with blockstorage-driver-vrmXXX deployed based on section Logging In to the blockstorage-driver-vrmXXX-Assigned Host to Which a Volume Is Attached and run the following command to query details about the volume:

    python /usr/bin/info-collect-script/audit_resume/get_vrm_volume.py -qi uuid -cf `echo "blockstorage-driver-vrmXXX" | awk -F '-' '{print "cinder-volume-"$3}'`

    NOTE:

    Check whether the command output displays "can not find this volume".

    NOTE:

    The value of uuid is that of volume_id obtained from the audit report.

  3. Confirm with tenant to determine whether to delete the invalid volume.

    • If yes, go to 4.
    • If no, contact technical support for assistance.

  4. Run the following command to check whether the volume has any snapshots:

    cinder snapshot-list –-all-t –-volume-id Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report. If the command output is left blank, the volume does not have snapshots.

    • If yes, go to 5.
    • If no, go to 6.

  5. Run the following command to delete all snapshots displayed in the command output in 4:

    cinder snapshot-delete Volume snapshot uuid

    NOTE:

    The volume snapshot UUID is the value of ID in the command output in 4.

  6. Log in to the active GaussDB node based on section Logging In to the Active GaussDB Node and run the following command to delete the volume:

    python /usr/bin/info-collect-script/audit_resume/delete_specify_volume.py Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

Orphan Volume Snapshots

Context

An orphan volume snapshot is the one that exists on the VRM node but is not recorded in the Cinder database. It can also be the one that exists in the Cinder database and on the VRM node but its unavailable period exceeds 24 hours.

Delete the orphan volume snapshot from the storage device.

Parameter Description

The name of the audit report is wildSnapshotAudit.csv. Table 18-38 describes parameters in the report.

Table 18-38 Parameter description

Parameter

Description

snap_name

Specifies the volume snapshot UUID on the storage device.

snap_type

Specifies the snapshot type, such as san, lvm, dsware, or vrm.

Impact on the System

An orphan volume snapshot occupies the storage space.

Possible Causes

A database is backed up for future restoration. However, after the backup is created, one or more volume snapshots are created. When the database is restored using the backup, records of these snapshots are deleted from the database, but these snapshots reside on their storage devices and become orphan volume snapshots. Alternatively, system errors occur during the service process.

Procedure

  1. Open the audit report and view the snapshot information.
  2. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  3. Import environment variables. For details, see Importing Environment Variables.
  4. Run the following command to check whether the snapshot exists in the Cinder service:

    cinder snapshot-show Snapshot uuid

    NOTE:

    The snapshot UUID is the value of snap_name obtained from the audit report. If the command output displays "ERROR: No snapshot with a name or ID of 'XXX(volume uuid)' exists.", the volume does not exist.

    • If yes, go to 9.
    • If no, go to 5.

  5. Run the following command to obtain the OM IP address of the blockstorage-driver-vrmXXX-assigned host:

    cps host-list

    NOTE:

    To obtain the OM IP address, locate the host whose roles value contains blockstorage-driver-vrmXXX in the command output and take a note of the OM IP address.

  6. Log in to the blockstorage-driver-vrmXXX-assigned host by following operations provided in section Using SSH to Log In to a Host and run the following command to query the snapshot information:

    python /usr/bin/info-collect-script/audit_resume/get_vrm_snapshot.py -qi Snapshot uuid –cf `echo "blockstorage-driver-vrmXXX" | awk -F '-' '{print "cinder-volume-"$3}'`

    NOTE:
    • The snapshot UUID is the value of snap_name obtained from the audit report.
    • The roles information about blockstorage-driver-vrmXXX can be obtained from 5.

    Check whether the command output displays "no this snapshot".

    • If yes, contact technical support for assistance.
    • If no, go to 7.

  7. Confirm with the tenant whether to delete the snapshot.

    • If yes, go to 8.
    • If no, contact technical support for assistance.

  8. Log in to the blockstorage-driver-vrmXXX-assigned host by following operations provided in section Using SSH to Log In to a Host and run the following command to delete the snapshot:

    python /usr/bin/info-collect-script/audit_resume/delete_vrm_snapshot.py -di Snapshot uuid –cf `echo "blockstorage-driver-vrmXXX" | awk -F '-' '{print "cinder-volume-"$3}'`

    NOTE:
    • The snapshot UUID is the value of snap_name obtained from the audit report.
    • The roles information about blockstorage-driver-vrmXXX can be obtained from 5.

  9. Confirm with the tenant whether to delete the snapshot.

    • If yes, go to 10.
    • If no, contact technical support for assistance.

  10. Check whether the snapshot status is error.

    • If yes, go to 13.
    • If no, go to 11.

  11. Log in to the node whose roles value is controller and run the following command to set the snapshot status to error:

    cinder snapshot-reset-state --state error Snapshot uuid

    NOTE:

    The snapshot UUID is the value of snap_name obtained from the audit report.

  12. Run the following command to query the snapshot status:

    cinder snapshot-show Snapshot uuid

    NOTE:

    The snapshot UUID is the value of snap_name obtained from the audit report.

    Check whether the snapshot status is error.

    • If yes, go to 13.
    • If no, contact technical support for assistance.

  13. Log in to the node whose roles value is controller and run the following command to delete the snapshot:

    cinder snapshot-delete Snapshot uuid

    NOTE:

    The snapshot UUID is the value of snap_name obtained from the audit report.

Invalid Volume Snapshots

Context

An invalid volume snapshot is the one that is recorded in the Cinder database but does not exist on the VRM node.

Delete the invalid volume snapshot from the Cinder database.

Parameter Description

The name of the audit report is fakeSnapshotAudit.csv. Table 18-39 describes parameters in the report.

Table 18-39 Parameter description

Parameter

Description

snap_id

Specifies the snapshot ID.

snap_name

Specifies the snapshot UUID to which snapshot- is prefixed.

volume_id

Specifies the base volume ID.

snap_type

Specifies the snapshot type, such as san, lvm, dsware, or vrm.

location

Specifies the snapshot location.

Impact on the System

The invalid snapshot can be queried using the Cinder command but unavailable for the system.

Possible Causes

A database is backed up for future restoration. However, after the backup is created, one or more volume snapshots are deleted. When the database and storage devices are restored using the backup, records of these volume snapshots reside in the database and become invalid volume snapshots. Alternatively, system errors occur during the service process.

Procedure

  1. Open the audit report and view the snapshot information.
  2. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  3. Import environment variables. For details, see Importing Environment Variables.
  4. Run the following command to check whether the snapshot exists in the Cinder service:

    cinder snapshot-show Snapshot uuid
    NOTE:

    The snapshot UUID is the value of snap_id obtained from the audit report. If the command output displays "ERROR: No snapshot with a name or ID of 'XXX(snapshot uuid)' exists.", the snapshot does not exist.

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Run the following command to obtain the UUID of the volume to which the snapshot belongs:

    cinder snapshot-show Snapshot uuid | grep volume_id | awk -F '|' '{ print $3}'

    NOTE:

    The snapshot UUID is the value of snap_id obtained from the audit report.

  6. Log in to the blockstorage-driver-vrmXXX-assigned host to which the volume is attached by following operations provided in section Logging In to the blockstorage-driver-vrmXXX-Assigned Host to Which a Volume Is Attached and run the following command to check whether the snapshot exists in FusionCompute:

    python /usr/bin/info-collect-script/audit_resume/get_vrm_snapshot.py -qi Snapshot uuid –cf `echo "blockstorage-driver-vrmXXX" | awk -F '-' '{print "cinder-volume-"$3}'`

    NOTE:

    Check whether the command output displays "no this snapshot".

    • If yes, go to 7.
    • If no, contact technical support for assistance.

  7. Log in to the node whose roles value is controller and run the following command to query the snapshot status:

    cinder snapshot-show Snapshot uuid

    NOTE:

    The snapshot UUID is the value of snap_id obtained from the audit report.

    Check whether the snapshot status is available or error.

    • If yes, go to 10.
    • If no, go to 8.

  8. Run the following command to set the snapshot status to error:

    cinder snapshot-reset-state --state error Snapshot uuid

    NOTE:

    The snapshot UUID is the value of snap_id obtained from the audit report.

  9. Run the following command to query the snapshot status:

    cinder snapshot-show Snapshot uuid

    NOTE:

    The snapshot UUID is the value of snap_id obtained from the audit report.

    Check whether the snapshot status is error.

    • If yes, go to 10.
    • If no, contact technical support for assistance.

  10. Run the following command to delete the snapshot:

    cinder snapshot-delete Snapshot uuid

    NOTE:

    The snapshot UUID is the value of snap_id obtained from the audit report.

Stuck Volumes

Context

A stuck volume is the one kept in a transition status (including creating, downloading, deleting, extending, error_extending, error_deleting, error_attaching, error_detaching, attaching, detaching, uploading, retyping, error, restoring, backing-up, and restoring-backup) and is unavailable for use. Volumes only in the available or in-use status can be used. If a volume is kept stuck in a transition status for more than 24 hours, restore the volume based on site conditions.

NOTE:

Audit the orphan VM before deleting a stuck volume.

Description

The name of the audit report is VolumeStatusAudit.csv. Table 18-40 describes parameters in the report.

Table 18-40 Parameter description

Parameter

Description

volume_id

Specifies the volume ID.

volume_displayname

Specifies the name of the volume created by a tenant.

volume_name

Specifies the unique volume identifier to which volume- is prefixed on the VRM node.

volume_type

Specifies the volume type, such as san, lvm, dsware, or vrm.

location

Specifies the volume location.

status

Specifies the status of the volume.

last_update_time

Specifies the last time when the volume was updated.

Possible Causes

  • A system exception occurs when a volume service operation is in process.
  • A database is backed up for future restoration. However, after the backup is created, the statuses of one or more volumes are changed. When the database is restored using the backup, records of these volume statuses are restored to their former statuses in the database.

Impact on the System

The stuck volume becomes unavailable and occupies system resources.

Procedure

Restore the volume based on the volume statuses listed in Table 18-41. For other situations, contact technical support for assistance.

Table 18-41 Volume restoration methods

Volume Status

Description

Possible Scenario

Restoration Method

creating

The volume is being created.

A system exception occurs during the volume creation process.

See Method 1.

error_restoring

The volume fails to restore.

The volume data fails to restore.

See Method 2.

backing-up

The volume data is being backed up.

An exception occurs in the system when the volume data is being backed up.

See Method 2.

restoring-backup

The volume data is being restored.

An exception occurs in the system when the volume data is being restored.

See Method 2.

downloading

The image for creating the volume is being downloaded.

A system exception occurs when the volume is being created from an image.

See Method 2.

deleting

The volume is being deleted.

A system exception occurs during the volume deletion process.

Forcibly delete the volume. For details, see Method 3.

error_deleting

Deletion failed.

A system exception occurs when the volume is being deleted, resulting in the deletion failure.

Forcibly delete the volume. For details, see Method 3.

error_attaching

Attachment failed.

A system exception occurs when the volume fails to attach to a VM.

Set the volume status to available or in-use. For details, see Method 4.

error_detaching

Detachment failed.

A system exception occurs when the volume fails to detach from a VM.

Set the volume status to available or in-use. For details, see Method 4.

attaching

The volume is being attached to a VM.

A system exception occurs during the volume attachment process.

Set the volume status to available or in-use. For details, see Method 4.

detaching

The volume is being detached from a VM.

A system exception occurs during the volume detachment process.

Set the volume status to available or in-use. For details, see Method 4.

uploading

The image is being uploaded.

A system exception occurs when an image is being created using the volume.

See Method 5.

retyping

The volume is being migrated.

A system exception occurs during the storage migration process.

See Method 6.

extending

The volume is being expanded.

A system exception occurs during the volume expansion process.

See Method 7.

error_extending

Expansion failed.

Volume expansion failed.

See Method 7.

Method 1

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query information about the volume:

    cinder show Volume uuid
    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    Check whether the value of status in the command output is consistent with the volume status in the audit report.

    • If yes, go to 4.
    • If no, no further action is required.

  4. Query the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Log in to the host with blockstorage-driver-vrmXXX deployed based on section Logging In to the blockstorage-driver-vrmXXX-Assigned Host to Which a Volume Is Attached and run the following command to query details about the volume:

    python /usr/bin/info-collect-script/audit_resume/get_vrm_volume.py -qi uuid -cf `echo "blockstorage-driver-vrmXXX" | awk -F '-' '{print "cinder-volume-"$3}'`

    NOTE:

    Check whether the command output displays "can not find this volume".

    • If yes, go to 7.
    • If no, go to 6.

  6. Reset the volume status using the value of uuid based on section

    Resetting the Volume Status.
    NOTE:

    The value of uuid is that of volume_id obtained from the audit report.

  7. Run the following command to delete the volume:

    cinder force-delete Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

Method 2

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query the volume status:

    cinder show Volume uuid
    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    Check whether the value of status in the command output is consistent with the volume status in the audit report.

    • If yes, go to 4.
    • If no, no further action is required.

  4. Query the value of last_update_time in the audit report and check whether the

    time difference between the value and the current time exceeds 24 hours.
    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Reset the volume status using the value of uuid based on section

    Resetting the Volume Status.
    NOTE:

    The value of uuid is that of volume_id obtained from the audit report.

Method 3

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query information about the volume:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    Check whether the value of status in the command output is consistent with the volume status in the audit report.

    • If yes, go to 4.
    • If no, no further action is required.

  4. Query the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Run the following command to delete the volume:

    cinder force-delete Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

Method 4

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query the volume status:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    Check whether the value of status in the command output is consistent with the volume status in the audit report.

    • If yes, go to 4.
    • If no, no further action is required.

  4. Query the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Run the following command to query the volume status:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    In the command output, check whether the value of attachments is left blank.

    • If yes, go to 6.
    • If no, go to 8.

  6. Run the following command to set the volume status to available:

    cinder reset-state --state available Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

  7. Run the following command to query the volume status:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    In the command output, check whether the value of status is available.

    • If yes, go to 11.
    • If no, contact technical support for assistance.

  8. Run the following command to set the volume status to in-use:

    cinder reset-state --state in-use Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

  9. Run the following command to query the volume status:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    In the command output, check whether the value of status is in-use.

    • If yes, go to 10.
    • If no, contact technical support for assistance.

  10. Query the details of the volume in FusionCompute using UUID and check whether the VM is bound. For details, see section Rebuilding a VM.

    NOTE:

    The value of uuid is that of volume_id obtained from the audit report.

    NOTE:

    The value of uuid is that of volume_id obtained from the audit report.

  11. Query the details of the volume in FusionCompute using uuid and check whether the VM is unbound. For details, see section Rebuilding a VM.

    NOTE:

    The value of uuid is that of volume_id obtained from the audit report.

Method 5

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query information about the volume:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    Check whether the value of status in the command output is consistent with the volume status in the audit report.

    • If yes, go to 4.
    • If no, contact technical support for assistance.

  4. Query the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Query the details of the volume in FusionCompute using uuid and check whether the volume exists in the disk list of the storage pool. For details, see section Rebuilding a VM.

    NOTE:
    • The value of uuid is that of volume_id obtained from the audit report.
    • Volume attaching status: obtained from the VM binding attribute.
    • If yes, take a note of the volume attaching status (Bound or Not bound) and go to 6.
    • If no, contact technical support for assistance.

  6. Determine whether to attach the volume to a VM.

    • If yes, go to 7.
    • If no, go to 8.

  7. Check whether a VM is being exported in Task Center based on the information of the VM to which the volume is to be attached.

    • If yes, no further action is required.
    • If no, go to 8.

  8. Switch to the controller host you have logged in to and run the following command to query the volume status:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    In the command output, check whether the value of attachments is left blank.

    • If yes, go to 9.
    • If no, go to 12.
    NOTE:

    The value of uuid is that of volume_id obtained from the audit report.

  9. Run the following command to set the volume status to available:

    cinder reset-state --state available Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

  10. Run the following command to query the volume status:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    In the command output, check whether the value of status is available.

    • If yes, go to 11.
    • If no, contact technical support for assistance.

  11. Check whether the attaching status of the volume obtained in 5 is Not Bound.

    NOTE:

    The value of uuid is that of volume_id obtained from the audit report.

  12. Run the following command to set the volume status to in-use:

    cinder reset-state --state in-use Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

  13. Check whether the attaching status of the volume obtained in 5 is Bound.

    NOTE:

    The value of uuid is that of volume_id obtained from the audit report.

Method 6

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query information about the volume:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    Check whether the value of status in the command output is consistent with the volume status in the audit report.

    • If yes, go to 4.
    • If no, contact technical support for assistance.

  4. Query the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Confirm with the tenant whether the volume status is to be changed.

    • If yes, go to 6.
    • If no, no further action is required.

  6. Query the details of the volume in FusionCompute using uuid and check whether the volume exists in the disk list of the storage pool.

    For details, see section Rebuilding a VM.

    NOTE:
    • The value of uuid is that of volume_id obtained from the audit report.
    • Volume attaching status: obtained from the VM binding attribute.
    • If yes, take a note of the volume attaching status (attached or not attached) and go to 7.
    • If no, contact technical support for assistance.

  7. Run the following command to query the volume status:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    In the command output, check whether the value of attachments is left blank.

    • If yes, go to 8.
    • If no, go to 11.

  8. Run the following command to set the volume status to available:

    cinder reset-state --state available Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

  9. Run the following command to query the volume status:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    In the command output, check whether the value of status is available.

    • If yes, go to 10.
    • If no, contact technical support for assistance.

  10. Check whether the attaching status of the volume obtained in 6 is Not Bound.

    NOTE:

    The value of uuid is that of volume_id obtained from the audit report.

  11. Run the following command to set the volume status to in-use:

    cinder reset-state --state in-use Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

  12. Check whether the attaching status of the volume obtained in 6 is Bound.

    NOTE:

    The value of uuid is that of volume_id obtained from the audit report.

Method 7

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query the volume status:

    cinder show Volume uuid

    NOTE:

    The volume UUID is the value of volume_id obtained from the audit report.

    Check whether the value of status in the command output is consistent with the volume status in the audit report.

    • If yes, go to 4.
    • If no, no further action is required.

  4. Check whether the volume is in either of the following statues and perform the required operations:

    extending

    In this case, check whether the time difference between the value of last_update_time and the current time exceeds 24 hours. error_extending

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Query the details of the volume in FusionCompute using uuid based on section Rebuilding a VM and locate the volume.

    NOTE:

    The value of uuid is that of volume_id obtained from the audit report.

    Check whether the volume status is available.

    • If yes, go to 6.
    • If no, contact technical support for assistance.

  6. Log in to the host with the blockstorage-driver-vrmXXX role assigned based on section Logging In to the blockstorage-driver-vrmXXX-Assigned Host to Which a Volume Is Attached and run the following script to restore the volume that fails to expand:

    python /usr/bin/info-collect-script/audit_resume/resume_extend_volume.py –id uuid -cf `echo "blockstorage-driver-vrmXXX" | awk -F '-' '{print "cinder-volume-"$3}'`

    NOTE:

    After the command is executed, check whether error information is not displayed.

    • If yes, go to 7.
    • If no, contact technical support for assistance.

  7. Reset the volume status using the value of uuid based on section Resetting the Volume Status.

    NOTE:

    The value of uuid is that of volume_id obtained from the audit report.

Handling Inconsistent Volume Attachment Information

Context

Volume attachment information includes the following:

  • Volume attachment status recorded in Cinder management data
  • Volume attachment status recorded in Nova management data
  • Information about volumes recorded in VRM

The system audits the consistency between the preceding volume attachment information.

Parameter description

The name of the audit report is VolumeAttachmentAudit.csv. Table 18-42 describes parameters in the report.

Table 18-42 Parameter description

Parameter

Description

volume_id

Specifies the volume ID.

volume_displayname

Specifies the name of the volume created by a user.

volume_type

Specifies the volume type, such as san, lvm, dsware, or vrm.

location

Specifies details about the volume.

attach_status

Specifies the volume attachment status.

Impacts on the System

  • Residual volume attachment information may reside on hosts.
  • Volume-related services may be affected.

Possible Causes

  • The database is reverted using a data backup to the state when the backup was created. However, after the backup was created, one or more volumes were attached to VMs. After the database is restored, records of the volume attachment information are deleted from the database, but the information resides on the storage devices.
  • If a service operation fails and is rolled back, volume-related information rollback fails.

Procedure

  1. Open the audit report and view volume attachment information.

    In the audit report:

    • location: specifies detailed volume attachment information recorded in the Cinder service, Nova service, and storage devices.
      • ATTACH_TO: specifies information recorded in Cinder management data about the VM to which the volume is attached.

        For example:

        'ATTACH_TO': [{'instance_id': u' e3fd74ba-389e-4b51-afe0-531e25978264'}]

        instance_id specifies the VM UUID.

      • BELONG_TO: specifies information about the host to which the volume belongs.
      • HYPER_USE: If the volume type is vrm, this field is left blank.
      • MAP_TO::[{'location': ' vrm'}] specifies that the volume is on the VRM.
      • NOVA_USE: specifies information recorded in Nova management data about the VM to which the volume is attached.

        For example:

        'NOVA_USE': [{''instance_name': u'instance-00000002', 'instance_id': u'e3fd74ba-389e-4b51-afe0-531e25978264'}]

    • attach_status: specifies the volume attachment status. Its values can be:
      • management_status: Comparison result between attachment information in the Cinder service and the Nova service match indicates that the information is consistent, and not_match indicates that information is inconsistent.
      • cinder_status: Result of comparison between attachment information in the Cinder service and the storage device when the VRM storage type is used. match indicates that the attachment status recorded in the Cinder service is consistent with that recorded in VRM, and not_match indicates that information is inconsistent.

  2. Handle the volume based on the volume states listed in Table 18-43. For other situations, contact technical support for assistance.

    Table 18-43 Volume attachment information handling methods

    management_status

    cinder_status

    Scenario

    Solution

    not_match

    not_match

    The volume is recorded as attached in the Cinder service but is not recorded as attached in the Nova service or on the VRM node.

    See Method 1.

    not_match

    match

    The volume is recorded as attached in the Nova service but is not recorded as attached in the Cinder service or on the VRM node.

    See Method 2.

    match

    not_match

    The volume is recorded as attached in the Cinder service and Nova service but is not recorded as attached on the VRM node. Alternatively, the volume is recorded as attached on the VRM node but is not recorded as attached in the Cinder service or the Nova service.

    See Method 3.

Method 1

  1. In the audit report, check whether VM information is available in ATTACH_TO of location.

    • If yes, go to 2.
    • If no, contact technical support for assistance.

  2. Log in to any controller node in the AZ in the cascaded FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  3. Import environment variables. For details, see Importing Environment Variables.
  4. Run the following command to query VM volume attachment information:

    nova show vm-uuid
    NOTE:

    vm-uuid: specifies the instance_id value of ATTACH_TO in location in the audit report.

    In the command output, os-extended-volumes:volumes_attached is VM volume attachment information.

    If ERROR (CommandError): No server with a name or ID of 'vm-uuid' exists. is displayed, the VM does not exist.

    Check whether the VM exists.

    • If yes, go to 5.
    • If no, go to 7.

  5. Check whether VM volume attachment information obtained in 4 contains the volume UUID of the audit report.

    NOTE:

    The volume UUID is the volume_id value in the audit report.

    • If yes, the audit is incorrect caused by the time difference. No further action is required.
    • If no, go to 6.

  6. In the audit report, check whether VM information is available in NOVA_USE of location.

    • If yes, contact technical support for assistance.
    • If no, go to 7.

  7. See Rebuilding a VM to use UUID to query volume details in FusionCompute and check whether volume information is attached to the VM in the disk list in the storage pool.

    NOTE:

    The volume UUID is the volume_id value in the audit report.

    The volume attachment status is data (attached or unattached) displayed in the attached VM column in the volume list.

    • If yes, contact technical support for assistance.
    • If no, go to 8.

  8. Perform the following operations:

    • Run the following command to change the volume attribute to the available status:
      cinder reset-state --state available Volume uuid
      NOTE:

      The volume UUID is the volume_id value in the audit report.

    • Run the following command on the controller node to clear the attachment status of the volume:
      python /usr/bin/info-collect-script/audit_resume/clear_attachment_info.py Volume uuid
      NOTE:

      The volume UUID is the volume_id value in the audit report.

      Perform 4 to check whether the VM exists.
      • If yes, go to 9.
      • If no, no further action is required.

  9. Run the following command to attach the volume to the VM:

    nova volume-attach vm-uuid Volumeuuid auto
    NOTE:

    The volume UUID is the volume_id value in the audit report.

    vm-uuid: specifies the instance_id value of ATTACH_TO in location in the audit report.

Method 2

  1. Log in to any controller node in the AZ in the cascaded FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query VM volume attachment information:

    nova show vm-uuid
    NOTE:

    vm-uuid: specifies the instance_id value of NOVA_USE in location in the audit report.

    In the command output, os-extended-volumes:volumes_attached is VM volume attachment information.

  4. Check whether VM volume attachment information obtained in 3 contains the volume UUID of the audit report.

    NOTE:

    The volume UUID is the volume_id value in the audit report.

    • If yes, go to the next step.
    • If no, the audit is incorrect caused by the time difference. No further action is required.

  5. Run the following command to query the VM status:

    nova show VM uuid | grep OS-SRV-USG:launched_at
    NOTE:

    VM uuid: specifies the instance_id value of NOVA_USE in location in the audit report.

    If OS-SRV-USG:launched_at is empty, the VM is faulty.

    Check whether time information is displayed in the command output.

    • If yes, go to 6.
    • If no, the VM is invalid. Check whether the VM is cleared. For details, see Invalid VMs.

  6. Log in to the host housing the active GaussDB and nova database node according to Logging In to the Active GaussDB Node. Then run the following script to clear residual attachment information of the volume from the VM:

    sh /usr/bin/info-collect-script/audit_resume/delete_bdm.sh VM uuid Volume uuid
    NOTE:

    VM uuid: specifies the instance_id value of NOVA_USE in location in the audit report.

    The volume UUID is the volume_id value in the audit report.

Method 3

  1. In the audit report, check whether VM information is available in ATTACH_TO of location.

    • If yes, go to 2.
    • If no, go to 5.

  2. See Rebuilding a VM to use UUID to query volume details in FusionCompute and check whether volume information is attached to the VM.

    NOTE:

    The volume UUID is the volume_id value in the audit report.

    • If yes, contact technical support for assistance.
    • If no, go to 3.

  3. See Querying Information About a VM in FusionCompute Using UUID to use UUID to query the VM details in FusionCompute and the corresponding VM.

    NOTE:

    The UUID is the instance_id value of ATTACH_TO in location in the audit report.

  4. Attach the volume to the VM as shown in the following figure.

    NOTE:

    Enter the volume_id value in the audit report for Name.

  5. See Rebuilding a VM to use UUID to query volume details in FusionCompute and check whether volume information is attached to the VM.

    NOTE:

    The volume UUID is the volume_id value in the audit report.

    • If yes, go to 6.
    • If no, contact technical support for assistance.

  6. In the VM list, click Name to enter the VM details page.
  7. Check whether the VM status is Stopped.

    • If yes, record the VM UUID and perform 10.
    • If no, record the VM UUID and perform 8.
    NOTE:
    • You can obtain the VM status from the status attribute on the VM details page.
    • You can obtain the VM UUID from the UUID attribute on the VM details page.

  8. Confirm with the tenant whether the VM can be paused.

    • If yes, go to 9.
    • If no, no further action is required.

  9. Stop the VM.
  10. On the VM details page, click Hardware, locate the target volume, click More, and select Detach to detach the disk from the VM.
  11. Log in to any controller node in the AZ in the cascaded FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  12. Import environment variables. For details, see Importing Environment Variables.
  13. Run the following command to attach the volume to the VM:

    nova volume-attach vm-uuid Volume uuid auto
    NOTE:

    The volume UUID is the volume_id value in the audit report.

    vm-uuid: specifies the VM UUID recorded in 7.

Nova novncproxy Zombie Process

Context

The Nova novncproxy service may generate zombie processes due to the websockify module or the Python version. However, the probability for this issue is found to be very low. To improve system stability, the system also audits and automatically clears these zombie processes.

Parameter Description

The audit configuration item is max_zombie_process_num, which is stored in the /etc/info-collect.conf file on the novncproxy-deployed node. The configuration item specifies the threshold for automatically clearing zombie processes. The default value is 10. The value is explained as follows:

  • The system automatically clears the zombie processes on a compute node only when the number of zombie processes on the node exceeds the threshold.
  • If the threshold is set to -1, the system does not clear zombie processes.

The name of the audit report is zombie_process_hosts.csv.Table 18-44 describes parameters in the report.

Table 18-44 Parameter description

Parameter

Description

host

Specifies the compute node name.

zombieprocess

Specifies the number of zombie processes detected on the node.

is restart

Specifies whether any automatic zombie process deletion is conducted. The default value is True.

Impact on the System

  • Excessive zombie processes may deteriorate the system performance.
  • After a zombie process is deleted, the nova-novncproxy service restarts, which interrupts in-use novnc services.

Possible Causes

  • The websockify module used by the nova-novncproxy service is defective.
  • Python 2.6 is defective.

Procedure

No operation is required. The system automatically clears excessive zombie processes based on the specified threshold.

NOTE:

Before the system automatically clears a zombie process, this zombie process is attached to process 1. Therefore, this zombie process clearing does not immediately take effect.

Residual Cold Migration Data

Context

FusionSphere OpenStack stores VM cold migration information in the database and will automatically delete it after the migration confirmation or rollback. However, if an exception occurs, residual information is not deleted from the database.

Parameters

The name of the audit report is cold_cleaned.csv. Table 18-45 describes parameters in the report.

Table 18-45 Parameters in the audit report

Parameter

Description

instance_uuid

Specifies the UUID of the VM that is cold migrated.

Impact on the System

  • This issue incurs a higher quota usage than the actual usage.
  • This issue adversely affects the code implementation and resource usages of subsequent VM cold migrations.

Possible Causes

  • The nova-compute service is restarted during the migration.
  • The VM status is reset after the migration.

Procedure

No operations are required.

Intermediate Status of the Cold Migration

Context

FusionSphere OpenStack stores VM cold migration information in the database. If the source node is restarted during the migration confirmation, the cold migration may be stuck in the intermediate status.

Parameters

The name of the audit report is cold_stuck.csv. Table 18-46 describes parameters in the report.

Table 18-46 Parameters in the audit report

Parameter

Description

instance_uuid

Specifies the UUID of the VM that is cold migrated.

migration_id

Specifies the ID of the cold migration record.

migration_updated

Specifies the time when the migration is confirmed.

instance_updated

Specifies the time when the VM information is updated.

Impact on the System

Maintenance operations cannot be performed on the VM.

Possible Causes

  • The fc-nova-compute service on the source node is restarted during the cold migration.
  • Network exceptions cause packet loss.

Procedure

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to log in to the active GaussDB node based on section Logging In to the Active GaussDB Node to clear the intermediate status of the VM:

    python /usr/bin/info-collect-script/audit_resume/clean_stuck_migration.py instance_uuid migration_id

    NOTE:
    • The value of instance_uuid can be obtained from the audit report.
    • The value of migration_id can be obtained from the audit report.

  4. Check whether the VM is running properly.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Abnormal Hosts That Adversely Affect Cold Migrated VMs

Context

If the source host becomes faulty during a VM cold migration, the cold migration will be adversely affected. Perform an audit to detect the cold migrated VMs that are adversely affected by faulty hosts in the system.

Parameters

The name of the audit report is host_invalid_migration.csv. Table 18-47 describes parameters in the report.

Table 18-47 Parameters in the audit report

Parameter

Description

id

Specifies the ID of the cold migration record.

instance_uuid

Specifies the UUID of the VM that is cold migrated.

source_compute

Specifies the source host in the cold migration.

source_host_state

Specifies the status of the source host.

Impact on the System

Maintenance operations cannot be performed on the VM.

Possible Causes

  • The source host is powered off.
  • The fc-nova-compute role on the source host is deleted.
  • The fc-nova-compute service on the source host runs improperly.

Before handling the audit result, ensure that no service exception alarm has been generated in the system. If any host becomes faulty, replace the host by performing operations provided in section Replacing Hosts and Accessories from HUAWEI CLOUD Stack 6.5.0 Parts Replacement. To delete a host, perform operations provided in section Deleting a Host from an AZ from HUAWEI CLOUD Stack 6.5.0 O&M Guide.

Procedure

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see sectionUsing SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Perform the following operations on the source host:

    1. Run the following command to check whether the source host exists:

      cps host-list

      Check whether the command output contains a host whose ID is the same as the source_computevalue in the audit report.

      • If yes, perform 4.
      • If no, go to next step.
    2. Locate the host whose status value is fault in the command output in the previous operation and check whether its services cannot be restored.
      • If yes, go to 4.
      • If no, restore the services and perform the audit again.
    3. Locate the host whose ID is the same as the source_computevalue in the audit report and check whether the host has the fc-nova-compute role assigned.
      • If no, go to 4.
      • If yes, run the following command to verify that the fc-nova-compute service is running properly:

        cps template-instance-list --service nova fc-nova-computeXXX

        If the fc-nova-compute service is running properly, that is, the value of status is active or standby, but no operation can be performed on the VM, contact technical support for assistance.

  4. Run the following command to clear the residual cold migration record and reset the VM status:

    python /usr/bin/info-collect-script/audit_resume/clean_stuck_migration.py instance_uuid id

    NOTE:
    • The value of instance_uuid can be obtained from the audit report.
    • The value of id can be obtained from the audit report.

Detecting and Deleting Residual Live Migration Data

Context

FusionSphere OpenStack stores VM live migration information in the database and will automatically delete it after the migration or rollback is complete. However, if an exception occurs, residual information is not deleted from the database. The cascaded FusionCompute system does not support live migration. Therefore, this audit report is ignored in this scenario.

Parameter Description

The name of the audit report is live_cleaned.csv. Table 18-48 describes parameters in the report.

Table 18-48 Parameter description

Parameter

Description

instance_uuid

Specifies the UUID of the VM that is live migrated.

Impact on the System

This issue adversely affects resource usages of subsequent VM live migrations.

Possible Causes

The nova-compute service is restarted during the migration.

Procedure

No operation is required.

Stuck Images

Context

An image in the active status is available for use. If an image is stuck in the queued, saving, or deleted status, the image is unavailable. If an image is kept stuck in a transition status for a long time (24 hours by default), delete the image.

Parameter Description

The name of the audit report is stucking_images.csv. Table 18-49 describes parameters in the report.

Table 18-49 Parameter description

Parameter

Description

id

Specifies the image ID.

status

Specifies the image status.

updated_at

Specifies the last time when the image was updated.

owner

Specifies the ID of the tenant who created the image.

Impact on the System

  • An image in the queued status does not occupy system resources, but the image is unavailable.
  • An image in the saving status has residual image files that occupy the storage space.
  • An image in the deleted status has residual image files that occupy the storage space.

Possible Causes

  • The image creation process is not complete: The image was not uploaded to the image server within 24 hours after it was created. In this case, the image is kept in the queued status.
  • During the image creation process, an exception (for example, intermittent network disconnection) occurred when the image was being uploaded. In this case, the image is kept in the queued status.
  • When an image was being uploaded, the Glance service failed. In this case, the image is kept in the saving status.
  • During the image deletion process, an exception occurred on the DB service. In this case, the image is kept in the deleted status.

Procedure

To handle the fault at the Cascading OpenStack, see section Stuck Images. No operationCascaded OpenStackor the cascaded layer.

Residual Orphan Child Snapshots

Context

An orphan child snapshot indicates the one whose ECS snapshot object does not exist but the associated volume snapshot exists.

Parameter Description

The name of the audit report is wildInstanceSnapshotAudit.csv. Table 18-50 describes parameters in the report.

Table 18-50 Parameters in the audit report

Parameter

Description

snapshot_id

Specifies the ID of the orphan child snapshot.

instance_id

Specifies the ID of the VM corresponding to the orphan child snapshot.

Impacts on the System

Orphan child snapshots occupy tenant quotas and storage capacity, but they cannot be used, wasting resources.

Possible Causes

  • The system is powered off or other faults occur during the ECS snapshot execution or deletion.
  • Users manually delete the ECS snapshot object.

Procedure

  1. Log in to the first host in an AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to obtain the name of the ECS snapshot object:

    cinder snapshot-show snapshot_id | grep sys_snapshot_ecs | awk -F '_' '{print $4}'
    NOTE:

    The value of snapshot_id is that of snapshot_id obtained from the audit report.

    Check whether the name of the ECS snapshot object corresponding to the volume snapshot is obtained.

    • If yes, go to 4.
    • If no, contact technical support for assistance.

  4. Run the following command to check the ID of the ECS snapshot object corresponding to the volume snapshot from Glance:

    glance image-list | grep image_name | awk -F '|' '{print $2}'

    NOTE:

    image_name: indicates the ECS snapshot object name obtained in 3.

    Check whether the ID of the ECS snapshot object corresponding to the volume snapshot is obtained.
    • If yes, go to 5.
    • If no, go to 6.

  5. Run the following command to check whether any command output is displayed:

    glance image-show image_id | grep snapshot_id

    NOTE:

    image_id: indicates the ECS snapshot object ID obtained in 4.

    The value of snapshot_id is that of snapshot_id obtained from the audit report.

    • If yes, the snapshot is not an orphan child snapshot. In this case, no further action is required, and you need to contact technical support for assistance.
    • If no, the snapshot is an orphan child snapshot. In this case, go to 6.

  6. Run the following command to delete the volume snapshot:

    cinder snapshot-delete snapshot-id

    Check whether the command is successfully executed.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.
    NOTE:

    If "ERROR" is displayed, contact technical support for assistance.

Frontend Qos

Context

Limit the frontend Nova QoS rate, including the volume bandwidth and IOPS. If QoS audit result is confirmed, manually rectify the fault based on the actual situation.

Parameter Description

The name of the audit report is FrontEndQosAudit.csv. Table 18-51 describes parameters in the report.

Table 18-51 Parameters in the audit report

Parameter

Description

VOLUME_ID

Specifies the volume ID.

VOLUME_TOTAL_IOPS

Specifies the volume IOPS.

VOLUME_TOTAL_BYTES

Specifies the volume bandwidth.

INSTANCE_ID

Specifies the VM ID.

HYPERVISOR_TOTAL_IOPS

Specifies the VM IOPS.

HYPERVISOR_TOTAL_BYTES

Specifies the VM bandwidth.

RESULT_TYPE

Specifies the QoS audit result. The values can be FAKE_QOS, DIFF_QOS, or WILD_QOS.

Impacts on the System

The bandwidth and IOPS of the volume are inconsistent with the actual bandwidth and IOPS.

Possible Causes

  • During QoS association, an exception occurs in the system. As a result, QoS is not associated with the volume.
  • During QoS modification, an exception occurs in the system. As a result, QoS is not updated to the volume.
  • During QoS disassociation, an exception occurs in the system. As a result, QoS is not disassociated from the volume.

Procedure

Determine the volume QoS processing method based on Table 18-52. For other scenarios, contact technical support for assistance.

Table 18-52 QoS audit result processing methods

QoS Audit Result

Description

Processing Method

FAKE_QOS

Specifies fake QoS.

For details, see Method 1.

DIFF_QOS

Specifies the QoS which is different from the actual one.

For details, see Method 1.

Method 1

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables to the node. For details, see Importing Environment Variables.
  3. Run the following command to query volume information on the node:

    cinder show Volume ID

    In the command output, volume_type indicates the volume type name.

  4. Run the following command to obtain volume type information:

    cinder type-show Volume type name

    In the command output, id indicates the volume type ID.

  5. Run the following command to synchronize the QoS associated with volume_type to the volume and check the command output:

    cinder vendor-qos-specs-sync --vol_type_id Volume type ID --consumer front-end

    Check whether the command output contains OK.

    • If yes, synchronization is successful. No further action is required.
    • If no, contact technical support for assistance.

Volume Qos

Context

Limit the backend storage QoS rate, including the volume bandwidth and IOPS. If QoS audit result is confirmed, manually rectify the fault based on the actual situation.

Parameter Description

The name of the audit report is VolumeQosAudit.csv. Table 18-53 describes parameters in the report.

Table 18-53 Parameters in the audit report

Parameter

Description

VOLUME_ID

Specifies the volume ID.

VOLUME_QOS_MAXIOPS

Specifies the volume IOPS.

VOLUME_QOS_MAXBANDWIDTH

Specifies the volume bandwidth.

STORAGE_QOS_NAME

Specifies the QoS name in the storage device.

STORAGE_QOS_MAXIOPS

Specifies the volume IOPS in the storage device.

STORAGE_QOS_MAXBANDWIDTH

Specifies the volume bandwidth in the storage device.

DEVICE_SN

Specifies the storage device SN.

RESULT_TYPE

Specifies the QoS audit result. The values can be FAKE_QOS, DIFF_QOS, WILD_QOS, ERROR_QOS, or NOT_ENABLE_QOS.

Impacts on the System

The bandwidth and IOPS of the volume are inconsistent with the actual bandwidth and IOPS.

Possible Causes

  • During QoS association, an exception occurs in the system. As a result, QoS is not associated with the volume.
  • During QoS modification, an exception occurs in the system. As a result, QoS is not updated to the volume.
  • During QoS disassociation, an exception occurs in the system. As a result, QoS is not disassociated from the volume.

Procedure

Determine the volume QoS processing method based on Table 18-54. For other scenarios, contact technical support for assistance.

Table 18-54 QoS audit result processing methods

QoS Audit Result

Description

Processing Method

FAKE_QOS

Specifies fake QoS.

For details, see Method 1.

DIFF_QOS

Specifies the QoS which is different from the actual one.

For details, see Method 1.

Method 1

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables to the node. For details, see Importing Environment Variables.
  3. Run the following command to query volume information on the node:

    cinder show Volume ID

    In the command output, volume_type indicates the volume type name.

  4. Run the following command to obtain volume type information:

    cinder type-show Volume type name

    In the command output, id indicates the volume type ID.

  5. Run the following command to synchronize the QoS associated with volume_type to the volume and check the command output:

    cinder vendor-qos-specs-sync --vol_type_id Volume type ID --consumer backend-end

    Check whether the command output contains OK.

    • If yes, synchronization is successful. No further action is required.
    • If no, contact technical support for assistance.

Common Operations

  • In this chapter, import environment variables before performing the nova (in nova xxx format), cinder (in cinder xxx format), neutron (in neutron xxx format), glance (in glance xxx format), cps (in cps xxx format), or cpssafe command. For details, see section Importing Environment Variables.
  • You can run commands in OpenStack in either secure mode or insecure mode. For details, see section Command Execution Methods.

Setting the VM Status

NOTE:

This section describes how to use a script to restore a VM to a specified status. The new statuses are transmitted from the invoking location.

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Run the following command to log in to the active GaussDB or gaussdb_nova node based on section Logging In to the Active GaussDB Node to clear the VM status in the database:

    sh /usr/bin/info-collect-script/audit_resume/setvmstate.sh uuid status

    NOTE:
    • The password of the gaussdba account is required during the command execution process. The default password of user gaussdba is FusionSphere123.
    • The value of uuid is that in the audit report.
    • The value of status is the required new VM status, which is transmitted from the invoking location.

    Check whether the command is successfully executed based on the command output.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Logging In to the Active GaussDB Node

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system. For details, see section Using SSH to Log In to a Host.
  2. Run the following command to query the ID of the host accommodating the active GaussDB node:

    • If the GaussDB database is not deployed on an independent host, run the following command to obtain the ID of the host accommodating the active GaussDB node:

      cps template-instance-list --service gaussdb gaussdb | grep 'active' | awk -F '|' '{print $5}'

    • If the database used by Nova is deployed on an independent host, run the following command to obtain the ID of the host accommodating the active GaussDB node:

      cps template-instance-list --service gaussdb gaussdb_nova | grep 'active' | awk -F '|' '{print $5}'

    • If the database used by Cinder is deployed on an independent host, run the following command to obtain the ID of the host accommodating the active GaussDB node:

      cps template-instance-list --service gaussdb gaussdb_cinder | grep 'active' | awk -F '|' '{print $5}'

    NOTE:
    • If the database is deployed on an independent host, the name of the GaussDB database used by Nova is gaussdb_nova. The VM-related data is stored in the gaussdb_nova database.
    • If the database is deployed on an independent host, the name of the GaussDB database used by Cinder is gaussdb_cinder. The volume-related data is stored in the gaussdb_cinder database.

    The value of gaussdb_host_id on the database node can be obtained from the command output.

  3. Perform the following operation to query the OM IP address of the host accommodating the active GaussDB node:

    Run the cpssafe command to enter secure mode, enter the user password, and run the following command as prompted:

    cps host-list | grep gaussdb_host_id | awk -F '|' '{print $7}'

    The OM IP address can be obtained from the command output.

  4. Log in to the host accommodating the active GaussDB node using the OM IP address. For details, see section Using SSH to Log In to a Host.

Rebuilding a VM

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Run the following command to query the VM information:

    nova show uuid

    NOTE:
    • The value of uuid is that in the audit report.
    • The host name is the value of OS-EXT-SRV-ATTR:host in the command output.
    • In the command output, the value of id in os-extended-volumes:volumes_attached is the ID of the volume attached to the VM.

  3. Log in to any info-collect-server-assigned host using the OM IP address.

    For details, see sectionLogging In to a Host with a Role Deployed.

    NOTE:

    Run the following command to obtain the ID of the host where the info-collect-server role is deployed:

    cps template-instance-list --service collect info-collect-server

  4. Run the following command to check whether the image is available:

    python /usr/local/bin/info-collect-server/server/audit_script.py cascaded_image_confirm fc-nova-computeXXX uuid

    NOTE:
    • If "The image of VM is active. The image ID is 47bd4481-23ef-4e52-8eca-1d5b70126c82." is displayed, the image is available.
    • If "ERROR: The server may boot from volume, no image supplied." is displayed, the VM is started from volumes.
    • The value of fc-nova-computeXXX is the host name obtained in 2.
    • The value of uuid is the VM UUID obtained in 2.
    • The image ID is 47bd4481-23ef-4e52-8eca-1d5b70126c82.
    • If yes, go to 7.
    • If no, go to 5 if the VM is started from volumes. Otherwise, contact technical support for assistance.

  5. Run the following command to query the image ID of the VM system volume:

    cinder show uuid

    NOTE:
    • The value of uuid is the volume ID in the volume attachment information obtained in 2.
    • The drive letter of the attached volume is the value of device in attachments in the command output.
    • The volume image ID is the value of image_id in volume_image_metadata in the command output.
    • If multiple volume IDs are obtained in 2, repeatedly execute the command and obtain the image ID of the attached volume in /dev/vda, /dev/xvda, or /dev/sda.

  6. Run the following command to check whether the image status is ACTIVE:

    nova image-show image_id

    NOTE:
    • The value of image_id is the image ID.
    • The image status is the value of status in the command output.
    • If yes, go to 7.
    • If no, contact technical support for assistance.

  7. Run the following command to rebuild the VM using the image ID:

    nova rebuild uuid image_id

    NOTE:
    • The value of uuid is the VM UUID obtained in 2.
    • The value of image_id is the image ID.

  8. Check whether the VM is successfully rebuilt.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Querying Information About a VM in FusionCompute Using UUID

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Run the following command to obtain the host accommodating the VM and information about the cluster in which the VM locates:

    nova show uuid

    NOTE:
    • The name of the host accommodating the VM is the OS-EXT-SRV-ATTR:host value in the command output.
    • The information about the cluster in which the VM locates is the OS-EXT-SRV-ATTR:hypervisor_hostname value in the command output.

  3. Run the following command to query the IP address for accessing the FusionCompute web client:

    cps template-params-show --service nova fc-nova-computeXXX

    NOTE:
    • The value of fc-nova-computeXXX is the OS-EXT-SRV-ATTR:host value in the command output from 2.
    • The IP address for accessing the FusionCompute web client is the fusioncompute_fc_ip value in the command output.
    • The FusionCompute cluster in which the compute node locates is the fusioncompute_clusters value in the command output.

  4. Open a browser, enter the IP address for accessing FusionCompute in the address box, and log in to the FusionCompute web client.
  5. Click VM and Template.

  6. On the VM and Template tab, select VM in the left pane.

    The following page is displayed.

  7. Select UUID from the All types drop-down list, enter the VM UUID in the search box, and check whether the VM information can be obtained.

    • If yes, go to 8.
    • If no, the VM is not in the FusionCompute system. No further action is required.

  8. Click the name of the VM.

Querying Whether a VM Was Properly Started

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see sectionUsing SSH to Log In to a Host.

  2. Run the following command to query the VM information using the VM UUID:

    nova show uuid

    NOTE:
    • The time when the VM was created is the value of created in the command output.
    • The time when the VM was properly started is the value of OS-SRV-USG:launched_at in the command output.

    If the value of OS-SRV-USG:launched_at is left blank, the VM has never been started properly since it was created.

Querying Information About a Volume in FusionCompute

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Run the following command to query details about the volume:

    cinder show uuid

    NOTE:
    • The os-vol-host-attr:host value in the command output is in the format of Logical host name@Backend storage configuration name#Data store name.
    • According to the command output, the logical host name is logic_host_name.
    • The data store name is datastore_name.

  3. Run the following command to query the IP address for accessing the FusionCompute web client:

    cps template-params-show --service cinder `echo "logic_host_name" | awk -F '-' '{print $1"-volume-"$2}'`

    NOTE:
    • The value of logic_host_name is the name of the logical host to which the volume obtained from 2 is attached.
    • The IP address for accessing the FusionCompute web client is the value of fc_ip in other_storage_cfg in the command output.

  4. In the address box of a browser, enter the IP address obtained from 3 to log in to the FusionCompute web client.
  5. On the FusionCompute web client, click Storage Pool.

    The following page is displayed.

  6. Select datastore_name obtained in 2 in the storage pool list in the left area and click the Disks tab.

    The following page is displayed.

  7. Select UUID from the drop-down list, enter the volume UUID in the search box, and check whether the volume information can be obtained.

    • If yes, go to 8.
    • If no, the volume is not in the site. No further action is required.

  8. Click the down arrow on the left of the volume name.

    The volume details page is displayed.

Logging In to the blockstorage-driver-vrmXXX-Assigned Host to Which a Volume Is Attached

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Run the following command to check whether the volume exists in the Cinder service:

    cinder show Volume uuid

    NOTE:
    • The volume UUID can be obtained from the required handling process of the audit report in each section.
    • If the command output displays "ERROR: No volume with a name or ID of 'XXX(volume UUID)' exists.", the volume does not exist.
    • If yes, go to 3.
    • If no, contact technical support for assistance.

  3. Obtain the logical host name from the command output in 2.

    NOTE:

    The os-vol-host-attr:host value in the command output is in the format of Logical host name@Backend storage configuration name#Data store name. For example, if the value of os-vol-host-attr:host is cinder-vrm001@ipsan2#ipsan1 in the command output in 2, the logical host name is cinder-vrm001.

  4. Run the following command to query information about theblockstorage-driver-vrmXXX role and take a note of the information:

    echo Logical host name | awk -F '-' '{print "blockstorage-driver-"$2}'

    NOTE:
    • The logical host name is that obtained from 3.
    • The information about the blockstorage-driver-vrmXXX role can be obtained from the command output. For example, if the logical host name is cinder-vrm001, the name of the blockstorage-driver-vrmXXX role is blockstorage-driver-vrm001.

  5. Run the following command to obtain the OM IP address of the blockstorage-driver-vrmXXX-assigned host:

    cps host-list

    NOTE:

    To obtain the OM IP address, locate the host whose roles value contains blockstorage-driver-vrmXXX in the command output. Then take a note of the OM IP address.

  6. Log in to the blockstorage-driver-vrmXXX-assigned host.

    For details, see section Using SSH to Log In to a Host.

Resetting the Volume Status

  1. Log in to any controller host in an AZ in the cascaded FusionSphere OpenStack system.

    For details, see section Using SSH to Log In to a Host.

  2. Run the following command to set the volume status to available:

    cinder reset-state --state available Volume uuid

    NOTE:

    The volume UUID indicates the UUID of the volume for which the status is to be reset.

  3. Run the following command to query the volume status:

    cinder show Volume uuid

    NOTE:

    The volume UUID indicates the UUID of the volume for which the status is to be reset.

    In the command output, check whether the value of status is available.

    • If yes, go to 4.
    • If no, contact technical support for assistance.

  4. Query the details of the volume in FusionCompute using UUID based on section Querying Information About a Volume in FusionCompute and check whether the volume exists.

    • If yes, go to 7.
    • If no, go to 5.

  5. Confirm with the tenant to determine whether to delete the volume.

    • If yes, go to 6.
    • If no, contact technical support for assistance.

  6. Log in to any controller host in an AZ at the Cascaded OpenStack based on section Using SSH to Log In to a Host and run the following command to delete the volume:

    cinder force-delete Volume uuid

    NOTE:

    The volume UUID indicates the UUID of the volume for which the status is to be reset.

  7. Run the following command on any controller host in the AZ to check whether the value of attachments is left blank in the command output:

    cinder show Volume uuid

    NOTE:

    The volume UUID indicates the UUID of the volume for which the status is to be reset.

    • If yes, go to 8.
    • If no, go to 16.

  8. Query details of the volume in FusionCompute using UUID based on section Querying Information About a Volume in FusionComputeand check whether the volume status is available.

    • If yes, go to 9.
    • If no, contact technical support for assistance.

  9. Check whether the value of Attach VM is Bound.

    • If yes, go to 10.
    • If no, no further action is required.

  10. In the disk list, click Name of a disk to view details of the VM to which the disk is attached.
  11. Check whether the VM status is Stopped.

    • If yes, take a note of the VM UUID and go to 13.
    • If no, take a note of the VM UUID and go to 12.
    NOTE:
    • The VM status can be obtained from the status attribute on the basic information page.
    • The VM UUID can be obtained from the VM UUID attribute on the basic information page.

  12. Confirm with the tenant whether the VM can be paused.

    • If yes, after the tenant stops the VM, go to 13.
    • If no, no further action is required.

  13. On the VM details page, click Hardware, select Disks, locate the target volume, click More, and select Detach to detach the volume from the VM.
  14. Log in to the active GaussDB node and the Nova database node based on section Logging In to a Host with a Role Deployed.Then run the following script to delete the residual volume attachment information from the VM:

    sh /usr/bin/info-collect-script/audit_resume/delete_bdm.sh VM uuid volume uuid

    NOTE:
    • The VM UUID can be obtained from 11.
    • The volume UUID indicates the UUID of the volume for which the status is to be reset.

    Check whether the command output displays success.

    • If yes, go to 15.
    • If no, contact technical support for assistance.

  15. Run the following command to attach the volume to the VM:

    nova volume-attach vm_uuid volume uuid auto

    NOTE:
    • The VM UUID can be obtained from 11.
    • The volume UUID indicates the UUID of the volume for which the status is to be reset.

  16. Run the following command on any controller host in the AZ to set the volume status to in-use:

    cinder reset-state --state in-use Volume UUID

    NOTE:

    The volume UUID indicates the UUID of the volume for which the status is to be reset.

  17. Run the following command to query the volume status:

    cinder show Volume uuid

    NOTE:

    The volume UUID indicates the UUID of the volume for which the status is to be reset.

    In the command output, check whether the value of status is in-use.

    • If yes, go to 18.
    • If no, contact technical support for assistance.

  18. Query the details of the volume in FusionCompute using UUID and check whether Attach VM is Bound. For details, see section Querying Information About a Volume in FusionCompute.

    • If yes, no further action is required.
    • If no, go to 19.

  19. Query details of the VM in FusionCompute using UUID based on section Querying Information About a Volume in FusionCompute and locate the VM.

    NOTE:

    The UUID is the value of server_id in attachments obtained from 3 .

  20. On the VM details page, select Attach Disk from the Operation drop-down list, locate the volume using the volume UUID, and attach the volume to the VM.

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 45543

Downloads: 33

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next