No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Using KVM for Virtualization(Cascaded OpenStack)

Using KVM for Virtualization(Cascaded OpenStack)

Overview

When using the FusionSphere OpenStack cloud platform, problems such as resource residues and resource unavailable occur because of unexpected system failures (such as host reboot, process restart), or backup recovery, resulting in service failure. In this case, the resource pool consistency audit is performed to ensure data consistency in the resource pool, ensuring the normal operation of the services.

Scenarios

The system audit is required for the OpenStack-based FusionSphere system when data inconsistency occurs in the following scenarios:

  • When a service-related operation is performed, a system exception occurs, for example, when you create a VM, a host process restarts, causing the operation to fail. In this case, residual data may reside in the system or resources may become unavailable.
  • If any service-related operation is performed after a system database is backed up and before the database is restored, residual data may reside in the system or resources may become unavailable after the system database is being restored using the data backup.

The system audit is used to help administrators detect and handle data inconsistency. Therefore, conduct a system audit if any of the following conditions is met and log in to the first host in the OpenStack system to obtain the audit report, and locate and handle data inconsistency:

  • An alarm is generated indicating that data inconsistency verification fails.
  • The system database is restored using a data backup.
  • The routine system maintenance is performed.
NOTE:
  • Performing a system audit when the system is running stably is recommended. Do not use audit results when a large number of service-related operations are in progress.
  • During the audit process, if service-related operations (for example, creating a VM or expanding the system capacity) are performed or any system exception occurs, the audit result may be distorted. In this case, the system provides instructions to confirmation for the detected problems.

Audit Mechanism

The system audit consists of audit and post log analysis.

The following illustrates how the system audit works:

  • The system obtains service data from databases, hosts, and storage devices, compares the data, and generates an audit report.
  • The system also provides this audit guide and Command Line Interface (CLI) commands for users to locate and handle the data inconsistency problems listed in the audit report.

You can conduct the system audit in either of the following methods:

  • The system automatically starts auditing at 04:00 every day and reports an alarm and generates an audit report if it detects any data inconsistency. If an alarm has been generated and has not been cleared, the system does not generate the alarm again. If no data inconsistency is detected but an alarm has been generated for data inconsistency, the system automatically clears this alarm. You can log in to the web client and choose Configuration > System > System Audit to change the start time and period for the system audit.
  • Log in to the FusionSphere OpenStack system and run the infocollect audit command to start the audit.

Post log analysis is used after the system database is restored using a data backup. It analyzes historical logs and then generates an audit report sorting records of tenant's operations on resources (such as VMs and volumes) in a specified time period. Based on the report and this audit guide, the administrator can locate and handle problems listed in the audit reports.

Audit Process

If any audit alarm is generated, conduct an audit based on the process shown in Figure 18-3.

Figure 18-3 Audit process

Manual Audit

Scenarios

Conduct the manual audit when:

  • The system database is restored using a data backup.
  • Inconsistency problems are handled. The manual audit is used to verify that the problems are rectified.

Prerequisites

Services in the system are running properly.

Procedure

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.

    Please enter 1, enable Keystone V3 authentication with the built-in DC administrator.

  3. Perform the following operations to conduct the manual audit:

    1. Enter the security mode. For details, see Command Execution Methods.
    2. Run the following command to conduct the manual audit:

      infocollect audit --item ITEM --parameter PARAMETER --type TYPE

      Table 18-55 describes parameters in the command.

      If you do not specify the audit item, an audit alarm will be triggered when an audit problem is detected. However, if the audit item is specified, no audit alarm will be triggered when an audit problem is detected.

      Table 18-55 Parameter description

      Parameter

      Mandatory or Optional

      Description

      item

      Optional

      Specifies a specific audit item. If you do not specify the audit item, an audit alarm will be reported when an audit problem is detected. However, if the audit item is specified, no audit alarm will be reported when an audit problem is detected. Values:

      • 1001: indicates that a VM is audited. The following audit reports are generated after an audit is complete:
        • orphan_vm.csv: Audit report about orphan VMs
        • invalid_vm.csv: Audit report about invalid VMs
        • host_changed_vm.csv: Audit report about VM location inconsistency
        • stucking_vm.csv: Audit report about stuck VMs
        • diff_property_vm.csv: Not available in NFV scenarios
        • diff_state_vm.csv: Not available in NFV scenarios
        • host_invalid_migration.csv: Audit report about abnormal hosts that adversely affect cold migrated VMs
      • 1002: indicates that an image is audited. The following audit report is generated after an audit is complete:

        stucking_images.csv: Audit report about stuck images

      • 1003: indicates that a zombie process is audited. The following audit report is generated after an audit is complete:

        zombie_process_hosts.csv: Audit report about zombie processes

      • 1004: indicates that the residual nova-compute service is audited. No audit report is generated after an audit is complete. This item is required when the role is deleted by the CPS.
      • 1005: indicates that the records of migrated databases are audited. The following audit reports are generated after an audit is complete:
        • cold_cleaned.csv: Audit report about residual data after cold migration
        • live_cleaned.csv: Audit report about residual data after live migration
        • cold_stuck.csv: Audit report about stuck databases of the cold migration
      • 1007: indicates that the Nova database has not submitted events for auditing for more than one hour. The following audit report is generated after an audit is complete:

        nova_idle_transactions.csv: Audit report about the Nova database not submitting events for auditing for more than one hour

      • 1102: indicates that a redundant Neutron namespace (DHCP and router namespaces) is audited. The following audit report is generated after an audit is complete:
        • redundant_namespaces.csv: Audit report about redundant Neutron namespaces
      • 1103: indicates that an orphan Neutron port is audited. The following audit report is generated after an audit is complete. An orphan Neutron port is that Neutron determines that this port is used by a VM, but this VM does not exist actually.
        • neutron_wild_ports.csv: Audit report about orphan Neutron ports
      • 1201: indicates that the invalid volume, orphan volume, volume attachment status and stuck volume are audited. The following audit reports are generated after an audit is complete:
        • fakeVolumeAudit.csv: Audit report about invalid volumes
        • wildVolumeAudit.csv: Audit report about orphan volumes
        • VolumeAttachmentAudit.csv: Audit report about the volume attachment status
        • VolumeStatusAudit.csv: Audit report about stuck volumes
      • 1204: indicates that the invalid snapshot, orphan snapshot, and stuck snapshot are audited. The following audit reports are generated after an audit is complete:
        • fakeSnapshotAudit.csv: Audit report about invalid snapshots
        • wildSnapshotAudit.csv: Audit report about orphan snapshots
        • SnapshotStatusAudit.csv: Audit report about stuck snapshots
      • 1301: indicates that a bare metal server (BMS) is audited. The following audit report is generated after an audit is complete:
        • invalid_ironic_nodes.csv: Audit report about unavailable BMSs
        • invalid_ironic_instances.csv: Audit report about BMS consistency
        • stucking_ironic_instances.csv: Audit report about stuck BMSs

      If the parameter is not specified, all the audit items are performed by default.

      parameter

      Optional. This parameter can be specified only after the audit item is specified.

      Specifies an additional parameter. You can specify only one value which needs to match the item.

      • If item is set to 1001, you can set the value of vm_stucking_timeout which indicates the timeout threshold in seconds for VMs in an intermediate state. The default value is 14400. The value affects the audit report about stuck VMs. You can also set the value of host_invalid_timeout which indicates the heartbeat timeout threshold in seconds for abnormal hosts. The default value is 14400. The value affects the audit report about abnormal hosts that adversely affect cold migrated VMs.
      • If item is set to 1002, you can set the value of image_stucking_timeout which indicates the timeout period in seconds for transient images. The default value is 86400. The value affects the audit report about stuck images.
      • If item is set to 1005, you can set the value of migration_stucking_timeout which indicates the timeout period in seconds. The default value is 14400. The migration_stucking_timeout parameter affects the audit report about intermediate state of the cold migration.
      • If item is set to other values, no additional parameter is required.

      Example: --parameter vm_stucking_timeout=3600

      type

      Optional

      Specifies the additional parameter, which indicates whether an audit is synchronous or asynchronous. If this parameter is not specified for an audit, the audit is a synchronous one. The values are:

      • sync: specifies a synchronous audit. For details, see the following command.
      • async: specifies an asynchronous audit. For details, see Asynchronous Audit. The audit progress and audit result status of an asynchronous audit can be obtained by invoking the interface for querying the task status.

      Run the following command to detect a VM in the intermediate state for greater than or equal to 3600 seconds when conducting an audit:

      infocollect audit --item 1001 --parameter vm_stucking_timeout=3600

      Information similar to the following is displayed:

      +--------------------------------------+----------------------------------+  
      | Hostname                             | Path                             |  
      +--------------------------------------+----------------------------------+  
      | CCCC8175-8EAC-0000-1000-1DD2000011D0 | /var/log/audit/2015-04-22_020324 |  
      +--------------------------------------+----------------------------------+

      In the command output, Hostname indicates the ID of the host for which the audit report is generated, and Path indicates the directory containing the audit report.

      You need log in to the host firstly and then view audit reports based on Collecting Audit Reports to view it.

Collecting Audit Reports

Scenarios

Collect audit reports when:

  • Alarms about the volume audit, VM audit, snapshot audit, and image audit are generated.
  • Routine maintenance is performed.

Prerequisites

A local PC running the Windows operating system (OS) is available.

Procedure

  1. Log in to the first host in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.

    Please enter 1, enable Keystone V3 authentication with the built-in DC administrator.

  3. Perform the following operations to obtain the IDs of the hosts where the active and standby audit services are deployed:

    1. Enter the secure operation mode. For details, see Command Execution Methods.
    2. Run the following command to obtain the IDs of the hosts where the active and standby audit services are deployed:

      cps template-instance-list --service collect info-collect-server

      The following information is displayed:

      +------------+---------------------+---------+--------------------------------------+---------------+ 
      | instanceid | componenttype       | status  | runsonhost                           | omip          | 
      +------------+---------------------+---------+--------------------------------------+---------------+ 
      | 0          | info-collect-server | standby | 192521E0-BAA6-11DF-90F1-ED7A0FD46E28 | 192.168.42.85 | 
      | 1          | info-collect-server | active  | 7AC6B58A-1AAD-11B5-8567-000000821800 | 162.168.42.95 | 
      +------------+---------------------+---------+--------------------------------------+---------------+

      The values of runsonhost indicate the host IDs.

  4. Enter the secure operation mode by performing steps provided in Command Execution Methods and run the following command to obtain the management IP address of the host where the active audit service is deployed:

    cps host-show host ID | grep manageip

    NOTE:
    • Select the host in the active state from the hosts obtained in 4.
    • If the host you have logged in to is the one for which the management IP address is to be obtained, go to 6.

  5. Run the following command to log in to the host where the active audit service is deployed:

    su fsp

    ssh fsp@Management IP address

    su - root

  6. Run the following command to query the time for the last audit conducted on the host:

    ls /var/log/audit -Ftr | grep /$ | tail -1

    Information similar to the following is displayed:

    2014-09-20_033137/
    NOTE:
    • The directory name indicates the audit time. For example, 2014-09-20_033137 indicates 3:31:37 on September 20th, 2014.
    • If no result is returned, no audit report is available on the host.

  7. Run the following command to create a temporary directory used for storing audit reports:

    mkdir -p /home/fsp/last_audit_result

  8. Run the following command to copy the latest audit report to the temporary directory:

    cp -r /var/log/audit/`ls /var/log/audit -Ftr | grep /$ | tail -1` /home/fsp/last_audit_result

  9. Run the following command to modify the permissions of files in the temporary directory:

    chmod 777 /home/fsp/last_audit_result/ -R

  10. Switch to user fsp and copy the temporary directory to the first host in the AZ.

    Run the following command to switch to user fsp:

    exit

    Run the following command to copy the temporary directory:

    scp -r /home/fsp/last_audit_result fsp@host_ip:/home/fsp

    In the command, the value of host_ip is the management IP address of the first host. If the value is 172.28.0.2, run the following command:

    scp -r /home/fsp/last_audit_result fsp@172.28.0.2:/home/fsp

    During the copy process, the password of user root is required. The default password of user root is Huawei@CLOUD8!.

  11. Run the following command to delete the temporary directory from the host where the latest audit report is saved:

    rm -r /home/fsp/last_audit_result

  12. Log in to the first host in the AZ. For details, see Using SSH to Log In to a Host.
  13. Use WinSCP or other tools to copy the folder /home/fsp/last_audit_result to the local PC.
  14. Run the following command to delete the temporary folder from the first host:

    rm -r /home/fsp/last_audit_result

Analyzing Audit Results

  • If multiple faults are displayed, handle the faults one by one based on 2.
  • If the audit alarm is generated in a cascaded FusionSphere OpenStack system, the alarm must be handled regardless of whether it is also generated in the cascading system. After the alarm is cleared, audit the cascading system (manually or using the automatic routine audit by the cascading system) to ensure data consistency between the cascading and cascaded systems.

Scenarios

Analyze the audit results when:

  • When receiving audit-related alarms, such as volume, VM, snapshot, and image audit alarms, log in to the system, obtain the audit reports, and rectify the faults accordingly.
  • After enabling the backup and restoration feature, log in to the system and perform a consistency audit. Then obtain the audit reports and rectify the fault accordingly.
  • To perform routine maintenance for the system, log in to the system and perform an audit. Then obtain the audit reports and rectify the fault accordingly.

Prerequisites

  • You have obtained the audit report. For details, see Collecting Audit Reports.
  • You have obtained the operation logs if the audit is conducted after the system database is restored using a data backup.

Procedure

  1. Determine the audit report name from Details in Additional Information of the alarm information and obtain the audit report for further analysis.
  2. Check the audit report name.

  3. (Optional)After the faults are handled, check the operation logs and perform operations provided in Restoring the VM HA Flag Bit and Startup Mode to prevent inconsistency between the VM HA flag and the startup flag.

    NOTE:

    Perform this step based on whether the VM HA flag bit or startup mode was changed after the management data was backed up and before the system database was restored.

Handling Audit Results

Orphan VMs

Context

A VM is orphaned in the following scenarios:

  • Common orphan VM: The VM is present on a host but does not exist in the system database or is in the deleted state in the database.
  • Orphan VM caused by an HA exception: The VM exists in the database, and two copies of the VM are present in the system, one is in the pause state, and the other is in the running state.

Parameter Description

The name of the audit report for an orphan VM is orphan_vm.csv. The Table 18-56 describes parameters in the report.

Table 18-56 Parameter description

Parameter

Description

uuid

Specifies the VM universally unique identifier (UUID).

hyper_vm_name

Specifies the VM name registered in the hypervisor.

host_id

Specifies the ID of the host accommodating the VM.

Possible Causes

  • The database is rolled back using a data backup to the state when backup was created. However, after the backup was created, one or more VMs were created. After the database is restored, records of these VMs are deleted from the database, but these VMs reside on their hosts and become orphan VMs.
  • Some VMs were manually created on hosts using system commands.
  • During the VM live migration or cold migration process, orphan VMs exist on both the source and destination nodes, which is a normal case. After the live migration is complete, or the cold migration is confirmed and the rollback is complete, orphan VMs are deleted.
  • During the VM HA rescheduling process, a network or system exception occurred, causing the system failed to clear the VM resources in the source host. Therefore, VM information resides on the source host. After the VM is rebuilt on the destination host, the VM location recorded in the database is changed to the destination host. In this case, the VM exists on both the source and destination hosts. Therefore, in the system audit report, the VM location for the VM on the source host is incorrect.
  • During the VM live migration, cold migration, resize, resize-revert, or resize-confirm process, the network or system is unstable, or storage encounters an exception, resulting in a VM fault. After the faulty VM is deleted, residual data may exist on the source or destination host.

Impact on the System

  • VMs orphaned by database restoration are invisible to tenants.
  • Residual system resources reside.

Procedure

  1. Copy the audit report to the first controller host in the AZ. For details, see Collecting Audit Reports.
  2. Log in to the first host in the AZ. For details, see Using SSH to Log In to a Host.
  3. Import environment variables. For details, see Importing Environment Variables.
  4. Run the following command to query detailed information about the orphan VM, including the information about the host accommodating the VM and the value of instance_name:

    1. Run the following command to enter the secure mode:

      runsafe

      Information similar to the following is displayed.

      Input command:
    2. Run the following command to check whether the orphan VM exists in the database:

      python /etc/nova/nova-util/invalid_vm_result.py orphan_vm status audit_path

      audit_path indicates the patch to which audit reports are copied to in 1, as shown in the following.

      /home/fsp/last_audit_result/date or /home/fsp/last_audit_result.

      where, date must be replaced with that in the actual audit result, for example,

      /home/fsp/last_audit_result/2018-03-26_142612-02/

    3. Run the following command, and check the output:

      ls audit_path

      Check whether the output is as follows:

      audit  data_collect.zip
    4. Check whether there is a VM ID list in the output after 4.b execution, as shown in the following figure (the second row), is displayed.
      host:034284EA-41E4-11B5-8567-000000821800 has orphan vms, omip:172.28.0.2
      1df5bc46-3347-49e7-9f9e-9b58c34e1815.instance-00000040;
      • If yes, go to the next step.
      • If no, the fault is falsely reported due to time differences. In this case, no further action is required.

  5. Log in to the host accommodating the orphan VM by performing steps provided in Using SSH to Log In to a Host. The management IP address of the host accommodating the VM is the omip value in the command output of 4.
  6. Run the following command to check whether an orphan VM runs on the host:

    python /etc/nova/nova-util/invalid_vm_result.py orphan_vm result hyper_vm_str

    hyper_vm_str is the VM ID list in the command output of 4. The format is as follows: vm_id1,hyper_vm_id1;vm_id2,hyper_vm_id2...

    Check whether VM IDs are displayed in the command output. The following figure shows an example, and VM IDs are displayed from the second row.

    Those VMs at this host are orphan Vms, Please deal those VMs:
    1df5bc46-3347-49e7-9f9e-9b58c34e1815.instance-00000040
    df5bc46-3347-49e7-9f9e-9b58c34e1811,instance-00000039
    • If yes, go to 7.
    • If no, no orphan VM exists. The fault is falsely reported due to the creation time difference. No further action is required.

  7. Decide whether to delete the orphan VM.

    • If yes, go to 8.
    • If no, contact the user to process the VM. No further action is required.

  8. Run the following command to delete the orphan VM:

    python /etc/nova/nova-util/invalid_vm_result.py orphan_vm clean hyper_vm_str

    hyper_vm_str is the VM ID list in the command output of 4 in "vm_id1,hyper_vm_id1;vm_id2,hyper_vm_id2..." format. Replace hyper_vm_str with the IDs of VM to be deleted in the required format.

    The orphan VM is successfully handled.

Invalid VMs

Context

An invalid VM is the one that exists in the system database and is in a normal state in the database but is not present in the hypervisor.

Parameter Description

The name of the audit report is invalid_vm.csv. Table 18-57 describes parameters in the report.

Table 18-57 Parameter description

Parameter

Description

uuid

Specifies the VM UUID.

tenant_id

Specifies the tenant ID.

hyper_vm_name

Specifies the VM name on the host, for example, instance_xxx.

updated_at

Specifies the last time when the VM status was updated.

status

Specifies the current VM state.

task_status

Specifies the current VM task state.

Impact on the System

Users can query the VM using the Nova APIs, but the VM does not exist on the host.

Possible Causes

  • The database is rolled back using a data backup to the state when backup was created. However, after the backup was created, one or more VMs were deleted. After the database is restored, records of these VMs are present to the database, but these VMs have been deleted.
  • The system was not stable (for example, VMs were being live migrated) when the audit was conducted.
  • Some hosts are abnormal, resulting in that VMs on the hosts are incorrectly reported as invalid VMs. In this case, conduct the system audit again after the system recovers.

Procedure

  1. Copy the audit report to the first controller host in the AZ. For details, see Collecting Audit Reports.
  2. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  3. Import environment variables. For details, see Importing Environment Variables.
  4. Perform the following operations to obtain details of the invalid VM:

    1. Run the following command to enter the secure mode:

      runsafe

      The following information is displayed:

      Input command:
    2. Run the following command to obtain information about the host accommodating the invalid VM and the instance_name value:

      python /etc/nova/nova-util/invalid_vm_result.py invalid_vm status audit_path

      audit_path is the path storing the audit report of 1. The following is an example:

      /home/fsp/last_audit_result

    3. Check whether VM IDs are displayed in the command output. The following figure shows an example, and VM IDs are displayed in the second row:
      host:034284EA-41E4-11B5-8567-000000821800 has invalid vms, omip:172.28.0.2
      3cd2e2ea-830f-442f-b654-a243f03bdc57.instance-00000091;
      • If yes, go to the next step.
      • If no, the fault is falsely reported due to the time difference. In this case, no further action is required.

  5. Log in to the host accommodating the invalid VM by performing steps provided in Using SSH to Log In to a Host: The management IP address of the host accommodating the VM is the omip value in the command output of previous step.
  6. Import environment variables. For details, see Importing Environment Variables.
  7. Run the following command to check whether an invalid VM runs on the host:

    python /etc/nova/nova-util/invalid_vm_result.py invalid_vm result hyper_vm_str

    hyper_vm_str is the VM ID list in the command output of 4. The format is as follows: vm_id1,hyper_vm_id1;vm_id2,hyper_vm_id2...

    Check whether VM IDs are displayed in the command output. The following figure shows an example, and VM IDs are displayed from the second row.

    Those VMs at this host are invalid Vms, please deal those VMs:
    3cd2e2ea-830f-442f-b654-a243f03bdc57
    • If yes, an invalid VM exists and go to next step.
    • If no, no invalid VM exists. The fault is falsely reported due to the creation time difference. No further action is required.

  8. Run the following command to query the VM volume information:

    Run the runsafe command to enter the secure operation mode, enter the user password, and run the following command as prompted:

    nova show uuid | grep volumes_attached

    uuid can be obtained from the command output of previous step. The following figure shows an example.

    | os-extended-volumes:volumes_attached | [{"id": "cb39e2f3-9837-4ebe-972f-59caed581021",

    The VM volume IDs are displayed in the id row of the command output.

    If an exception occurs during the operation, contact technical support for assistance.

  9. Check whether all the volume IDs obtained in 8 are listed in the audit report of the invalid volume.

    • If yes, go to 11.
    • If no, go to 10.

  10. Contact the tenant to determine whether to delete the invalid VM.

    • If yes, contact the user to process the VM. No further action is required.
    • If no, go to 11.

  11. Determine whether to delete the VM or restore the VM.

    • To restore the VM, go to 12.
    • To delete the VM, go to 14.

  12. Perform the following operations to restore the VM:

    Run the runsafe command to enter the secure operation mode, enter the user password, and run the following command as prompted:

    /opt/cloud/services/nova/venv/bin/python2.7 /etc/nova/nova-util/reschedule_vm.py vm_uuid

  13. Run the following command to query the VM status:

    Run the runsafe command to enter the secure operation mode, enter the user password, and run the following command as prompted:

    nova show vm_uuid

    In the command output, check the value of status.

    • If the value is REBUILD, the VM is being restored. Query the VM again 1 minute later.
    • If the value is ACTIVE, the VM is restored.
    • If another value is displayed, the VM restoration failed. Contact technical support for assistance.

  14. Log in to the host accommodating GaussDB or the active gaussdb_nova node. For details ,see Logging In to the Active GaussDB Node.

    Run the following command to clear the record of the invalid VM in the database. Enter the password of user gaussdba as prompted during the command execution. The default password of user gaussdba is FusionSphere123.

    sh /usr/bin/info-collect-script/audit_resume/FakeVMCleanup.sh vm_uuid

    Check whether the command is successfully executed.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

VM Location Inconsistency

Context

The host accommodating a VM recorded in the system database is inconsistent with the actual host.

If the fault is confirmed, correct the actual VM location information in the database.

Parameter Description

The name of the audit report is host_changed_vm.csv. The Table 18-58 describes parameters in the report.

Table 18-58 Parameter description

Parameter

Description

uuid

Specifies the VM UUID.

tenant_id

Specifies the ID of the tenant who owns the VM.

hyper_vm_name

Specifies the VM name registered in the hypervisor.

updated_at

Specifies the last time when the VM status was updated.

status

Specifies the VM state.

task_status

Specifies the VM task state.

host_id

Specifies the ID of the host accommodating the VM recorded in the database.

hyper_host_id

Specifies the ID of the actual host accommodating the VM.

hypervisor_hostname

Reserved for connecting to VRM and is left blank in KVM scenarios.

hyper_hypervisor_hostname

Reserved for connecting to VRM and is left blank in KVM scenarios.

Possible Causes

The database is rolled back using the management data backup to the state when the backup was created. However, after the backup was created, one or more VMs were migrated. After the database is restored, location records of these VMs in the database are inconsistent with the actual VM locations.

Impact on the System

The VM becomes unavailable if the VM location recorded in the database is inconsistent with the actual host accommodating the VM.

Procedure

  1. Log in to the first host in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to obtain the management IP address of the host accommodating the VM:

    Run the runsafe command to enter the secure operation mode, enter the user password, and run the following command as prompted:

    cps host-list |grep hyper_host_id |awk -F ' ' '{print $12}'

    The value of hyper_host_id can be obtained from audit report hyper_host_id.

    The management IP address of the host is the value in the command output.

  4. Log in to the host accommodating the VM and run the following command to check whether the VM runs on the host:

    nova_virsh_cmd virsh-list-name | grep hyper_vm_name

    The value of hyper_vm_name can be obtained from the audit report.

    Check whether the command output contains the VM record.

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Log in to the host accommodating GaussDB or the active gaussdb_nova node by performing steps provided in Logging In to the Active GaussDB Node. Run the sh /usr/bin/info-collect-script/audit_resume/host_changed_handle.sh uuid hyper_host_id command to correct the information about the host accommodating the VM recorded in the database. Enter the password of user gaussdba as prompted during the command execution. The default password of user gaussdba is FusionSphere123.

    sh /usr/bin/info-collect-script/audit_resume/host_changed_handle.sh uuid hyper_host_id

    uuid can be obtained from the audit report.

    The value of hyper_host_id can be obtained from audit report hyper_host_id.

    Check whether the command is successfully executed based on the command output.

    • If yes, go to 6.
    • If no, contact technical support for assistance.

  6. Run the following command to stop the VM:

    nova stop uuid

    After a few seconds, go to 7.

    Check whether the value of status in the command output is SHUTOFF.

    • If yes, go to 8.
    • If no, contact technical support for assistance.

  7. Run the following command to query the VM status:

    nova show uuid | grep status

  8. Run the following command to migrate the VM:

    nova migrate uuid

    After a few seconds, go to 7.

    Check whether the value of status in the command output is VERIFY_RESIZE.

    • If yes, go to 9.
    • If no, contact technical support for assistance.

  9. Run the following command to check whether the VM is successfully migrated:

    nova resize-confirm uuid

    After a few seconds, go to 7.

    Check whether the value of status in the command output is SHUTOFF.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Stuck VMs

Context

A stuck VM is the one that is kept in a transition state for a long time and cannot be automatically restored if any system exception (for example, a host restart) occurs during a VM service process (for example, starting a VM).

Manually restore the VM based on the VM status and the task status. For a VM from which the system volumes are detached, perform the operations based on section A Fault Occurs If the System Disk of a VM Is Detached in HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide.

Parameter Description

The name of the audit report is stucking_vm.csv. Table 18-59 describes parameters in the report.

Table 18-59 Parameter description

Parameter

Description

uuid

Specifies the VM UUID.

tenant_id

Specifies the tenant ID.

hyper_vm_name

Specifies the VM name registered in the hypervisor.

updated_at

Specifies the last time when the VM status was updated.

status

Specifies the VM state.

task_status

Specifies the VM task state.

Possible Causes

A system exception occurred when a VM service operation was in process.

NOTE:

If a host fault caused the VM to be stuck in a transient state, the VM can be restored using the HA mechanism. For details about the VM HA feature, see the product documentation.

Impact on the System

The VM becomes unavailable and occupies system resources.

Procedure

Restore the VM based on the VM statuses and task statuses listed in Table 18-60. For other situations, contact technical support for assistance.

Table 18-60 VM restoration methods

VM Status

Task Status

Possible Scenario

Restoration Method

building

scheduling

Creating a VM

Method 2

building

None

Creating a VM

Method 2

building

block_device_mapping

Creating a VM

Method 2

building

networking

Creating a VM

Method 2

building

spawning

Creating a VM

Method 2

N/A

image_snapshot_pending

Exporting a snapshot

Set the VM state to active. For details, see Method 1.

N/A

image_snapshot

Exporting a snapshot

Set the VM state to active. For details, see Method 1.

N/A

image_pending_upload

Exporting a snapshot

Method 4

N/A

image_uploading

Exporting a snapshot

Set the VM state to active. For details, see Method 1.

N/A

image_backup

Creating a VM backup

Set the VM state to active. For details, see Method 1.

N/A

resize_prep

Migrating a VM or modifying VM attributes

Set the VM state to active. For details, see Method 1.

N/A

resize_migrating

Migrating a VM or modifying VM attributes

Method 4

N/A

resize_migrated

Migrating a VM or modifying VM attributes

Method 4

N/A

resize_finish

Migrating a VM or modifying VM attributes

Method 4

N/A

resize_reverting

Migrating a VM or modifying VM attributes

Method 4

N/A

rebooting

Restarting a VM

Set the VM state to active. For details, see Method 3.

N/A

reboot_pending

Restarting a VM

Set the VM state to active. For details, see Method 3.

N/A

reboot_started

Restarting a VM

Set the VM state to active. For details, see Method 3.

N/A

rebooting_hard

Restarting a VM

Set the VM state to active. For details, see Method 3.

N/A

reboot_pending_hard

Restarting a VM

Set the VM state to active. For details, see Method 3.

N/A

reboot_started_hard

Restarting a VM

Set the VM state to active. For details, see Method 3.

N/A

pausing

Pausing a VM

Set the VM state to active. For details, see Method 1

N/A

unpausing

Unpausing a VM

Set the VM state to paused. For details, see Method 1.

N/A

suspending

Suspending a VM

Set the VM state to active. For details, see Method 1.

N/A

resuming

Resuming a VM

Set the VM state to suspended. For details, see Method 1.

N/A

powering_off

Stopping a VM

Set the VM state to active. For details, see Method 1.

N/A

powering_on

Starting a VM

Set the VM state to stopped. For details, see Method 1.

N/A

rebuilding

Rebuilding a VM

Method 6

N/A

rebuild_block_device_mapping

Rebuilding a VM

Method 6

N/A

rebuild_spawning

Rebuilding a VM

Method 6

N/A

migrating

Live migrating a VM

Method 4

N/A

rescheduling

Rescheduling a VM

Method 6

N/A

deleting

Deleting a VM

Method 5

Method 1

Reset the VM status based on the stuck state.

  1. Reset the VM status based on the stuck state.

    Based on the returned VM status, reset the VM status. For details, see Setting the VM Status. Stop the VM and then start it.

  2. Log in to the first controller node in the AZ.

    For details, see Using SSH to Log In to a Host.

  3. Import environment variables. For details, see Importing Environment Variables.
  4. Perform the following operations to query the VM attributes:

    1. Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input command:
    2. Run the following command to query the VM attributes:

      nova show uuid

      The VM UUID can be obtained from the audit report.

  5. Run the following commands to stop the VM first and then query the VM state to verify that the VM is in the stopped state:

    nova stop uuid

    nova show uuid

    The VM UUID can be obtained from the audit report.

    After the VM is stopped, Check whether any exception occurs when you perform the preceding operations.

    • If yes, go to Method 4.
    • If no, go to the next step.

  6. Run the following commands to start the VM and then query the VM state to verify that the VM is in the active state:

    nova start uuid

    nova show uuid

    Check whether any exception occurs when you perform the preceding operations.

    • If yes, go to Method 4.
    • If no, go to the next step.

  7. Have a tenant to log in to the VM and check whether any exception occurs.

    • If yes, go to Method 4.
    • If no, no further action is required.

Method 2

If a VM fails to create, ask the VM tenant whether this VM is useful. If yes, trigger VM high availability (HA) to restore the VM. If no, delete the VM and create another one.

NOTE:

Only an HA-enabled VM can be restored. A non-HA-enabled VM can only be deleted in this case.

  1. Ask the VM tenant whether this VM is useful and is necessary to restore.

    • If yes, go to the next step.
    • If no, have the tenant to delete the VM. No further action is required.
    NOTE:

    If the VM is stuck in the deleting state for a long time, go to Method 5.

  2. Log in to the first controller node in the AZ.

    For details, see Using SSH to Log In to a Host.

  3. Import environment variables. For details, see Importing Environment Variables.
  4. Perform the following operations to query the VM attributes:

    1. Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input command:
    2. Run the following command to query the VM attributes:

      nova show uuid

      The VM UUID can be obtained from the audit report.

  5. Run the following command to query whether the VM has HA enabled:

    nova show uuid

    The VM has HA enabled if the metadata field does not contain the {'_ha_policy_type': 'close'} key value.

  6. Determine the subsequent operation based on whether the VM has HA enabled.

    • If the VM has HA enabled, set the VM state to error. For details, see Setting the VM Status. No further action is required. The VM will be automatically rebuilt at the preset HA triggering time.
    • If the VM does not have HA enabled, have the tenant to delete the failed VM.

Method 3

If a VM fails to restart, have the tenant to perform steps provided in Setting the VM Status to reset the VM status and then restart the VM. If the fault persists, detach the volumes from the VM and create the VM again.

  1. Have the tenant to reset the VM status and then restart the VM. Check whether the VM is successfully restarted.

Method 4

Handle the failure based on the boot device of the VM.

  1. Log in to the first controller node in the AZ.

    For details, see Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Determine the boot device of the VM.

    1. Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input command:
    2. Run the following command to query the VM attributes:

      nova show vm_uuid

      vm_uuid specifies the UUID of the VM in the intermediate state in the audit report.

      In the command output, the image parameter provides information about the image used by the VM.

      The image boots from a volume if the following information is displayed:

      | image                                | Attempt to boot from volume - no image supplied|

      The VM boots from an image if information similar to the following is displayed (d0bd0551-07f2-45f6-8516-f481e0152715 specifies the image ID):

      | image                                | cirros (d0bd0551-07f2-45f6-8516-f481e0152715)|

  4. Execute the rollback script for VM cold migration/resize to migrate the VM to the source host. Then run the runsafe command to enter the secure operation mode and run the following command:

    python /etc/nova/nova-util/revert_migrate_vm.py vm_uuid

    • If no command output is displayed, the VM is successfully rolled back. Go to 5.
    • If "WARNNING:xxx vm cannot be reverted." or other exception message is displayed, contact technical support for assistance.

  5. Execute the rollback script for the VM cold migration/VM image file resize.

    1. Then run the runsafe command to enter the secure operation mode and run the following command to query the host accommodating the VM:

      nova show vm_uuid

      The OS-EXT-SRV-ATTR:host parameter in the command output specifies the ID of the host accommodating the VM.

    2. Log in to the FusionSphere OpenStack web client and query the IP address of the host Step 1External OM plane on the Summary page based on the host ID.
    3. Log in to the host as user root and execute the rollback script for the VM cold migration/VM image file resize.

      sh /etc/nova/nova-util/revert_migrate_vm_file.sh vm_uuid

      • If no command output is displayed, the image file is successfully rolled back. Go to 6.
      • If "WARNNING:xxx image file need not revert." is displayed and the VM task status is resize_migrating, go to the next step. In other situations, contact technical support for assistance.

  6. On the controller node, perform the cold migration operation. For details, see 1.

    The cold migration is mandatory. Otherwise, resource collection error may occur on the host. Before performing the cold migration, ensure that the other hosts have sufficient resources for accommodating the migrated VMs.

    1. Run the runsafe command to enter the secure operation mode and run the following command:

      nova migrate vm_uuid

      If no command output is displayed, the cold migration can be performed. If "No valid host" is displayed, release host resources first.

    2. Run the nova show vm_uuid command multiple times to check whether status of the VM is VERIFY_RESIZE and task_state is -.
      • If yes, the operation is successful. Go to 6.c.
      • If no, contact technical support for assistance.
    3. Run the nova resize-confirm vm_uuid command to check the cold migration result.
    4. Run the nova show vm_uuid command multiple times to check whether status of the VM is SHUTOFF and task_state is -.
      • If yes, the cold migration is complete. Go to 7.
      • If no, contact technical support for assistance.

  7. Start the VM.

    1. Run the runsafe command to enter the secure operation mode and run the following command:

      nova start vm_uuid

    2. Run the nova show vm_uuid command multiple times to check whether status of the VM is ACTIVE and task_state is -.
      • If yes, the VM is successfully restored. No further action is required.
      • If no, go to the next step.

  8. Rebuild the VM.

    If the VM boots from an image, rebuilding the VM may cause data loss on the system volume. Therefore, contact technical support for assistance before rebuilding the VM.

    1. Run the runsafe command to enter the secure operation mode and run the following command:

      nova show vm_uuid

      In the command output, the image parameter specifies the image ID.

      | image                                | cirros (d0bd0551-07f2-45f6-8516-f481e0152715)|

      For example, d0bd0551-07f2-45f6-8516-f481e0152715 is the image ID.

    2. Run the following command to rebuild the VM:

      nova rebuild vm_uuid Image ID

    3. Run the nova show vm_uuid command multiple times to check whether status of the VM is ACTIVE and task_state is -.
      • If yes, the VM is successfully restored. No further action is required.
      • If no, contact technical support for assistance.

Method 5

If a VM stucks in the deleting state and cannot be restored, manually delete the VM.

  1. If a VM is stuck in the deleting state and cannot be restored, manually delete the VM.

    Set the VM to the stopped state. For details, see Setting the VM Status.

  2. Log in to the first controller node in the AZ.

    For details, see Using SSH to Log In to a Host.

  3. Import environment variables. For details, see Importing Environment Variables.
  4. Perform the following operations to check whether the VM exists in the database:

    NOTE:

    An orphan VM does not exist in the system database or is in the deleted state in the database.

    1. Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input command:
    2. Run the following command to check whether the VM exists in the database:

      nova show vm_uuid

      Check whether the command is correctly executed and whether the VM information is displayed:

      • If yes, make a note of the value of OS-EXT-SRV-ATTR:instance_name in the command output, which indicates the VM name (hyper_vm_name) registered in the hypervisor, and the value of OS-EXT-SRV-ATTR:host, which indicates the host ID (host_id), and go to the next step.
      • If no, contact technical support for assistance.

  5. Enter the secure operation mode and obtain the management IP address of the host running the VM.

    Run the runsafe command to enter the secure operation mode, enter the user password as prompted, and run the following command:

    cps host-show host_id | grep manageip | awk -F '|' '{print $3}'

  6. Log in to the host accommodating the VM and run the following command to check whether the VM runs on the host:

    nova_virsh_cmd virsh-instance-state hyper_vm_name

    hyper_vm_nam is the value you obtained in 4. Check whether the Non-Active is displayed when you perform the preceding operations:

    • If yes, contact technical support for assistance.
    • If no, go to the next step.

  7. Run the following command to stop the VM:

    nova_virsh_cmd virsh-instance-shutdown hyper_vm_name

  8. Log in to the controller node (for details, see 2) and run the following command in the database to check whether the VM has volumes attached:

    nova show vm_uuid

    Check whether the os-extended-volumes:volumes_attached field contains any value.

    • If yes, the VM has volumes attached. Make a note of the UUIDs of each volume and go to 9.
    • If no, no volumes are attached to the VM. Go to 10.

  9. On the controller node, run the following command for each volume attached to the VM one by one to detach the volumes:

    nova volume-detach vm_uuid volume_uuid

    NOTE:

    Do not detach the root device volume.

    Check whether any exception occurs when you perform the preceding operations.

    • If yes, contact technical support for assistance.
    • If no, go to 10.

  10. Run the following command to delete the VM:

    nova delete vm_uuid

    Run the following command to check whether the VM is successfully deleted:

    nova show vm_uuid

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 6

Reset the VM task status and recreate the VM.

  1. Log in to the first controller node in the AZ.

    For details, see Using SSH to Log In to a Host.

  2. Import environment variables. For details, see Importing Environment Variables.
  3. Perform the following operations to reset the VM task status:

    1. Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input command:
    2. Run the following command to reset the VM task status:

      nova reset-state vm_uuid

  4. Check whether the VM has HA enabled.

    • If yes, the VM is automatically recreated after 5 to 10 minutes. After 10 minutes, go to 6.
    • If no, go to 5.

  5. Rebuild the VM according to 3 .

    If the VM boots from an image, rebuilding the VM may cause data loss on the system volume. Therefore, contact technical support for assistance before rebuilding the VM.

    1. Run the runsafe command to enter the secure operation mode, enter the user password as prompted, and run the following command:

      nova show vm_uuid

      In the command output, check the image value. If information similar to the following is displayed, the VM boots from an image:

      | image | cirros (d0bd0551-07f2-45f6-8516-f481e0152715)|
    2. Run the following command to rebuild the VM:

      nova rebuild vm_uuid image_uuid

  6. Perform the following operations to query the VM status:

    Run the runsafe command to enter the secure operation mode, enter the user password, and run the following command as prompted:

    nova show vm_uuid

    In the command output, check the value of status.

    • If the value is REBUILD, the VM is recreating. Query the VM status again after 1 minute.
    • If the value is ACTIVE, the VM is restored. No further action is required.
    • If other values are displayed, the VM fails to be restored. Contact technical support for assistance.

Inconsistency Between the Quota in the Nova Database and the Actual Quota

Context

If the used quota in the Nova database is inconsistent with that in the actual environment, rectify the fault by referring to this section.

Parameter Description

The names of the audit reports are nova_quota_vcpus.csv, nova_quota_memory_mb.csv, and nova_quota_instance.csv.

Impact on the System

The used quota in the quota table of the database is inconsistent with the actually used data. As a result, the tenant may fail to create a VM due to resource limitation.

Possible Causes

  • Changes in the quota table and VM changes are not ensured in transactions.
  • When a network exception occurs during the process of creating, resizing, or deleting a VM, the actually used quota is inconsistent with that in the quota table.

Procedure

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Switch to the node where the audit service resides.

    1. Run the cps template-instance-list --service collect info-collect-server command to query the External OM IP address of the audit service.
    2. Run the su fsp command to switch to the fsp user.

      The default username is fsp, and the default password is Huawei@CLOUD8.

    3. Run the ssh fsp@omip command to switch to the node where the audit service resides.
    4. Import environment variables by referring to Importing Environment Variables.

  4. Manually audit the quota.

    1. Run the following command to enter the secure operation mode:

      runsafe

      Information similar to the following is displayed:

    2. Run the following command to manually audit the quota and mark the value of Path in the command output as PATH:

      infocollect audit --item 1008

  5. Run the /usr/bin/python2.7 /etc/nova/nova-util/refresh_quota_usages.py PATH command to restore the Nova quota usage. Enter y or n when the "Please confirm recovering the quota-usages table(y/n):" prompt is displayed in the command output.

    • y: Modify the resource usage in the quota_usages table.
    • n: Confirm that the resource usage in the quota_usages table is not modified and exit the processing.

    After y is entered, check whether the command output contains "Success synchronizing the quota-usages table in resource: instance/vcpus/memory_mb".

    • If yes, the resource usage in the quota_usages table is restored successfully. No further action is required.
    • If no, contact technical support for assistance.

Stuck Images

Context

An image in the active state is available. If an image is stuck in the queued or saving state, the image is unavailable. If an image is kept stuck in a transition state for more than 24 hours, process it at the Cascading OpenStack. For details, see Stuck Images.

Orphan Volumes

Context

An orphan volume is the one that is present to a storage device but is not recorded in the Cinder database.

If a volume is orphaned and the management data is lost due to backup-based system restoration, use the orphan volume to create another volume and notify the tenant of using the new volume.

NOTE:

Orphan volumes processed in Orphan VMs can be neglected.

Parameter Description

The name of the audit report is wildVolumeAudit.csv. Table 18-61 describes parameters in the report.

Table 18-61 Parameter description

Parameter

Description

volume_name

Specifies the volume name on the storage device.

volume_type

Specifies the volume type.

Impact on the System

An orphan volume is unavailable in the Cinder service but occupies the storage space.

Possible Causes

  • The database is rolled back using a data backup to the state when backup was created. However, after the backup was created, one or more volumes were created. After the database is restored, records of these volumes are deleted from the database, but these volumes reside on their storage devices and become orphan volumes.
  • The storage system is shared by multiple FusionSphere systems.
  • Volumes on the storage device are not created using the Cinder service.
NOTE:

When you design system deployment for a site, do not make multiple hypervisors to share one storage system, and use only the Cinder service to create volumes on a storage device. Otherwise, false audit reports may be generated.

Procedure

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Perform the following steps for each volume in the audit report. The volume ID is the value behind "volume-".
  4. Obtain the volume attributes. For details, see Querying Volume Attributes.
  5. Ask the tenant whether to restore the orphan volume.

    • If yes, go to 6.
    • If no, go to 7.

  6. Use the orphan volume to create another volume and replicate the original data to the new volume. For details, see section Restoring Volume Data.

    Check whether any exception occurs when you perform the preceding operations.

    • If yes, contact technical support for assistance.
    • If no, go to 7.

  7. Delete the orphan volume. For details, see Deleting an Orphan Volume.

Invalid Volumes

Context

An invalid volume is the one that is recorded in the Cinder database but is not present to a storage device.

Delete the invalid volume from the Cinder database.

Parameter Description

The name of the audit report is fakeVolumeAudit.csv. Table 18-62 describes parameters in the report.

Table 18-62 Parameter description

Parameter

Description

volume_id

Specifies the volume ID.

volume_displayname

Specifies the name of the volume created by a tenant.

volume_name

Specifies the volume name on the storage device.

volume_type

Specifies the volume type.

location

Specifies the volume location.

Impact on the System

The volume can be queried using the Cinder command.

Possible Causes

The database is rolled back using a data backup to the state when backup was created. However, after the backup was created, one or more volumes were deleted. After the database is restored, records of these volumes reside in the database, so the volumes become invalid.

Procedure

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Perform the following operations to check whether the volume exists in Cinder:

    1. Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input command:
    2. Run the following command to check whether the volume exists in Cinder:

      cinder show Volume ID

      The volume ID can be obtained from the audit report. For example, run the following command:

      cinder show 044e14af-9d11-4ee9-9b5a-0dcbcd5033aa

      Check whether the command output contains ERROR, which indicates that the volume does not exist in Cinder.

      Check whether the command output contains ERROR.

      • If yes, contact technical support for assistance.
      • If no, go to 4.

  4. Perform the following operations to query the host list:

    1. Enter the secure operation mode. For details, see Command Execution Methods.
    2. Run the following command to query the management IP address of the host with the blockstorage-driver-kvm001 role assigned:

      cps template-instance-list --service cinder cinder-volume-kvm001

      Information similar to the following is displayed:

      +----------------+---------------+--------+--------------------------------------+----------------+ 
      | instanceid     | componenttype | status | runsonhost                           | omip           | 
      +----------------+---------------+--------+--------------------------------------+----------------+ 
      | agt_0000000002 | cinder-volume | active | 9099E61E-432D-E611-A2B4-042758045CB6 | 192.168.101.105 | 
      | agt_0000000001 | cinder-volume | active | 6094092C-D21D-B211-9287-0018E1C5D866 | 192.168.101.104 | 
      | agt_0000000000 | cinder-volume | active | 3E83D92A-D21D-B211-95E9-0018E1C5D866 | 192.168.101.103 | 
      +----------------+---------------+--------+--------------------------------------+----------------+

      The value of omip indicates the management IP address of the node.

  5. Run the following commands to log in to a host with the blockstorage-driver-kvm001 role assigned:

    su fsp

    ssh fsp@Management IP address

    For example, run the following command:

    ssh fsp@192.168.101.104

    After you log in to the host as user fsp, run the su - root command to switch to user root.

    The default password of user fsp is Huawei@CLOUD8.

    The default password of user root is Huawei@CLOUD8!.

  6. Run the following command to query the storage type:

    python /usr/bin/info-collect-script/audit_resume/get_host_storage_info.py

    NOTE:

    If the storage type displayed in the command output is inconsistent with the volume type, go to 5 to log in to another host with the blockstorage-driver-kvm001 role assigned and perform the subsequent operations.

    Information similar to the following is displayed:

    storage_type=dsware 
    addition info is : 
              manage_ip=172.28.0.231 
              vbs_url=172.28.6.1,172.28.6.0,172.28.0.2

    The volume storage type is dsware. The value of manage_ip is the IP address of the FusionStorage Manager node, and the value of the vbs_url is the management IP address of the compute node. Go to 7.

  7. Run the following command to query information about the volume:

    fsc_cli --ip Compute node management IP address --manage_ip FusionStorage Manager node IP address --port 10519 --op queryVolume --volName Volume name on the storage device

    For example, run the following command:

    fsc_cli --ip 172.29.6.6 --manage_ip 172.29.0.231 --port 10519 --op querySnapshot --snapName snapshot-6f2282f1-22b3-41f1-8b3f-d15aa9790388

    The volume exists on the storage device if information similar to the following is displayed:

    result=0 
    vol_name=volume-6f2282f1-22b3-41f1-8b3f-d15aa9790388,father_name=,status=0,vol_size=1024,real_size=-1,pool_id=0,create_time=2016-09-18 07:23:21

    Check whether the volume exists on the storage device attached to the host whose roles value is blockstorage-driver-kvm001 in the AZ.

    • If yes, contact technical support for assistance.
    • If no, go to next step.

  8. Log in to the node whose roles value is controller and run the following command to check whether the volume has a snapshot:

    cinder snapshot-list --all-t --volume-id Volume ID

    Check the command output to determine whether the volume has a snapshot.

    • If yes, record the snapshot ID in the command output and run the following command to delete the snapshot:

      cinder snapshot-delete Snapshot ID

    • If no, go to next step.

    After the snapshot is deleted, run the following command to delete the volume:

    python /usr/bin/info-collect-script/audit_resume/delete_specify_volume.py Volume ID

    The volume is deleted if the following information is displayed:

    INFO: delete success.

  9. Import environment variables. For details, see Importing Environment Variables.
  10. Enter the secure operation mode. For details, see 3. Run the following command to query the volume status:

    cinder show Volume ID

    Check whether the deleted volume still exists.

    • If yes, contact technical support for assistance.
    • If no, no further action is required.

Handling Orphan Volume Snapshots

Context

An orphan volume snapshot is the one that is present to a storage device but is not recorded in the Cinder database.

Delete the orphan volume snapshot from the storage device.

Parameter Description

The name of the audit report is wildSnapshotAudit.csv. Table 18-63 describes parameters in the report.

Table 18-63 Parameter description

Parameter

Description

snap_name

Specifies the volume snapshot name on the storage device.

snap_type

Specifies the snapshot type.

Impact on the System

An orphan volume snapshot occupies the storage space.

Possible Causes

The database is rolled back using a data backup to the state when backup was created. However, after the backup was created, one or more volume snapshots were created. After the database is restored, records of these snapshots are deleted from the database, but these snapshots reside on their storage devices and become orphan volume snapshots.

Procedure

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Perform the following steps for each snapshot in the audit report. The volume ID is the value behind "snapshot-".
  4. Perform the following operations to check whether the snapshot exists in Cinder:

    1. Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input command:
    2. cinder snapshot-show Snapshot ID

      For example, run the following command:

      cinder snapshot-show 1cd5c6eb-e729-4773-b846-e9f1d3467c56

      Check whether the command output contains ERROR. If ERROR is contained, the snapshot does not exist in Cinder.

      ERROR: No snapshot with a name or ID of '1cd5c6eb-e729-4773-b846-e9f1d3467c56' exists.

      Check whether the snapshot exists in Cinder.

      • If yes, contact technical support for assistance.
      • If no, go to 5.

  5. Perform the following operations to query the host list:

    1. Enter the secure operation mode. For details, see Command Execution Methods.
    2. Run the following command to query the management IP address of the host with the blockstorage-driver-kvm001 role assigned:

      cps template-instance-list --service cinder cinder-volume-kvm001

      Information similar to the following is displayed:

      +----------------+---------------+--------+--------------------------------------+----------------+ 
      | instanceid     | componenttype | status | runsonhost                           | omip           | 
      +----------------+---------------+--------+--------------------------------------+----------------+ 
      | agt_0000000002 | cinder-volume | active | 9099E61E-432D-E611-A2B4-042758045CB6 | 192.168.101.105 | 
      | agt_0000000001 | cinder-volume | active | 6094092C-D21D-B211-9287-0018E1C5D866 | 192.168.101.104 | 
      | agt_0000000000 | cinder-volume | active | 3E83D92A-D21D-B211-95E9-0018E1C5D866 | 192.168.101.103 | 
      +----------------+---------------+--------+--------------------------------------+----------------+

      The value of omip indicates the management IP address of the node.

  6. Run the following commands to log in to a host with the blockstorage-driver-kvm001 role assigned:

    su fsp

    ssh fsp@Management IP address

    For example, run the following command:

    ssh fsp@192.168.101.103

    After you log in to the host as user fsp, run the su - root command to switch to user root.

    The default password of user fsp is Huawei@CLOUD8.

    The default password of user root is Huawei@CLOUD8!.

  7. Run the following command to query the storage type:

    python /usr/bin/info-collect-script/audit_resume/get_host_storage_info.py

    NOTE:

    If the storage type displayed in the command output is inconsistent with the snapshot type, go to 6 to log in to another host with the blockstorage-driver-kvm001 role assigned and perform the subsequent operations.

    Information similar to the following is displayed:

    storage_type=dsware 
    addition info is : 
              manage_ip=172.28.0.231 
              vbs_url=172.28.6.1,172.28.6.0,172.28.0.2

    The volume storage type is dsware. The value of manage_ip is the IP address of the FusionStorage Manager node, and the value of the vbs_url is the management IP address of the compute node. Go to 8.

  8. Run the following command to query the snapshot information:

    fsc_cli --ip Compute node management IP address --manage_ip FusionStorage Manager node IP address --port 10519 --querySnapshot --snapName Snapshot name on the storage device from 4.

    For example, run the following command:

    fsc_cli --ip 172.29.6.6 --manage_ip 172.29.0.231 --port 10519 --op querySnapshot --snapName snapshot-6f2282f1-22b3-41f1-8b3f-d15aa9790388

    The snapshot exists on the storage device if information similar to the following is displayed:

    result=0 
    snap_name=snapshot-6f2282f1-22b3-41f1-8b3f-d15aa9790388,father_name=,status=0,snap_size=1024,real_size=-1,pool_id=0,create_time=2014-09-18 07:23:21

    Check whether the snapshot exists on the storage device attached to the host with the blockstorage-driver-kvm001 role assigned in the AZ.

    • If yes, go to 9.
    • If no, contact technical support for assistance.

  9. Log in to the target host.

    After you log in to the host as user fsp, run the su - root command to switch to user root.

    The default password of user fsp is Huawei@CLOUD8.

    The default password of user root is Huawei@CLOUD8!.

  10. Run the following command to delete the snapshot:

    fsc_cli --ip Compute node management IP address --manage_ip FusionStorage Manager node IP address --port 10519 --op deleteSnapshot --snapName Snapshot name on the storage device

    For example, run the following command:

    fsc_cli --ip 172.29.6.6 --manage_ip 172.29.0.231 --port 10519 --op deleteSnapshot --snapName snapshot-6f2282f1-22b3-41f1-8b3f-d15aa9790388

    Check whether the snapshot is deleted.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Handling Invalid Volume Snapshots

Context

An invalid volume snapshot is the one that is recorded in the Cinder database but is not present to a storage device.

Delete the invalid volume snapshot from the Cinder database.

Parameter Description

The name of the audit report is fakeSnapshotAudit.csv. Table 18-64 describes parameters in the report.

Table 18-64 Parameter description

Parameter

Description

snap_id

Specifies the snapshot ID.

snap_name

Specifies the volume snapshot name on the storage device.

volume_id

Specifies the base volume ID.

snap_type

Specifies the snapshot type.

location

Specifies the snapshot location.

Impact on the System

The snapshot can be queried using the Cinder command.

Possible Causes

The database is rolled back using a data backup to the state when backup was created. However, after the backup was created, one or more volume snapshots were deleted. After the database is restored, records of these volume snapshots reside in the database, and the volume snapshots become invalid.

Procedure

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Perform the following operations to check whether the snapshot exists in Cinder:

    1. Run the following command to enter the secure operation mode:

      runsafe

      Information similar to the following is displayed:

      Input command:
    2. cinder snapshot-show Snapshot ID

      For example, run the following command:

      cinder snapshot-show 1cd5c6eb-e729-4773-b846-e9f1d3467c56

      Check whether the command output contains ERROR. If ERROR is contained, the snapshot does not exist in Cinder.

      ERROR: No snapshot with a name or ID of '1cd5c6eb-e729-4773-b846-e9f1d3467c56' exists.

      Check whether the snapshot exists in Cinder.

      • If yes, go to 4.
      • If no, contact technical support for assistance.

  4. Perform the following operations to query the host list:

    1. Enter the secure operation mode. For details, see Command Execution Methods.
    2. Run the following command to query the management IP address of the host with the blockstorage-driver-kvm001 role assigned:

      cps template-instance-list --service cinder cinder-volume-kvm001

      Information similar to the following is displayed:

      +----------------+---------------+--------+--------------------------------------+----------------+ 
      | instanceid     | componenttype | status | runsonhost                           | omip           | 
      +----------------+---------------+--------+--------------------------------------+----------------+ 
      | agt_0000000002 | cinder-volume | active | 9099E61E-432D-E611-A2B4-042758045CB6 | 192.168.101.105 | 
      | agt_0000000001 | cinder-volume | active | 6094092C-D21D-B211-9287-0018E1C5D866 | 192.168.101.104 | 
      | agt_0000000000 | cinder-volume | active | 3E83D92A-D21D-B211-95E9-0018E1C5D866 | 192.168.101.103 | 
      +----------------+---------------+--------+--------------------------------------+----------------+

      The value of manageip indicates the management IP address of the controller node.

  5. Run the following commands to log in to a host with the blockstorage-driver-kvm001 role assigned:

    su fsp

    ssh fsp@Management IP address

    For example, run the following command:

    ssh fsp@192.168.101.103

    After you log in to the host as user fsp, run the su - root command to switch to user root.

    The default password of user fsp is Huawei@CLOUD8.

    The default password of user root is Huawei@CLOUD8!.

  6. Run the following command to query the storage type:

    python /usr/bin/info-collect-script/audit_resume/get_host_storage_info.py

    NOTE:

    If the storage type displayed in the command output is inconsistent with the snapshot type, go to 5 to log in to another host with the blockstorage-driver-kvm001 role assigned and perform the subsequent operations.

    • Information similar to the following is displayed:
      storage_type=dsware 
      addition info is : 
                manage_ip=172.28.0.231 
                vbs_url=172.28.6.1,172.28.6.0,172.28.0.2

      The storage type is dsware. The value of manage_ip is the IP address of the FusionStorage management node, and the value of vbs_url is the management IP address of the compute node. Go to 7.

  7. Run the following command to query the snapshot information:

    fsc_cli --ip Compute node management IP address --manage_ip FusionStorage Manager node IP address --port 10519 --querySnapshot --snapName Snapshot name on the storage device

    For example, run the following command:

    fsc_cli --ip 172.29.6.6 --manage_ip 172.29.0.231 --port 10519 --op querySnapshot --snapName snapshot-6f2282f1-22b3-41f1-8b3f-d15aa9790388

    The snapshot exists on the storage device if information similar to the following is displayed:

    result=0 
    snap_name=snapshot-6f2282f1-22b3-41f1-8b3f-d15aa9790388,father_name=,status=0,snap_size=1024,real_size=-1,pool_id=0,create_time=2014-09-18 07:23:21

    Check whether the snapshot exists on the storage device attached to the host with the blockstorage-driver-kvm001 role assigned in the AZ.

    • If yes, contact technical support for assistance.
    • If no, go to 8.

  8. Enter the secure operation mode (for details, see the secure operation mode of 3.) and run the following command to delete the snapshot:

    cinder snapshot-delete Snapshot ID

    For example, run the following command:

    cinder snapshot-delete 1cd5c6eb-e729-4773-b846-e9f1d3467c56

    If the command output contains ERROR, the volume fails to delete. Check whether the volume is deleted.

    • If yes, contact technical support for assistance.
    • If no, contact technical support for assistance.

Restoring the VM HA Flag Bit and Startup Mode

Context

If the VM HA flag bit and the startup mode have been modified before or after the database management data is restored using a backup, inconsistency may occur in the two configuration items after the database restoration. In this case, you need to restore the VM HA flag bit and the startup mode.

Possible Causes

The database is reverted using a data backup to the state when backup was created. However, after the backup was created, the HA flags or startup flags for VMs were changed. After the database is restored, the flags in the database are inconsistent with the actual flags.

Impact on the System

  • The VM rescheduling time and type are inconsistent with those set before the restoration.
  • The VM startup mode is inconsistent with that set before the restoration.

Procedure

  1. Obtain the operation report (for details, see Collecting Audit Reports) and check whether the VM HA flag bit or startup mode was changed after the management data was backed up and before the system database was restored.

    If the VM HA flag bit or startup flag was changed after the management data was backed up and before the system database was restored, all of the following conditions must be met:

    The value of tenant is the UUID of the tenant who performed the operation. 
    The value of res_id is the VM UUID. 
    The value of res_type is servers. 
    The value of time is the time after the management data was backed up and before the system database was restored. 
    In the action field, the value of the HTTP request method is POST, the HTTP request URL is /v2/tenant_id/servers/instance_id/metadata, the value of tenant_id is the tenant UUID, and the value of instance_id is the VM UUID. The metadata field contains one or more of _ha_policy_time, _ha_policy_type, and __bootDev. The action is changing metadata. 
    Alternatively, the value of the HTTP request method is DELETE, the HTTP request URL is /v2/tenant_id/servers/instance_id/metadata/key, the value of tenant_id is the tenant UUID, the value of instance_id is the VM UUID, and the value of key is _ha_policy_time, _ha_policy_type, or __bootDev. The action is deleting metadata.
    • If yes, go to 2.
    • If no, no further action is required.

  2. Use PuTTY by using the Cascaded-Reverse-Proxy to log in to the first host in the cascaded FusionSphere OpenStack system.

    The default username is fsp, and the default password is Huawei@CLOUD8.

  3. Run the following command and enter the password Huawei@CLOUD8! of user root to switch to user root:

    su - root

  4. Import environment variables. For details, see Importing Environment Variables.
  5. Run the following command to query the management IP address of a controller node:

    cps host-list

    The node whose roles value is controller is the controller node. The value of manageip indicates the management IP address.

  6. Run the following commands to log in to the controller node:

    su fsp

    ssh fsp@Management IP address

    su - root

    The default password of user fsp is Huawei@CLOUD8.

    The default password of user root is Huawei@CLOUD8!.

  7. Import environment variables. For details, see Importing Environment Variables.
  8. If the metadata was changed, run the runsafe command to enter the secure operation mode, enter the user password, and run the following command to set the VM HA flag bit and startup mode:

    nova meta instance_id set _ha_policy_time=time _ha_policy_type=type __bootDev=dev

    instance_id specifies the instance_id value obtained in 1. Add the values of _ha_policy_time, _ha_policy_type, and __bootDev in the action field in 1 as key values to the command line. The values of time, type, and dev are the new values of _ha_policy_time, _ha_policy_type, and __bootDev, respectively.

    NOTE:

    The preceding three items may not be present concurrently. Add the item value only when the corresponding field is present. For example, if only the value of __bootDev is changed, restore only the value of __bootDev. In this case, run the following command:

    nova meta instance_id set __bootDev=dev

  9. If the metadata was deleted, run the runsafe command to enter the secure operation mode, enter the user password, and run the following command to delete the VM HA flag bit and startup mode:

    nova meta instance_id delete _ha_policy_time _ha_policy_type __bootDev

    instance_id specifies the instance_id value in the action field in 1. If the metadata of the VM whose UUID is the instance_id value is deleted in 1, add the deleted metadata to the command line.

    NOTE:

    The preceding three items may not be present concurrently. Add the item value only when the corresponding field is present. For example, if only the value of __bootDev is deleted, delete only the value of __bootDev. In this case, run the following command:

    nova meta instance_id delete __bootDev

Stuck Volumes

Context

A volume in the available or in-use state is an available volume. A volume in the transient state (including creating, downloading, deleting, error_deleting, error_attaching, error_detaching, attaching, detaching, uploading, retyping, reserved, and maintenance) is an unavailable volume. If a volume is kept stuck in a transient state for more than 24 hours, restore the volume based on site conditions.

Parameter Description

The name of the audit report is VolumeStatusAudit.csv. Table 18-65 describes parameters in the report.

Table 18-65 Parameters in the audit report

Parameter

Description

volume_id

Specifies the volume ID.

volume_displayname

Specifies the name of the volume created by a user.

volume_name

Specifies the volume name on the storage device.

volume_type

Specifies the volume type.

location

Specifies the volume location.

status

Specifies the volume status.

last_update_time

Specifies the last time when the volume was updated.

Possible Causes

  • An exception occurred during a volume service operation, delaying the update of the volume status.
  • The database is rolled back using the management data backup to the state when the backup was created. However, after the backup was created, the states of one or more volumes were changed. After the database is restored, records of these volume states are restored to their former states in the database.

Impacts on the System

The stuck volume becomes unavailable but consumes system resources.

Procedure

Handle the volume based on the volume states listed in Table 18-66. For other situations, contact technical support for assistance.

Table 18-66 Stuck volume handling methods

Volume Status

Transient State or Not

Description

Scenario

Handling Method

creating

Y

Creating

Creating a volume

For details, see Method 1.

downloading

Y

Downloading

Creating a volume from an image

For details, see Method 2.

deleting

Y

Deleting

Deleting a volume

Forcibly delete the volume. For details, see Method 3.

error_deleting

N

Deletion failed.

Volume deletion failed

Forcibly delete the volume. For details, see Method 3.

error_attaching

N

Attachment failed.

Volume attachment failed

Set the volume state to available or in-use. For details, see Method 4.

error_detaching

N

Detachment failed.

Volume detachment failed

Set the volume state to available or in-use. For details, see Method 4.

attaching

Y

Attaching

Attaching a volume

Set the volume state to available or in-use. For details, see Method 4.

If the volume is a DR placeholder volume, no action is required.

detaching

Y

Detaching

Detaching a volume

Set the volume state to available or in-use. For details, see Method 4.

uploading

Y

Uploading

Creating an image from a volume

For details, see Method 5.

retyping

Y

Migrating

A system exception occurs during storage migration.

For details, see Method 6.

reserved

N

Reserving

After VM live migration, the original volume is reserved. After the user confirms that the service is normal, the original volume is deleted.

For details, see Method 7.

maintenance

Y

Maintaining

A process exception occurs during data copy.

Contact technical support for assistance.

Method 1

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables to the node. For details, see Importing Environment Variables.
  3. Perform the following operations on the node to query information about the volume:

    1. Run the following command to enter the secure operation mode:

      runsafe

      Information similar to the following is displayed:

      Input command:
    2. Run the following command to query information about the volume status:

      cinder show Volume ID

      Check whether the value of status in the command output is consistent with the volume state in the audit report.

      • If yes, go to 4.
      • If no, no further action is required.

  4. View the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Check the output of cinder show Volume ID in 3.

    • If the value of source_volid is not None, the volume is created from the source volume whose ID is the value of source_volid. In this case, go to 6.
    • If the value of snapshot_id is not None, the volume is created from the snapshot whose ID is the value of snapshot_id. In this case, go to 6.
    • If volume_image_metadata exists, the volume is created from the image. In this case, go to 10.

    For other scenarios, go to 6.

  6. Query the volume storage type, host, and management information about the storage device. For details, see Querying Volume Attributes.

    • If the storage type is dsware, go to 7.

  7. Log in to the host where the blockstorage-driver role is assigned based on 5 in Handling Invalid Volume Snapshots, and then run the following command to view volume information:

    fsc_cli --ip Management IP address of the compute node --manage_ip IP address of the DSWare management node --port 10519 --op queryVolume --volName Volume name on the storage device

    Volume name on the storage device is obtained from the vol_name field of os-vol-pro-location-attr:provider_location in the cinder show Volume ID command output in 3.

    For example, run the following command:

    fsc_cli --ip 172.29.6.6 --manage_ip 172.29.0.231 --port 10519 --op queryVolume --volName volume-6f2282f1-22b3-41f1-8b3f-d15aa9790388

    Information similar to the following is displayed:

    vol_name=volume-6f2282f1-22b3-41f1-8b3f-d15aa9790388,father_name=,status=0,vol_size=1024,real_size=-1,pool_id=0,create_time=2014-09-18 07:23:21  
    result=0

    Check the value of status in the command output.

    • If the value is 0, go to 8.
    • If the value is not 0, go to 12.

  8. Enter the secure operation mode based on 3 and run the following command to set the volume status to available:

    cinder reset-state --state available Volume ID

    For example, run the following command:

    cinder reset-state --state available 5f27b4fd-ef1b-4726-8252-c7c95b714f29

  9. Enter the secure operation mode based on 3 and run the following command to query the volume status:

    cinder show Volume ID

    In the command output, check whether the value of status is available.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

  10. Enter the secure operation mode based on 3 and run the following command to set the volume status to error:

    cinder reset-state --state error Volume ID

    For example, run the following command:

    cinder reset-state --state error 5f27b4fd-ef1b-4726-8252-c7c95b714f29

  11. Enter the secure operation mode based on 3 and run the following command to query the volume status:

    cinder show Volume ID

    In the command output, check whether the value of status is error.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

  12. Enter the secure operation mode based on 3 and run the following command to delete the volume:

    cinder force-delete Volume ID

    For example, run the following command:

    cinder force-delete 1cd5c6eb-e729-4773-b846-e9f1d3467c56

    If the command output contains ERROR, the volume fails to delete. Check whether the volume is deleted.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 2

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables to the node. For details, see Importing Environment Variables.
  3. Perform the following operations on the node to query information about the volume:

    1. Run the following command to enter the secure operation mode:

      runsafe

      Information similar to the following is displayed:

      Input command:
    2. Run the following command to query information about the volume status:

      cinder show Volume ID

      Check whether the value of status in the command output is consistent with the volume state in the audit report.

      • If yes, go to 4.
      • If no, no further action is required.

  4. Enter the secure operation mode based on 3 and run the following command to set the volume status to available:

    cinder reset-state --state available Volume ID

    For example, run the following command:

    cinder reset-state --state available 5f27b4fd-ef1b-4726-8252-c7c95b714f29

  5. Enter the secure operation mode based on 3 and run the following command to query the volume status:

    cinder show Volume ID

    In the command output, check whether the value of status is available.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

  6. Enter the secure operation mode based on 3 and run the following command to delete the volume:

    cinder force-delete Volume ID

    For example, run the following command:

    cinder force-delete 1cd5c6eb-e729-4773-b846-e9f1d3467c56

    If the command output contains ERROR, the volume fails to delete. Check whether the volume is deleted.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 3

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables to the node. For details, see Importing Environment Variables.
  3. Perform the following operations on the node to query information about the volume:

    1. Run the following command to enter the secure operation mode:

      runsafe

      Information similar to the following is displayed:

      Input command:
    2. Run the following command to query information about the volume:

      cinder show Volume ID

      Check whether the value of status in the command output is consistent with the volume state in the audit report.

      • If yes, go to 4.
      • If no, no further action is required.

  4. Check whether the volume is in the error_deleting state.

    • If yes, go to 6.
    • If no, go to 5.

  5. View the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 6.
    • If no, contact technical support for assistance.

  6. Enter the secure operation mode based on 3 and run the following command to delete the volume:

    cinder force-delete Volume ID

    For example, run the following command:

    cinder force-delete 1cd5c6eb-e729-4773-b846-e9f1d3467c56

    If the command output contains ERROR, the volume fails to delete. Check whether the volume is deleted.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 4

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables to the node. For details, see Importing Environment Variables.
  3. Perform the following operations on the node to query information about the volume:

    1. Run the following command to enter the secure operation mode:

      runsafe

      Information similar to the following is displayed:

      Input command:
    2. Run the following command to query the volume status:

      cinder show Volume ID

      Check whether the value of status in the command output is consistent with the volume state in the audit report.

      • If yes, go to 4.
      • If no, no further action is required.

  4. Check whether the volume is in the error_attaching or error_detaching state.

    • If yes, go to 6.
    • If no, go to 5.

  5. View the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 6.
    • If no, contact technical support for assistance.

  6. Enter the secure operation mode based on 3 and run the following command to query the volume status:

    cinder show Volume ID

    In the command output, check whether the value of attachments is left blank.

    • If yes, go to 7.
    • If no, go to 10.

  7. Enter the secure operation mode based on 3 and run the following command to set the volume status to available:

    cinder reset-state --state available Volume ID

    For example, run the following command:

    cinder reset-state --state available 5f27b4fd-ef1b-4726-8252-c7c95b714f29

  8. Enter the secure operation mode based on 3 and run the following command to query the volume status:

    cinder show Volume ID

    In the command output, check whether the value of status is available.

    • If yes, go to 9.
    • If no, contact technical support for assistance.

  9. Log in to the host housing the active GaussDB or gaussdb_nova node based on Logging In to the Active GaussDB Node. Then run the following script to clear residual attachment information about the volume from the VM:

    sh /usr/bin/info-collect-script/audit_resume/delete_bdm.sh VM ID Volume ID

    Enter the database password twice as prompted in the script. If the input parameters are correct, the message "... doesn't exist in the block_device_mapping table" can be ignored. Then check whether the command output contains "...block_device_mapping failed".

    • If yes, contact technical support for assistance.
    • If no, go to 12.

  10. Enter the secure operation mode based on 3 and run the following command to set the volume status to in-use:

    cinder reset-state --state in-use Volume ID

    For example, run the following command:

    cinder reset-state --state in-use 5f27b4fd-ef1b-4726-8252-c7c95b714f29

  11. Enter the secure operation mode based on 3 and run the following command to query the volume status:

    cinder show Volume ID

    In the command output, check whether the value of status is in-use.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

  12. Enter the secure operation mode based on 3 and run the following command to attach the volume:

    nova volume-attach VM ID Volume ID Mount point

    For example, run the following command:

    nova volume-attach 9d5f3cb2-690b-4725-a4e0-cfe96640fb37 8ce350f0-ebcd-4fce-8d14-2e797128ac74

    The volume failed to attach if the command output contains ERROR.

    • If the command output does not contain ERROR, the volume is successfully attached. In this case, go to 13.
    • If "ERROR (CommandError): No server with a name or ID of 'XXXXX' exists." is displayed, the VM does not exist, and no further action is required. In other situations, contact technical support for assistance.

  13. Enter the secure operation mode based on 3 and run the following command to query the volume status:

    cinder show Volume ID

    In the command output, check whether the value of status is in-use.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 5

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables to the node. For details, see Importing Environment Variables.
  3. Perform the following operations on the node to query information about the volume:

    1. Run the following command to enter the secure operation mode:

      runsafe

      Information similar to the following is displayed:

      Input command:
    2. Run the following command to query information about the volume:

      cinder show Volume ID

      Check whether the value of status in the command output is consistent with the volume state in the audit report.

      • If yes, go to 4.
      • If no, no further action is required.

  4. View the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Enter the secure operation mode based on 3 and run the following command to query the volume status:

    cinder show Volume ID

    In the command output, check whether the value of attachments is left blank.

    • If yes, go to 6.
    • If no, go to 8.

  6. Enter the secure operation mode based on 3 and run the following command to set the volume status to available:

    cinder reset-state --state available Volume ID

    For example, run the following command:

    cinder reset-state --state available 5f27b4fd-ef1b-4726-8252-c7c95b714f29

  7. Enter the secure operation mode based on 3 and run the following command to query the volume status:

    cinder show Volume ID

    In the command output, check whether the value of status is available.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

  8. Enter the secure operation mode based on 3 and run the following command to set the volume status to in-use:

    cinder reset-state --state in-use Volume ID

    For example, run the following command:

    cinder reset-state --state in-use 5f27b4fd-ef1b-4726-8252-c7c95b714f29

  9. Enter the secure operation mode based on 3 and run the following command to query the volume status:

    cinder show Volume ID

    In the command output, check whether the value of status is in-use.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 6

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Run the following command to query information about the volume status:

    cinder show Volume UUID

    NOTE:

    Volume UUID is the volume_id value in the audit report.

    Check whether the value of status in the command output is consistent with the volume state in the audit report.

    • If yes, go to 3.
    • If no, contact technical support for assistance.

  3. View the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 4.
    • If no, contact technical support for assistance.

  4. Confirm with the tenant whether the volume status is to be changed.

    • If yes, go to 5.
    • If no, no further action is required.

  5. Run the following command to query the volume status:

    cinder show Volume UUID

    NOTE:

    Volume UUID is the volume_id value in the audit report.

    In the command output, check whether the value of attachments is left blank.

    • If yes, go to 6.
    • If no, go to 8.

  6. Run the following command to set the volume status to available:

    cinder reset-state --state available --reset-migration-status --attach-status detached Volume UUID

    NOTE:

    Volume UUID is the volume_id value in the audit report.

  7. Run the following command to query the volume status:

    cinder show Volume UUID

    NOTE:

    Volume UUID is the volume_id value in the audit report.

    In the command output, check whether the value of status is available.

    • If yes, go to 10.
    • If no, contact technical support for assistance.

  8. Run the following command to set the volume status to in-use:

    cinder reset-state --state in-use --attach-status attached --reset-migration-status Volume UUID

    NOTE:

    Volume UUID is the volume_id value in the audit report.

  9. Run the following command to query the volume status:

    cinder show Volume UUID

    NOTE:

    Volume UUID indicates the UUID of the volume whose status needs to be reset.

    In the command output, check whether the value of status is in-use.

    • If yes, go to 10.
    • If no, contact technical support for assistance.

  10. Contact the user to change the disk type of the volume again.

Method 7

  1. Log in to any controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Run the following command to query information about the volume status:

    cinder show Volume UUID

    NOTE:

    Volume UUID is the volume_id value in the audit report.

    Check whether the value of status in the command output is consistent with the volume state in the audit report.

    • If yes, go to 3.
    • If no, contact technical support for assistance.

  3. View the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 4.
    • If no, contact technical support for assistance.

  4. Obtain the original volume of the reserved volume which is a copy of the original volume and check whether services on the VM where the original volume resides are normal.

    Run the following command to obtain the ID of the original volume:

    cinder show uuid | grep description

    Check whether the command output contains "migration src for Original volume ID".

    • If yes, confirm with the user whether services on the VM where the original volume resides are normal. If VM services are normal, submit the migration task on the Service OM migration task page.
    • If no, contact technical support for assistance.

Handling Inconsistent Volume Attachment Information

Context

Volume attachment information includes the following:

  • Volume attachment status recorded in Cinder management data
  • Volume attachment status recorded in Nova management data
  • Volume-host attachment information recorded on storage devices
  • Volume-VM attachment information recorded in hypervisors

The system audits the preceding volume attachment information.

NOTE:

Do not handle the invalid volumes handled in Invalid Volumes.

Parameter Description

The name of the audit report is VolumeAttachmentAudit.csv. Table 18-67 describes parameters in the report.

Table 18-67 Parameter description

Parameter

Description

volume_id

Specifies the volume ID (UUID).

volume_displayname

Specifies the name of the volume created by a tenant.

volume_type

Specifies the volume type.

location

Specifies the detailed volume attachment information recorded in the Cinder service, Nova service, hypervisors, or storage devices. Its values can be:

  • ATTACH_TO: indicates information recorded in Cinder management data about the VM to which the volume is attached. For example:

    'ATTACH_TO': [{'instance_id': u'e32d3e98-2d61-4652-b805-afccb7fbc592'}]

    instance_id specifies the VM UUID.

  • BELONG_TO: indicates information about the host to which the volume belongs.
  • HYPER_USE: indicates information about the VM using this volume in the hypervisor. For example:

    'HYPER_USE': [{'instance_name': u'instance-00000003', 'location': u'4709A23A-9340-1185-8567-000000821800'}]

    instance_name specifies the VM name, and location specifies the host to which the volume belongs.

  • MAP_TO: indicates information recorded on the storage device about the host to which the volume is mapped. For example:

    'MAP_TO': [{'location': u'68B81E2C-08BB-1170-8567-000000821800'}]

  • NOVA_USE: indicates information recorded in Nova management data about the VM to which the volume is attached. For example:

    'NOVA_USE': [{'instance_name': u'instance-00000004', 'instance_id': u'e32d3e98-2d61-4652-b805-afccb7fbc592'}]

attach_status

Specifies the volume attachment status. Its values can be:

  • management_status: Comparison result between attachment information in the Cinder service and the Nova service. match indicates that the information is consistent, and not_match indicates that information is inconsistent.
  • cinder_status: Comparison result between attachment information in the Cinder service and the storage device. match indicates that the information is consistent, and not_match indicates that information is inconsistent.
  • hyper_status: Comparison result between attachment information in the hypervisor and the storage device. match indicates that the information is consistent, and not_match indicates that information is inconsistent.

Impact on the System

  • Residual volume attachment information may reside on hosts.
  • Volume-related services may be affected. For example, if a volume has inconsistent attachment information recorded, FusionStorage Block may fail to create snapshots for the volume.

Possible Causes

  • The database is reverted using a data backup to the state when the backup was created. However, after the backup was created, one or more volumes were attached to VMs. After the database is restored, records of the volume attachment information are deleted from the database, but the information resides on the storage devices.
  • If a service operation fails and is rolled back, volume-related information rollback fails.

Procedure

  1. Handle the volume based on the volume states listed in Table 18-68. For other situations, contact technical support for assistance.

    Table 18-68 Volume attachment information handling methods

    management_status

    cinder_status

    hyper_status

    Possible Scenario

    Handling Method

    not_match

    not_match

    not_match

    N/A

    See Method 9.

    not_match

    not_match

    match

    The volume is not recorded as attached in the Cinder service, but is recorded in the Nova service and on the VM.

    See Method 6.

    not_match

    match

    not_match

    The volume is recorded as attached in the Cinder service, but is not recorded in the Nova service or on the VM.

    For details, see Method 1.

    not_match

    match

    match

    1. The volume is recorded as attached in the Cinder service and on the VM, but is not recorded in the Nova service.

    2. The volume is not recorded as attached in the Cinder service or on the VM, but is recorded in the Nova service.

    3. The volume is recorded as attached in the Cinder, and the virtual machine mounted in the Nova and VM is deleted.

    1. For details, see Method 2.

    2. For details, see Method 5.

    3. For details, see Method 8.

    match

    not_match

    not_match

    1. The volume is attached to an orphan VM.

    2. The volume is recorded as attached in the Cinder service, Nova service, and on the VM, but is not recorded on the host.

    1. For details, see Orphan VMs.

    2. For details, see Method 3.

    3.For details, see Method 7.

    match

    not_match

    match

    The volume is not recorded as attached in the Cinder service but is recorded on the host.

    For details, see Method 3.

    match

    match

    not_match

    The volume is recorded as attached in the Cinder service and the Nova service, but is not recorded on the VM.

    For details, see Method 4.

Method 1

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Perform the following operations to query the VM attributes:

    1. Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input command:
    2. Run the following command to query the attachment information about the volume:

      cinder show uuid

      In the command output:

      The value of attachments may contain multiple records. Locate the record in which the value of server_id is the faulty VM ID and obtain the information about attachments_id, which is the attachment information required for detaching volumes.

  4. Run the following commands to clear the attachment information about the volume:

    token=`openstack token issue | awk '/id/{print $4}' | awk '/id/{print $1}'`

    TENANT_ID=`openstack project list | grep $OS_PROJECT_NAME | awk '{print $2}'`

    sh /usr/bin/info-collect-script/audit_resume/cinder_cmd.sh --operate detach --vol_id uuid --attachment_id attachment_id --os-token "$token" --tenant_id $TENANT_ID

    Check whether the command output contains SUCCESS.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 2

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Perform the following operations to query the VM attributes:

    1. Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input command:
    2. Run the following command to query the attachment information about the volume:

      cinder show uuid

      In the command output:

      The value of attachments may contain multiple records. Locate the record in which the value of server_id is the faulty VM ID and obtain the information about attachments_id, which is the attachment information required for detaching volumes.

  4. Run the following command to clear the attachment information about the volume:

    cinder volume-detach [--attachment_uuid <attachment_id>] <uuid>

    Run the cinder show command to check whether the volume is successfully detached.

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Run the runsafe command to enter the secure operation mode, enter the user password, and run the following command to attach the volume to the specified VM:

    nova volume-attach vm-uuid uuid

    Check whether the volume is successfully attached.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 3

  1. Obtain the volume attributes. For details, see Querying Volume Attributes.
  2. Check whether the host records attachment information of the volume based on MAP_TO in the audit results.

    • If yes, go to 3.
    • If no, go to 4.

  3. Detach the volume based on its status.

    Run the following command to check whether the volume is in the in-use state:

    cinder show uuid

    • If yes, contact technical support for assistance.
    • If no, perform the following operation.
      • dsware: run the following command to detach the volume.

        vbs_cli -c detachwithip -v Volume name on the storage device -i dsware_manage_ip -p 0

      • san or v3: perform the following steps to detach the volume from the host.
        1. Run the following command to perform the hash operation on the location field in HYPER_USE.

        python -c "print hash('2046B2B1-4D27-E711-8084-A08CF81DF81E')"

        Information similar to the following is displayed:

        8697324651793178011

        Run the cinder show volume_id|grep lun command to obtain the value of lun_id.

        Information similar to the following is displayed:

        |os-volume-replication:driver_data|{"ip": "10.31.4.54", "ESN": "fa5a871550301236", "vol_name":"volume-705dfb73-aeb4-4d95-b2de-f662112527f5", "pool": 0, "lun_id": "150"} 
        1. Log in to OceanStor DeviceManager, choose Provisioning > Host, and select the host in 1. Then, click Properties and choose Owning Host Group to find the name of the host group to which the host belongs.

        1. Choose Provisioning > Host > Host Group, select the host group in 2, and click Properties to view the mapping view of the host group.

        1. Choose Provisioning > Mapping View, select the mapping view obtained in 3, and view the corresponding LUN group.

        1. Choose Provisioning > LUN > LUN Group, select the LUN group obtained in 4, view its LUNs, and search for the LUN ID obtained in 1 (You need to click in the upper right corner to display Set the information items to be displayed, and then you can view the LUN ID in the list). Then, select the target LUN andclick Remove to delete the mappingbetween the LUN and host.

  4. Attach the volume based on its status.

    • dsware: run the following command to attach the volume.

      vbs_cli -c attachwithip -v Volume name on the storage device -i dsware_manage_ip -p 0

    • san or v3: perform the following steps to attach the volume to the host.
      1. Run the following command to perform the hash operation on the location field in HYPER_USE.

      python -c "print hash('2046B2B1-4D27-E711-8084-A08CF81DF81E')"

      Information similar to the following is displayed:

      8697324651793178011

      Run the cinder show volume_id|grep lun command to obtain the value of lun_id.

      Information similar to the following is displayed:

      |os-volume-replication:driver_data|{"ip": "10.31.4.54", "ESN": "fa5a871550301236", "vol_name":"volume-705dfb73-aeb4-4d95-b2de-f662112527f5", "pool": 0, "lun_id": "150"} 
      1. Log in to OceanStor DeviceManager, choose Provisioning > Host, and select the host in 1. Then, click Properties and choose Owning Host Group to find the name of the host group to which the host belongs.

      1. Choose Provisioning > Host > Host Group, select the host group obtained in 2, and click Properties to view the mapping view of the host group.

      1. Choose Provisioning > Mapping View, select the mapping view obtained in 3, and view the corresponding LUN group.

      1. Choose Provisioning > LUN > LUN Group, select the target LUN group in 4, click Add Object, and add the LUN to the LUN group based on the the LUN ID obtained in 1 to complete the mapping between the LUN and host.

      1. Log in to the FusionSphere OpenStack host and run the hot_add command to scan disks to ensure that the host can use the volume.

Method 4

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Contact technical support engineers to check whether the volume can be detached from the VM.

    If there is no risk, perform the following operations to detach the volume from the VM:
    1. Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input command:
    2. Run the following command to detach the volume from the VM:

      nova volume-detach vm-uuid uuid

      Check whether the volume is successfully attached.

      • If yes, no further action is required.
      • If no, contact technical support for assistance.

Method 5

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query the VM status:

    nova show VM ID | grep OS-SRV-USG:launched_at

    Check whether time information is displayed in the command output.

    • If yes, go to 4.
    • If no, the VM is an invalid VM. In this case, ensure that the invalid VM has been properly deleted. For details, see Invalid VMs.

  4. Log in to the host housing the active GaussDB or gaussdb_nova node according to Logging In to the Active GaussDB Node. Then run the following script to clear the residual attachment information of the volume from the VM:

    sh /usr/bin/info-collect-script/audit_resume/delete_bdm.sh VM ID Volume ID

    Enter the database password twice as prompted in the script and check whether success is displayed in the command output.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 6

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query the host accommodating the VM:

    nova show VM ID | grep OS-EXT-SRV-ATTR:host

    • If no host information is displayed, contact technical support for assistance.
    • If host information is displayed, go to 4.

  4. Run the following command to query the VM instance name:

    nova show VM ID | grep OS-EXT-SRV-ATTR:instance_name

    • If no instance name information is displayed, contact technical support for assistance.
    • If instance name information is displayed, go to 5.

  5. Run the following command to query the device information of the VM volume:

    nova volume-attachments VM ID

    • If no device information is displayed, contact technical support for assistance.
    • If device information is displayed, go to 6.

  6. Run the following command to query volume information:

    cinder list --all-t

    Check whether the volumes attached to the VM have the VM information, record all the volumes that are not attached to the VM, and record the associated device information based on the command output of 5.

    • If all the attached volumes have the VM information, no further action is required.
    • If a residual volume exists, go to 7.

  7. Log in to the host accommodating the VM and run the following command to clear the residual information at the virtualization layer:

    nova_virsh_cmd virsh-detach-disk VM instance name

    Check whether the command output contains success.
    • If yes, go to 8.
    • If no, contact technical support for assistance.

  8. Run the following command to clear the residual volume attachment information of the VM:

    sh /usr/bin/info-collect-script/audit_resume/delete_bdm.sh VM ID Volume ID

    Enter the database password twice as prompted in the script and check whether success is displayed in the command output.

    • If yes, go to 9.
    • If no, contact technical support for assistance.

  9. Run the following command to attach the volume to the VM:

    nova volume-attach vm-uuid volume-uuid

    Check whether the volume is successfully attached.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 7

NOTE:

If inconsistencies are found during a system audit, after alarms generated for the Nova and Cinder components are cleared and the disk array has recovered, wait for a processing period (10 minutes), and then perform a manual audit or wait for the next round of system audit to complete. If the audit alarms still are not cleared, perform the steps in this section.

  1. Log in to any host in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Based on the hosts displayed in HYPER_USE, identify the redundant host in MAP_TO and thereby identify the correct host. Example:

    volume_id,volume_displayname,volume_type,location,attach_status6120ca56-12e0-4835-a0fb-1fab336bfd8f,bl_sys,v3,"{'ATTACH_TO': [{'instance_id': u'910f2975-f2f9-4376-8809-ce1d56527dba'}], 'BELONG_TO': u'cinder@IPSAN_V3', 'HYPER_USE': [{'instance_name': u'instance-00000002', 'location': u'8B13124C-7C15-11CF-8567-000000821800'}], 'MAP_TO': [{'location': u'7913694372495436396'}, {'location': u'8031912861936646597'}], 'NOVA_USE': [{'instance_name': u'instance-00000002', 'instance_id': u'910f2975-f2f9-4376-8809-ce1d56527dba'}]}","{'management_status': 'match', 'cinder_status': 'not_match', 'hyper_status': 'not_match'}"

    Run the following command to obtain the hash of the location field in HYPER_USE.

    python -c "print hash('8B13124C-7C15-11CF-8567-000000821800')",

    The result is as follows:

    7913694372495436396

    Compare the hash result with the locations in MAP_TO: the correcthost is 7913694372495436396, and the redundant one is 8031912861936646597.

  4. Run the following command to check whether the host is location in the audit report.

    nova show uuid

    uuid indicates instance_id in the audit report. For example,

    nova show 910f2975-f2f9-4376-8809-ce1d56527dba

    The result is as follows:

    +---------------------------------+---------------------------------------------------+
    |  Property                       |   Value                                           |                                                                                 +---------------------------------+---------------------------------------------------+
    |  OS-DCF:diskConfig              |  MANUAL                                           |
    |  OS-EXT-Az:availability_zone    |  az1.dc1                                          |
    |  OS-EXT-SRV-ATTR:host           |  8B13124C-7C15-11CF-8567-000000821800           |
    |  OS-EXT-SRV-ATTR:hostname       |  bl-vm     

    Check whether the value Value of host is the same as the location in HYPER_USE in 3.

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Log in to the disk array management page and locate the LUN ID of the volume.

    Run the following command on controller nodes to convert the volume ID to LUN ID based on the volume type:

    1. volume_type: v3

      python -c "from cinder.volume.drivers.huawei.huawei_driver import huawei_utils;print huawei_utils.encode_name('volume_id')"

    2. volume_type: san

      python

      hash('volume-volume_id')

  6. Check whether the V3 disk array is used.

    • If yes, go to 7.
    • If no, the host in 3 is the correct host recorded in MAP_TO. Log in to the disk array and remove the mapping between the extra host recorded in MAP_TO and the LUN. No further action is required.

  7. Log in to the disk array and query the residual host ID.

    Log in to the v3 disk array and choose Provisioning > Host. Locate the name of the host group to which the extra host belongs to.

  8. Identify the mapping between the host group and LUN group.

    The host group name is OpenStack_HostGroup_7 obtained in step 7. Choose Provisioning > Host > Host Group and check the mapping view. The mapping view that corresponds to the host group is OpenStack_Mapping_View_7. Choose Provisioning > Mapping View and check the mapping view. The LUN group that corresponds to OpenStack_Mapping_View_7 is OpenStack_LunGroup_7.

  9. Remove the mapping between the extra host and LUN.

    Choose Provisioning > LUN > LUN Group and locate the LUN group obtained in 8. Select the LUN group and click Remove Object. In the search box, enter the volume name obtained in 5 and click Search. Select the obtained LUN and remove it.

Method 8

  1. Log in to the host accommodating the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query volume attachment information.

    cinder show volume_id

    • If the volume status is in-use and shareable is False, perform 5.
    • If the volume status is in-use and shareable is True, perform 4.

  4. In the output obtained in 3, attachments may contain multiple server_ids.

    Check whether there is only one server_id.

    • If yes, go to 5.
    • If no, go to 7.

  5. Run the following command to clear volume attachment information.

    cinder reset-state --attach-status detached volume_id

  6. Run the following command to query volume attachment information.

    cinder show volume_id

    In the output, check whether the volume status is available.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

  7. Log in to the node that hosts the database. For details, see Logging In to a Host Running the Database Service in HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide.
  8. Perform the following steps to log in to the Cinder database.

    1. Run the following command to switch to the database account.

      su gaussdba

    2. Run the following command to log in to the Cinder database.

      gsql cinder

      Default password: FusionSphere123

  9. Run the following command to clear volume attachment information.

    UPDATE VOLUME_ATTACHMENT SET ATTACH_STATUS='detached' WHERE INSTANCE_UUID='server_id' AND VOLUME_ID='volume_id' AND ATTACH_STATUS='attached';

    In the command, replace server_id with the ID of the deleted VM.

  10. Go back to the controller node, and run the following command to query volume attachment information.

    cinder show volume_id

    In the output, check whether attachments contain server_id of the deleted VM.

    • If yes, contact technical support for assistance.
    • If no, no further action is required.

Method 9

  1. Log in to the host accommodating the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query the volume details:

    cinder show uuid

    Information similar to the following is displayed:

    GNC-RN01-SRN01-HOST01:~ # cinder show 608a83de-7749-4014-a6e9-a683df2bd849+---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+|                Property               |                                                                                                                                                   Value                                                                                                                                                    |+---------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+|              attachments              | [{'server_id': 'f2891596-36d3-48b3-8b9e-635bbf33b796', 'attachment_id': '54820174-57cb-4291-8968-9f0c68d70ad6', 'attached_at': '2018-12-13T10:03:28.569558', 'host_name': None, 'volume_id': '608a83de-7749-4014-a6e9-a683df2bd849', 'device': '/dev/vda', 'id': '608a83de-7749-4014-a6e9-a683df2bd849'}]  ||          availability_zone           |                                                                                                                                                 az1.gncdc1                                                                                                                                                 ||                bootable               |                                                                                                                                                    true    

    Check the value of bootable.

    • If the value is true, the volume is a system disk of the VM.
    • If the value is false, the volume is a data disk of the VM.

  4. Based on server_id, obtained in 3, of the VM to which the volume is attached, run the following command to check whether the VM has been deleted:

    nova instance-action-list server_id

    Information similar to the following is displayed:

    GNC-RN01-SRN01-HOST01:~ # nova instance-action-list f2891596-36d3-48b3-8b9e-635bbf33b796+------------+------------------------------------------+---------+----------------------------+| Action     | Request_ID                               | Message | Start_Time                 |+------------+------------------------------------------+---------+----------------------------+| create     | req-6aed42ea-69d7-484b-9060-7b3db2c74ee4 | -       | 2018-12-13T10:03:20.221384 || reboot     | req-613bca14-59c0-4dc6-92cb-7c17d89418fc | -       | 2019-01-17T06:51:32.423814 || reboot     | req-e50dde15-2d8a-493b-a3dc-e782188d1101 | -       | 2019-01-18T04:14:31.036272 || reschedule | req-62d463a0-b21d-4d16-9998-3a036885e6e9 | -       | 2019-01-18T04:18:38.157223 || reschedule | req-a82ec634-f484-4f6a-8dd7-b4d6858e79ec | -       | 2019-01-18T04:48:48.389963 || reschedule | req-b221f108-c2ca-4aa3-90cb-fc2aef677d13 | -       | 2019-01-18T05:18:47.891362 || reschedule | req-788d648b-f3b8-4460-a9d7-442fc5a9b5cc | -       | 2019-01-18T05:43:27.167151 || reschedule | req-75230bb1-f00f-46bd-b8da-e0d2f53b7c80 | -       | 2019-01-18T06:13:18.024269 || reschedule | req-751b95b2-88e6-4882-b8be-0bc57a670452 | -       | 2019-01-18T06:38:40.423816 || reschedule | req-a0c99c91-d01b-4112-8f09-90b686a5e30c | -       | 2019-01-18T07:03:31.775610 || stop       | req-2bbbf4df-460a-49b8-97d5-3c808f23cc88 | -       | 2019-01-18T07:31:09.810141 || reschedule | req-cffe4c6d-91ab-4939-8a74-4d798ab443d4 | -       | 2019-01-18T07:34:32.334318 || reschedule | req-278d4de1-af3a-4ed0-bf12-916ed82a0a3f | -       | 2019-01-18T07:54:47.257292 || reschedule | req-e103d7dc-59cd-4563-9520-39effd410015 | -       | 2019-01-18T08:19:30.707745 || reschedule | req-e270f850-7d1a-4983-9c42-0c3ab7dcc818 | -       | 2019-01-18T08:49:21.560663 || reschedule | req-da522b9e-a1ba-40f6-a6e2-ea7262c6a22c | -       | 2019-01-18T09:09:20.646063 || delete     | req-66c48553-3338-478e-8965-7d878c08ac03 | -       | 2019-01-18T09:34:49.506238 |+------------+------------------------------------------+---------+----------------------------+
    • If the last operation record of the VM is the deletion operation and the residual volume is the system disk of the VM, the volume can be deleted.
    • If the residual volume is the data disk of the VM, confirm with the user whether the volume can be deleted.

  5. Run the following commands to reset the volume status to available and delete the volume:

    cinder reset-state --state available --attach-status detached volume_id

    cinder delete volume_id

    Information similar to the following is displayed:

    GNC-RN01-SRN01-HOST01:~ #cinder reset-state --state available --attach-status detached 608a83de-7749-4014-a6e9-a683df2bd849GNC-RN01-SRN01-HOST01:~ # cinder list --all-t | grep 608| 608a83de-7749-4014-a6e9-a683df2bd849 | f18d48fa0063461e8cac98a7231308b6 | available | vnfm_volume_4c3b8544-6c23-41c0-851b-de1b2f89cc1a |  5   | VolumeService01 |   true   |    False    |                                      |GNC-RN01-SRN01-HOST01:~ # cinder delete 608a83de-7749-4014-a6e9-a683df2bd849
    Request to delete volume 608a83de-7749-4014-a6e9-a683df2bd849 has been accepted

    Check whether the command output contains "Request to delete volume volume_id has been accepted".

    • If yes, the volume has been deleted. No further action is required.
    • If no, contact technical support for assistance.

Detecting and Deleting nova-novncproxy Zombie Processes

Context

The Nova novncproxy service may generate zombie processes due to the websockify module or the Python version. However, the probability for this issue is found to be very low. To improve system stability, the system also audits and automatically clears these zombie processes.

Parameter Description

The audit configuration item is max_zombie_process_num, which is stored in the /etc/info-collect.conf file on the novncproxy-deployed node. The configuration item specifies the threshold for automatically clearing zombie processes. The default value is 10.

  • The system automatically clears these zombie processes only when the number of zombie processes on a compute node exceeds the threshold.
  • If the threshold is set to -1, the system does not clear zombie processes.

The name of the audit report is zombie_process_hosts.csv. Table 18-69 describes parameters in the report.

Table 18-69 Parameter description

Parameter

Description

host

Specifies the compute node name.

zombieprocess

Specifies the number of zombie processes detected on the node.

is restart

Specifies whether zombie processes have been cleared. The default value is True.

Impact on the System

  • Excessive zombie processes may deteriorate the system performance.
  • After a zombie process is deleted, the nova-novncproxy service restarts, which interrupts in-use novnc services.

Possible Causes

  • The websockify module used by the nova-novncproxy service is defective.
  • Python 2.6 is defective.

Procedure

No operation is required. The system automatically clears excessive zombie processes based on the specified threshold.

NOTE:

Before the system automatically clears a zombie process, this zombie process is attached to process 1. Therefore, this zombie process clearing does not immediately take effect.

Detecting and Deleting Residual Cold Migration Data

Context

FusionSphere OpenStack stores VM cold migration information in the database and will automatically delete it after the migration confirmation or rollback. However, if an exception occurs, residual information is not deleted from the database.

Parameter Description

The name of the audit report is cold_cleaned.csv. Table 18-70 describes parameters in the report.

Table 18-70 Parameter description

Parameter

Description

instance_uuid

Specifies the universally unique identifier (UUID) of the VM that is cold migrated.

Impact on the System

  • This issue incurs a higher quota usage than the actual usage.
  • This issue adversely affects the code implementation and resource usages of subsequent VM cold migrations.

Possible Causes

  • The nova-compute service is restarted during the migration.
  • The VM status is reset after the migration.

Procedure

No manual operations are required.

Detecting and Deleting Residual Live Migration Data

Context

FusionSphere OpenStack stores VM live migration information in the database and will automatically delete it after the migration completion or rollback. However, if an exception occurs, residual information is not deleted from the database.

Parameter Description

The name of the audit report is live_cleaned.csv. Table 18-71 describes parameters in the report.

Table 18-71 Parameter description

Parameter

Description

instance_uuid

Specifies the universally unique identifier (UUID) of the VM that is live migrated.

Impact on the System

This issue adversely affects resource usages of subsequent VM live migrations.

Possible Causes

The nova-compute service is restarted during the migration.

Procedure

No manual operations are required.

Handling the Intermediate State of the Cold Migration

Context

FusionSphere OpenStack stores VM cold migration information in the database. If the source node is restarted during the migration confirmation, the cold migration may be stuck in the intermediate state.

Parameter Description

The name of the audit report is cold_stuck.csv. Table 18-72 describes parameters in the report.

Table 18-72 Parameter description

Parameter

Description

instance_uuid

Specifies the universally unique identifier (UUID) of the VM that is cold migrated.

migration_id

Specifies the ID of the cold migration record.

migration_updated

Specifies the time when the migration is confirmed.

instance_updated

Specifies the time when the VM information is updated.

migration_status

Specifies the VM migration task status.

Impact on the System

  • Maintenance operations cannot be performed on the VM.

Possible Causes

  • The nova-compute service on the source node is restarted during the cold migration.
  • Network exceptions cause packet loss.

Procedure

  1. Log in to the first host in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Perform the following operations to query the VM attributes:

    1. Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input command:
    2. Check the migration_status value in the audit report. If the value is reverting, perform 3.c to 3.e. Otherwise, perform 3.c and then go to 3.e.
    3. Run the following commands to clear the intermediate state of the cold migration:

      python /usr/bin/info-collect-script/audit_resume/clean_stuck_mi

      gration.py instance_uuid migration_id

      instance_uuid and migration_id can be obtained from the audit report.

    4. Run the following command to migrate VM stuck in the reverting state back to the original host:

      nova resize-revert instance_uuid

    5. Run the following command to query the VM status:

      nova show uuid

      Check whether the VM changes to the active state.

      • If yes, no further action is required.
      • If no, contact technical support for assistance.

Handling Abnormal Hosts That Adversely Affect Cold Migrated VMs

Context

A VM is running on a host. If the host becomes faulty, VM services will be interrupted. In addition, if the source host becomes faulty during a VM cold migration, the cold migration will be adversely affected. Conduct an audit to detect the cold migrated VMs that are adversely affected by faulty hosts in the system.

Parameter Description

The name of the audit report is host_invalid_migration.csv. Table 18-73 describes parameters in the report.

Table 18-73 Parameter description

Parameter

Description

id

Specifies the ID of the cold migration record.

instance_uuid

Specifies the universally unique identifier (UUID) of the VM that is cold migrated.

source_compute

Specifies the source host in the cold migration.

source_host_state

Specifies the status of the source host.

Impact on the System

Maintenance operations cannot be performed on the VM.

Possible Causes

  • The source host is powered off.
  • The compute role of the source host is deleted.
  • The nova-compute service on the source host runs improperly.

Before handling the audit result, ensure that no service exception alarm has been generated in the system. If any host becomes faulty, replace the host by performing operations provided in section Replacing Hosts and Accessories from HUAWEI CLOUD Stack 6.5.0 Parts Replacement. To delete a host, perform operations provided in section Deleting a Host from an AZ from HUAWEI CLOUD Stack 6.5.0 O&M Guide.

Procedure

  1. Log in to the first host in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Perform the following operations to restore the VM status:

    1. Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input command:
    2. Run the following command to query hosts:

      cps host-list

      Check whether the command output contains a host whose ID is the same as the source_compute value in the audit report.

      • If yes, go to the next step.
      • If no, go to 4.
    3. Locate the host whose status value is fault in the command output and check whether its services cannot be restored.
      • If yes, go to 4.
      • If no, restore the services and perform the audit again.
    4. Run the following command to query hosts:

      cps host-list

      Locate the host whose ID is the same as the source_compute value in the audit report and check whether the host has the compute role assigned.

      • If yes, the nova-compute service is normal and operations cannot be performed for the VM. Contact technical support for assistance.
      • If no, go to 4.

  4. Perform the following operations to clear unmaintainable cold migration information:

    1. Run the following command to delete residual cold migration data:

      python /usr/bin/info-collect-script/audit_resume/clean_stuck_migration.py instance_uuid id

      instance_uuid and id can be obtained from the audit report.

    2. Run the following command to query the VM status:

      nova show uuid

      Check whether the VM changes to the active state.

      • If yes, no further action is required.
      • If no, contact technical support for assistance.

Handling Redundant Neutron Namespaces

Context

In centralized DHCP scenarios, a network has been deleted, but its DHCP namespace still exists. This namespace is known as a redundant one. In distributed DHCP scenarios, the namespace of a network on a node is redundant if the node does not contain a port for the network.

After the user confirms that a DHCP namespace is redundant, restart the neutron-dhcp-agent to delete the namespace.

In centralized router scenarios, a router has been deleted, but its namespace still exists. This namespace is known as a redundant one. In distributed router scenarios, the namespace of a router on a node is redundant if all subnets of the router do not have VMs on the node.

Parameter Description

The name of the audit report is redundant_namespaces.csv. Table 18-74 describes parameters in the report.

Table 18-74 Parameter description

Parameter

Description

host_id

Specifies the universally unique identifier (UUID) of the node with redundant namespaces.

namespace_list

Specifies the list of redundant namespaces.

Possible Causes

When networks are deleted in batches, the RPC messages consumed by dhcp-agent are transmitted in serial mode, which is prone to message stack in the message queue. If dhcp-agent is disconnected from RabbitMQ in this case, the RPC broadcast messages will be lost, causing the failure to delete the DHCP namespaces of some networks.

Impact on the System

The system contains residual DHCP or router namespaces.

Procedure

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host..
  2. Import environment variables to the node. For details, see Importing Environment Variables.

    Perform the following operations to query the DHCP or router deployment mode:
    1. Enter the secure operation mode. The following information is displayed:
      Input command:
    2. Query the DHCP or router deployment mode.
      1. To query the DHCP deployment mode

        Run the following command:

        cps template-params-show --service neutron neutron-server|grep dhcp_distributed

        If True is displayed in the command output, distributed DHCP is used. Otherwise, centralized DHCP is used.

      2. To query the router deployment mode

        Run the following command:

        cps template-params-show --service neutron neutron-openvswitch-agent|grep enable_distributed_routing

        If True is displayed in the command output, distributed router is used. Otherwise, centralized router is used.

        • In centralized DHCP scenarios, perform 3 to 6.
        • In distributed DHCP scenarios, perform 7 to 9.
        • In centralized router scenarios, perform 10 to 12.
        • In distributed router scenarios, perform 13 to 16.

  3. Determine the node containing a redundant Neutron namespace based on host_id in the audit report. Then log in to the node and import environment variables by repeating 1 and 2.
  4. Enter the secure operation mode according to Command Execution Methods and check for the redundant DHCP namespace on the host.

    1. Enter the secure operation mode. The following information is displayed:
      Input command:
    2. Run the following command to check whether the DHCP namespace exists:

      ip netns | grep namespace_id

      NOTE:

      namespace_id specifies the ID of each namespace in the namespace_list field of the audit report.

      • If yes, go to the next step.
      • If no, the namespace is not redundant. No further action is required.

  5. Enter the secure operation mode and run the following command to check for the network of the redundant namespace in the system:

    neutron net-show network_id

    network_id specifies the network ID for namespace_id in 4.

    An example is provided as follows:

    If namespace_id is qdhcp-9c4c4872-af61-4fe0-9148-04324233a5e9, network_id is 9c4c4872-af61-4fe0-9148-04324233a5e9.

    Check whether the network of the redundant namespace exists in the system.

    • If yes, the namespace is not redundant. No further action is required.
    • If no, go to the next step.

  6. Enter the secure operation mode and run the following command to delete the redundant DHCP namespace from the node:

    ip netns del namespace_id

    The namespace_id value can be obtained in 4.

  7. Enter the secure operation mode according to Command Execution Methods and check whether the node accommodating the redundant namespace has a network port.

    1. Enter the secure operation mode. The following information is displayed:
      Input command:
    2. Run the following command to check whether the node contains a network port:

      neutron port-list --network_id network_id --binding:host_id host_id

      NOTE:

      network_id can be obtained in 5, and host_id is the host_id value in the audit report.

      In the command output:

      • If only one distributed_dhcp_port record is displayed, this node does not contain other network ports. Go to the next step.
      • If multiple distributed_dhcp_port records are displayed, the namespace is not redundant. No further action is required.

  8. Perform operations provided in 3 and 4 to log in to the node containing the redundant DHCP namespace and check whether the redundant DHCP namespace exists, respectively.
  9. If the node has a redundant namespace, go to 6.
  10. Enter the secure operation mode according to Command Execution Methods and check for the redundant router namespace on the host.

    1. Enter the secure operation mode. The following information is displayed:
      Input command:
    2. Run the following command to check whether the router namespace exists:

      ip netns | grep namespace_id

      NOTE:

      namespace_id specifies the ID of each namespace in the namespace_list field of the audit report.

      • If yes, go to the next step.
      • If no, the namespace is not redundant. No further action is required.

  11. Enter the secure operation mode and run the following command to check for the router of the redundant namespace in the system:

    neutron router-show router_id

    The value of router_id is the router ID corresponding to the value of namespace_id in 10.

    Check whether the router of the redundant namespace exists in the system.

    • If yes, the namespace is not redundant. No further action is required.
    • If no, go to the next step.

  12. Delete the redundant router namespaces on the node in secure mode:

    1. Run the following command to check whether a residual port is in the router namespace:

      ip netns exec namespace_id ip addr

      Determine whether a residual port is in the router namespace based on the command output:

      • If a port name starts with qr- or rfp-, perform the following operations:
        1. Run the following command to add the port with the name starting with qr- to the the main namespace:

          ip netns exec namespace_id ip link set qr-xxx netns 1

        2. Run the following command to delete the port with the name starting with qr- of the br-int network bridge:

          ovs-vsctl del-port br-int qr-xxx

        3. Run the following command to delete the port with the name starting with rfp-xxx from the router namespace:

          ip netns exec namespace_id ip link delete rfp-xxx

      • If no port name starts with qr- or rfp-, go to next step.
    2. Run the following command to delete the residual router namespace:

      ip netns del namespace_id

      NOTE:

      To obtain namespace_id, see 10.

  13. Enter the secure operation mode according to Command Execution Methods and check whether the node accommodating the redundant namespace has a network port.

    1. Enter the secure operation mode. The following information is displayed:
      Input command:
    2. Query the network of the subnet connected to the router and check whether the router exists and query its network ID.

      Run the following command to obtain the port ID of the router:

      neutron router-port-list router_id

      If no port ID is displayed or the command output is left blank, the namespace is redundant, and go to 14 to 16. Otherwise, perform the rest operation in 13.

      Run the following command to obtain the network ID of the router port:

      neutron port-show router_port_id -c network_id

    3. Run the following command to check whether the node contains a network port:

      neutron port-list --network_id network_id --binding:host_id host_id

      NOTE:

      The host_id value can be obtained from the audit report.

      In the command output, check whether the namespace is redundant.

      • If all the networks on the router have no VM ports, the router namespace is redundant. Go to the next step.
      • If the networks on the router have VM ports, the router namespace is not redundant. No further action is required.

  14. Run the following command to query whether the residual binding relationship between the router and l3 agent exists:

    neutron l3-agent-list-hosting-router router_id | grep host_id

    NOTE:

    host_id is host_id in the audit report.

    Determine whether the residual binding relationship between the router and l3 agent exists.

    • If any command output is displayed, the residual binding relationship between the router and l3 agent exists. Run the following command to delete the residual binding relationship:

      neutron l3-agent-router-remove l3_agent_id router_id

      NOTE:

      l3_agent_id indicates the value in the first column in the command output.

    • If no command output is displayed, no residual binding relationship between the router and l3 agent exists. Go to next step.

  15. Perform operations provided in 3 and 4 to log in to the node containing the redundant namespace and check whether the redundant namespace exists, respectively.
  16. If the node has a redundant namespace, go to 12.

Handling Stuck Volume Snapshot in an Intermediate State

Context

A volume snapshot in the creating, deleting, or error_deleting state is unavailable. If a volume snapshot is kept stuck in an intermediate state for more than 24 hours, restore the volume snapshot based on site conditions.

Parameter description

The name of the audit report is SnapshotStatusAudit.csv. Table 18-75 describes parameters in the report.

Table 18-75 Parameter description

Parameter

Description

snap_id

Specifies the snapshot ID.

snap_name

Specifies the volume snapshot name on the storage device.

snap_type

Specifies the snapshot type.

status

Specifies the snapshot status.

last_update_time

Specifies the time when the snapshot is updated.

Impact on the System

The volume snapshot becomes unavailable but consumes system resources.

Possible Causes

  • An exception occurred during a volume snapshot operation, delaying the update of the volume snapshot status.
  • The database is rolled back using a data backup to the state when a backup was created. However, after the backup was created, the states of one or more volume snapshots were changed. After the database is restored, records of these volume snapshot states are restored to their former states in the database.

Procedure

Select the process methods based on the volume statuses listed in Table 18-76. For other situations, contact technical support for assistance.

Table 18-76 Stuck volume snapshot handing methods

Snapshot Status

In Transition Mode

Description

Possible Scenario

Handling Method

creating

Y

The volume is being created.

Creating a volume snapshot

Method 1

deleting

Y

The volume is being deleted.

Deleting a volume snapshot

Method 2

error_deleting

N

The deletion failed.

Snapshot deletion failed

Method 2

Method 1

  1. Log in to the first host in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Perform the following operations on the node to query information about the volume snapshot:

    1. Run the following command to enter the secure operation mode:

      runsafe

      Information similar to the following is displayed:

      Input command:
    2. Run the following command to query information about the volume snapshot:

      cinder snapshot-show Snapshot ID

      Check whether the value of status in the command output is consistent with the volume status in the audit report.

      • If yes, go to 4.
      • If no, contact technical support for assistance.

  4. View the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Set the volume snapshot status to error.

    Enter the secure operation mode (for details, see 3) and run the following command:

    cinder snapshot-reset-state snapshot ID --state error

  6. View the snapshot status.

    Enter the secure operation mode (for details, see 3) and run the following command:

    cinder snapshot-show Snapshot ID

    In the command output, check whether the value of status is error.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

  7. Run the following command to delete the volume snapshot:

    cinder snapshot-delete Snapshot UUID

    NOTE:

    The Snapshot UUID value is the value of snap_id obtained from the audit report.

  8. Run the following command to check whether the volume snapshot is deleted:

    cinder snapshot-show Snapshot UUID

    NOTE:

    The Snapshot UUID value is the value of snap_id obtained from the audit report.

    If information similar to the following is displayed, the volume snapshot is deleted:

    ERROR: No snapshot with a name or ID of 'e318e16e-5a1c-471f-89c2-5c76719aa346' exists.

    If the value of status in the command output is error_deleting, the volume snapshot fails to delete.

    If the value of status in the command output is deleting, the volume snapshot is being deleted. Wait for about one minute and perform 8 again until the volume snapshot is deleted or fails to delete.

    Check whether the volume snapshot is successfully deleted.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 2

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Perform the following operations on the node to query information about the volume snapshot:

    1. Run the following command to enter the secure operation mode:

      runsafe

      Information similar to the following is displayed:

      Input command:
    2. Run the following command to query information about the volume snapshot:

      cinder snapshot-show Snapshot ID

      Check whether the value of status in the command output is consistent with the volume snapshot status in the audit report.

      • If yes, go to 4 when the volume is in deleting status, and go to 5 when the volume is in error_deleting status.
      • If no, no further action is required.

  4. View the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 5.
    • If no, contact technical support.

  5. Set the volume snapshot status to available.

    Enter the secure operation mode (for details, see 3) and run the following command:

    cinder snapshot-reset-state snapshot ID --state error

  6. Delete a volume snapshot.

    Enter the secure operation mode (for details, see 3) and run the following command:

    cinder snapshot-show Snapshot ID

  7. Check whether the volume snapshot is successfully deleted.

    Enter the secure operation mode (for details, see 3) and run the following command:

    cinder snapshot-show Snapshot ID

    If the information similar to the following is displayed, the volume snapshot is deleted successfully.

    ERROR: No snapshot with a name or ID of 'e318e16e-5a1c-471f-89c2-5c76719aa346' exists.

    In the output, if the volume status is error_deleting, the volume snapshot fails to be deleted.

    In the output, if the volume status is deleting, the volume snapshot is being deleted. Perform 7 again until the volume snapshot is successfully deleted or fails to be deleted.

    Check whether the volume snapshot is successfully deleted.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Handling Virtual Network Orphan Port

Context

Neutron database considers that virtual network orphan ports are used by VMs. Actually, Neutron database does not receive the request from Nova database after the port is unbounded from VMs, leading an inconsistency between Nova database and Neutron database. An orphan port cannot be used by VMs.

Parameter Description

The name of the audit report is neutron_wild_ports.csv. Table 18-77 describes parameters in the report.

Table 18-77 Parameter description

Parameter

Description

port_id

UUID of the orphan port

device_id

The VM UUID of the orphan port

Possible Causes

If Nova database binds a NIC to a VM, it updates port attributes including device_id. If Nova database detects a problem, it enters the rollback process and clears the port attributes. The device_id becomes invalid due to a call problem but port information including device_id is still left in Neutron database.

Impact on the System

VMs cannot use a residual port.

Procedure

  1. Log in to any controller node host in the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following commands to check whether the device_id in the audit report exists:

    1. Run the following command to enter the secure mode:

      runsafe

      Information similar to the following is displayed:

      Input command:
    2. nova show device_id

      For example, run the following command:

      nova show 1cd5c6eb-e729-4773-b846-e9f1d3467c56

      The VM does not exist in Nova if information similar to the following is displayed:

      ERROR (CommandError): No server with a name or ID of '1cd5c6eb-e729-4773-b846-e9f1d3467c56' exists.

      The VM does not exist in Nova.

      • If yes, this is not an orphan port, ignore it.
      • If no, go to the next step.

  4. Obtain the port_id of the cascading FusionSphere OpenStack system based on the port_id of the cascaded FusionSphere OpenStack system in the audit report.

    1. Run the following command to enter the secure mode:

      runsafe

      The following information is displayed:

      Input command:
    2. Run the following command to obtain the port_id of the cascading FusionSphere OpenStack system:

      neutron port-show port_id

      For example, run the following command:

      neutron port-show fbb94d50-8878-41c5-9492-f7c5dc362b46

      The port_id indicates the UUID of the port in the cascading FusionSphere OpenStack system. If the port name contains no UUID, the port_id accommodating the cascading FusionSphere system does not exist.

      Run the command to check whether the port_id of the cascading FusionSphere OpenStack system exists.

      • If yes, go to the next step.
      • If no, this is not an orphan port, ignore it. No further action is required.

  5. Log in to any controller node host in the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  6. Import environment variables to the node. For details, see Importing Environment Variables.
  7. Run the following command to enter the secure mode:

    runsafe

    Information similar to the following is displayed:

    Input command:

    Run the following command to check whether the port_id of the cascading FusionSphere OpenStack system exists:

    neutron port-show port_id

    Run the command to check whether the port_id of the cascading FusionSphere OpenStack system exists.

    • If yes, go to the next step.
    • If no, this is not an orphan port, ignore it. No further action is required.

  8. Run the following command to enter the secure mode:

    runsafe

    The following information is displayed:

    Input command:

    Run the following command to delete the VM information accommodating the port_id of the cascading FusionSphere OpenStack system:

    neutron port-update attributes port_id

    Attributes contain --device-id, --device-owner, and --binding:host_id, which are related to VM bounding. Set these attributes to ".

    For example: neutron port-update --device-id='' --device-owner='' --binding:host_id='' 3162e2fe-5607-4eef-804e-f6b24068fd3e

Handling Unavailable Bare Metal Servers

Context

Unavailable bare-metal server refers to the bare-metal server whose OS is unavailable or the bare-metal server that cannot be controlled by Ironic.

Users are recommended to delete unavailable bare-metal servers:

  • Users delete the bare-metal server, and recreate a bare-metal server using the accommodating physical server.
  • If the bare-metal server has changed, delete the bare-metal server and then delete the corresponding ironic node.
NOTE:

1.If the bare-metal server is not be deleted after it becomes unavailable, the server cannot be reinstalled as a computing node using PXE.

2.Unavaiable bare-metal server audit function supports only powered-on bare-metal servers. You need to power on the target bare-metal server before you enable the auditing function. Otherwise, the powered-off bare-metal server will be audited as an unavailable bare-metal server.

Parameter Description

The name of the audit report is invalid_ironic_nodes.csv. Table 18-78 describes parameters in the report.

Table 18-78 Parameter description

Parameter

Description

ironic_node_uuid

Specifies the UUID for bare-metal servers in Ironic

instance_uuid

Specifies the UUID for bare-metal servers in Nova

cps_id

Specifies the ID for bare-metal servers in CPS

Possible Causes

  • Back up the database before you delete the bare-metal server and then perform database restoration operations. In this case, the bare-metal server OS becomes unavailable.
  • After provisioning the bare-metal server, if the IPMI address of the bare-metal server changed or the bare-metal server is relocated, the bare-metal server looses its corresponding physical server.
  • In the auditing process, if the bare-metal server is being powered on or being created, it may be audited as an unavailable bare-metal server.

Impact on the System

  • The bare-metal server becomes unavailable due to the backup and restoration.
  • Residual system resources reside.

Procedure

  1. Check whether the CPS ID in the audit report is left blank.

    • If yes, the bare-metal server failed to be created. Delete the instance_uuid through a Nova API, and no further action is required.
    • If no, use the BMC or SSH to remotely log in to the bare-metal server and check whether the bare-metal server that cannot be started becomes unavailable.

  2. In the BMC remote login mode, if the bare-metal server cannot be started or be stuck in the Euler process, the bare-metal server becomes unavailable and needs to be cleared.
  3. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  4. Import environment variables. For details, see Importing Environment Variables.
  5. Enter the secure operation mode refers to 7.3-command execution mode and obtain the IP address of the External OM network plane from any controller node.For details,see Command Execution Methods.

    To enter the secure operation mode, the following information is displayed:

    Input command:

    Run the following command:

    cps host-list

    The node whose roles value is controller indicates the controller node. The value of omip indicates the IP address of External OM network plane.

  6. Run the following commands to log in to the controller node using the IP address of External OM plane obtained in 5:

    su fsp

    ssh fsp@omIP

    su - root

  7. Import environment variables. For details, see 4.
  8. Run the following command to clear unavailable bare-metal servers.

    NOTE:

    Bare-metal server clearing consumes 12 to 15 minutes because the system is cleaning disks.

    1. Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input OS_PASSWORD
    2. Enter the password of user OS_USERNAME as prompted.

      The following information is displayed after you enter the password:

      Input command:
    3. Run the following command to clear the bare-metal server:

      nova delete instance_uuid

      uuid can be obtained from the audit report.

      Check whether the command output contains ERROR.

      • If yes, contact technical support for assistance.
      • If no, no further action is required.

Handling Bare-metal Server Auditing Inconsistency

Context

The auditing inconsistency between Nova and Ironic produces orphan servers and invalid physical servers.

Orphan physical server

Nova management plane does not contain any information about the physical server, but Ironic contains information about Nova instances.

Invalid physical servers:

Nova management plane contains information about the physical server, but Ironic does not contain information about Nova instances.

Parameter Description

The name of the audit report is invalid_ironic_instances.csv. Table 18-79 describes parameters in the report.

Table 18-79 Parameter description

Parameter

Description

orphan_instance_in_ironic

Specifies the instance UUID of orphan physical servers

fake_instance_in_nova

Specifies the ID of invalid physical servers

Possible Causes

  • The database is rolled back using the management data backup to the state when the backup was created.
  • The system was not stable (for example, VMs were being live migrated) when the audit was conducted.
  • Some hosts are abnormal, resulting in that VMs on the hosts are incorrectly reported as invalid VMs. In this case, conduct the system audit again after the system recovers.
  • On the host, run system commands to manually create a VM or a bare-metal server, delete a VM or bare-metal server.

Impact on the System

  • Residual system resources reside.
  • Show unavailable physical server.

Procedure

  1. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  • Orphan physical server handling steps
  1. Check whether orphan servers created by the user exist (orphan_instance_in_ironic in the audit report).

    If yes, users independently decide whether to remain orphan servers, and no future action is required.

    If no, go to 2.

  2. Enter the secure operation mode based on en-us_topic_0040533306.xml and fetch the IP address of External OM network plane.

    To enter the secure operation mode, the following information is displayed:

    Input command:

    Run the following command:

    cps host-list

    The node whose roles value is controller is the controller node. The value of the External OM plane IP address is the Management IP address.

  3. Run the following command to log in to the controller node using the External OM plane IP address obtained in 2:

    su fsp

    ssh fsp@omIP

    su - root

  4. Import environment variables. For details, see Importing Environment Variables. Check whether the user wants to restore orphan physical servers.

    If yes, go to next step.

    If no, no further action is required. If the user needs to handle the physical server, contact technical support for assistance.

  5. Restore the orphan server to available status.

    1. Run the following command to query the server UUID:

      ironic node-list|grep instance-uuid

      The parameter instance-uuid indicates the orphan_instance_in_ironic field in the audit report.

    2. In the output, if Provisioning State is available, run the following command:

      ironic node-updatenode-uuidremove instance_uuid

      In the output, if Provisioning State is not available, run the following command:

      ironic node-set-provision-state node-uuid deleted

      Obtain the node-uuid from the previous step.

      Check whether the command output contains ERROR.

    • If yes, contact technical support for assistance.
    • If no, no further action is required.

  • Invalid physical server handling steps
  1. Perform the following operations to obtain details of the invalid VM and obtain the host information and instance_name:

    Run the following command to enter the secure operation mode:

    runsafe

    Information similar to the following is displayed:

    Input OS_PASSWORD

    Enter the password of user OS_USERNAME as prompted.

    Enter the password and the following information is displayed:

    Input command:

    Run the following command to obtain information about the invalid physical server.

    ironic node-list|grep uuid

    The parameter uuid can be obtained from fake_instance_in_nova in the audit report.

  2. Check whether information about the bare-metal server details can be queried:

    • If no, go to the next step.
    • If yes, the fault is falsely reported due to time difference. In this case, no further action is required.

  3. Contact the tenant to determine whether to delete the invalid VM.

    • If yes, go to next step.
    • If no, no further action is required. If the user needs to restore the physical server, contact technical support for assistance.

  4. Deleting a VM

    nova delete uuid

    The parameter uuid can be obtained from fake_instance_in_nova in the audit report.

    Check whether the command output contains ERROR.

    • If yes, contact technical support for assistance.
    • If no, no further action is required.

Handling Bare Metal Servers in an Intermediate State

Context

Bare-metal instances are stuck in an intermediate state for 24 hours and cannot be automatically restored due to a system exception in the service process.

Manually restore the bare-metal instance based on actual conditions.

Parameter Description

The name of the audit report is stucking_ironic_instances.csv. Table 18-80 describes parameters in the audit report.

Table 18-80 Parameter description

Parameter

Description

uuid

Specifies the UUID of bare-metal instances.

tenant_id

Specifies the tenant ID of bare-metal instances.

hyper_vm_name

Specifies the name of bare-metal instances in a hypervisor.

updated_at

Specifies the latest update time of bare-metal instance status.

status

Specifies the status of bare-metal instances.

task_status

Specifies the task status of bare-metal instances.

Possible Causes

An exception occurred during a bare-metal instance operation, delaying the update of the bare-metal instance status.

Impact on the System

The bare-metal instance becomes unavailable but consumes system resources.

Procedure

Restore the bare-metal instance based on the statuses listed in Table 18-81. For other situations, contact technical support for assistance.

Table 18-81 Bare-metal instance status methods

Bare-metal Instance Status

Bare-metal Instance Task Status

Possible Scenario

Handling Method

building

scheduling

Creating a bare-metal instance

Ask the tenant to delete the bare-metal instance. No further action is required.

building

None

Creating a bare-metal instance

Ask the tenant to delete the bare-metal instance. No further action is required.

building

block_device_mapping

Creating a bare-metal instance

Ask the tenant to delete the bare-metal instance. No further action is required.

building

networking

Creating a bare-metal instance

Ask the tenant to delete the bare-metal instance. No further action is required.

building

spawning

Creating a bare-metal instance

Ask the tenant to delete the bare-metal instance. No further action is required.

N/A

rebooting

Restarting a bare-metal instance

For details, see Method 2.

N/A

reboot_pending

Restarting a bare-metal instance

For details, see Method 2.

N/A

reboot_started

Restarting a bare-metal instance

For details, see Method 2.

N/A

rebooting_hard

Restarting a bare-metal instance

For details, see Method 2.

N/A

reboot_pending_hard

Restarting a bare-metal instance

For details, see Method 2.

N/A

reboot_started_hard

Restarting a bare-metal instance

For details, see Method 2.

N/A

powering_off

Stopping a bare-metal instance

Set the bare-metal instance status to active, for details, see Method 1.

N/A

powering_on

Starting a bare-metal instance

Set the bare-metal instance status to stopped, for details, see Method 1.

N/A

rescheduling

Rescheduling a bare-metal instance

Ask the tenant to delete the bare-metal instance. No further action is required.

N/A

deleting

Deleting a bare-metal instance

See Method 3.

Method 1

Reset the bare-metal instance status based on the preceding table for user confirmation and bare-metal instance restoration.

  1. Reset the bare-metal instance status based on the preceding table for user confirmation. For details, see Setting the VM Status. Restart the instance to ensure that the bare-metal instance is available.
  2. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  3. Import environment variables. For details, see Importing Environment Variables.
  4. Enter the secure operation mode based on Command Execution Methods and fetch the IP address of External OM network plane.

    To enter the secure operation mode, the following information is displayed:

    Input command:

    Run the following command:

    cps host-list

    The node whose roles value is controller indicates the controller node. The value of omip indicates the IP address of External OM plane.

  5. Run the following commands to log in to the controller node using the IP address of External OM plane obtained in 4:

    su fsp

    ssh fsp@omIP

    su - root

  6. Import environment variables. For details, see 3.
  7. Run the following command to query the bare-metal instance attribute:

    • Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input OS_PASSWORD
    • Enter the password of user OS_USERNAME as prompted.

      The following information is displayed after you enter the password:

      Input command:
    • Run the following command to query the bare-metal instance attribute:

      nova show uuid

      The parameter uuid can be obtained from the audit report.

  8. Run the following command to stop the bare-metal instance, and the uuid is obtained from the audit report. Then check whether the bare-metal instance status is set to stopped.

    nova stop uuid

    nova show uuid

    Check whether any exception occurs when you perform the preceding operations.

    • If yes, contact technical support for assistance.
    • If no, go to the next step.

  9. Run the following commands to start a bare-metal instance. Then check whether the bare-metal instance status is set to active.

    nova start uuid

    nova show uuid

    Perform the proceeding steps and check whether an exception occurs.

    • If yes, contact technical support for assistance.
    • If no, no further action is required.

  10. Ask users to check whether the bare-metal server can be logged in to properly.

    • If yes, contact technical support for assistance.
    • If no, no further action is required.

Method 2

Method: Reset the bare-metal instance status to active based on Setting the VM Status if the previous restarting fails and restart the instance again.

  1. Ask the tenant to restart the instance. Then check whether the bare-metal instance is successfully restarted.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 3

Method: Manually delete the bare-metal instance that is stuck in an intermediate state.

  1. Set the target bare-metal instance to active and re-delete the instance.For details, see Setting the VM Status Perform the following steps if the instance cannot be deleted.
  2. Log in to the first controller node in the AZ. For details, see Using SSH to Log In to a Host.
  3. Import environment variables. For details, see Importing Environment Variables.
  4. Enter the secure operation mode based on Command Execution Methods and fetch the IP address of External OM network plane.

    To enter the secure operation mode, the following information is displayed:

    Input command:

    Run the following command:

    cps host-list

    The node whose roles value is controller indicates the controller node. The value of omip indicates the IP address of External OM plane.

  5. Run the following commands to log in to the controller node using the IP address of External OM plane obtained in 4:

    su fsp

    ssh fsp@omIP

    su - root

  6. Import environment variables. For details, see 3.
  7. Run the following command to query the bare-metal instance attribute:

    • Run the following command to enter the secure operation mode:

      runsafe

      The following information is displayed:

      Input OS_PASSWORD
    • Enter the password of user OS_USERNAME as prompted.

      The following information is displayed after you enter the password:

      Input command:
    • Run the following command to check whether the corresponding bare metal instance exists:

      ironic node-list

      • If yes, the UUID in the query result is the server ID of the bare-metal server and is the same as that in the audit report. Record the UUID as ironic_uuid and perform 8.
      • If no, contact technical support for assistance. For details, see Deleting an Orphan Volume.

  8. Run the following command to delete a bare-metal instance:

    ironic node-set-provision-state ironic_uuid deleted

    The ironic_uuid can be obtained from 7. Check whether any exception occurs when you perform the preceding operations.

    • If yes, contact technical support for assistance.
    • If no, go to the next step.

  9. Run the following command to check whether the bare-metal instance has volumes attached in the database:

    nova show uuid

    The uuid can be obtained from the audit report. Check whether the os-extended-volumes:volumes_attached field contains any value.

    • If yes, the bare-metal instance has volumes attached. Record each volume_uuid. If yes, go to 10.
    • If no, the bare-metal instance does not have volumes attached and no action is required, and go to 11.

  10. If the bare-metal instance has volumes attached, run the following command to delete the volumes one by one:

    nova volume-detach uuid volume_uuid

    The uuid is obtained from the audit report and the volume_uuid is obtained from 9. Check whether an exception occurs during the execution.

    • If yes, contact technical support for assistance.
    • If no, go to 11.

  11. Delete the residual bare-metal handling process.

    The uuid is obtained from the audit report.

    Run the Nova command preferentially:

    nova delete uuid

    Run the following command to check whether the process is deleted:

    nova show uuid

    • If yes, the process is deleted, and no further action is required.
    • If no, contact technical support for assistance.

Handling the Transactions Nova Database Does Not Submit

Context

During the routine maintenance using FusionSphere, a switchover between the active and standby GaussDB, which makes Nova database fail to submit transactions for auditing. If Nova database does not submit transactions for auditing for a long period of time, the database connections will be occupied, so the number of available connections will decrease. The system will audit transactions that Nova database does not submit for more than one hour and ask users to manually clear them.

Parameter Description

The name of the audit report is nova_idle_transactions.csv. Table 18-82 describes parameters in the report.

Table 18-82 Parameter description

Parameter

Description

count

Specifies the number of transactions that Nova database does not submit for auditing more than one hour.

Possible Causes

During VM maintenance process, Nova database transactions fail to be automatically submitted when the switchover between the active and standby GaussDB occurs due to the overhigh workload or interrupted network.

Impact on the System

If the Nova database does not submit transactions for auditing for long period of time, the database connections will be occupied, so the number of available connections will decrease, and the transaction processing will become slow or even fail.

Procedure

  1. Log in to the first host in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Query the IP address of the host where Nova database is located. Nova database can be deployed independently or non-independently.

    • If the Nova database is deployed independently, run the following command:

      cps template-instance-list --service gaussdb_nova gaussdb_nova

    • If the output information contains Item Not Found!, run the following command:

      cps template-instance-list --service gaussdb gaussdb_nova

    • If the Nova database is deployed non-independently, run the following command:

      cps template-instance-list --service gaussdb gaussdb

    Record the ID of the database-located host based on the command output. In the output, active indicates the Nova database status, runsonhost indicates the host where Nova database locates, and IP address of the host where Nova database locates indicates the IP address of External OM network plane.

  4. Log in to the host where the nova database is deployed by performing steps provided in Logging In to a Host with a Role Deployed.
  5. Log in to the Nova database.

    Run the following commands to log in to the database:

    su - gaussdba

    gsql nova

    The password of the database is required during the execution of the commands and the default password is FusionSphere123.

    If the command output contains the following information, the login to the Nova database is successful. Otherwise, contact technical support for assistance.

    Type "help" for help.
    NOVA=#

  6. Delete the transactions that Nova database does not submit for more than one hour.

    Run the following command:

    select pg_terminate_backend(pid) from pg_stat_activity where state in ('idle in transaction') and now()-xact_start > interval '60 min' and datname = 'NOVA';

    If the output contains the following information, the transaction is deleted successfully. If no, contact technical support for assistance.

    PG_TERMINATE_BACKEND
    ----------------------
    (0 rows)

    Or

    PG_TERMINATE_BACKEND
    t
    (x rows)

  7. Run \q command to exit from the database.

Common Operations

Detaching Volumes from a VM and Creating a VM Again

Methods

The following methods are available:

  • Detach the damaged volume from a VM and have the tenant to use the damaged volume to create another VM.
  • Have the tenant create another VM of the same specifications as the original VM and stop the VM. Then replicate the data on the original VM volume to the corresponding new volume on the new VM. Delete the original volume after you confirm that the system services are not affected.

Procedure

  1. Set the target VM to the stopped state. For details, see Setting the VM Status.
  2. Use PuTTY by using the Cascaded-Reverse-Proxy to log in to the first host in the cascaded FusionSphere OpenStack system.

    The default username is fsp, and the default password is Huawei@CLOUD8.

  3. Run the following command and enter the password Huawei@CLOUD8! of user root to switch to user root:

    su - root

  4. Import environment variables. For details, see Importing Environment Variables.
  5. Perform the following operations to query the management IP addresses of controller nodes:

    cps host-list

    The node whose roles value is controller indicates a controller node. The value of manageip indicates the management IP address.

  6. Run the following commands to log in to the controller node:

    su fsp

    ssh fsp@Management IP address

    su - root

  7. Import environment variables. For details, see 4.