No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
FusionSphere OpenStack System Audit at the Cascading OpenStack

FusionSphere OpenStack System Audit at the Cascading OpenStack

Overview

When using the FusionSphere OpenStack cloud platform, problems such as resource residues and resource unavailable occur because of unexpected system failures (such as host reboot, process restart), or backup recovery, resulting in service failure. In this case, the resource pool consistency audit is performed to ensure data consistency in the resource pool, ensuring the normal operation of the services.

Scenarios

The system audit is required for the OpenStack-based FusionSphere system when data inconsistency occurs in the following scenarios:

  • When a service-related operation is performed, a system exception occurs. For example, when you create a VM, a host process restarts, causing the operation to fail. In this case, residual data may reside in the system or resources may become unavailable.
  • If any service-related operation is performed before a database is restored, residual data may reside in the system or resources may become unavailable.

The system audit is used to help administrators detect and handle data inconsistency. Therefore, conduct a system audit if any of the following conditions is met and log in to the first host in the OpenStack system to obtain the audit report, and locate and handle data inconsistency:

  • An alarm is generated indicating that data inconsistency verification fails.
  • Routine system maintenance is performed.
NOTE:
  • You are advised to conduct a system audit when the system is running stably. Do not use audit results when a large number of service-related operations are in progress.
  • During the audit process, if service-related operations (for example, creating a VM or expanding the system capacity) are performed or any system exception occurs (for example, a host is faulty), the audit result may be distorted. In this case, conduct the system audit again after the system recovers. In addition, confirm the detected problems again based on the audit result processing procedure.

Audit Mechanism

The following illustrates how a system audit works:

  • The system obtains service data from databases, hosts, and storage devices, compares the data, and generates an audit report.
  • This audit guide and Command Line Interface (CLI) commands are provided for users to locate and handle the data inconsistency problems listed in the audit reports.

You can conduct a system audit in either automatic or manual mode:

  • Automatic: The system automatically starts an audit at 4:00 every day and reports an alarm and generates an audit report if it detects any data inconsistency. If the alarm has been generated, the system does not generate a second one. If no data inconsistency problem is detected but an alarm has been generated for data inconsistency, the system automatically clears this alarm.
  • Manual: Log in to FusionSphere OpenStack and run the infocollect audit command to start an audit.

The log analysis function is used as an auxiliary of the audit tool to locate and handle inconsistency problems generated by backup and restoration operations. The log analysis function analyzes historical logs and then generates a report listing records of tenant operations on resources (such as VMs and volumes) in a specified time period.

Audit Process

If any audit alarm is generated, conduct an audit based on the process shown in Figure 18-1.

Figure 18-1 Audit process

Manual Audit

Scenarios

Conduct the manual audit when:

  • The system database is restored using a data backup.
  • After the inconsistency problems are automatically handled, the manual audit is used to verify that the problems have been rectified.

Prerequisites

Services in the system are running properly.

Procedure

  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.

    Please enter 1, enable Keystone V3 authentication with the built-in DC administrator.

  3. Run the following command to conduct a manual audit:

    infocollect audit --item ITEM --parameter PARAMETER --type TYPE

    If you do not specify the audit item, an audit alarm will be triggered when an audit problem is detected. However, if the audit item is specified, no audit alarm will be triggered when an audit problem is detected.

    Table 18-1 Parameter description

    Parameter

    Mandatory

    Description

    item

    Optional

    Specifies a specific audit item. If you do not specify the audit item, an audit alarm will be reported when an audit problem is detected. However, if the audit item is specified, no audit alarm will be reported when an audit problem is detected. Values:

    • 1001: indicates that a VM is audited. The following audit reports are generated after an audit is complete:
      • orphan_vm.csv: audit report about orphan VMs
      • invalid_vm.csv: audit report about invalid VMs
      • host_changed_vm.csv: audit report about VM location inconsistency
      • stucking_vm.csv: audit report about stuck VMs
      • diff_property_vm.csv: audit report about VM attribute inconsistency
      • diff_state_vm.csv: audit report about VM status inconsistency
    • 1006: indicates a residual BDM audit report of the VM. The following audit report is generated after an audit is complete:

      invalid_bdms.csv: residual BDM audit report of the VM in the cascading scenario

    • 1101: indicates that an invalid VM port is audited. The following audit report is generated after an audit is complete:

      stale_ports.csv: audit report about invalid VM ports in cascading scenarios

    • 1208: indicates that a volume in the cascading system is audited. The following audit reports are generated after an audit is complete:
      • CascadeFakeVolumeAudit.csv: audit report about fake volumes in the cascading system in cascading scenarios
      • CascadingMgmtVolumeStatusAudit.csv: audit report about volumes for management VMs in the cascading system
      • CascadeWildVolumeAudit.csv: audit report about orphan volumes in cascading scenarios
      • CascadeVolumeAttachmentAudit.csv: audit report about volume attachment statuses in cascading scenarios
      • CascadeVolumeMiddleStatusAudit.csv: audit report about stuck volumes in cascading scenarios
      • CascadeDiffStatusVolumeAudit.csv: audit report about volume status inconsistency in cascading scenarios
    • 1213: indicates that a snapshot is audited. The following audit reports are generated after an audit is complete:
      • CascadeFakeSnapshotAudit.csv: audit report about invalid snapshots in cascading scenarios
      • CascadeWildSnapshotAudit.csv: audit report about orphan snapshots in cascading scenarios
      • CascadeInstanceSnapshotAudit.csv: audit report about residual orphan child snapshots in cascading scenarios
      • CascadeDiffStatusSnapshotAudit.csv: audit report about volume snapshot inconsistency in cascading scenarios
    • 1702: indicates that an ECS snapshot is audited. The following audit report is generated after an audit is complete:

      images_vm_snapshots.csv: audit report about residual ECS snapshots

    If the parameter is not specified, all the audit items are performed by default.

    parameter

    Optional. This parameter can be specified only after the audit item is specified.

    Specifies an additional parameter. You can specify only one value which needs to match the item.

    • If item is set to 1001, you can set the value of vm_stucking_timeout which indicates the timeout threshold in seconds for VMs in a transient state. The default value is 14400. The value affects the audit report about stuck VMs.
    • If item is set to other values, no additional parameter is required.

    Example: --parameter vm_stucking_timeout=10000

    type

    Optional

    Specifies the additional parameter, which indicates whether an audit is synchronous or asynchronous. If this parameter is not specified for an audit, the audit is a synchronous one. The values are:

    • sync: specifies a synchronous audit. For details, see the following command.
    • async: specifies an asynchronous audit. For details, see Asynchronous Audit. The audit progress and audit result status of an asynchronous audit can be obtained by invoking the interface for querying the task status.

    Run the following command to detect a VM in the transient state for greater than or equal to 3600 seconds:

    infocollect audit --item 1001 --parameter vm_stucking_timeout=3600

    Information similar to the following is displayed:

    +--------------------------------------+----------------------------------+  
    | Hostname                             | Path                             |  
    +--------------------------------------+----------------------------------+  
    | CCCC8175-8EAC-0000-1000-1DD2000011D0 | /var/log/audit/2015-04-22_020324 |  
    +--------------------------------------+----------------------------------+

    In the command output, Hostname indicates the ID of the host for which the audit report is generated, and Path indicates the directory containing the audit report.

    You need log in to the host firstly and then view audit reports based on Collecting Audit Reports to view it.

Collecting Audit Reports

Scenario

  • Collect audit reports when an alarm indicating that data inconsistency verification fails is generated.
  • Log in to the environment and view the audit report when routine maintenance is performed.

Prerequisites

A local PC running the Windows operating system (OS) is available.

Procedure

  1. Log in to the cascading OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.

    Please enter 1, enable Keystone V3 authentication with the built-in DC administrator.

  3. Run the following command to obtain the External OM plane IP address of a host where the audit service is deployed. For details, see Command Execution Methods.

    cps template-instance-list --service collect info-collect-server

    Information similar to the following is displayed:

  4. If the current host is not the active node, log in to the host where the active audit service is deployed. For details, see Logging In to a Host with a Role Deployed.

    NOTE:

    If the current host is the active node, skip this step.

  5. Run the following commands to obtain the result of the last audit:

    cd /var/log/audit

    ls -l

    Information similar to the following is displayed:

    total 64
    drwxr-xr-x 4 cps cps 4096 Apr 15 09:26 2016-04-15_012615
    drwxr-xr-x 4 cps cps 4096 Apr 15 09:30 2016-04-15_012934
    drwxr-xr-x 4 cps cps 4096 Apr 15 09:52 2016-04-15_015227
    drwxr-xr-x 4 cps cps 4096 Apr 15 09:59 2016-04-15_015808
    drwxr-xr-x 4 cps cps 4096 Apr 15 10:05 2016-04-15_020431
    NOTE:
    • The value on the last line of the command output indicates the last audit time. For example, 2016-04-15_020431 indicates 02:04:31 a.m. on April 15, 2016.
    • If no result is returned, no audit report is available on the host.

  6. Method 1: View the audit reports directly.

    Enter the directory in the returned information and run the vi command.

    Method 2: Copy the audit reports for local viewing.

    1. Run the following command to create a temporary directory for saving the audit report:

      mkdir -p /home/fsp/last_audit_result

    2. Run the following command to copy the latest audit report on the host to the temporary directory:

      cp -r /var/log/audit/`ls /var/log/audit -Ftr | grep /$ | tail -1` /home/fsp/last_audit_result

    3. Run the following command to modify the temporary directory and file permission:

      chmod 777 /home/fsp/last_audit_result/ -R

    4. Use WinSCP or other tools to copy the last_audit_result folder in the /home/fsp/last_audit_result directory to the local PC.

      The default username is fsp, and the default password is Huawei@CLOUD8.

    5. Run the following command to delete the temporary folder from the host:

      rm -r /home/fsp/last_audit_result

Analyzing Audit Results

  • If multiple faults are displayed, rectify the faults one by one based on the procedure.
  • Before addressing audit issues at the cascading layer, confirm whether the same issues exist at the cascaded layer. If audit issues exist at both the cascading and cascaded layers, handle the audit issues at the cascaded layer first and perform an audit at the cascading layer again (you can manually trigger the audit or wait until the cascading FusionSphere OpenStack system automatically starts the daily audit). If the audit issues still exist at the cascading layer, address the issues.

Scenarios

Analyze the audit results when:

  • When receiving audit-related alarms, such as volume, VM, snapshot, and image audit alarms, log in to the system, obtain the audit reports, and rectify the faults accordingly.
  • After enabling the backup and restoration feature, log in to the system and perform a consistency audit. Then obtain the audit reports and rectify the fault accordingly.
  • To perform routine maintenance for the system, log in to the system and perform an audit. Then obtain the audit reports and rectify the fault accordingly.

Prerequisites

Procedure

  1. For an audit-related alarm, select the target audit report based on the information available in Details in Additional Info for further analysis.
  2. Check the audit report name.

    • VM Audit
      • If the report name is orphan_vm.csv, rectify the fault based on Orphan VMs. Otherwise, residual resources may exist.
      • If the report name is invalid_vm.csv, rectify the fault based on Invalid VMs. Otherwise, unavailable VMs may be visible to users.
      • If the report name is stucking_vm.csv, rectify the fault based on Stuck VMs. Otherwise, VMs may become unavailable.
      • If the report name is diff_property_vm.csv, rectify the fault based on VM Attribute Inconsistency. Otherwise, residual resources may exist.
      • If the report name is diff_state_vm.csv, rectify the fault based on VM Status Inconsistency. Otherwise, VMs may become unavailable.
      • If the report name is nova_quota_vcpus.csv,nova_quota_memory_mb.csv,nova_quota_instance.csv, rectify the fault based on Inconsistency Between the Quota in the Nova Database and the Actual Quota. Otherwise, VMs may create failed.
      • If the report name is nova_service_cleaned.csv, rectify the fault based on Nova-compute Service Residual. Otherwise, user experience is affected.
    • Volume Audit
      • If the report name is CascadeWildVolumeAudit.csv, rectify the fault based on Orphan Volumes. Otherwise, volumes in the cascaded system may become unavailable in the cascading system, which results in residual resources.
      • If the report name is CascadeFakeVolumeAudit.csv, rectify the fault based on Invalid Volumes. Otherwise, unavailable volumes may be visible to users.
      • If the report name is CascadeVolumeMiddleStatusAudit.csv, rectify the fault based on Stuck Volumes. Otherwise, volumes may become unavailable.
      • If the report name is CascadeDiffStatusVolumeAudit.csv, rectify the fault based on Inconsistent Volume Status. Otherwise, volumes may become unavailable because the volume states in the cascading and cascaded systems are inconsistent.
      • If the report name is CascadeVolumeAttachmentAudit.csv, rectify the fault based on Inconsistent Volume Attachment Information. Otherwise, the volume attachment may fail because volume attachment information in the cascading system is inconsistent with that in the cascaded system.
      • If the report name is CascadingMgmtVolumeStatusAudit.csv, rectify the fault based on Stuck Management Volumes. Otherwise, volumes may be unavailable, occupying storage space and wasting resources.
    • Snapshot Audit
      • If the report name is CascadeWildSnapshotAudit.csv, rectify the fault based on Orphan Volume Snapshots. Otherwise, residual resources may exist.
      • If the report name is CascadeFakeSnapshotAudit.csv, rectify the fault based on Invalid Volume Snapshots. Otherwise, unavailable volume snapshots may be visible to users.
      • If the report name is CascadeDiffStateSnapshotAudit.csv, rectify the fault based on Inconsistent Volume Snapshot Status. Otherwise, the volume snapshots may become unavailable because the states of the volume snapshots in the cascading and cascaded systems are inconsistent.
      • If the report name is CascadeInstanceSnapshotAudit.csv, rectify the fault based on Residual Orphan Child Snapshots. Otherwise, residual orphan child snapshot resources exist, occupying system resources.
    • Virtual Network Resource Audit
      • If the report name is stale_ports.csv, rectify the fault based on Invalid VM Ports. Otherwise, VM port information in the cascading system may be inconsistent with that in the cascaded system, affecting the VM usage.
    • Image Audit
      • If the report name is stucking_images.csv, rectify the fault based on Stuck Images. Otherwise, image resources may reside, occupying system resources.
    • Residual Audit Related to VM BDM
      • If the report name is invalid_bdms.csv, rectify the fault based on Residual VM BDM Data. Otherwise, the volume fails to attach to the VM, and the number of volumes is reduced if the volume is deleted.
    • Other Audit
      • If the report name is images_vm_snapshots.csv, rectify the fault based on Residual ECS Snapshots. Otherwise, residual ECS snapshot resources exist, occupying system resources.

Handling Audit Results

  • In this chapter, import environment variables before running the nova (in nova xxx format), cinder (in cinder xxx format), neutron (in neutron xxx format), glance (in glance xxx format), cps (in cps xxx format), or cpssafe command in FusionSphere OpenStack. For details, see Importing Environment Variables.
  • The commands in OpenStack can be performed in either secure mode or insecure mode. For details, see Importing Environment Variables.

Orphan VMs

Context

A VM is orphaned if it is present in the cascaded OpenStack system but not in the cascading OpenStack system.

If an orphan VM is not created by a tenant, it is recommended that the tenant deletes it to release computing and network resources.

Parameter Description

The name of the audit report for an orphan VM is orphan_vm.csv. The Table 18-2 describes parameters in the report.

Table 18-2 Parameter description

Parameter

Description

uuid

Specifies the VM universally unique identifier (UUID) at the Cascading OpenStack.

hyper_vm_name

Specifies the VM UUID at the Cascaded OpenStack.

host_id

Specifies the ID of the host accommodating the VM.

Possible Causes

  • If a database at the Cascading OpenStack is restored to a previous time point, VMs created after the time point become orphan VMs.
  • Users create VMs in the cascaded FusionSphere OpenStack system.
  • The system was unstable when the audit was conducted.
  • FusionStorage Manager, FusionManager, and eBackup VMs are not created in the management AZ as required. In this case, no further action is required.

Impact on the System

  • VMs orphaned by database restoration are invisible to tenants.
  • The system contains residual resources.

Procedure

  1. Log in to the cascading OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a node with the nova-proxy001 role deployed. For details, see Logging In to a Host with a Role Deployed.
  4. Run the following script to check whether the VM is an orphan VM:

    python2.7 /usr/bin/info-collect-script/audit_resume/judge_wild_vm.py uuid
    NOTE:

    The volume uuid is the UUID obtained from the audit report.

    • If the command output contains "This vm is normal", the VM is normal. No further action is required.
    • If the command output contains "This vm is wild", the VM is an orphan VM. Go to the next step.
    • If the command output displays other information, contact technical support for assistance.

  5. Confirm with the tenant about whether the VM is in use.

    • If yes, contact technical support for assistance.
    • If no, go to the next step.

  6. Log in to the cascaded OpenStack system. For details, see Using SSH to Log In to a Host.
  7. Run the following command to query the VM status:

    nova show uuid

    NOTE:

    The value of uuid is that of hyper_vm_name obtained from the audit report.

  8. If the VM is in the deleting status, run the following command to set the VM status to error:

    nova reset-state uuid
    NOTE:

    The value of uuid is that of hyper_vm_name obtained from the audit report.

  9. Run the following command in the cascaded FusionSphere OpenStack system to delete the VM:

    nova delete uuid
    NOTE:

    The value of uuid is that of hyper_vm_name obtained from the audit report.

  10. Run the following command in the cascaded FusionSphere OpenStack system to check whether the VM is deleted:

    nova show uuid
    NOTE:

    The value of uuid is that of hyper_vm_name obtained from the audit report. If the command output contains "ERROR (CommandError): No server with a name or ID of 'XXX (VM UUID)' exists", the VM is unavailable.

    • If the command output contains "ERROR (CommandError): No server with a name or ID of 'XXX (VM UUID)' exists", the VM is successfully deleted. No further action is required.
    • If the command output displays other information, contact technical support for assistance.

Invalid VMs

Context

An invalid VM is the VM that is recorded as normal in the database at the cascading layer but is not present to the database at the cascaded layer.

For an invalid VM, confirm with the tenant whether the VM is invalid. If the VM is invalid, delete the VM record from the database.

Parameter Description

The name of the audit report is invalid_vm.csv. Table 18-3 describes parameters in the report.

Table 18-3 Parameter description

Parameter

Description

uuid

Specifies the VM UUID at the cascaded layer.

tenant_id

Specifies the tenant ID.

hyper_vm_name

-

updated_at

Specifies the latest time when the VM status was updated.

status

Specifies the current VM status.

task_status

Specifies the current VM task status.

host_id

Specifies the name of the host accommodating the VM at the cascading layer.

Impact on the System

Users can query the VM using Nova APIs, but the VM does not exist in the cascaded FusionSphere OpenStack system.

Possible Causes

  • A database at the cascaded layer is restored to a previous time point. As a result, VMs created after the time point become invalid VMs.
  • VMs being live migrated during the audit process may be regarded as invalid VMs.
  • An exception occurs on a host at the cascaded layer. As a result, the VMs on the host become invalid VMs.
  • During the resizing or cold migration operation, an exception occurs in the system. In this case, the VMs may not exist at the cascaded layer and are regarded as invalid VMs.
  • FusionStorage Manager, FusionManager, and eBackup VMs are not created in the management AZ. As a result, these VMs are regarded as invalid VMs.

Procedure

  1. Log in to the cascading OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a node with the nova-proxy001 role deployed. For details, see Logging In to a Host with a Role Deployed.
  4. Run the following script to check whether the VM is an orphan VM:

    python2.7 /usr/bin/info-collect-script/audit_resume/judge_fake_vm.py uuid
    NOTE:

    The value of uuid is that in the audit report.

    • If the command output contains "This VM is normal", no further action is required.
    • If the command output displays "This VM is fake", go to the next step.
    • If the command output displays other information, contact technical support for assistance.

  5. Confirm with the tenant about whether the VM is in use.

    • If yes, contact technical support for assistance.
    • If no, go to the next step.

  6. Run the following command to query the VM status:

    nova show uuid

    NOTE:

    The value of uuid is that in the audit report.

  7. If the VM is in the deleting status, run the following command to set the VM status to error:

    nova reset-state uuid

    NOTE:

    The value of uuid is that in the audit report.

  8. Run the following command in the cascading FusionSphere OpenStack system to delete the VM:

    nova delete uuid
    NOTE:

    The value of uuid is that in the audit report.

  9. Run the following command in the cascading FusionSphere OpenStack system to check whether the VM is deleted:

    nova show uuid
    NOTE:

    The value of uuid is that in the audit report. If the command output contains "ERROR (CommandError): No server with a name or ID of 'XXX (VM UUID)' exists", the VM is unavailable.

    • If the command output contains "ERROR (CommandError): No server with a name or ID of 'XXX (VM UUID)' exists", the VM is successfully deleted. No further action is required.
    • If the command output displays other information, contact technical support for assistance.

Stuck VMs

Context

A stuck VM is the VM in the BUILD status or the task_state value of the VM is not left blank for more than 4 hours.

Manually restore the VM based on the VM status and the task status.

Parameter Description

The name of the audit report is stucking_vm.csv. Table 18-4 describes parameters in the report.

Table 18-4 Parameter description

Parameter

Description

uuid

Specifies the VM UUID at the Cascading OpenStack.

tenant_id

Specifies the tenant ID.

hyper_vm_name

Specifies the VM UUID at the Cascaded OpenStack.

updated_at

Specifies the last time when the VM status was updated.

status

Specifies the VM status.

task_status

Specifies the VM task status.

host_id

Specifies the name of the host accommodating the VM at the Cascading OpenStack.

Possible Causes

A system exception occurred when a VM service operation was in process.

Impact on the System

The VM becomes unavailable and occupies system resources.

Procedure

  1. Log in to the cascading OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command at the cascading FusionSphere OpenStack system to check whether the VM is in the deleting status:

    nova show uuid
    NOTE:

    The value of uuid is that in the audit report. In the command output, OS-EXT-SRV-ATTR:host indicates the cascaded FusionSphere OpenStack system where the VM locates.

    • If yes, go to 4.
    • If no, go to 6.

  4. Run the following command in the cascading FusionSphere OpenStack system to delete the VM:

    nova delete uuid
    NOTE:

    The value of uuid is that in the audit report.

  5. Run the following command at the cascading FusionSphere OpenStack system to check whether the VM is in the is deleted:

    nova show uuid
    NOTE:

    If the command output contains "ERROR (CommandError): No server with a name or ID of 'XXX' exists", the VM does not exist in the OpenStack system.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

  6. Log in to the cascaded FusionSphere OpenStack system where the VM locates based on the information about the host accommodating the VM, see Using SSH to Log In to a Host.
  7. Import environment variables. For details, see Importing Environment Variables.
  8. Run the following command in the cascaded FusionSphere OpenStack system to obtain status information about the VM at the Cascaded OpenStack:

    nova show uuid
    NOTE:

    The value of uuid is that of hyper_vm_name obtained from the audit report.

  9. Check whether the VM at the Cascaded OpenStack is a stuck VM.

  10. Run the following command in the cascading FusionSphere OpenStack system to check whether the VM at the Cascading OpenStack is still in the BUILD status:

    nova show uuid
    NOTE:

    The value of uuid is that in the audit report, and status in the command output indicates the current VM status.

    • If yes, contact technical support for assistance.
    • If no, no further action is required.

  11. Run the following command in the cascading FusionSphere OpenStack system to query the VM operation records:

    nova instance-action-list uuid
    NOTE:

    The value of uuid is that in the audit report. In the command output, Action indicates the operations performed on the VM, and Start_Time indicates the operation time.

    Check whether the latest operation performed on the VM is resizing or migrating operation.

    • If yes, go to the next step.
    • If no, contact technical support for assistance.

  12. Rectify the fault based on the operations provided in Restoring VMs with Resizing or Cold Migration Exception.

VM Attribute Inconsistency

Context

If the following VM information at the Cascading OpenStack and the Cascaded OpenStack is inconsistent, the VM attribute is inconsistent:

1. VM boot mode (recorded in the VM metadata)

2. VM NICs

Parameter Description

The name of the audit report for the VM attribute inconsistency is diff_property_vm.csv. Table 18-5 describes parameters in the report.

Table 18-5 Parameter description

Parameter

Description

uuid

Specifies the VM UUID at the Cascading OpenStack.

tenant_id

Specifies the tenant ID.

hyper_vm_name

Specifies the VM UUID at the Cascaded OpenStack.

updated_at

Specifies the last time when the VM status was updated.

status

Specifies the VM status.

task_status

Specifies the VM task status.

host_id

Specifies the ID of the host accommodating a VM recorded in the database at the Cascading OpenStack.

property_name

Specifies the VM attributes, including the following:

  1. bootDev: specifies the VM startup mode.
  2. nic: specifies the VM NICs.

property

Specifies the VM attributes at the Cascading OpenStack.

hyper_property

Specifies the VM attributes at the Cascaded OpenStack.

Possible Causes

  • The VM boot device or NIC data recorded at the Cascading OpenStack is inconsistent with those recorded at the Cascaded OpenStack due to a system fault (cascaded system exceptions) in the service process.
  • VM attributes are modified using the cascaded FusionSphere OpenStack interfaces, resulting in inconsistent VM attributes at the cascading and Cascaded OpenStacks.

Impact on the System

System data is inconsistent.

Procedure

  1. Check the value of property_name in the audit report.
  2. If the value is bootDev, rectify the fault based on Processing VM Boot Device Inconsistency.
  3. If the value is nic, rectify the fault based on Processing Redundant VM NICs.

Processing VM Boot Device Inconsistency

  1. Log in to the cascading OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a node with the nova-proxy001 role deployed. For details, see section Logging In to a Host with a Role Deployed.
  4. Run the following script to obtain the VM boot information at the Cascaded OpenStack:

    python2.7 /usr/bin/info-collect-script/audit_resume/get_vm_boot_info.py uuid
    NOTE:

    The value of uuid is that in the audit report.

    • If the command output displays "Success, boot info is [*]", go to the next step. (Content in the square bracket indicates VM startup mode at the Cascaded OpenStack.)
    • If the command output displays "please contact engineer", contact technical support for assistance.

  5. Run the following command in the cascading FusionSphere OpenStack system to reset the VM boot device:

    nova meta uuid set __bootDev=boot_type
    NOTE:

    The value of uuid is that in the audit report, and boot_type indicates the VM boot device obtained in 4.

Processing Redundant VM NICs

  1. Log in to the cascading OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a node with the nova-proxy001 role deployed. For details, see Logging In to a Host with a Role Deployed.
  4. Run the following script to check whether a redundant NIC exists:

    python2.7 /usr/bin/info-collect-script/audit_resume/judge_vm_interface.py instance_id
    NOTE:

    The value of instance_id is that of uuid obtained from the audit report.

    • If the command output displays "No redundance port", the NIC information at the Cascading OpenStack is the same as that at the Cascaded OpenStack, and no further action is required.
    • If the command output displays other information, go to the next step.

  5. Check whether the command output displays "Redundance port on cascading [*]". (Content in the square bracket indicates the port ID of the redundant NIC at the Cascading OpenStack.)

    • If yes, go to the next step.
    • If no, go to 7.

  6. Run the following command in the cascading FusionSphere OpenStack system to delete the redundant NIC:

    nova interface-detach uuid port_id
    NOTE:

    The value of uuid is that in the audit report. The value of port_id is that obtained from 4.

  7. Check whether the command output in 4 displays "Redundance port on cascaded [*]". (Content in the square bracket indicates the port ID of the redundant NIC at theCascaded OpenStackr.)

    • If yes, go to the next step.
    • If it is not displayed, no further action is required.

  8. Confirm with the tenant whether the redundant NIC is still in use in the VM.

    • If yes, contact technical support for assistance.
    • If no, go to the next step.

  9. Log in to the cascaded OpenStack system. For details, see Using SSH to Log In to a Host.
  10. Run the following command in the cascaded FusionSphere OpenStack system to delete the redundant NIC:

    nova interface-detach uuid port_id
    NOTE:

    The value of uuid is that of hyper_vm_name obtained from the audit report. The value of port_id is that obtained from 4.

  11. Run the following command in the cascaded FusionSphere OpenStack system to check whether the redundant NIC on the VM is successfully deleted:

    nova interface-list uuid
    NOTE:

    The value of uuid is that of hyper_vm_name obtained from the audit report.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

VM Status Inconsistency

Context

The VM status at the Cascading OpenStack is inconsistent with that at the Cascaded OpenStack.

Parameter Description

The name of the audit report for the VM status inconsistency is diff_state_vm.csv. Table 18-6 describes parameters in the report.

Table 18-6 Parameter description

Parameter

Description

uuid

Specifies the VM UUID at the Cascading OpenStack.

tenant_id

Specifies the tenant ID.

hyper_vm_name

Specifies the VM UUID at the Cascaded OpenStack.

updated_at

Specifies the last time when the VM status was updated.

status

Specifies the VM UUID at the Cascading OpenStack.

task_status

Specifies the VM task status at the Cascading OpenStack.

power_status

Specifies the VM task status at the Cascading OpenStack.

host_id

Specifies the ID of the host accommodating the VM.

hyper_status

Specifies the VM UUID at the Cascaded OpenStack.

hyper_power_status

Specifies the VM power supply status at the Cascaded OpenStack.

Possible Causes

  • A backup is created for a database for future restoration. However, after the creation, the VM is started or stopped. When the database is restored using the backup, status record of the VM at the Cascading OpenStack is inconsistent with that at the Cascaded OpenStack.
  • If an exception occurs in the cascaded system or management network, services at the Cascading OpenStack are interrupted or become faulty, resulting in inconsistent VM status.
  • The system time at the cascading system is inconsistent with that at the cascaded system. As a result, the VM status at the Cascaded OpenStack cannot be synchronized to the Cascading OpenStack.
  • Other unknown errors result in inconsistent VM status.

Impact on the System

  • System data is inconsistent.
  • Tenants' operation rights on the VM are restricted.

Procedure

  1. Log in to the cascading OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a node with the nova-proxy001 role deployed. For details, see Logging In to a Host with a Role Deployed.
  4. Run the following script to check whether the VM status at the Cascaded OpenStack is consistent with that at the Cascading OpenStack:

    python2.7 /usr/bin/info-collect-script/audit_resume/judge_diff_state_vm.py uuid
    NOTE:

    The value of uuid is that in the audit report.

    • If the command output displays "This VM status is same", no further action is required.
    • If the command output displays "cascading vm status:[*], power state:[*] cascaded vm status:[*], power state:[*]", the VM status or power supply status at the Cascading OpenStack is inconsistent with that at the Cascaded OpenStack. (An asterisk (*) indicates the VM status or power supply status.) If the VM status is inconsistent, go to 5. If the power supply status is inconsistent, go to 6.
    • If the command output displays other information, contact technical support for assistance.

  5. Rectify the fault based on the processing methods applied to different VM statuses and scenarios listed in the following table. For other situations, contact technical support for assistance.

    Table 18-7 Processing methods

    VM Status at the Cascading OpenStack

    VM Status at the Cascaded OpenStack

    Possible Scenario

    Processing Method

    error

    active

    paused

    suspended

    stopped

    1. Cascaded OpenStack error during VM creation
    2. Restarting or deleting VMs when an error occurs at the Cascaded OpenStack
    3. Manually invoking interfaces for resetting status at the Cascading OpenStack

    See Method 1.

    active

    paused

    suspended

    stopped

    Pausing, suspending, and stopping a VM at the Cascaded OpenStack

    See Method 1.

    stopped

    paused

    suspended

    Different operations performed on VMs at the cascading and those at the Cascaded OpenStack, resulting in inconsistent VM status

    See Method 1.

    suspended

    paused

    stopped

    Different operations performed on VMs at the cascading and those at the Cascaded OpenStack, resulting in inconsistent VM status

    See Method 1.

    pause

    suspended

    stopped

    Different operations performed on VMs at the cascading and those at the Cascaded OpenStack, resulting in inconsistent VM status

    See Method 1.

    paused

    stopped

    suspended

    active

    Cascaded OpenStack error during the VM suspending (hibernating) process

    See Method 2.

    active

    stopped

    resized

    Migrating or resizing a VM at the Cascaded OpenStack

    See Method 3.

    -

    error

    -

    If the Xen hypervisor at the Cascaded OpenStack is used, handle the audit result at the Cascaded OpenStack first by following operations provided in VM Status Inconsistency. If theCascaded OpenStackr is used at the cascaded layer, Cascaded OpenStackit result at the cascaded layer first by following operations provided in Using KVM for Virtualization(Cascaded OpenStack).

  6. Log in to a node with the nova-proxy001 role deployed. For details, see Logging In to a Host with a Role Deployed.
  7. Run the following script to change the power supply status of the VM at the Cascading OpenStack:

    python2.7 /usr/bin/info-collect-script/audit_resume/rectify_vm_state.py --uuid uuid --power-state power_state
    NOTE:

    The value of uuid is that in the audit report. In the command output, power_state indicates the VM power supply status obtained in 6.

    • If the command output displays "rectify the VM state or power state success", no further action is required.
    • If the command output displays other information, contact technical support for assistance.

Method 1

  1. Set the VM status at the Cascading OpenStack to the same as that at the Cascaded OpenStack. For details, see Setting the VM Status.

Method 2

  1. Log in to the cascading OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command in the cascading FusionSphere OpenStack system to reset the VM status:

    nova reset-state uuid --active
    NOTE:

    The value of uuid is that in the audit report.

Method 3

  1. Log in to the cascading OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command in the cascading FusionSphere OpenStack system to obtain the VM flavor information at the Cascading OpenStack:

    nova show uuid
    NOTE:

    The value of uuid is that in the audit report. In the command output, flavor indicates the current VM flavor.

  4. Log in to the cascaded FusionSphere OpenStack system where the VM locates.
  5. Run the following command in the cascaded FusionSphere OpenStack system to obtain the flavor used by the VM at the Cascaded OpenStack:

    nova show uuid
    NOTE:

    The value of uuid is that of hyper_vm_name obtained from the audit report.

  6. Check whether the VM flavor at the Cascading OpenStack is the same as that at the Cascaded OpenStack.

    • If yes, go to 7.
    • If no, go to 8.

  7. Run the following command in the cascaded FusionSphere OpenStack system to confirm the VM migrating and resizing operations at the Cascaded OpenStack: After running the command, go to 9.

    nova resize-confirm uuid
    NOTE:

    The value of uuid is that of hyper_vm_name obtained from the audit report.

  8. Run the following command in the cascaded FusionSphere OpenStack system to perform revert operations on the VM at the Cascaded OpenStack:

    nova resize-revert uuid
    NOTE:

    The value of uuid is that of hyper_vm_name obtained from the audit report.

  9. Check whether the VM flavor and status at the Cascading OpenStack are consistent with those at the Cascaded OpenStack. For details, see 1 to 5.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Inconsistency Between the Quota in the Nova Database and the Actual Quota

Context

If the used quota in the Nova database is inconsistent with that in the actual environment, rectify the fault by referring to this section.

Parameter Description

The names of the audit reports are nova_quota_vcpus.csv, nova_quota_memory_mb.csv, and nova_quota_instance.csv.

Impact on the System

The used quota in the quota table of the database is inconsistent with the actually used data. As a result, the tenant may fail to create a VM due to resource limitation.

Possible Causes

  • Changes in the quota table and VM changes are not ensured in transactions.
  • When a network exception occurs during the process of creating, resizing, or deleting a VM, the actually used quota is inconsistent with that in the quota table.

Procedure

  1. Log in to the cascading OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Switch to the node where the audit service resides.

    1. Run the cps template-instance-list --service collect info-collect-server command to query the External OM IP address of the audit service.
    2. Run the su fsp command to switch to the fsp user.

      The default username is fsp, and the default password is Huawei@CLOUD8.

    3. Run the ssh fsp@omip command to switch to the node where the audit service resides.
    4. Import environment variables by referring to Importing Environment Variables.

  4. Manually audit the quota.

    1. Run the following command to enter the secure operation mode:

      runsafe

      Information similar to the following is displayed:

    2. Run the following command to manually audit the quota and mark the value of Path in the command output as PATH:

      infocollect audit --item 1008

  5. Run the /usr/bin/python2.7 /etc/nova/nova-util/refresh_quota_usages.py PATH command to restore the Nova quota usage. Enter y or n when the "Please confirm recovering the quota-usages table(y/n):" prompt is displayed in the command output.

    • y: Modify the resource usage in the quota_usages table.
    • n: Confirm that the resource usage in the quota_usages table is not modified and exit the processing.

    After y is entered, check whether the command output contains "Success synchronizing the quota-usages table in resource: instance/vcpus/memory_mb".

    • If yes, the resource usage in the quota_usages table is restored successfully. No further action is required.
    • If no, contact technical support for assistance.

Orphan Volumes

Context

An orphan volume is the volume that exists at the Cascaded OpenStack but does not exist at the Cascading OpenStack.

Parameter Description

The name of the audit report is CascadeWildVolumeAudit.csv. As shown in Table 18-8.

Table 18-8 Parameter description

Parameter

Description

volume_name

Specifies the name of the orphan volume at the Cascaded OpenStack.

volume_id

Specifies the ID of the orphan volume at the Cascaded OpenStack.

volume_host

Specifies the cascaded host to which the orphan volume belongs.

Impact on the System

Volumes at the Cascaded OpenStack cannot be managed at the Cascading OpenStack, resulting in the waste of resources at the Cascaded OpenStack.

Possible Causes

  • A database is backed up for future restoration. However, after the backup is created, one or more volumes are created at the Cascaded OpenStack. When the database is restored using the backup, records of these volumes are deleted from the database, but the volume information resides at the Cascaded OpenStack.
  • Volumes at the Cascaded OpenStack are created at the Cascaded OpenStack without using the commands delivered at the Cascading OpenStack.

Procedure

  1. Log in to the cascading OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a node with the cinder-proxy001 role deployed. For details, see Logging In to a Host with a Role Deployed.
  4. Run the following script to check whether the volume is an orphan volume:

    python2.7 /usr/bin/info-collect-script/audit_resume/judge_wild_volume.py volume_id
    NOTE:

    The value of volume_id is that obtained from the audit report.

    • If the command output displays "This volume is not wild", no further action is required.
    • If the command output displays "This volume is wild", go to the next step.
    • If the command output displays other information, contact technical support for assistance.

  5. Confirm with the tenant whether the volume can be deleted.

    • If yes, go to the next step.
    • If no, contact technical support for assistance.

  6. Log in to the cascaded FusionSphere OpenStack system, see Using SSH to Log In to a Host.
  7. Run the following command to query the volume status:

    cinder show volume_id | grep status

    NOTE:

    The value of volume_id is that obtained from the audit report.

  8. Run the following command to obtain the ID of the server to which the volume is attached:

    cinder show volume_id | grep attachment

    NOTE:

    The value of volume_id is that obtained from the audit report.

  9. Detach the volumes at the Cascaded OpenStack.

    nova volume-detach server_id volume_id

    NOTE:

    The value of volume_id is that obtained from the audit report.

    The value of server_id is that obtained in step 7.

  10. Run the following command in the cascaded FusionSphere OpenStack system to delete the volume:

    cinder delete volume_id
    NOTE:

    The value of volume_id is that obtained from the audit report.

    Check whether the command is successfully executed.

    • If yes, go to the next step.
    • If no, contact technical support for assistance.

  11. Run the following command in the cascaded FusionSphere OpenStack system to check whether the volume exists:

    cinder show volume_id
    NOTE:

    The value of volume_id is that in the audit report. If the command output displays "ERROR:No volume with a name or ID of 'XXX (Volume ID)' exists", the volume does not exist.

    • If yes, contact technical support for assistance.
    • If no, go to the next step.

Invalid Volumes

Context

An invalid volume is the volume that exists at the Cascading OpenStack but does not exist at the Cascaded OpenStack.

Parameter Description

The name of the audit report is CascadeFakeVolumeAudit.csv. Table 18-9 describes the parameter description.

Table 18-9 Parameter description

Parameter

Description

volume_id

Specifies the ID of the invalid volume at the Cascading OpenStack.

volume_host

Specifies the host information queried at the Cascading OpenStack. The format is host#volume_backend_name, in which the value of host before the pound (#) indicates the cascaded FusionSphere OpenStack system to which the volume belongs.

Impact on the System

The volume queried using the cinder command does not exist at the Cascaded OpenStack.

Possible Causes

  • A backup is created for a database for future restoration. However, after the backup is created, one or more volumes are deleted. When the database is restored using the backup, records of these volumes are deleted from the Cascaded OpenStack but reside in the database and become invalid volumes.
  • A volume is deleted from the Cascaded OpenStack without using the command delivered at the Cascading OpenStack.

Procedure

  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Query the proxy role that corresponds to the invalid volume in the AZ in the cascaded FusionSphere OpenStack system in the "cinder-proxyxxx" format.

    1. Find parameter item volume_host typically in the "host#volume_backend_name"format from the audit report obtained in Collecting Audit Reports. host in front of a number sign (#) indicates the name of the AZ in the cascaded FusionSphere OpenStack system where the volume locates.

      For example, the AZ name is az2.dc1.cn-global-1--DC2.

    2. Run the following command to query the Nova serial number of the AZ whose prefix is default-aggregate_novaXXX in the cascaded FusionSphere OpenStack system:

      nova aggregate-list

      Information similar to the following is displayed:

      +----+-----------------------------------+--------------------------------------------+
      | ID | Name                              | Availavility Zone                          |
      +----+-----------------------------------+--------------------------------------------+  
      | 1  | manage-aggr                       | manage-az                                  |
      | 2  | default-aggregate_nova001         | AZ1                                        |
      | 3  | nova001#AZ1                       | AZ1                                        |
      | 4  | default-aggregate_nova002         | az1.dc1.cn-global-1--DC2                   |
      | 5  | nova002#IOoptimized               | az1.dc1.cn-global-1--DC2                   |                                       
      | 80 | default-aggregate_nova003         | az2.dc1.cn-global-1--DC2                  |  
      | .. | ......                            | ......                                     |   
      +----+-----------------------------------+--------------------------------------------+          

      Query the corresponding Nova serial number in the "novaXXX" format based on the AZ obtained in 3.a.

      For example, if az2.dc1.cn-global-1--DC2 corresponds to nova003, the proxy role that corresponds to the AZ in the cascaded FusionSphere OpenStack system where the invalid volume locates is cinder-proxy003.

  4. Log in to a node with the cinder-proxyxxx role deployed. For details, see Logging In to a Host with a Role Deployed.

    xxx can be 001, 002, or 003 obtained in 3.b.

  5. Run the following script to check whether the volume is an invalid volume:

    python2.7 /usr/bin/info-collect-script/audit_resume/judge_fake_volume.py volume_id
    NOTE:

    The value of volume_id is that obtained from the audit report.

    • If the command output contains "This volume is normal", no further action is required.
    • If the command output displays "This volume is fake", go to the next step.
    • If the command output displays other information, contact technical support for assistance.

  6. Confirm with the tenant whether the volume can be deleted.

    • If yes, go to the next step.
    • If no, contact technical support for assistance.

  7. Run the following command to query the volume status and attach the volume:

    cinder show volume_id

    NOTE:

    The value of volume_id is that obtained from the audit report.

  8. Log in to a node with the nova-proxyxxx role deployed. For details, see Logging In to a Host with a Role Deployed.

    xxx can be 001, 002, or 003 obtained in 3.b.

  9. Run the following script to clear the volume attachment information in the VM:

    python2.7 /usr/bin/info-collect-script/audit_resume/clear_volume_bdm.py volume_id
    NOTE:

    The value of volume_id is that in the audit report.

  10. Run the following command to reset the volume status:

    cinder reset-state --state error --attach-status detached volume_id
    NOTE:

    The value of volume_id is that in the audit report.

  11. Run the following command in the cascading FusionSphere OpenStack system to delete the invalid volume:

    cinder force-delete volume_id
    NOTE:

    The value of volume_id is that in the audit report.

Orphan Volume Snapshots

Context

An orphan snapshot is the one that exists at the Cascaded OpenStack but does not exist at the Cascading OpenStack.

Parameter Description

The name of the audit report is CascadeWildSnapshotAudit.csv. Table 18-10 describes the parameter description.

Table 18-10 Parameter description

Parameter

Description

cascaded_snapshot_name

Specifies the name of the orphan volume snapshot at the Cascaded OpenStack.

cascaded_snapshot_id

Specifies the ID of the orphan volume snapshot at the Cascaded OpenStack.

host

Specifies the cascaded host on which the orphan volume snapshot locates.

Impact on the System

Snapshots at the Cascaded OpenStack cannot be managed at the Cascading OpenStack, resulting in the waste of resources at the Cascaded OpenStack.

Possible Causes

  • A backup is created for a database for future restoration. However, after the backup was created, one or more volume snapshots are created. When the database is restored using the backup, records of these snapshots are deleted from the database, but these snapshots reside at the Cascaded OpenStack and become orphan snapshots.
  • Snapshots at the Cascaded OpenStack are created at the Cascaded OpenStack without using the commands delivered at the Cascading OpenStack.

Procedure

  1. Log in to the cascading OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a node with the cinder-proxy001 role deployed. For details, see Logging In to a Host with a Role Deployed.
  4. Run the following script to check whether the snapshot is an orphan snapshot:

    python2.7 /usr/bin/info-collect-script/audit_resume/judge_wild_snapshot.py snapshot_id
    NOTE:

    The value of snapshot_id is that of cascaded_snapshot_id obtained from the audit report.

    • If the command output displays "This snapshot is not wild", no further action is required.
    • If the command output displays "This snapshot is wild", go to the next step.
    • If the command output displays other information, contact technical support for assistance.

  5. Confirm with the tenant whether the snapshot is in use.

    • If yes, contact technical support for assistance.
    • If no, go to the next step.

  6. Log in to the cascaded FusionSphere OpenStack system based on the host information in the audit report. For details, see Using SSH to Log In to a Host.
  7. Run the following command to reset the snapshot status:

    cinder snapshot-reset-state uuid --state error
    NOTE:

    The value of uuid is that of cascaded_snapshot_id obtained from the audit report.

  8. Run the following command in the cascaded FusionSphere OpenStack system to delete the snapshot:

    cinder snapshot-delete snapshot_id
    NOTE:

    The value of snapshot_id is that of cascaded_snapshot_id obtained from the audit report.

    Check whether the command is successfully executed.

    • If yes, go to the next step.
    • If no, contact technical support for assistance.

  9. Run the following command in the cascaded FusionSphere OpenStack system to check whether the snapshot exists:

    cinder snapshot-show uuid
    NOTE:

    The value of uuid is that of cascaded_snapshot_id obtained from the audit report.

    • If the command output displays "ERROR", indicates the snapshot is deleted, no further action is required.
    • If the command output does not display "ERROR", indicates the snapshot exists, contact technical support for assistance.

Invalid Volume Snapshots

Context

An invalid volume snapshot is the one that exists at the Cascading OpenStack but does not exist at the Cascaded OpenStack.

Parameter Description

The name of the audit report is CascadeFakeSnapshotAudit.csv. Table 18-11 describes parameters in the report.

Table 18-11 Parameter description

Parameter

Description

cascading_snapshot_name

Specifies the name of the invalid volume snapshot at the Cascading OpenStack.

cascading_snapshot_id

Specifies the ID of the invalid volume snapshot at the Cascading OpenStack.

Impact on the System

The snapshot queried using the Cinder command does not exist.

Possible Causes

  • A backup is created for a database for future restoration. However, after the backup is created, one or more snapshots are deleted. When the database is restored using the backup, records of these snapshots are deleted from the Cascaded OpenStack but reside in the database and become invalid snapshots.
  • A snapshot is deleted from the Cascaded OpenStack but not the Cascading OpenStack.

Procedure

  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a node with the cinder-proxy001 role deployed. For details, see Logging In to a Host with a Role Deployed.
  4. Run the following script to check whether the snapshot is an invalid snapshot:

    python2.7 /usr/bin/info-collect-script/audit_resume/judge_fake_snapshot.py snapshot_id
    NOTE:

    The value of snapshot_id is that of cascading_snapshot_id obtained from the audit report.

    • If the command output contains "This snapshot is normal", no further action is required.
    • If the command output displays "This snapshot is fake", go to the next step.
    • If the command output displays other information, contact technical support for assistance.

  5. Run the following command at the Cascading OpenStack to set the snapshot status to error:

    cinder snapshot-reset-state uuid --state error
    NOTE:

    The value of uuid is that of cascading_snapshot_id obtained from the audit report.

  6. Run the following command at the Cascading OpenStack to delete the invalid volume snapshot:

    cinder snapshot-delete uuid
    NOTE:

    The value of uuid is that of cascading_snapshot_id obtained from the audit report.

  7. Run the following command at the Cascading OpenStack to check whether the snapshot exists:

    cinder snapshot-show uuid
    NOTE:

    The value of uuid is that of cascading_snapshot_id obtained from the audit report.

    • If the command output displays "ERROR", indicates the snapshot is deleted, no further action is required.
    • If the command output does not display "ERROR", indicates the snapshot exists, contact technical support for assistance.

Inconsistent Volume Snapshot Status

Context

The snapshot status at the Cascading OpenStack is inconsistent with that at the Cascaded OpenStack.

Parameter Description

The name of the audit report is CascadeDiffStatusSnapshotAudit.csv. Table 18-12 describes parameters in the report.

Table 18-12 Parameter description

Parameter

Description

cascading_snapshot_id

Specifies the volume snapshot ID at the Cascading OpenStack.

cascading_status

Specifies the volume snapshot status at the Cascading OpenStack.

cascaded_snapshot_id

Specifies the volume snapshot ID at the Cascaded OpenStack.

cascaded_status

Specifies the volume snapshot status at the Cascaded OpenStack.

Possible Causes

  • If an exception occurs in the cascaded system or on the management network, services at the Cascading OpenStack are interrupted or become faulty, resulting in inconsistent volume snapshot status at the Cascading OpenStack.
  • Other unknown errors result in inconsistent volume snapshot status.

Impact on the System

  • System data is inconsistent.
  • Tenants' operation rights on the volume snapshot are restricted.

Procedure

  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a node with the cinder-proxy001 role deployed. For details, see Logging In to a Host with a Role Deployed.
  4. Run the following script to check whether the snapshot status at the Cascading OpenStack is the same as that at the Cascaded OpenStack:

    python2.7 /usr/bin/info-collect-script/audit_resume/judge_diff_state_snapshot.py snapshot_id
    NOTE:

    The value of snapshot_id is that of cascading_snapshot_id obtained from the audit report.

    • If the command output displays "This snapshot status is same", no further action is required.
    • If the command output displays "cascading snapshot status:[*], cascaded snapshot status:[*]" (An asterisk (*) indicates the snapshot status.), go to the next step.

  5. Rectify the fault based on the processing methods applied to different VM statuses and scenarios listed in the following table. For other situations, contact technical support for assistance.

    Table 18-13 Processing methods

    Snapshot Status at the Cascading OpenStack

    Snapshot Status at the Cascaded OpenStack

    Possible Scenario

    Processing Method

    creating

    error

    available

    available

    error

    creating

    1. Snapshot created successfully at the Cascaded OpenStack without being synchronized to the Cascading OpenStack
    2. Snapshot service failure at the Cascading OpenStack

    See Method 1.

    deleting

    error_deleting

    available

    error

    Cascading OpenStack service error, resulting in the operation commands failing to issue to the Cascaded OpenStack

    See Method 2.

    deleting

    Snapshot deletion failure at the Cascaded OpenStack

    See Method 3.

    error_deleting

    Snapshot deletion failure at the Cascaded OpenStack

    See Method 3.

Method 1

  1. Run the following command in the cascading FusionSphere OpenStack system to reset the snapshot status at the Cascading OpenStack:

    cinder snapshot-reset-state uuid --state <state>
    NOTE:

    The value of uuid is that of cascading_snapshot_id obtained from the audit report. The value of state is that of cascaded_status obtained from the audit report.

Method 2

  1. Run the following command in the cascading FusionSphere OpenStack system to reset the snapshot status at the Cascading OpenStack:

    cinder snapshot-reset-state uuid --state error
    NOTE:

    The value of uuid is that of cascading_snapshot_id obtained from the audit report.

  2. Run the following command in the cascading FusionSphere OpenStack system to delete the snapshot:

    cinder snapshot-delete uuid
    NOTE:

    The value of uuid is that of cascading_snapshot_id obtained from the audit report.

  3. Run the following command in the cascading FusionSphere OpenStack system to check whether the snapshot exists:

    cinder snapshot-show uuid
    NOTE:

    The value of uuid is that of cascading_snapshot_id obtained from the audit report.

    • If the command output displays "ERROR", indicates the snapshot is deleted, no further action is required.
    • If the command output does not display "ERROR", indicates the snapshot exists, contact technical support for assistance.

Method 3

  1. Run the following command in the cascaded FusionSphere OpenStack system to reset the snapshot status:

    cinder snapshot-reset-state uuid --state error
    NOTE:

    The value of uuid is that of cascaded_snapshot_id obtained from the audit report.

  2. Run the following command in the cascading FusionSphere OpenStack system to reset the snapshot status:

    cinder snapshot-reset-state uuid --state error
    NOTE:

    The value of uuid is that of cascading_snapshot_id obtained from the audit report.

  3. Run the following command in the cascading FusionSphere OpenStack system to delete the snapshot:

    cinder snapshot-delete uuid
    NOTE:

    The value of uuid is that of cascading_snapshot_id obtained from the audit report.

  4. Run the following command in the cascading FusionSphere OpenStack system to check whether the snapshot exists:

    cinder snapshot-show uuid
    NOTE:

    The value of uuid is that of cascading_snapshot_id obtained from the audit report.

    • If the command output displays "ERROR", indicates the snapshot is deleted, no further action is required.
    • If the command output does not display "ERROR", indicates the snapshot exists, contact technical support for assistance.

Stuck Volumes

Context

A stuck volume is the one that is kept in a transition state (including deleting, error_deleting, error_attaching, error_detaching, attaching, detaching, error_extending, uploading, retyping, backing-up, restoring-backup, error, reserved, and maintenance) for more than 24 hours.

Parameter Description

The name of the audit report is CascadeVolumeMiddleStatusAudit.csv. Table 18-14 describes parameters in the report.

Table 18-14 Parameter description

Parameter

Description

cascading_volume_id

Specifies the ID of the volume at the Cascading OpenStack.

cascading_status

Specifies the status of the volume at the Cascading OpenStack.

cascading_host

Specifies the volume host at the Cascading OpenStack.

cascaded_volume_id

Specifies the ID of the volume at the Cascaded OpenStack.

cascaded_status

Specifies the status of the volume at the Cascaded OpenStack.

cascaded _host

Specifies the volume host at the Cascaded OpenStack.

Original volume: Volume host information at the Cascading OpenStack is the same as that at the Cascaded OpenStack.

Non-original volume: Volume host information at the Cascading OpenStack is different from that at the Cascaded OpenStack.

Possible Causes

  • A system exception occurs when a volume service operation is in process.
  • A backup is created for a database for future restoration. However, after the backup is created, the statuses of one or more volumes are changed. When the database is restored using the backup, records of these volume statuses are restored to their former statuses in the database.

Impact on the System

The stuck volume becomes unavailable and occupies system resources at the Cascading OpenStack and Cascaded OpenStack.

Procedure

  • Common scenarios
  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command in the cascading FusionSphere OpenStack system to obtain the volume information:

    cinder show uuid
    NOTE:

    The value of uuid is that of cascading_volume_id obtained from the audit report. In the command output, status indicates the volume status, and os-vol-host-attr:host indicates the cascaded FusionSphere OpenStack system to which the volume belongs.

    Check whether the volume status is consistent with that recorded in the audit report.

    • If yes, go to the next step.
    • If no, the volume was unstable when the audit was performed. No further action is required.

  4. Log in to the first cascaded FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  5. Run the following command in the cascaded FusionSphere OpenStack system to obtain the volume status information:

    cinder show uuid
    NOTE:

    The value of uuid is that of cascaded_volume_id obtained from the audit report.

  6. Check whether the volume status that has been obtained is consistent with that recorded in the audit report.

    • If yes, go to the next step.
    • If no, the volume was unstable when the audit was performed. No further action is required.

  7. Check whether the volume status at the Cascading OpenStack is the same as that at the Cascaded OpenStack.

  8. Perform the following operations:

    NOTE:

    The value of uuid is that of cascaded_volume_id obtained from the audit report.

    If the status is inconsistent, run the cinder unmanage uuid command.

    cinder unmanage uuid

    NOTE:

    The value of uuid is that of cascaded_volume_id obtained from the audit report.

    • If the volume at the Cascaded OpenStack is an original volume and in the available or in-use status or the volume at the Cascaded OpenStack is a non-original volume and in the in-use status, go to the next step.
    • If the volume at the Cascaded OpenStack is in statuses, excluding available and in-use, log in to FusionSphere OpenStack at the Cascaded OpenStack and run the cinder show uuid command to check whether the status of the volume is the same as that of the volume at the Cascaded OpenStack in the audit report.

      cinder show uuid

    NOTE:

    The value of uuid is that of cascaded_volume_id obtained from the audit report.

    If the status is the same perform operations provided in Using FusionCompute for Virtualization(Cascaded OpenStack)-Stuck Volumes or Using KVM for Virtualization(Cascaded OpenStack)-Stuck Volumes.

  9. Log in to the next cascaded FusionSphere OpenStack to repeat step 1 to step 7 until stuck VMs at all cascaded FusionSphere OpenStack systems are handled
  • Volumes in the reserved state

    The posible reason why a volume is in the reserved state (a transient state) is that the original volume is reserved after VM live migration. Such volumes can be deleted after the user confirms that the service is normal.

    In this case, perform the following operations to rectify the fault.

  1. Log in to any controller node in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query information about the volume status:

    cinder show Volume UUID

    NOTE:

    Volume UUID is the volume_id value in the audit report.

    Check whether the value of status in the command output is consistent with the volume state in the audit report.

    • If yes, go to 4.
    • If no, contact technical support for assistance.

  4. View the value of last_update_time in the audit report and check whether the time difference between the value and the current time exceeds 24 hours.

    • If yes, go to 5.
    • If no, contact technical support for assistance.

  5. Obtain the original volume of the reserved volume which is a copy of the original volume and check whether services on the VM where the original volume resides are normal.

    Run the following command to obtain the ID of the original volume:

    cinder show uuid | grep description

    Check whether the command output contains "migration src for Original volume ID".

    • If yes, confirm with the user whether services on the VM where the original volume resides are normal. If VM services are normal, submit the migration task on the Service OM migration task page.
    • If no, contact technical support for assistance.

  • Volumes in the maintenance state

    The posible reason why a volume is in the maintenance state (a transient state) is that a process exception occurs during data copy.

    In this case, contact technical support for assistance.

Inconsistent Volume Status

Context

The volume status recorded in the database at the Cascading OpenStack is inconsistent with that at the Cascaded OpenStack. If a volume status is inconsistent for a long period (24 hours by default), restore the volume based on site conditions.

Parameter Description

The name of the audit report is CascadeDiffStatusVolumeAudit.csv. Table 18-15 describes parameters in the report.

Table 18-15 Parameter description

Parameter

Description

cascading_volume_id

Specifies the volume ID at the Cascading OpenStack.

cascading_status

Specifies the volume status at the Cascading OpenStack.

cascading_host

Specifies the volume host information at the Cascading OpenStack.

cascaded_volume_id

Specifies the volume ID at the Cascaded OpenStack.

cascaded_status

Specifies the volume status at the Cascaded OpenStack.

cascaded_host

Specifies the volume host information at the Cascaded OpenStack.

System at the Cascaded OpenStack

Original volume: Volume host information at the Cascading OpenStack is the same as that at the Cascaded OpenStack.

Non-original volume: Volume host information at the Cascading OpenStack is different from that at the Cascaded OpenStack.

Possible Causes

  • If an exception occurs on the management network, services at the Cascading OpenStack are interrupted or become faulty, resulting in inconsistent volume status at the Cascaded OpenStack.
  • Other unknown errors result in inconsistent volume status.

Impact on the System

  • System data is inconsistent.
  • Tenants' operation rights on the volume are restricted.

Procedure

  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to nodes with the cinder-proxy001 role assigned at the Cascading OpenStack. For details, see Logging In to a Host with a Role Deployed.
  4. Run the following script to check whether the snapshot status at the Cascading OpenStack is the same as that at the Cascaded OpenStack:

    python2.7 /usr/bin/info-collect-script/audit_resume/judge_diff_state_volume.py volume_id
    NOTE:

    The value of volume_id is that of cascading_volume_id obtained from the audit report.

    • If the command output displays "Volume status is the same", the volume status is the same, and no further action is required.
    • If the command output displays "cascading volume id:[*] status:[*] host:[*], cascaded volume id:[*] status:[*] host:[*]" (may be multiple records), the volume statuses are different, and go to the next step.

  5. Rectify the fault based on the processing methods applied to different volume statuses and scenarios listed in Table 18-16. For other situations, contact technical support for assistance.

    Table 18-16 Processing methods

    Volume Status at the Cascading OpenStack

    Volume Status at the Cascaded OpenStack

    Possible Scenario

    Processing Method

    All statuses

    All original and non-original volumes at the Cascaded OpenStack are in available status.

    Detaching the volume from the VM by running commands at the Cascaded OpenStack

    All statuses

    All volumes at the Cascaded OpenStack are in either available or in-use status, and at least one volume is in the in-use status.

    Attaching the volume to a VM by running commands at the Cascaded OpenStack

    All statuses

    error

    creating

    downloading

    deleting

    error_deleting

    error_attaching

    error_detaching

    attaching

    detaching

    uploading

    retyping

    backing-up

    restoring-backup

    extending

    error_restoring

    error_extending

    Cascaded FusionSphere OpenStack system error, resulting in inconsistent volume status at the cascading and cascaded FusionSphere OpenStack systems

    Handle the audit result at the Cascaded OpenStack first by following operations provided in Using FusionCompute for Virtualization(Cascaded OpenStack) or Using KVM for Virtualization(Cascaded OpenStack).

Method 1

  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to a node with the cinder-proxy001 role role assigned at the Cascading OpenStack. For details, see Logging In to a Host with a Role Deployed.
  4. Run the following script in to reset the volume status to available at the Cascading OpenStack:

    python2.7 /usr/bin/info-collect-script/audit_resume/reset_volume_available.py volume_id
    NOTE:

    The value of volume_id is that of cascading_volume_id obtained from the audit report.

  5. Run the following script to clear the volume-related block device mapping information:

    python2.7 /usr/bin/info-collect-script/audit_resume/clear_volume_bdm.py volume_id
    NOTE:

    The value of volume_id is that of cascading_volume_id obtained from the audit report.

  6. Log in to each FusionSphere OpenStack at the Cascaded OpenStack where the non-original volume is in the available status. For details, see Using SSH to Log In to a Host.
  7. Run the following command:

    cinder unmanage uuid

    NOTE:

    The value of uuid is that of cascaded_volume_id obtained from the audit report. In the command output, cascaded_status indicates the volume status, and cascaded_host indicates the cascaded FusionSphere OpenStack where the volume locates.

  8. At the Cascading OpenStack, run the following command to check whether the volume is in the available status:

    cinder show volume_id
    NOTE:

    volume_id indicates the value of cascading_volume_id in the report.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Method 2

  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to nodes with the cinder-proxy001 role assigned at the Cascading OpenStack. For details, see Logging In to a Host with a Role Deployed.
  4. Run the following script in to reset the volume status to available at the Cascading OpenStack:

    python2.7 /usr/bin/info-collect-script/audit_resume/reset_volume_available.py volume_id
    NOTE:

    The value of volume_id is that of cascading_volume_id obtained from the audit report.

  5. Run the following script to obtain the UUID of the VM to which the volume is attached:

    python2.7 /usr/bin/info-collect-script/audit_resume/get_volume_attach_server.py volume_id
    NOTE:

    The value of volume_id is that of cascading_volume_id obtained from the audit report.

  6. Run the following command at the Cascading OpenStack to attach the volume to the VM again:

    nova volume-attach vm_uuid volume_uuid
    NOTE:

    The value of vm_uuid is the VM UUID in server_list at the Cascading OpenStack obtained from 5. The value of volume_uuid is that of cascading_volume_id in the audit report.

  7. Run the following command in the cascading FusionSphere OpenStack system to check whether the volume is successfully attached to the VM:

    cinder show volume_uuid
    NOTE:

    The value of volume_uuid is that of cascading_volume_id obtained from the audit report. In the command output, if the value of status is in-use, the volume is successfully attached to the VM.

    • If yes, go to the next step.
    • If no, contact technical support for assistance.

  8. Run the following command on the cinder-proxy001-assigned node at the Cascading OpenStack to query the status of the volume at the Cascaded OpenStack.

    python2.7 /usr/bin/info-collect-script/audit_resume/judge_diff_state_volume.py volume_uuid

    NOTE:
    • The value of volume_uuid is that of cascading_volume_id obtained from the audit report.
    • The value of cascaded_volume_status is the status of the volume at the Cascaded OpenStack.

  9. If a non-original volume at the Cascaded OpenStack is in the available status, log in to the cascaded FusionSphere OpenStack. For details, see Using SSH to Log In to a Host.
  10. Import environment variables. For details, see Importing Environment Variables.
  11. At the Cascaded OpenStack, run the following command:

    cinder unmanage uuid

    • If no command output, no further action is required.
    • If command output displays, contact technical support for assistance.
    NOTE:

    The value of uuid is that of cascaded_volume_id obtained from the audit report.

Inconsistent Volume Attachment Information

Context

If the values of attachments at the Cascading OpenStack are inconsistent with those at the Cascaded OpenStack, the volume attachment information is inconsistent.

Parameter Description

The name of the audit report is CascadeVolumeAttachmentAudit.csv. Table 18-17 describes parameters in the report.

Table 18-17 Parameter description

Parameter

Description

cascading_volume_id

Specifies the ID of the volume at the Cascading OpenStack.

cascading_host

Specifies the information about the host to which the volume belongs at the Cascading OpenStack.

cascading_status

Specifies the status of the volume at the Cascading OpenStack.

cascading_num

Specifies the number of VMs to which the volume is attached at the Cascading OpenStack.

cascaded_voluem_id

Specifies the ID of the volume at the Cascaded OpenStack.

cascaded_host

Specifies the information about the host to which the volume belongs at the Cascaded OpenStack.

cascaded_status

Specifies the status of the volume at the Cascaded OpenStack.

cascaded_num

Specifies the number of VMs to which the volume is attached at the Cascaded OpenStack.

Original volume: Volume host information at the Cascading OpenStack is the same as that at the Cascaded OpenStack.

Non-original volume: Volume host information at the Cascading OpenStack is different from that at the Cascaded OpenStack.

Impact on the System

  • Residual volume attachment information may reside on hosts.
  • Volume-related services may be affected. For example, volumes at the Cascading OpenStack may fail to attach to a VM and detach from the VM.

Possible Causes

A backup is created for a database for future restoration. However, after the backup is created, one or more volumes are attached to VMs. When the database is restored using the backup, records of the volume attachment information are deleted from the database at the Cascading OpenStack, but the information resides on the database at the Cascaded OpenStack.

Procedure

  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Log in to nodes with the cinder-proxy001 role assigned at the Cascading OpenStack. For details, see Logging In to a Host with a Role Deployed.
  4. Run the following script to check whether the volume attachment information at the Cascading OpenStack is consistent with that at the Cascaded OpenStack:

    python2.7 /usr/bin/info-collect-script/audit_resume/judge_diff_volume_attachment.py volume_id
    NOTE:

    The value of volume_id is that of cascading_volume_id obtained from the audit report.

    • If the command output displays "Volume attachment is same", no further action is required.
    • If the command output displays "Volume attachment is different.cascading volume id:[*] status:[*]host:[*] attach_num [*].cascaded volume id:[*] stauts:[*]host:[*] attach_num [*]" (may be multiple records), volume attachment information at the cascaded and Cascading OpenStacks are the same, and go to the next step.

  5. Rectify the volume inconsistency fault based on the methods applied to different volume statuses listed in Table 18-18. For other situations, contact technical support for assistance.

    Table 18-18 Processing methods based on volume statuses

    Volume Status

    Possible Cause

    Method

    Both the volume status at the Cascading OpenStack and that at all Cascaded OpenStacks are in-use.

    A database is restored, or a volume is attached to a VM at the Cascaded OpenStack.

    See Method 1.

    Volume status at the Cascading OpenStack is different from that at the Cascading OpenStack.

    A fault occurs on the management network, services at the Cascading OpenStack are in paused status or abnormal, and as a result, volume status at the Cascading OpenStack is different from that at the Cascaded OpenStack.

    Other causes also can trigger the inconsistent volume status error.

    See section Inconsistent Volume Status.

Method 1

  1. Log in to nodes with the cinder-proxy001 role assigned at the Cascading OpenStack. For details, see Logging In to a Host with a Role Deployed.
  2. Run the following script in to reset the volume status to available at the Cascading OpenStack:

    python2.7 /usr/bin/info-collect-script/audit_resume/reset_volume_available.py volume_id
    NOTE:

    The value of volume_id is that of cascading_volume_id obtained from the audit report.

  3. Run the following script to obtain the UUID of the VM to which the volume is attached:

    python2.7 /usr/bin/info-collect-script/audit_resume/get_volume_attach_server.py volume_id
    NOTE:

    The value of volume_id is that of cascading_volume_id obtained from the audit report.

    • If the command output displays "Success, No server", contact technical support for assistance.
    • If the command output displays "Success, cascaded host :[***] server_list:[****]", go to the next step. (Content in the square bracket indicates the VM to which the volume is attached.)

  4. Run the following command at the Cascading OpenStack to attach the volume to the VM again:

    nova volume-attach vm_uuid volume_uuid
    NOTE:

    The value of vm_uuid is the VM UUID at the Cascading OpenStack obtained from 3, and that of volume-id is the volume ID at theCascading OpenStackr recorded in the audit report.

  5. Run the following command in the cascading FusionSphere OpenStack system to check whether the volume is successfully attached to the VM:

    cinder show volume_uuid
    NOTE:

    The value of volume_uuid is that of cascading_volume_id obtained from the audit report. In the command output, if the value of status is in-use, the volume is successfully attached to the VM.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Invalid VM Ports

Context

An invalid VM port is the port of a VM NIC that is recorded as normal at the Cascading OpenStack but is not present or is abnormal at the Cascaded OpenStack.

For an invalid VM port, confirm with the tenant whether the port is invalid. If it is invalid, delete the port.

Parameter Description

The name of the audit report is stale_ports.csv. Table 18-19 describes parameters in the report.

Table 18-19 Parameter description

Parameter

Description

id

Specifies the port UUID at the Cascading OpenStack.

name

Specifies the port name at the Cascading OpenStack.

device_id

Specifies the port device ID at the Cascading OpenStack.

device_owner

Specifies the port device owner at the Cascading OpenStack.

Possible Causes

When nova-proxy at the Cascading OpenStack invokes neutron-server at the Cascading OpenStack to create multiple ports, the creation of some ports may time out. In this case, nova-proxy tries again. However, some timeout requests have been successfully processed by neutron-server at the Cascading OpenStack, and nova-proxy does not deliver redundant NICs to the Cascading OpenStack. Therefore, the number of ports at the Cascading OpenStack is more than that at the Cascaded OpenStack.

Impact on the System

The data at the Cascading OpenStack is inconsistent with that at the Cascaded OpenStack.

Procedure

  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command in the cascading FusionSphere OpenStack system to check whether the port exists at the Cascading OpenStack:

    neutron port-show port_id
    • If yes, go to the next step.
    • If no, the port is not invalid. No further action is required.

  4. Log in to the cascaded FusionSphere OpenStack system based on the information about the host to which the port belongs. For details, see Using SSH to Log In to a Host.
  5. Import environment variables. For details, see Importing Environment Variables.
  6. Run the following command in the cascaded FusionSphere OpenStack system to check whether the port exists at the Cascaded OpenStack:

    neutron port-show port_name
    NOTE:

    The value of port_name is in the format of port@+Port ID at the Cascading OpenStack, and the port ID at the Cascading OpenStack is that in the audit report.

  7. Determine whether the port exists at the Cascaded OpenStack based on the command output.

    • If yes, contact technical support for assistance if the port is in the abnormal status (DOWN). Otherwise, the port is not invalid. No further action is required.
    • If no, the port is invalid. Go to the next step.

  8. Run the following command in the cascading FusionSphere OpenStack system to delete the invalid port at the Cascading OpenStack:

    neutron port-delete port_id
    NOTE:

    The value of port_id is the port ID at the Cascading OpenStack obtained from the audit report.

Stuck Images

Context

An image in the active state is available for use. If an image is stuck in the queued, saving or deleted state, the image is unavailable. If an image is kept stuck in a transition state for a long time (24 hours by default), delete the image.

Parameter Description

The name of the audit report is stucking_images.csv. Table 18-20 describes parameters in the report.

Table 18-20 Parameter description

Parameter

Description

id

Specifies the image ID.

status

Specifies the image status.

updated_at

Specifies the last time when the image was updated.

owner

Specifies the ID of the tenant who created the image.

Impact on the System

  • An image in the queued state does not occupy system resources, but the image is unavailable.
  • An image in the saving state has residual image files that occupy the storage space.
  • An image in the deleted state has residual image files that occupy the storage space.

Possible Causes

  • The image creation process is not complete: The image was not uploaded to the image server within 24 hours after it was created. In this case, the image is kept in the queued state.
  • During the image creation process, an exception (for example, intermittent network disconnection) occurred when the image was being uploaded. In this case, the image is kept in the queued state.
  • When an image was being uploaded, the Glance service failed. In this case, the image is kept in the saving state.
  • During the image deletion process, an exception occurred on the DB service. In this case, the image is kept in the deleted state.

Procedure

  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to check whether the image is in a transition state:

    glance image-show id
    NOTE:

    The value of id is the image UUID obtained from id in the audit report. In the command output, status indicates the image status.

    Check whether the image status is in a transition state (queued, saving, or deleted).

    • If yes, go to the next step.
    • If no, the system was unstable when the audit was performed. No further action is required.

  4. Confirm with the tenant whether the image can be deleted.

    • If yes, go to the next step.
    • If no, contact technical support for assistance.

  5. Run the following command to set the image protection attribute to False:

    glance image-update id --protected False
    NOTE:

    The value of id is the image UUID obtained from id in the audit report.

  6. Run the following command to delete the stuck image:

    glance image-delete id
    NOTE:

    The value of id is the image UUID obtained from id in the audit report.

  7. Run the following commands to check whether the image is successfully deleted:

    glance image-show id
    NOTE:

    The value of id is the image UUID obtained from id in the audit report. If the command output contains "No image with a name or ID of 'XXX (Image ID)' exists", the image does not exist.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Residual VM BDM Data

Context

When multiple volumes are attached to a VM at the same time, BDM creation times out, and as a result, volume attachment stops. Further, residual VM BDM data is generated. You are advised to delete residual BDM data because the attached volumes are in available status.

Parameter Description

The name of the audit report is invalid_bdms.csv. Table 18-21 describes parameters in the report.

Table 18-21 Parameter description

Parameter

Description

instance_uuid

Specifies the ID of the VM at the Cascading OpenStack.

volume_id

Specifies the volume ID at the Cascading OpenStack.

device_name

Specifies the name of the device map the volume.

Impact on the System

Volumes fail to attach to the VM. If these volumes are deleted, BDM data is generated, and the number of volumes attached to the VM is reduced.

Possible Causes

BDM application times out.

Procedure

  1. Log in to nodes where the nova-proxy001 is located in the cascading OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to check whether the volume is in the available status:

    cinder show volume_id
    NOTE:

    volume_id: indicates the value of volume_id in the audit report, and in the command output, status indicates the volume status.

    • If yes, go to the next step.
    • If no, no further action is required.
    • If the volume does not exist, go to the next step.

  4. Run the following command to delete the volume attachment information in the VM:

    python2.7 /usr/bin/info-collect-script/audit_resume/clear_volume_bdm.py volume_id instance_id
    NOTE:

    volume_id: indicates the value of volume_id in the audit report.

    Instance_id: indicates the value of instance_uuid in the audit report.

Nova-compute Service Residual

Context

After a host is deleted from the cloud provisioning system (CPS), the nova-compute service of the deleted host still exists. Such a nova-compute service is called nova-compute service residual.

Parameter Description

The audit report is named nova_service_cleaned.csv. Table 18-22 describes parameters in the report.

Table 18-22 Parameter description

Parameter

Description

service_id

Specifies the Nova service ID of the host.

host_id

Specifies the CPS host ID.

binary

Specifies the Nova service name, for example, nova-compute.

Impact on the System

The host cannot be found on the FusionSphere OpenStack web client or by running CPS commands. However, the Nova-compute service of the host can still be found by running Nova commands. If FusionSphere OpenStack OM has been deployed, users can find the host in the faulty state on the FusionSphere OpenStack OM web client, affecting user experience.

Possible Causes

  • A host is deleted during the host expansion on the FusionSphere OpenStack web client.
  • A user backs up FusionSphere OpenStack data, deletes a host from the web client, and restores FusionSphere OpenStack data.

Procedure

  1. Use PuTTY by using the Cascading-Reverse-Proxy to log in to the first host in the cascading FusionSphere OpenStack system.

    The default username is fsp, and the default password is Huawei@CLOUD8.

  2. Run the following command and enter the password Huawei@CLOUD8! of user root to switch to user root:

    su - root

  3. Importing Environment Variables.

    export OS_AUTH_URL=https://identity.az1.dc1.domainname.com:443/identity/v2.0 
    export OS_USERNAME=dc1_admin 
    export OS_TENANT_NAME=dc_system_dc1 
    export OS_ENDPOINT_TYPE=internalURL 
    export OS_REGION_NAME=az1.dc1  
    
    export NOVA_ENDPOINT_TYPE=internalURL 
    export CINDER_ENDPOINT_TYPE=internalURL 
    export OS_VOLUME_API_VERSION=2

    Table 18-23 describes the parameters in the preceding command.

    Table 18-23 Environment variables

    Parameter

    Description

    OS_AUTH_URL

    Specifies the authentication address, which corresponds to the public URL endpoint of the Keystone service.

    OS_USERNAME

    Specifies the DC administrator account, which is automatically created during FusionSphere OpenStack installation.

    OS_TENANT_NAME

    Specifies information about the tenant who owns the DC administrator. The tenant name is the project name that is automatically generated during FusionSphere OpenStack installation.

    OS_REGION_NAME

    Specifies the AZ in which the operation is performed, for example, az1.dc1.

    OS_ENDPOINT_TYPE

    NOVA_ENDPOINT_TYPE

    CINDER_ENDPOINT_TYPE

    Specifies the endpoint type. This variable is required when you run OpenStack commands. Set the variable to internalURL.

    OS_VOLUME_API_VERSION

    Specifies the volume version. If version 2 is required, set it to 2.

  4. Perform the following operations to query the management IP address of a controller node:

    1. Run the following command to enter the secure mode:

      cpssafe

      Information similar to the following is displayed.

      please choose environment variable which you want to import: 
      (1) openstack environment variable (keystone v3) 
      (2) cps environment variable 
      (3) openstack environment variable legacy (keystone v2) 
      please choose:[1|2|3]
    2. Enter 1 and select the keystone authentication. Then enter the password of user OS_USERNAME as prompted.

      Information similar to the following is displayed.

      Input command:
    3. Run the following command to query the management IP address of a controller node:

      cps host-list

      The node whose roles value is controller indicates a controller node. The value of manageip indicates the management IP address.

  5. Run the following commands to log in to the controller node:

    su fsp

    ssh fsp@Management IP address

    su - root

  6. Import environment variables based on 3.
  7. Obtain a line of audit record from the audit report and check whether the host has been deleted or the host expansion is not started.

    Run the runsafe command to enter the secure operation mode, enter the user password, and run the following command as prompted: The command is displayed as follows:

    cps host-role-list host_id

    Obtain host_id value from the audit report.

    Check whether the command output contains "Hostid is not found!" or the host role list is empty.

    • If yes, go to 8.
    • If no, the host has been restored, and no further action is required.

  8. Check whether nova-compute service residuals exist. Run the runsafe command to enter secure operation mode and run the following command:

    nova service-list|grep nova-compute|grep host_id

    Obtain host_id value from the audit report.

    Check whether the command output contains the nova-compute service record.

    • If yes, the nova-compute service residuals still exist, and go to 9.
    • If no, the nova-compute service residuals have been cleared, and no further action is required.

  9. Check whether the host houses VMs. Run the runsafe command to enter secure operation mode and run the following command:

    nova list --all-t --host host_id

    Obtain host_id value from the audit report.

    Check whether the command output shows that the host houses VMs.

    • If yes, go to 10.
    • If no, go to 11.

  10. Handle the VMs on the host. If the host is deleted, check whether the VMs need to be restored on other hosts.

    • If yes, restore VMs. For details, see section Rebuilding VMs on Other Hosts in the HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide. Run the nova list --all-t --host host_id command as described in the previous step. If no VM is displayed, go to 11.
    • If no, contact the tenant to delete all VMs from the host, and go to 11.

  11. Nova-compute Service Residual Delete nova-compute service residuals. Run the runsafe command to enter the secure operation mode and run the following command:

    nova service-delete service-id

    Obtain service_id value from the audit report.

    Check whether the command output is empty or contains "Service service_id not found. (HTTP 404)".

    • If yes, "Service xxx not found" indicates that the nova-compute service has been cleared. No further action is required.
    • If no, contact technical support for assistance.

Residual ECS Snapshots

Context

The VM corresponding to the ECS snapshot does not exist, and the snapshot is in the residual state. You are advised to delete snapshots that are in the residual state for a long time (24 hours by default).

Parameter Description

The report name of this audit item is images_vm_snapshots.csv. Table 18-24 describes parameters listed in images_vm_snapshots.csv.

Table 18-24 Parameters in the audit report

Parameter

Description

id

Specifies the ID of the ECS snapshot.

owner

Specifies the ID of the tenant that creates the ECS snapshot.

updated_at

Specifies the time when the ECS snapshot is updated for the last time.

__snapshot_from_instance

Specifies the ID of the VM corresponding to the ECS snapshot.

snapshot_id

Specifies the child snapshot corresponding to the ECS snapshot.

Impacts on the System

The image file of the ECS snapshot in the residual state occupies the space of the storage system.

Possible Causes

  • The user deletes the VM corresponding to the ECS snapshot from the background.
  • The child snapshot is abnormally deleted.
  • Other unknown errors cause the ECS snapshot to reside.

Procedure

Perform the following operations to delete the ECS snapshots which are in the residual state for a long time:

  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Check whether snapshot_id in the audit report is empty. For details, see Collecting Audit Reports.

    • If yes, go to 4.
    • If no, go to 5.

  4. Run the following command to check whether the VM corresponding to the ECS snapshot exists:

    nova show id
    NOTE:

    id indicates the VM UUID obtained from the __snapshot_from_instance field.

    • If no, go to 6.
    • If yes, the system was unstable when the audit was conducted. No further action is required.

  5. Run the following command to check whether the child snapshot exists:

    cinder snapshot-show snapshotid

    NOTE:

    snapshotid: indicates the UUID of the child snapshot obtained from the snapshot_id audit field.

    • If no, go to 6.
    • If yes, the system was unstable when the audit was conducted. No further action is required.

  6. Confirm with the user whether the ECS snapshot can be deleted.

    • If yes, go to 7.
    • If no, contact technical support for assistance.

  7. Run the following command to set the image protection attribute to False:

    glance image-update id --protected False
    NOTE:

    id indicates the image UUID obtained from the id field.

  8. Run the following command to delete the residual ECS snapshot:

    glance image-delete id
    NOTE:

    id indicates the image UUID obtained from the id field.

Residual Orphan Child Snapshots

Context

An orphan child snapshot indicates the one whose ECS snapshot object does not exist but the associated volume snapshot exists.

Parameter Description

The name of the audit report is CascadeInstanceSnapshotAudit.csv. Table 18-25 describes parameters in the report.

Table 18-25 Parameters in the audit report

Parameter

Description

snapshot_id

Specifies the ID of the orphan child snapshot.

instance_id

Specifies the ID of the VM corresponding to the orphan child snapshot.

Impacts on the System

Orphan child snapshots occupy tenant quotas and storage capacity, but they cannot be used, wasting resources.

Possible Causes

  • The system is powered off or other faults occur during the ECS snapshot execution or deletion.
  • Users manually delete the ECS snapshot object.

Procedure

  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to obtain the name of the ECS snapshot object:

    cinder snapshot-show snapshot_id | grep sys_snapshot_ecs | awk -F '_' '{print $4}'
    NOTE:

    The value of snapshot_id is that of snapshot_id obtained from the audit report.

    Check whether the name of the ECS snapshot object corresponding to the volume snapshot is obtained.

    • If yes, go to 4.
    • If no, contact technical support for assistance.

  4. Run the following command to check the ID of the ECS snapshot object corresponding to the volume snapshot from Glance:

    glance image-list | grep image_name | awk -F '|' '{print $2}'

    NOTE:

    image_name: indicates the ECS snapshot object name obtained in 3.

    Check whether the ID of the ECS snapshot object corresponding to the volume snapshot is obtained.
    • If yes, go to 5.
    • If no, go to 6.

  5. Run the following command to check whether any command output is displayed:

    glance image-show image_id | grep snapshot_id

    NOTE:

    image_id: indicates the ECS snapshot object ID obtained in 4.

    The value of snapshot_id is that of snapshot_id obtained from the audit report.

    • If yes, the snapshot is not an orphan child snapshot. In this case, no further action is required, and you need to contact technical support for assistance.
    • If no, the snapshot is an orphan child snapshot. In this case, go to 6.

  6. Run the following command to delete the volume snapshot:

    cinder snapshot-delete snapshot-id

    Check whether the command is successfully executed.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.
    NOTE:

    If "ERROR" is displayed, contact technical support for assistance.

Stuck Management Volumes

Context

This audit item is used to audit volumes which are only used in the cascading system. Currently, volumes which only exist in the cascading system are used when management VMs are created in the cascading system. After the management VMs are migrated, stuck volumes may be generated.

If a volume is kept in a transient state (including deleting, error_deleting, error_attaching, error_detaching, attaching, detaching, error_extending, uploading, retyping, backing-up, restoring-backup, error, reserved, and maintenance) for more than 24 hours, restore the volume based on site conditions.

Parameter Description

The name of the audit report is CascadingMgmtVolumeStatusAudit.csv. Table 18-26 describes the parameters in the report.

Table 18-26 Parameters in the audit report

Parameter

Description

volume_id

Specifies the volume ID.

status

Specifies the volume status.

Possible Causes

  • An exception occurred during a volume service operation, delaying the update of the volume status.
  • An exception occurred during volume migration.

Impacts on the System

The volume is unavailable but occupies resources in the cascading system.

Procedure

  1. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command in the cascading FusionSphere OpenStack system to obtain volume information:

    cinder show uuid

    Check whether the volume status is consistent with that recorded in the audit report.

    • If yes, go to the next step.
    • If no, the volume was unstable when the audit was performed. No further action is required.

  4. Determine the handling method based on the volume status. For details, see section Stuck Volumes-Table2 Stuck volume handling methods.

Common Operations

Setting the VM Status

NOTE:

This section describes how to use a script to roll back a VM status and VM power status to specified statuses. The new statuses are transmitted from the invoking location.

  1. Log in to the first host in the AZ. For details, see Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command to query the External OM plane IP address of a controller node. For details, see Command Execution Methods

    cps host-list

    The node whose roles contains controller is a controller node, and its omip value is the External OM plane IP address.

  4. Perform the following operations to log in to the host with the controller role deployed:
  5. Run the following command to switch to user fsp:

    su fsp

  6. Run the following command to log in to the host with the controller role deployed:

    ssh fsp@omip

  7. If "Enter passphrase for key" is prompted, enter the default password Huawei@CLOUD8!.
  8. Run the following command to switch to user root. The default password Huawei@CLOUD8!.

    su - root

  9. Run the following command to change the VM status in the database:

    python2.7 /usr/bin/info-collect-script/audit_resume/rectify_vm_state.py --uuid uuid --vm-state vm_state --power-state power_state
    NOTE:

    The value of uuid is that in the audit report.

    vm_state: Set to the target VM status.

    The value of power_state needs to be set to the target VM power status.

    The values of vm_state and power_state are the required VM status and power status, which are transmitted from the invoking location.

    The VM status and power status can be set at the same time or separately.

    Check whether the statuses are successfully set.

    rectify the vm state or power state success
    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Restoring VMs with Resizing or Cold Migration Exception

Scenario

Restore the abnormal VMs caused by resizing or cold migration failure at the Cascading OpenStack.

Procedure

  1. Log in to the cascaded FusionSphere OpenStack system where the VM locates based on the information about the host accommodating the VM. For details, see section Using SSH to Log In to a Host.
  2. Import environment variables. For details, see Importing Environment Variables.
  3. Run the following command in the cascaded FusionSphere OpenStack system to query the VM status:

    nova show uuid
    NOTE:

    The value of uuid is that of the faulty VM at the Cascaded OpenStack.

    • If the value of vm_state is resized and the value of power state is 1, go to 4.
    • If the value of vm_state is active, go to 7.
    • If the value of vm_state is other values, go to 13.

  4. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  5. Run the following command in the cascading FusionSphere OpenStack system to reset the VM status:

    nova reset-state uuid --active
    NOTE:

    The value of uuid is that of hyper_vm_name obtained from the audit report.

  6. Rectify the fault based on VM Status Inconsistency. No further action is required.
  7. Log in to the cascading FusionSphere OpenStack system. For details, see Using SSH to Log In to a Host.
  8. Run the following command in the cascading FusionSphere OpenStack system to obtain the VM flavor information at the Cascading OpenStack:

    nova show uuid
    NOTE:

    The value of uuid is that in the audit report. In the command output, flavor indicates the current VM flavor.

  9. Log in to the cascaded FusionSphere OpenStack system where the VM locates.
  10. Run the following command in the cascaded FusionSphere OpenStack system to obtain the flavor used by the VM at the Cascaded OpenStack:

    nova show uuid
    NOTE:

    The value of uuid is that of hyper_vm_name obtained from the audit report.

  11. Check whether the VM flavor at the Cascading OpenStack is the same as that at the Cascaded OpenStack.

    • If yes, go to 12.
    • If no, go to 13.

  12. Run the following command in the cascading FusionSphere OpenStack system to reset the VM status:

    nova reset-state uuid –active
    NOTE:

    The value of uuid is that of the faulty VM at the Cascading OpenStack.

  13. Contact technical support for assistance.
Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 35776

Downloads: 31

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next