No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Node Faults

Node Faults

Host OS Failure

Symptom

After a host OS fails, the OS must be reinstalled and all VMs on the hosts must be rebuilt. Either of the following conditions indicates a host failure:

  • The OS of the compute node is faulty.
  • The OS of a single controller node is faulty.
    NOTE:
    • This process applies to OS faults of only physical servers.
Possible Causes

One or more host disks are faulty.

Procedure

Operations in cascading scenarios

  1. On the cascading FusionSphere OpenStack web client, choose O&M > Capacity Expansion and set PXE Boot Hosts to ON.

    NOTE:
    • During the installation of the server OS, if the RAID controller card supports the passthrough feature, it can locate the mounted passthrough-enabled hard disk for installing the OS. If a RAID controller card does not support the passthrough feature, only the virtual disks configured on the RAID controller card can be used for installing the OS.
    • Before the host OS is installed in PXE mode, ensure that the JBOD mode is disabled and the host is restarted for the setting to take effect. If the PXE operation progress is complete or is suspended at 90% or a larger value, enable the JBOD mode and restart the host.

      For details, see section "Operation & Maintenance" > "HUAWEI V5 Server RAID Controller Card User Guide" in HUAWEI Rack Server Product Documentation-(V100R005_04).

  2. Manually start the faulty host, set it to boot from PXE during the starting process, and reinstall the host OS.

    To set the boot device to PXE, you can use the remote control function of the server BMC system or use a keyboard/video/mouse (KVM) to connect to the server.

    The installation takes about 10 to 15 minutes.

    NOTE:

    The installation will fail (services are unavailable although the OS is installed) if the disk drive letter changes after the installation, which may be caused by the following operations and can be handled according to How Do I Handle Drive Letter Changes?:

    • Adjust the host RAID arrays, including reconstructing, recreating, and deleting RAID arrays.
    • Adjust the FusionSphere OpenStack system disk or disk partitions, including expanding the disk capacity and adding partitions.
    • Attach the SSD card used by the MongoDB service to the host. This issue must be handled according to How Do I Handle Drive Letter Changes?.

    If the reinstallation fails with none of the operations above performed, contact technical support for assistance.

    After the host OS is reinstalled, the passwords of OS accounts, including users root and fsp, will be reset to the default ones.

  3. If the operating system of the controller node is faulty, perform 3 to 7.

    On the FusionSphere OpenStack web client, choose O&M > Component Maintenance, locate the row that contains the host ID to be restored, and click View Component.

  4. Check whether the MongoDB or Swift service on the faulty host is in the fault state.

    • If yes, go to 5.
    • If no, go to 8.

  5. The controller nodes are interconnected with FusionStorage, and MongoDB is deployed remotely. Before the storage is recovered, MongoDB cannot be recovered and the progress cannot reach 100%. In this case, recover the storage first. If the faulty KVM host has been connected to FusionStorage Block, recover storage by following the instructions in "Server OS Faults" in FusionStorage V100R006C30 Product Documentation. Do not reinstall the server OS because it has been restored.
  6. On the FusionSphere OpenStack web client, choose Configuration > Disk. On the displayed page, select the faulty host and check whether error message "Configuration failed. Click the host state to view details." is displayed.

    • If yes, click on the right of Expand Storage Capacity and go to 7.
    • If no, go to 10.

  7. Wait for 5 minutes and run the following command to check whether the MongoDB component is restored:

    cps template-instance-list --service mongodb mongodb

    • If yes, go to 10.
    • If no, contact technical support for assistance.

  8. View the installation progress on the Summary page on the FusionSphere OpenStack web client and check whether the installation is complete.

    The installation is complete if the progress reaches 100%.

    NOTE:

    If the reinstalled host is the first host in the FusionSphere OpenStack system, choose Management > Capacity Expansion to query the host installation status, because the host status displayed on the Summary page is always faulty. After the host reinstallation progress reaches 70%, the actual installation progress will be updated on the Summary page.

    • If yes, go to 9.
    • If no, contact technical support for assistance.

  9. On the web client of the cascading FusionSphere OpenStack system that you have logged in to, choose O&M > Capacity Expansion. On the Capacity Expansion page, disable PXE Boot Hosts.
  10. Select the new host and click Reboot.

    NOTE:

    The host automatically synchronizes system configuration after it is successfully installed. Some advanced functions, such as kernel configurations, will take effect only after the host restarts.

    To ensure that all configurations can take effect, restart hosts after the reinstallation.

  11. Use PuTTY to log in to the controller host in the AZ through the IP address of the External OM plane.

    The username is fsp, and the default password is Huawei@CLOUD8.
    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for the required parameter on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
      • Region Type I scenario:

        Cascading system: Cascading-ExternalOM-Reverse-Proxy

        Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

      • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  12. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Huawei@CLOUD8!.

  13. Run the TMOUT=0 command to disable user logout upon system timeout.
  14. Import environment variables. For details, see Importing Environment Variables.
  15. Run the following command to query information about VMs on the host:

    1. Run the following command to enter the secure operation mode:

      runsafe

      Information similar to the following is displayed:

      Input command:
    2. Run the following command to query information about VMs on the host:

      nova list --all-t --host host_id

      • If the host accommodates VMs (including the service VM), take a note of the service VM ID and go to 16.
      • If the host accommodates only management VMs, such as VRM VMs, FSM VMs, and Service OM VMs, go to 19.

  16. Run the following command to query and record the VMs configured with config_drive on the faulty host:

    nova list --all-t --host Host ID --field name,config_drive

    +--------------------------------------+------------------------+--------------+   
    | ID                                   | Name                   | Config Drive |
    +--------------------------------------+------------------------+--------------+
    | e2726fe3-6a84-4f96-8023-8f8754b95174 | APIGW-PODLB-02         |              |
    | f53ee430-1a7c-4d25-b48a-653f50a8329c | APIGWLB-02             |              |
    | ca5fd371-f76d-47f4-a0a4-d6c6f5f441c8 | APIGWZK-02             |              |
    | 83f668bd-1cc8-428d-aca1-505309d1d1ab | CONSOLE-01             |              |
    | 986a5c36-1baa-4728-a61f-334ce83244d0 | HAPROXY-02             | true         |
    +--------------------------------------+------------------------+--------------+            

  17. Run the following command to query the VM that is started from an image from VMs recorded in 15:

    Run the runsafe command to enter the secure operation mode, and run the following command to query detailed information about a VM:

    nova show vm_id

    The VM boots from the image if information similar to the following is displayed:

    | os-extended-volumes:volumes_attached | []                                                |

    If information similar to the following is displayed, run the cinder show volumes_id command:

    | os-extended-volumes:volumes_attached | [{"id": "a596caae-a79c-4d61-9abe-db17f01fb7c0"}]  |

    In the command, volumes_id specifies the id value in the os-extended-volumes:volumes_attached field. If multiple IDs are available, run the cinder show volumes_id command for each ID.

    If the bootable value in the output of each command is false, the VM is booted from an image. Otherwise, the VM is booted from a volume.

  18. Rebuild VMs recorded in 16 and 17 in sequence.

    For details, see Rebuilding VMs on Other Hosts.

  19. If the faulty host accommodates the VRM VM, restore the VRM VM by following the instructions provided in "OS Fault of A Single VRM VM" in the FusionSphere V100R006C10SPC600 Product Documentation (Server Virtualization, FusionCompute V100R006C10SPH105).
  20. If FSM VMs have been deployed on the faulty host, perform restoration based on "The Management Node Is Faulty" > "A Single FSM VM Fails (Deployed on a FusionSphere OpenStack Host)" in the FusionStorage V100R006C30 Block Storage Service Troubleshooting Guide.
  21. If the Service OM VM is deployed on the faulty host, perform operations provided in A Single Service OM VM Faulty in Active/Standby Deployment Mode to rebuild the Service OM VM.
  22. On Service OM, choose Services > Service OM > Centralized O&M > Alarm > Alarm List. On the displayed page, check whether there is any alarm about the installed host.

    • If yes, go to 23.
    • If no, go to 24.

  23. Locate the row that contains the alarm, click Clear in the Operation column, select the alarm to be deleted, and click Clear in the upper left corner.
  24. Log in to the FusionSphere OpenStack web client.

    For details, see Logging In to the FusionSphere OpenStack Web Client (ManageOne Mode).

  25. On the FusionSphere OpenStack web client, choose Configuration > Disk to check whether the management group of the faulty host uses a remote storage device.

    • If yes, go to 26.
    • If no, no further action is required.

  26. Use PuTTY to log in to the controller node through the reverse proxy IP address of the External OM plane.

    The default username is fsp, and the default password is Huawei@CLOUD8.

    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for the required parameter on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
      • Region Type I scenario:

        Cascading system: Cascading-ExternalOM-Reverse-Proxy

        Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

      • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  27. Run the following command to obtain the hash ID of the faulty host:

    python -c 'print(hash("hostid"))'

    hostid indicates the ID of the faulty host. The command output is the hash ID of the faulty host.

  28. Run the following command to obtain the WWN list of the remote LUNs reconnected to the host:

    cat /etc/huawei/fusionsphere/osConfig.storage/cfg/osConfig.storage.cfg.effect | python -mjson.tool | grep lun_wwn

    Information similar to the following is displayed:

     "lun_wwn": "6487b6b1004bc09524031c1e0000003d"

  29. Log in to DeviceManager. On the homepage, choose Provisioning > LUN. On the displayed page, query the host hash IDs by name, and check each LUN one by one. If a WWN is not in the WWN list obtained in 28, the LUN is to be cleared. Record and then delete to-be-cleared LUNs from the corresponding LUN groups on DeviceManager. For details, see product documentation for the corresponding model of OceanStor.

Operations in non-cascading scenarios

  1. On the FusionSphere OpenStack web client, choose O&M > Capacity Expansion and set PXE Boot Hosts to ON.

    NOTE:
    • During the installation of the server OS, if the RAID controller card supports the passthrough feature, it can locate the mounted passthrough-enabled hard disk for installing the OS. If a RAID controller card does not support the passthrough feature, only the virtual disks configured on the RAID controller card can be used for installing the OS.
    • Before the host OS is installed in PXE mode, ensure that the JBOD mode is disabled and the host is restarted for the setting to take effect. If the PXE operation progress is complete or is suspended at 90% or a larger value, enable the JBOD mode and restart the host.

      For details, see section "Operation & Maintenance" > "HUAWEI V5 Server RAID Controller Card User Guide" in HUAWEI Rack Server Product Documentation-(V100R005_04).

  2. Manually start the faulty host, set it to boot from PXE during the starting process, and reinstall the host OS.

    To set the boot device to PXE, you can use the remote control function of the server BMC system or use a keyboard/video/mouse (KVM) to connect to the server.

    The installation takes about 10 to 15 minutes.

    NOTE:

    The installation will fail (services are unavailable although the OS is installed) if the disk drive letter changes after the installation, which may be caused by the following operations and can be handled according to How Do I Handle Drive Letter Changes?:

    • Adjust the host RAID arrays, including reconstructing, recreating, and deleting RAID arrays.
    • Adjust the FusionSphere OpenStack system disk or disk partitions, including expanding the disk capacity and adding partitions.
    • Attach the SSD card used by the MongoDB service to the host. This issue must be handled according to How Do I Handle Drive Letter Changes?.

    If the reinstallation fails with none of the operations above performed, contact technical support for assistance.

    After the host OS is reinstalled, the passwords of OS accounts, including users root and fsp, will be reset to the default ones.

  3. If the operating system of the controller node is faulty, perform 3 to 7.

    On the FusionSphere OpenStack web client, choose O&M > Component Maintenance, locate the row that contains the host ID to be restored, and click View Component.

  4. Check whether the MongoDB or Swift service on the faulty host is in the fault state.

    • If yes, go to 5.
    • If no, go to 8.

  5. The controller nodes are interconnected with FusionStorage, and MongoDB is deployed remotely. Before the storage is recovered, MongoDB cannot be recovered and the progress cannot reach 100%. In this case, recover the storage first. If the faulty KVM host has been connected to FusionStorage Block, recover storage by following the instructions in "Server OS Faults" in FusionStorage V100R006C30 Product Documentation. Do not reinstall the server OS because it has been restored.
  6. On the FusionSphere OpenStack web client, choose Configuration > Disk. On the displayed page, select the faulty host and check whether error message "Configuration failed. Click the host state to view details." is displayed.

    • If yes, click on the right of Expand Storage Capacity and go to 7.
    • If no, go to 8.

  7. Wait for 5 minutes and run the following command to check whether the MongoDB component is restored:

    cps template-instance-list --service mongodb mongodb

    • If yes, go to 8.
    • If no, contact technical support for assistance.

  8. View the installation progress on the Summary page on the FusionSphere OpenStack web client and check whether the installation is complete.

    The installation is complete if the progress reaches 100%.

    NOTE:

    If the reinstalled host is the first host in the FusionSphere OpenStack system, choose Management > Capacity Expansion to query the host installation status, because the host status displayed on the Summary page is always faulty. After the host reinstallation progress reaches 70%, the actual installation progress will be updated on the Summary page.

    • If yes, go to 9.
    • If no, contact technical support for assistance.

  9. Select the new host and click Reboot.

    NOTE:

    The host automatically synchronizes system configuration after it is successfully installed. Some advanced functions, such as kernel configurations, will take effect only after the host restarts.

    To ensure that all configurations can take effect, restart hosts after the reinstallation.

  10. On the web client that you have logged in to, choose O&M > Capacity Expansion. On the Capacity Expansion page, disable PXE Boot Hosts.
  11. Use PuTTY to log in to the controller host in the AZ through the IP address of the External OM plane.

    The username is fsp, and the default password is Huawei@CLOUD8.
    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for the required parameter on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
      • Region Type I scenario:

        Cascading system: Cascading-ExternalOM-Reverse-Proxy

        Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

      • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  12. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Huawei@CLOUD8!.

  13. Run the TMOUT=0 command to disable user logout upon system timeout.
  14. Import environment variables. For details, see Importing Environment Variables.
  15. If the controller node is faulty and the hardware SDN is connected, reconnect the AC of the restored node. For details, see "Connecting the AC to FusionSphere" in HUAWEI CLOUD Stack 6.5.0 Network Configuration Best Practice (Region Type II).
  16. Run the following command to query information about VMs on the host:

    1. Run the following command to enter the secure operation mode:

      runsafe

      Information similar to the following is displayed:

      Input command:
    2. Run the following command to query information about VMs on the host:

      nova list --all-t --host host_id

      • If the host accommodates VMs (including the service VM), take a note of the service VM ID and go to 17.
      • If the host accommodates only management VMs, such as VRM VMs, FSM VMs, and Service OM VMs, go to 20.
      • If the host does not accommodate any VMs, go to 23.

  17. Run the following command to query and record the VMs configured with config_drive on the faulty host:

    nova list --all-t --host Host ID --field name,config_drive

    +--------------------------------------+------------------------+--------------+   
    | ID                                   | Name                   | Config Drive |
    +--------------------------------------+------------------------+--------------+
    | e2726fe3-6a84-4f96-8023-8f8754b95174 | APIGW-PODLB-02         |              |
    | f53ee430-1a7c-4d25-b48a-653f50a8329c | APIGWLB-02             |              |
    | ca5fd371-f76d-47f4-a0a4-d6c6f5f441c8 | APIGWZK-02             |              |
    | 83f668bd-1cc8-428d-aca1-505309d1d1ab | CONSOLE-01             |              |
    | 986a5c36-1baa-4728-a61f-334ce83244d0 | HAPROXY-02             | true         |
    +--------------------------------------+------------------------+--------------+            

  18. Run the following command to query the VM that is started from an image from VMs recorded in 16:

    Run the runsafe command to enter the secure operation mode, and run the following command to query detailed information about a VM:

    nova show vm_id

    The VM boots from the image if information similar to the following is displayed:

    | os-extended-volumes:volumes_attached | []                                                |

    If information similar to the following is displayed, run the cinder show volumes_id command:

    | os-extended-volumes:volumes_attached | [{"id": "a596caae-a79c-4d61-9abe-db17f01fb7c0"}]  |

    In the command, volumes_id specifies the id value in the os-extended-volumes:volumes_attached field. If multiple IDs are available, run the cinder show volumes_id command for each ID.

    If the bootable value in the output of each command is false, the VM is booted from an image. Otherwise, the VM is booted from a volume.

  19. Rebuild VMs recorded in 17 and 18 in sequence.

    For details, see Rebuilding VMs on Other Hosts.

  20. If the faulty host accommodates the VRM VM, restore the VRM VM by following the instructions provided in "OS Fault of A Single VRM VM" in the FusionSphere V100R006C10SPC600 Product Documentation (Server Virtualization,FusionCompute V100R006C10SPH105).
  21. If FSM VMs have been deployed on the faulty host, perform restoration based on "The Management Node Is Faulty" > "A Single FSM VM Fails (Deployed on a FusionSphere OpenStack Host)" in the FusionStorage V100R006C30 Block Storage Service Troubleshooting Guide.
  22. If the Service OM VM is deployed on the faulty host, perform operations provided in A Single Service OM VM Faulty in Active/Standby Deployment Mode to rebuild the Service OM VM.
  23. If the OS of the compute node is faulty and the KVM host has been connected to FusionStorage Block, recover storage by following the instructions in "Server OS Faults" in FusionStorage V100R006C30 Product Documentation. Do not reinstall the server OS because it has been restored.

    NOTE:

    If the blockstorage-driver, blockstorage-driver-vrmxxx, or blockstorage-driver-kvmxxx role (xxx can be 001 or 002) is assigned, install an eBackup driver by performing step 1 in "Installing Disaster Recovery Services" > "Installation and Initial Configuration (CSBS)" > "Installing and Configuring OceanStor BCManager eBackup" > "Interconnecting with FusionSphere OpenStack."

  24. On Service OM, choose Services > Service OM > Centralized O&M > Alarm > Alarm List. On the displayed page, check whether there is any alarm about the installed host.

    • If yes, go to 25.
    • If no, no further action is required.

  25. Locate the row that contains the alarm, click Clear in the Operation column, select the alarm to be deleted, and click Clear in the upper left corner.
  26. Log in to the FusionSphere OpenStack web client.

    For details, see Logging In to the FusionSphere OpenStack Web Client (ManageOne Mode).

  27. If a controller node is faulty, on the FusionSphere OpenStack web client, choose Configuration > Disk to check whether the management group of the faulty host uses a remote storage device.

    • If yes, go to 28.
    • If no, no further action is required.

  28. Use PuTTY to log in to the first FusionSphere OpenStack node through the IP address of the External OM plane.

    The default username is fsp, and the default password is Huawei@CLOUD8.

    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for the required parameter on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
      • Region Type I scenario:

        Cascading system: Cascading-ExternalOM-Reverse-Proxy

        Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

      • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  29. Run the following command to obtain the hash ID of the faulty host:

    python -c 'print(hash("hostid"))'

    hostid indicates the ID of the faulty host. The command output is the hash ID of the faulty host.

  30. Run the following command to obtain the WWN list of the remote LUNs reconnected to the host:

    cat /etc/huawei/fusionsphere/osConfig.storage/cfg/osConfig.storage.cfg.effect | python -mjson.tool | grep lun_wwn

    Information similar to the following is displayed:

     "lun_wwn": "6487b6b1004bc09524031c1e0000003d"

  31. Log in to DeviceManager. On the homepage, choose Provisioning > LUN. On the displayed page, query the host hash IDs by name, and check each LUN one by one. If a WWN is not in the WWN list obtained in 30, the LUN is to be cleared. Record and then delete to-be-cleared LUNs from the corresponding LUN groups on DeviceManager. For details, see product documentation for the corresponding model of OceanStor.

Rectifying the Disk Partition Fault on Compute Nodes

Symptom
  • VMs fail to be started by tenant users.
  • The standby database fails to be switched to the active database.
Possible Causes

The image partition is read-only, the file system is damaged, or the data is abnormal. As a result, the VM cannot be started normally.

Procedure
  • If remote storage is used, rectify the fault based on Remote Storage Is Faulty.
  • If local storage is used, perform the steps in this section to rectify the fault.
  1. Use PuTTY to log in to the first FusionSphere OpenStack node through the IP address of the External OM plane.

    The default username is fsp, and the default password is Huawei@CLOUD8.

    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for the required parameter on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
      • Region Type I scenario:

        Cascading system: Cascading-ExternalOM-Reverse-Proxy

        Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

      • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  2. Run the TMOUT=0 command to disable user logout upon system timeout.
  3. Import environment variables. For details, see Importing Environment Variables.
  1. Run the following command to check whether the partition is read-only:

    cat /proc/mounts | grep -w "ro," | grep /dev/mapper/cpsVG-image

    If ro is displayed, the partition is read-only.
    • If yes, go to 5.
    • If no, contact technical support for assistance.

  2. Run the following command to unmount the read-only partition:

    umount /dev/mapper/cpsVG-image

    If the partition fails to be unmounted and information similar to Figure 8-1 is displayed, the partition is being used by some programs. In this case, contact technical support for assistance.

    Figure 8-1 Command output

  3. Run the following command to restore the file system:

    fsck.ext3 /dev/mapper/cpsVG-image

    NOTE:

    If the partition uses the file systems excluding ext3, use other fsck file restoration commands. For example, fsck.ext2 is used for the ext2 file system.

    If no error information is displayed in the command output, the restoration is successful.

    • If the automatic restoration is successful, go to 7.
    • If the automatic restoration fails and the system partition is damaged, contact technical support for assistance.
      NOTE:

      If the restoration fails, you also can run the fsck -y /dev/mapper/cpsVG-image command to forcibly restore the file system. The forcible restoration may fail. As a result, data is lost or damaged.

  4. Run the following command to mount the partition:

    mount /dev/mapper/cpsVG-image /opt/HUAWEI/image

    /opt/HUAWEI/image indicates the directory to which the partition is mounted, and you can specify the directory based on site requirements.

Handling the Issue that a Controller Node VM in a Cascaded FusionSphere OpenStack System Is Faulty (Region Type I)

Symptom

The VM cannot be started and cannot be restored using other methods.

Possible Causes
  • The OS is faulty.
  • The VM file is damaged or the resource is abnormal.
Procedure
NOTE:

This operation applies to the scenarios where only one of the three controller node VMs in the cascaded FusionSphere OpenStack system is faulty.

  1. Use PuTTY to log in to any controller node in the cascading system through the IP address of the External OM plane.

    The username is fsp, and the default password is Huawei@CLOUD8.
    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for Cascading-ExternalOM-Reverse-Proxy on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation.

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Huawei@CLOUD8!.

  3. Run the TMOUT=0 command to disable user logout upon system timeout.
  4. Import environment variables.

    For details, see Importing Environment Variables.

  5. Run the following command to query the subnet of the Internal Base plane in the corresponding cascaded system:

    neutron net-list | grep cascaded_internal_base

  6. Run the following command to check whether DHCP is enabled (true: yes, false: no):

    neutron subnet-show subnet_id | grep dhcp

    NOTE:

    In the preceding command, subnet_id indicates the subnet ID obtained in 5.

    • If yes, go to 3.
    • If no, go to 4.

  7. Run the following command to disable DHCP:

    neutron subnet-update --disable-dhcp subnet_id

    NOTE:

    In the preceding command, subnet_id indicates the subnet ID obtained in 5.

  8. On the web client of the cascaded FusionSphere OpenStack system that you have logged in to, choose O&M > Capacity Expansion. On the Capacity Expansion page, disable PXE Boot Hosts.
  9. On the web client of the cascading FusionSphere OpenStack system that you have logged in to, click Virtual Deploy to switch to the Environment Management page.

  10. In the Name column, click the environment name to go to the VM Management page.

    Take a note of the name and ID of the faulty VM.

  11. Perform 1 to 4 to log in to the first node of the cascading FusionSphere OpenStack system.
  12. Run the following command to query the flavor name of the faulty VM based on the VM ID queried in 10:

    nova show cascadedvm_id |grep flavor

    cascadedvm_id indicates the ID of the faulty VM queried in 10.

    For example, if the ID of the faulty VM is 88198e65-de15-47cc-9a89-46bfaac984ed in 10, run the following command:

    nova show 88198e65-de15-47cc-9a89-46bfaac984ed |grep flavor

  13. Run the following command to query the UUID of the host accommodating the faulty VM based on the VM ID queried in 10:

    nova show cascadedvm_id |grep hypervisor_hostname

    cascadedvm_id indicates the ID of the faulty VM queried in 10.

    For example, if the ID of the faulty VM is 88198e65-de15-47cc-9a89-46bfaac984ed in 10, run the following command:

    nova show 88198e65-de15-47cc-9a89-46bfaac984ed |grep hypervisor_hostname

  14. Run the following command to check whether a data volume is attached to the faulty VM based on the VM ID queried in 10:

    cinder list |grep cascadedvm_id

    cascadedvm_id indicates the ID of the faulty VM queried in 10.

    For example, if the ID of the faulty VM is 88198e65-de15-47cc-9a89-46bfaac984ed in 10, run the following command:

    cinder list |grep 88198e65-de15-47cc-9a89-46bfaac984ed

    If the command output shown in Figure 8-2 is displayed, only the boot volume is attached to the faulty VM, and the SSD passthrough is configured for the faulty VM.

    Figure 8-2 Command output in scenario 1

    If the command output shown in Figure 8-3 is displayed, the boot and data volumes are attached to the faulty VM. Make a note of the data volume size.

    Figure 8-3 Command output in scenario 2

  15. On the cascaded FusionSphere OpenStack web client, choose Virtual Deploy to go to the Environment Management page. On the displayed page, click the VM name to go to the VM Management page. On the displayed page, click Create.

  16. In the displayed Create VM dialog box, set VM parameters.

    NOTE:

    Ensure that the specifications and configuration of the new VM are consistent with those of the faulty VM.

    • Name: specifies the VM instance name, which must be unique. The value must start with an uppercase or lowercase letter. The name contains only letters, digits, hyphens (-), and underscores (_).
    • Nova Availability Zone: specifies the AZ where the VM is to be deployed. Set it to manage-az.
    • Cinder Availability Zone: specifies the AZ where the VM volume is to be created. Set it to manage-az.
    • Boot volume size (G): specifies the size of the volume from which the VM is booted. Set it to 300 GB.
    • Data volume size (G): specifies the data volume size of the VM. If the SSD passthrough is configured for the VM queried in 14, keep this parameter left blank. If the data volume is attached to the VM queried in 14, enter the original data volume size.
    • Host: specifies the ID of the host accommodating the VM. One VM maps one compute node in the cascading FusionSphere OpenStack system. You can query the host ID on the Summary page on the web client of the cascading FusionSphere OpenStack system or run the cps host-list command. You are advised to select compute nodes in the cascading FusionSphere OpenStack system.
    • Image: specifies the image name. Set it to cascaded_image.
    • Flavor: specifies the VM specifications. Select the cascaded_vm-related specifications. Select the flavor name of the faulty VM queried in 12.
    • Volume Backend Name: specifies the name of the storage backend configured for the KVM resource pool. You can query the storage backend name by choosing Resource Pool > Configure Storage Cluster on the web client of the cascading FusionSphere OpenStack system.

    • Cascaded internal base: indicates the Internal Base plane in the cascaded system.
    • Cascaded external api: indicates the External API plane in the cascaded system.
    • Cascaded external om: indicates the External OM plane in the cascaded system.
    • Cascaded tunnel bearing: indicates the Tunnel Bearing plane in the cascaded system.
    • Cascaded storage data0 (optional): indicates the Storage Data 0 plane in the cascaded system.
    • Cascaded storage data1 (optional): indicates the Storage Data 1 plane in the cascaded system.
    • Cascaded provision (optional): indicates the network plane in bare metal scenarios. In KVM scenarios, retain the default settings.
    • Cascaded BMC Base (optional): indicates the network plane in bare metal scenarios. In KVM scenarios, retain the default settings.

  17. Log in to PuTTY in the cascading FusionSphere OpenStack system, run the following command to query the MAC address of the new VM, and make a note of the address:

    neutron port-list|grep cascadedvm_new

    cascadedvm_new indicates the name of the VM instance created in 16.

    If the VM instance name created in 16 is cascadedvm004, run the following command:

    neutron port-list|grep cascadedvm004

  18. Perform 1 to 4 to log in to the first node of the cascaded FusionSphere OpenStack system.
  19. On the web client of the cascaded FusionSphere OpenStack system, click Summary and query the ID of the faulty controller node.
  20. Run the following commands to configure the mapping between the ID of the faulty host and the MAC address of the new host:

    cps hostid-mac-add --mac MAC address of the new host --hostid Original host ID

    cps commit

    In the command, Original host ID indicates the ID of the faulty controller node queried in 19, and MAC address of the new host indicates the MAC address of the newly-created VM queried in 17.

    Figure 8-4 Command output

    As shown in Figure 8-4, if the command output contains "hostid already defined!", run the following commands to update the mapping between the host ID and the MAC address of the new VM:

    cps hostid-mac-update --mac MAC address of the new host --hostid Original host ID

    cps commit

    Run the following command to check whether the configured faulty host ID is consistent with the MAC address of the new host:

    cps hostid-mac-list

  21. On the web client of the cascaded OpenStack system that you have logged in, choose O&M > Capacity Expansion. The record Wait for PXE Boot is displayed on the Capacity Expansion page. After confirming the MAC address queried in 17, click .

  22. Click Summary and query the installation progress of the faulty host. If the blockstorage role is deployed on the faulty host, the installation is paused when the installation progress reaches 96%. For details, see "Server OS Faults" in the FusionStorage V100R006C30 Product Documentation and perform operations provided in "Install FSA for the server using the CLI" and "Restore storage resources" in sequence.
  23. On the web client of the cascaded FusionSphere OpenStack system that you have logged in to, choose O&M > Capacity Expansion. On the Capacity Expansion page, disable PXE Boot Hosts.

NAT Gateway Service Provisioning Failure (Region Type I)

Symptom

ManageOne Tenant Portal cannot deliver new NAT gateway services.

Possible Causes

The data plane of the NAT gateway is deployed in active/standby mode. When control component neutron-nat-data-agent on the active node of the NAT gateway becomes faulty, the data plane of the NAT gateway cannot receive new services.

Fault Diagnosis

Check whether the node where the neutron-nat-gw-data-agent component in the faulty status is located is the active NAT gateway node.

  • If yes, perform an active/standby switchover.
  • If no, contact technical support for assistance.
Procedure
  1. Use PuTTY to log in to any host in the AZ through the IP address of the External OM plane.

    The default account is fsp, and the default password is Huawei@CLOUD8.

    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for Cascading-ExternalOM-Reverse-Proxy on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation.

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Huawei@CLOUD8!.

  3. Run the TMOUT=0 command to disable user logout upon system timeout.
  4. Import environment variables.

    For details, see Importing Environment Variables.

  5. Run the following command to query the node where the neutron-nat-gw-data-agent component in the fault status is located:

    cps template-instance-list --service nat-gateway neutron-nat-gw-data-agent

    +----------------+---------------------------+--------+--------------------------------------+----------------+
    | instanceid     | componenttype             | status | runsonhost                           | omip           |
    +----------------+---------------------------+--------+--------------------------------------+----------------+
    | agt_0000000001 | neutron-nat-gw-data-agent | active | 53034D15-B603-084B-82D6-AAB15E9F3503 | 192.168.43.169 |
    | agt_0000000000 | neutron-nat-gw-data-agent | fault  | 45EBDB95-711D-7640-8B5C-EE3D98561285| 192.168.43.158|
    +----------------+---------------------------+--------+--------------------------------------+----------------+

  6. Use PuTTY to log in to the NAT gateway node through the IP address obtained in 5.

    The default account is fsp, and the default password is Huawei@CLOUD8.
    NOTE:

    In the Region Type I scenario, the system supports login authentication using a password or private-public key pair. If a private-public key pair is used for login authentication, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.

  7. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Huawei@CLOUD8!.

  8. Run the TMOUT=0 command to disable user logout upon system timeout.
  9. Import environment variables.

    For details, see Importing Environment Variables.

  10. Check whether the node where the neutron-nat-gw-data-agent component in the fault status is located is the active NAT gateway node based on the command output. Check whether status is MASTER in the command output.

    /usr/sbin/ugw_shell vrrp show all

    ===============================================
    vrrpid   enable   status    ha-link   
    -----------------------------------------------
    1        yes      MASTER     eth0.2038      
    ===============================================
    • If yes, go to 11.
    • If no, contact technical support for assistance.

  11. Run the following command to perform an active/standby switchover:

    cps host-template-instance-operate --service nat-gateway neutron-nat-gw-dataplane --action stop --host hostid

    +--------------------------+--------------------------------------+--------+---------+
    | template                 | runsonhost                           | action | result  |
    +--------------------------+--------------------------------------+--------+---------+
    | neutron-nat-gw-dataplane | 45EBDB95-711D-7640-8B5C-EE3D98561285 | stop   | success |
    +--------------------------+--------------------------------------+--------+---------+

    cps host-template-instance-operate --service nat-gateway neutron-nat-gw-dataplane --action start --host hostid

    +--------------------------+--------------------------------------+--------+---------+
    | template                 | runsonhost                           | action | result  |
    +--------------------------+--------------------------------------+--------+---------+
    | neutron-nat-gw-dataplane | 45EBDB95-711D-7640-8B5C-EE3D98561285 | start  | success |
    +--------------------------+--------------------------------------+--------+---------+

    hostid is the host ID obtained in 5. If success is displayed in the command output, the active/standby switchover is successful.

Handling the Issue that Hosts Are Mistakenly Connected to an AZ

Symptom

If a started host in an AZ is connected directly to the network without reinstallation in subsequent operations, this host is considered as a mistakenly added host. To avoid unnecessary alarms, you need to delete the host in a timely manner. Hosts that do not belong to an AZ are connected to this AZ by mistake, as illustrated in Figure 8-5.

Figure 8-5 Hosts mistakenly connected to an AZ
Procedure
  1. Use PuTTY to log in to any host in the AZ through the IP address of the External OM plane.

    The username is fsp, and the default password is Huawei@CLOUD8.

    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for the required parameter on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
      • Region Type I scenario:

        Cascading system: Cascading-ExternalOM-Reverse-Proxy

        Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

      • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  2. Run the following command and enter the password of the root user as prompted to switch to the root user:

    su - root

  3. Run the following command to disable user logout upon system timeout:

    TMOUT=0

  4. Import environment variables. For details, see Importing Environment Variables.
  5. Run the following command to query the host list:

    cps host-list

  6. Log in to the FusionSphere OpenStack web client.

    For details, see Logging In to the FusionSphere OpenStack Web Client (ManageOne Mode).

  7. On the Summary page, query the host list.
  8. Compare the two host lists.

    If the host list queried using the command contains a host in the fault status but the host list displayed on the web client does not contain such host, the host is considered as a mistakenly added one. In this case, take a note of its hostid value.

    NOTE:

    If multiple such hosts with the same IP address exist, you need to determine whether the hosts are mistakenly added or correctly installed but faulty based on hostid.

  9. Run the following command to query the rule list hostcfg:

    cps hostcfg-list

    Information similar to the following is displayed:
    +---------------+----------------+-----------------------------------------------------------+
    | type          | name           | hosts                                                     |
    +---------------+----------------+-----------------------------------------------------------+
    | site          | default        | hostid:                                                   |
    |               |                |                                                           |
    | storage       | default        | default:all                                               |
    |               |                |                                                           |
    | storage       | default_sdi    | capability:{boardtype: SDI}                               |
    |               |                |                                                           |
    | storage       | control_group0 | hostid:564D0074-B2AA-B928-B748-B32F52C5B8B5               |
    |               |                |                                                           |
    | storage       | control_group1 | hostid:564D159D-F177-F8B6-32DC-D7E034FF0182               |
    |               |                |                                                           |
    | storage       | control_group2 | hostid:564D4412-43F7-893F-1220-2D8F9D09B788               |
    |               |                |                                                           |
    | kernel        | default        | default:all                                               |
    |               |                |                                                           |
    | kernel        | default_sdi    | capability:{boardtype: SDI}                               |
    |               |                |                                                           |
    | network       | default        | default:all                                               |
    |               |                |                                                           |
    | resgrp-define | server         | capability:{role: sys-server}                             |
    |               |                |                                                           |
    | resgrp-define | default        | default:all                                               |
    |               |                |                                                           |
    | resgrp-define | default_sdi    | capability:{boardtype: SDI}                               |
    |               |                |                                                           |
    | resgrp-define | control_group1 | hostid:564D0074-B2AA-B928-B748-B32F52C5B8B5, 564D4412-43F |
    |               |                | 7-893F-1220-2D8F9D09B788, 564D159D-F177-F8B6-32DC-        |
    |               |                | D7E034FF0182                                              |
    |               |                |                                                           |
    +---------------+----------------+-----------------------------------------------------------+

  10. Run the following command to delete information about the incorrectly connected host from the hostcfg list:

    cps hostcfg-host-delete --host hostid=hostid --type type name

    hostid indicates the ID of the host that is incorrectly connected.

    type and name indicate the values of the corresponding fields of the host that is incorrectly connected in the hostcfg list.

  11. Run the following command to query the role of the host:

    cps host-role-list hostid

  12. Run the following command to delete the role of the host:

    cps role-host-delete --host hostid role_name

    cps commit

    hostid is that recorded in 8, and role_name indicates the role name obtained in 11.

  13. Run the following commands to delete the host that is incorrectly connected:

    cps host-delete --host hostid

    cps commit

Handling VM Cold Migration or Flavor Change Failure

Scenarios
  • A VM is set to boot from the local disk during VM creation, cold migration or flavor change is performed on a VM with data volumes, and a host is replaced. Data volumes cannot be connected on the target host. As a result, the cold migration or flavor change fails, and the VM status changes to ERROR.
  • In this case, you can run the nova ext-resize-revert server migration_id command to roll back the VM.
  • In Region Type I scenarios, you need to perform operations provided in this section only in cascaded scenarios.
Procedure
  1. Log in to the first controller node in the AZ using the IP address of the External OM plane. For details, see "Alarm Reference Information" > "System Audit (Region Type II&Region Type III)" > "Using KVM for Virtualization" > "Common Operations" > "Using SSH to Log In to a Host" in the HUAWEI CLOUD Stack 6.5.0 Alarm&Event Reference"Alarm Reference Information" > "System Audit (Region Type I)" > "FusionSphere OpenStack System Audit at the Cascading OpenStack" > "Common Operations" > "Using SSH to Log In to a Host" in the HUAWEI CLOUD Stack 6.5.0 Alarm&Event Reference.

    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for the required parameter on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
      • Region Type I scenario:

        Cascading system: Cascading-ExternalOM-Reverse-Proxy

        Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

      • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  2. Import environment variables.

    For details, see Importing Environment Variables.

  3. Run the following command to obtain the VM ID:

    nova list --all-t|grep vm_name

    NOTE:

    vm_name indicates the name of the faulty VM.

    vm_id in the command output indicates the ID of the faulty VM.

  4. On the controller node which you have logged in to, run the following command to query the VM status:

    nova show vm_id| grep status

    • If the VM status is ERROR, go to 6.
    • If other status is displayed, contact technical support for assistance.

  5. Run the following command to check all change records of the instance and confirm that the error status is caused by cold migration or flavor change:

    nova instance-action-list vm_id

    • If the error status is caused by cold migration, information similar to the following is displayed.

    • If the error status is caused by flavor change, information similar to the following is displayed.

  6. Run the following command and check whether the command output is displayed:

    nova migration-list |grep vm_id |grep finish_resize_failed

    This command is used to obtain migration_id for cold migration and flavor change.

    NOTE:

    migration_id indicates the ID queried by running the nova migration-list command. The value of migration_id queried in 6 is 19.

    • If yes, obtain migration_id and go to 7.
    • If no, contact technical support for assistance.

  7. Run the following command to restore the VM status to ACTIVE:

    nova ext-resize-revert server migration_id

    Wait for 2 minutes and go to 8.

    NOTE:

    In the preceding command, server indicates the VM ID. The value of migration_id is obtained from 6.

    For example, run the following command:

    nova ext-resize-revert 826bae25-51e1-4a05-9076-1cde12f2618a 19

  8. Run the following command to check whether the VM is in the active state:

    nova show vm_id| grep status
    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Handling the Exception Occurred During the Cross-Storage Migration

Scenarios

During cross-storage migration, if a service exception occurs (for example, the Cinder service is abnormal during data copy, or the storage link is abnormal), a cross-storage copy exception occurs. As a result, the VM status changes to LIVE_VOLUME_MIGRATE_FAIL or VOLUME_MIGRATE_FAIL, and the migrated volume status changes to maintenance. In this case, you cannot perform any operation on VMs or migrated volumes in such status.

This section describes how to process such abnormal statuses of VMs and migrated volumes.

Procedure
  1. Use PuTTY to log in to the first FusionSphere OpenStack node through the IP address of the External OM plane.

    The default username is fsp, and the default password is Huawei@CLOUD8.

    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for the required parameter on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
      • Region Type I scenario:

        Cascading system: Cascading-ExternalOM-Reverse-Proxy

        Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

      • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Huawei@CLOUD8!.

  3. Run the TMOUT=0 command to disable user logout upon system timeout.
  4. Import environment variables.

    For details, see Importing Environment Variables.

  1. Run the following command to query the ID of the VM whose status is abnormal:

    nova list --all-t

    +---------------------------------------+------------+----------------------------------+--------------------------+------------+-------------+--------------------------------------------------------------------------------+
    | ID                                    | Name       | Tenant ID                        | Status                   | Task State | Power State | Networks                                                                       |
    +---------------------------------------+------------+----------------------------------+--------------------------+------------+-------------+--------------------------------------------------------------------------------+
    | 4886f988-b332-48e6-85db-6d1d3d3a70f8  | allinonefm | 2d57a7d65b174b0c84e22887f3ce1fd6 | ACTIVE                   | -          | Running     | internal_base=172.28.13.139; external_om=192.168.43.84; external_api=192.168.42.19 |
    | 19b2843c-a2d7-4e93-97ea-ccd6e2af0b5c | vmig_test1 | 2d57a7d65b174b0c84e22887f3ce1fd6 | LIVE_VOLUME_MIGRATE_FAIL | -          | Running     | wyf_net=192.169.1.222                                                          |
    +---------------------------------------+------------+----------------------------------+--------------------------+------------+-------------+--------------------------------------------------------------------------------+
    NOTE:

    In the VM list, locate the VM whose status is LIVE_VOLUME_MIGRATE_FAIL or VOLUME_MIGRATE_FAIL and record its ID as VM_ID.

  2. Run the following command to query the ID of the migrated volume in the maintenance status and record it as Volume_ID:

    cinder list --all-t | grep VM_ID

    | 32eac8b3-db5d-4ec9-92eb-7ae994e940f9 | 2d57a7d65b174b0c84e22887f3ce1fd6 | maintenance |     vmig_vol1      |  1   |   v3-type0  |  false   |    False    | 19b2843c-a2d7-4e93-97ea-ccd6e2af0b5c |
    | 73a7ad25-d0c8-4bdf-af1c-0a02707e4142 | 2d57a7d65b174b0c84e22887f3ce1fd6 |    in-use   |     vmig_sys1      |  1   |   v3-type0  |   true   |    False    | 19b2843c-a2d7-4e93-97ea-ccd6e2af0b5c |

    VM_ID is obtained in 5.

  1. Run the following command to restore the faulty VM and the migrated volume:

    nova ext-volume-migrate-recover VM_ID Volume_ID

    For example, run the following command:

    nova ext-volume-migrate-recover 19b2843c-a2d7-4e93-97ea-ccd6e2af0b5c 32eac8b3-db5d-4ec9-92eb-7ae994e940f9

    NOTE:

    You can run the nova help ext-volume-migrate-recover command to query the VM restoration command.

    After the command takes effect, go to 8.

  1. Run the following command to check whether the VM and related volume are restored:

    nova list --all-t | grep VM_ID

    cinder list --all-t | grep Volume_ID

    NOTE:

    The VM is restored if Status is Active.

    The migrated volume is restored if Status is in-use.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Cross-AZ Cold Migration Exception

Scenarios

During the cold migration of a VM across AZs, if a service exception occurs, for example, the nova-compute service is restarted, the volume status changes to maintenance. In this case, you cannot perform any operation on the VM.

This section describes how to process the volume status in this scenario.

Procedure
  1. Use PuTTY to log in to the first FusionSphere OpenStack node through the IP address of the External OM plane.

    The default account is fsp, and the default password is Huawei@CLOUD8.

    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for the required parameter on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from the HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
      • Cascading system in the Region Type I scenario: Cascading-ExternalOM-Reverse-Proxy; Cascaded system in the Region Type I scenario: Cascaded-ExternalOM-Reverse-Proxy
      • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy
    • In Region Type I scenarios:
      • If the faulty VM exists in the cascading system, log in to the first node in the cascading system.
      • If the faulty VM exists in the cascaded system, log in to the first node in the cascaded system.

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Huawei@CLOUD8!.

  3. Run the TMOUT=0 command to disable user logout upon system timeout.
  4. Import environment variables.

    For details, see Importing Environment Variables.

  5. Run the following command to query the name of the volume attached to the VM:

    cinder list --all-t | grep VM_ID| awk '{print $8}'

    In the preceding command, VM_ID indicates the VM UUID, which can be obtained from Service OM.

    Perform 6 to 7 for each queried volume.

  6. Run the following command to obtain the volume name and ID:

    cinder list --all-t | grep VOLUME_NAME | awk '{print $8" "$2}'

    migrated_test_vol  c2910ef2-2c02-40c0-9e27-bfe37257d83a
    test_vol  ca5ba71d-8394-432b-849b-d8e8031e8b86

    In the preceding command, VOLUME_NAME indicates the volume name obtained in 5.

  7. Run the following commands to restore the volume status and delete residual volumes:

    cinder reset-state --state in-use --reset-migration-status volumd_id1

    cinder reset-state --reset-migration-status --attach-status detached volumd_id2

    cinder delete volumd_id2

    volume_id1 indicates the ID of the volume whose name does not start with migrated_ in 6, and volume_id2 indicates the ID of the volume whose name starts with migrated_ in 6.

  8. Run the following command to check whether the volume status is in-use:

    cinder list --all-t | grep VM_ID

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Failure to Log In to a VM When Its System Volume Becomes Faulty

Scenarios

If the system volume of a VM becomes faulty, the VM may fail to be logged in to. You can perform the steps provided in this section to restore the VM.

Procedure

If the VM boots from an image, run the rescue command to restore the VM. If the VM boots from a volume, use another VM to restore the VM. Run the nova show VM_ID command to query whether the VM boots from an image or a volume.

Query the VM boot mode.

  1. Use PuTTY to log in to the first FusionSphere OpenStack node through the IP address of the External OM plane.

    The default username is fsp, and the default password is Huawei@CLOUD8.

    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for the required parameter on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
      • Region Type I scenario:

        Cascading system: Cascading-ExternalOM-Reverse-Proxy

        Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

      • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Huawei@CLOUD8!.

  3. Run the TMOUT=0 command to disable user logout upon system timeout.
  4. Import environment variables.

    For details, see Importing Environment Variables.

  5. Determine the VM boot mode.

    1. Run the following command to enter the secure operation mode:

      runsafe

      Information similar to the following is displayed:

      Input command:
    2. Run the following command to query the ID of the rescued VM:

      nova list --all-t

      Information similar to the following is displayed.

    3. Run the following command to query the VM attributes:

      nova show VM_ID

      VM_ID indicates the ID of the rescued VM.

      • The VM boots from the image if information similar to the following is displayed (d0bd0551-07f2-45f6-8516-f481e0152715 specifies the image ID). In this case, perform 6 to 10 to restore the VM.
        | image | cirros (d0bd0551-07f2-45f6-8516-f481e0152715)|
        The VM boots from the volume if information similar to the following is displayed. In this case, perform 11 to 19 to restore the VM.
        | image | Attempt to boot from volume - no image supplied|

Rescue the VM booting from an image.

  1. Upload the rescue image and register the image. For details, see "Registering an Image" in the HUAWEI CLOUD Stack 6.5.0 O&M Guide.

    NOTE:
    • The image file must be able to guide the VM to properly boot and must meet the rescue requirement.
    • The image file should use a name that is easy to identify.

  2. Set the VM to the rescue mode.

    You can either specify an image or not when you set the VM to the rescue mode.
    • If you choose not to specify an image:
      1. Run the nova rescue VM_ID command to set the VM to enter the rescue mode.

        VM_ID indicates the ID of the rescued VM.

      2. Run the nova list --all-t command to check whether the status of the rescued VM is changed to RESCUE.

    • If you choose to specify an image:
      1. Run the glance image-list command to query the available images and image IDs.

        The image whose Name is cirros is the rescue image registered in 6.

      2. Run the nova rescue --image IMAGE_ID VM_ID command to set the VM to enter the rescue mode.

        VM_ID indicates the ID of the rescued VM. IMAGE_ID indicates the image ID obtained in 7.a. You can select a Linux image or a Windows image. The following uses a Linux image as an example.

      3. Run the nova list --all-t command to check whether the status of the rescued VM is changed to RESCUE.

  3. Run the following command to obtain the VM URL and log in to the VM using this URL:

    nova get-vnc-console VM_ID novnc

    VM_ID indicates the ID of the rescued VM.

    Information similar to the following is displayed.

    Replace nova-novncproxy.az1.dc1.domainname.com in the URL in the command output with the reverse proxy IP address of the FusionSphere OpenStack system, such as https://192.168.11.11:8002/vnc_auto.html?token=be540cf4-0185-4992-bafc-60bf6db48191&lang=EN.

    Use the new URL to log in to the rescued VM.

  4. Enter the username and password for logging in to the rescued VM and run the command to restore the faulty system volume.

    • The system disk of the VM is vda before the VM enters the rescue mode, vda/sda for Linux VMs and C for Windows VMs. The mount point is determined based on site requirements.

    • After the VM enters the rescue mode, the drive letter of the VM's system disk is changed to the immediate next letter.

    NOTE:

    The rescue interface is designed for restoring the system volume. Only the system volume is displayed after the VM enters the rescue mode. If the data volume of the rescued VM also becomes faulty, first restore the system volume. Then, restore the data volume after the VM starts properly.

  5. Run the nova unrescue VM_ID command to restore the VM status to ACTIVE.

    VM_ID indicates the ID of the rescued VM.

    If you can log in to the VM and start and stop the VM properly, the VM is restored.

Rescue the VM booting from a volume.

  1. Log in to the FusionSphere OpenStack web client in tenant view. Choose Resources > Computing > Compute Instances > VMs and locate the VM to be rescued.
  2. On the VMs page, click Create to create a rescue VM.

    Parameter settings of the rescue VM, including the AZ, flavor, and image, must be the same as those of the rescued VM.

  3. Locate the row that contains the rescued VM and choose More > Stop.
  4. Mount the system volume of the rescued VM to the rescue VM.

    1. Perform 1 to 4 to log in to the first host in the AZ and import environment variables based on Importing Environment Variables.
    2. Run the nova list --all-t command to check whether the status of the rescued VM is changed to SHUTOFF. Record the IDs of the rescued VM and rescue VM.

    3. Run the nova show VM_ID command to query the ID and device name of the system volume for the rescued VM.

      VM_ID indicates the ID of the rescued VM.

    4. Run the nova volume-detach VM_ID VOLUME_ID command to detach the system volume from the rescued VM.

      VM_ID indicates the ID of the rescued VM. VOLUME_ID indicates the system volume ID of the rescued VM.

    5. Run the nova volume-attachments VM_ID command to check whether the system volume is dismounted from the rescued VM. The command output shows the system volume has been dismounted from the target volume.

      VM_ID indicates the ID of the rescued VM.

    6. Run the nova volume-attach VM_ID VOLUME_ID command to attach the system volume of the rescued VM to the rescue VM.

      VM_ID indicates the ID of the rescue VM. VOLUME_ID indicates the system volume ID of the rescued VM.

    7. Run the nova volume-attachments VM_ID command to check whether the system volume is mounted to the rescue VM. The command output shows the system volume has been mounted to the rescue VM.

      VM_ID indicates the ID of the rescue VM.

  5. Run the following command to obtain the VM URL and log in to the rescue VM using the URL:

    nova get-vnc-console VM_ID novnc

    VM_ID indicates the ID of the rescue VM.

    Information similar to the following is displayed.

    Replace nova-novncproxy.az1.dc1.domainname.com in the URL in the command output with the reverse proxy IP address of the FusionSphere OpenStack system, such as https://192.168.11.11:8002/vnc_auto.html?token=7c640a6b-e23a-4885-a651-e6df097f12eb&lang=EN.

    Use the new URL to log in to the rescued VM.

  6. Perform operations to repair the corrupted file system based on site requirements.
  7. After the fault is rectified, log in to the FusionSphere OpenStack web client and switch to the VMs page. Locate the row of the rescue VM and choose More > Stop.
  8. Log in to the FusionSphere OpenStack host, detach the restored system volume from the rescue VM and attach it to the rescued VM.

    1. Run the nova volume-detach VM_ID VOLUME_ID command to detach the system volume which is successfully restored from the rescue VM.

      VM_ID indicates the ID of the rescue VM. VOLUME_ID indicates the system volume ID of the rescued VM.

    2. Run the nova volume-attachments VM_ID command to check whether the system volume which is successfully restored is detached from the rescue VM.

      VM_ID indicates the ID of the rescue VM.

    3. Run the nova volume-attach VM_ID VOLUME_ID VOLUME_NAME command to attach the restored system volume to the rescued VM.

      VM_ID indicates the ID of the rescued VM. VOLUME_ID indicates the system volume ID of the rescued VM. VOLUME_NAME indicates the name of the system volume.

    4. Run the nova volume-attachments VM_ID command to check whether the volume is attached to the rescued VM.

      VM_ID indicates the ID of the rescued VM.

  9. Log in to the FusionSphere OpenStack web client and switch to the VMs page. Locate the row of the rescued VM and click Start.

    Check the status of the rescued VM. If its status is changed to Running from Stopped, the VM is restored.

A Fault Occurs If the System Disk of a VM Is Detached

Symptom

After the system disk is detached from the VM, the VM becomes faulty, and no new system disk can be attached to the VM.

Possible Causes
  • The host is powered off or fails.
  • The VM status is abnormal.
Procedure
  1. Use PuTTY to log in to the first FusionSphere OpenStack node through the IP address of the External OM plane.

    The default username is fsp, and the default password is Huawei@CLOUD8.

    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for the required parameter on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
      • Region Type I scenario:

        Cascading system: Cascading-ExternalOM-Reverse-Proxy

        Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

      • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Huawei@CLOUD8!.

  3. Run the TMOUT=0 command to disable user logout upon system timeout.
  4. Import environment variables. For details, see Importing Environment Variables.
  5. Run the following command to query the host accommodating the faulty VM:

    nova show Faulty VM ID | grep host

    The faulty VM ID can be obtained from FusionSphere OpenStack Management Console.

    Record the value of OS-EXT-SRV-ATTR:host in the command output. This value is the ID of the host accommodating the faulty VM.

  6. Run the following command to check whether the host is running properly:

    cps host-list | grep Host ID

    The host ID is the value obtained in the preceding command.

    In the command output, the normal value indicates that the host is running properly.

    • If yes, go to 9.
    • If no, go to 7.

      If the host status is abnormal, record the internal management IP address of the host.

  7. Perform either of the following operations to restore the host:

    1. Run the su - fsp command to switch to the fsp user, and run the ssh fsp@Host IP address command to log in to the host as the fsp user and then switch to the root user.

      Enter the private key password as prompted. The default password is Huawei@CLOUD8!. If you have successfully replaced the public and private key files, enter the new private key password. Alternatively, press Enter and enter the password of the fsp user.

    2. Run the reboot command to restart the host.

      If the connection to the host cannot be set up due to host faults, use another method, such as the host baseboard management controller (BMC), to restart the host.

      You can query the host status by performing 6. The host has been restored if its status changes to normal within 10 minutes.

  8. Check whether the host is restored.

    • If yes, go to 9.
    • If no, go to 13 to rebuild the faulty VM.
    NOTE:

    This rebuilding method will change the VM ID and network information. In addition, if the VM uses local disks, the disk information will be lost after VM rebuilding.

  9. Run the following command to check the VM status:

    nova show Faulty VM ID | grep status

    • If the VM is in the SHUTOFF status, go to 12.
    • If the VM is in another status, go to 10.

  10. Run the following command to stop the VM:

    nova stop Faulty VM ID

  11. Perform 9 to check the VM status again.

    • If the VM is in the SHUTOFF status, go to 12.
    • If the VM is in another status, contact technical support for assistance.

  12. Attach a new system disk to the VM.

    • If the attachment is successful, no further action is required.
    • If the attachment fails, contact technical support for assistance.

  13. Run the following command to check the VM status and take a note of the VM attributes:

    nova show Faulty VM ID

    Table 8-1 lists the VM attributes.

    Table 8-1 VM attributes

    Attribute

    Description

    Example Value

    os-extended-volumes:volumes_attached

    Specifies the ID of the disk attached to the VM. Multiple disks can be attached to a VM.

    8458dbff-1acd-4445-a3ea-751b6c4a8d80

    tenant_id

    Specifies the ID of the tenant who owns the VM.

    bca6f4e8b2034d3eb93e7c94e897d619

    user_id

    Specifies the ID of the user who creates the VM.

    3cf46a44f05642149c1c9273913429cc

    config_drive

    Specifies whether to use the config_drive disk for file injection.

    The value can be True or left blank.

    flavor

    Specifies the flavor used by the VM.

    m1.tiny (1)

    metadata

    Specifies whether to designate the metadata.

    {}

    name

    VM Name

    Test01

    security_groups

    Security group information

    default

    tags

    Specifies the tag information.

    []

  14. Run the following command to import environment variables based on the tenant ID:

    export tenant_id=tenant_id

  15. Use the visual interface (vi) editor to create file d.json and write the following content into the file:

    { 
        "auth": { 
            "identity": { 
                "methods": [ 
                    "password" 
                ], 
                "password": { 
                    "user": { 
                        "domain": { 
                            "name": "vdc_name" 
                        }, 
                        "name": "username", 
                        "password": "password" 
                    } 
                } 
            }, 
            "scope": { 
                "project": { 
                    "domain": { 
                        "name": "vdc_name" 
                    }, 
                    "name": "vpc_name" 
                } 
            } 
        } 
    }     

    In the preceding content,

    • username: Enter the username based on the user_id of the VM. To obtain the username, log in to FusionSphere OpenStack Management Console as the cloud_admin user, choose System > User Management, and locate the user who has the same ID as the user_id of the VM in the ID column.
    • password: Enter the password of the user.
    • vdc_name: Contact the administrator to check whether the username belongs to a VDC by using the service provisioning tool in use. If the username belongs to a VDC, enter the VDC name. If it does not belong to any VDC, enter Default.
    • vpc_name: Contact the administrator to check the ID of each VPC in the service provisioning tool and find the VPC whose tenant_id is the same as that of the VM. Then, enter the VPC name. If no compliant VPC is found, in the FusionSphere OpenStack system you have logged in, run the openstack project list | grep tenant_id command to query the VPC. In the command output, the second field displays the target VPC name.
    NOTE:

    If username, vdc_name, and vpc_name cannot be queried, contact technical support for assistance.

  16. Run the following command to import the token environment variables:

    export TOKEN=$(curl -ki -d @d.json -H "Content-type: application/json" https://identity.localdomain.com:8023/identity/v3/auth/tokens | awk '/X-Subject-Token/ {print $2}' | tr -d '\r')

    NOTE:

    To ensure account security, delete the d.json file created in 15 after this step is complete.

  17. Run the following command to query VM network information and record the network ID:

    nova interface-list VM ID

    Information similar to the following is displayed:

     
    +------------+--------------------------------------+--------------------------------------+--------------+-------------------+  
    | Port State | Port ID                              | Net ID                               | IP addresses | MAC Addr          |  
    +------------+--------------------------------------+--------------------------------------+--------------+-------------------+  
    | ACTIVE     | 60b5961f-8afb-4735-9f3c-368d414857d2 | 04c20a58-b01f-4d18-b3e9-6c49ce78a22c | 192.168.211.6| fa:16:3e:56:d3:71 |  
    +------------+--------------------------------------+--------------------------------------+--------------+-------------------+

    If the VM uses multiple networks, record all network IDs.

  18. Run the following command to create a port on the network used by the faulty VM and record the port ID:

    curl -i --insecure -X POST https://network.localdomain.com:8020/v2.0/ports.json -H "User-Agent: python-neutronclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '{"port":{"network_id": "Net_ID", "tenant_id": "'tenant_id'", "binding:vnic_type": "vNIC_type", "name": "port_name", "admin_state_up": true}}'

    In this command, the value of vNIC_type can be direct (PCI passthrough) or normal (EVS or OVS).

    Example:

    curl -i --insecure -X POST https://network.localdomain.com:8020/v2.0/ports.json -H "User-Agent: python-neutronclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '{"port": {"network_id": "04c20a58-b01f-4d18-b3e9-6c49ce78a22c", "tenant_id": "'$tenant_id'", "binding:vnic_type": "normal", "name": "lt-test", "admin_state_up": true}}'

    Record the port ID, for example, 3f7ebd45-9a96-474c-88e6-5e3bf6e018cc.

  19. Run the following command to create a snapshot for the disk on the faulty VM:

    curl -i --insecure -X POST https://volume.localdomain.com:8776/v2/${tenant_id}/snapshots -H "User-Agent: python-cinderclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '{"snapshot": {"description": description, "metadata": {metadata}, "force": "True", "name": "snapshot_name", "volume_id": "volume_id"}}'

    Example:

    curl -i --insecure -X POST https://volume.localdomain.com:8776/v2/${tenant_id}/snapshots -H "User-Agent: python-cinderclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '{"snapshot": {"description": null, "metadata": {}, "force": "True", "name": "data_snap01", "volume_id": "8458dbff-1acd-4445-a3ea-751b6c4a8d80"}}'

    Record the snapshot ID, for example, 1e5b7681-faa1-4de0-9d80-ee3e2a606eeb.

    Run the following command repeatedly to query the snapshot status:

    cinder snapshot-show Snapshot ID

    The snapshot is successfully created if its status changes to available. If the VM has multiple disks, create a snapshot for each disk.

  20. Run the following command to create a disk using the snapshot of the original disk:

    curl -i --insecure -X POST https://volume.localdomain.com:8776/v2/${tenant_id}/volumes -H "User-Agent: python-cinderclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '{"volume": {"name": "new_volume_name", "availability_zone": AZname,"metadata": {metadata},"snapshot_id": "snapshot_id"}}'

    In this command, AZname specifies the AZ to which the disk belongs. You can run the cinder show Disk ID command to query the AZ of the original VM disk. The AZ of the new disk must be the same as that of the original disk.

    Example:

    curl -i --insecure -X POST https://volume.localdomain.com:8776/v2/${tenant_id}/volumes -H "User-Agent: python-cinderclient" -H "Content-Type: application/json" -H "Accept: application/json" -H "X-Auth-Token: $TOKEN" -d '{"volume": {"name": "data01_volume", "availability_zone": null,"metadata": {},"snapshot_id": "1e5b7681-faa1-4de0-9d80-ee3e2a606eeb"}}'

    Record the ID of the new disk, for example, 2ffa5677-0a13-4ea6-bcdc-1575b40780a8.

    If multiple snapshots have been created, create a disk using each of them.

  21. In the service provisioning system, create a VM.

    The disks attached to the VM must be virtual disks, and the attributes and network information of the new VM must be the same as those of the faulty VM.

    If the VM creation fails in the service provisioning system, run the following command:

    curl -i --insecure 'https://compute.localdomain.com:8001/v2/'$tenant_id'/os-volumes_boot' -X POST -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-novaclient" -H "X-Auth-Token: $TOKEN" -d '"server": {"name": "vm_name", "imageRef": "", "block_device_mapping_v2": [{"source_type": "source_type", "destination_type": "destination_type", "boot_index": "boot_index", "uuid": "source_id", "volume_size": "volume_size"}], "flavorRef": "flavorRef", "max_count": max_count, "min_count": min_count, "networks": [{"port": "port_id"}], "config_drive": false}}'

    Example:

    curl -i --insecure 'https://compute.localdomain.com:8001/v2/'$tenant_id'/os-volumes_boot' -X POST -H "Accept: application/json" -H "Content-Type: application/json" -H "User-Agent: python-novaclient" -H "X-Auth-Token: $TOKEN" -d '"server": {"name": "new_vm_02", "imageRef": "", "block_device_mapping_v2": [{"source_type": "image", "destination_type": "volume", "boot_index": "0", "uuid": "5f6a4b1d-1815-4c33-9c50-710fac909cc0", "volume_size": "1"}], "flavorRef": "1", "max_count": 1, "min_count": 1, "networks": [{"port": "3f7ebd45-9a96-474c-88e6-5e3bf6e018cc"}], "config_drive": false}}'

    After the VM is created, record the VM ID. Run the following command repeatedly to query the VM status:

    nova show VM ID

  22. The VM is successfully created if its status changes to active. In the service provisioning system, attach the disks created in 20 to the new VM.

    If the disk attaching fails in the service provisioning system, perform the following operations:

    1. Run the following command to import the environment variables of the VM:

      export uuid=vm_id

    2. Run the following command to attach disks to the VM:

      curl -i --insecure "https://compute.localdomain.com:8001/v2/$TENANT_ID/servers/${uuid}/os-volume_attachments" -X POST -H "X-Auth-Project-Id: service" -H "X-Auth-Token: $TOKEN_ID" -H "Content-Type: application/json" -H "Accept: application/json" -H "User-Agent: python-novaclient" -d '{"volumeAttachment": {"volumeId": "volumeId"}}'

      Example:

      curl -i --insecure "https://compute.localdomain.com:8001/v2/$TENANT_ID/servers/${uuid}/os-volume_attachments" -X POST -H "X-Auth-Project-Id: service" -H "X-Auth-Token: $TOKEN_ID" -H "Content-Type: application/json" -H "Accept: application/json" -H "User-Agent: python-novaclient" -d '{"volumeAttachment": {"volumeId": "3f28cb04-c9f9-4ae4-b350-6dd6da80f41d"}}'

  23. Delete the faulty VM after the new VM runs properly.

    You can also delete the disks as well as the created disk snapshots if they are no longer used.

Handling the Issue that the OS of a Controller Node VM in a Cascaded FusionSphere OpenStack System Is Faulty (Region Type I)

If the OS of the VM in the cascaded system is faulty, you need to reinstall the OS for the VM.

Symptom

The command output of cps host-show host_id shows that the host status is fault.

Prerequisites
  • You have obtained the ID of the faulty controller node in the cascaded system. This section uses 55c6f891-ae56-4cd5-b348-287d3a433ffc as an example.
  • You have logged in to the cascading and cascaded FusionSphere OpenStack web clients.
Procedure
  1. On the web client of the cascading FusionSphere OpenStack system that you have logged in to, click Virtual Deploy to switch to the Environment Management page.

  2. Click the name of the virtualization deployment environment of the cascaded FusionSphere OpenStack system and query names of components with the type of CascadedVM.

  3. Use PuTTY to log in to the first node of the cascading FusionSphere OpenStack system through the IP address of the External OM plane.

    The default username is fsp, and the default password is Huawei@CLOUD8.

    NOTE:
    • The system supports identity authentication using both the passwords and public-private key pairs. For details about login authentication using the public-private key pairs, see Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of the External OM plane, search for Cascading-ExternalOM-Reverse-Proxy on the 2.1 Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation.

  4. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Huawei@CLOUD8!.

  5. Run the TMOUT=0 command to disable user logout upon system timeout.
  6. Import environment variables.

    For details, see Importing Environment Variables.

  7. Run the following command to query the ID of the faulty controller node VM in the cascaded system:

    nova list |grep cascaded_vm

    cascaded_vm indicates the component name queried in 2.

    Use cascade_vm_az1_dc1_0 as an example and run the following command:

    nova list |grep cascade_vm_az1_dc1_0

    NOTE:

    55c6f891-ae56-4cd5-b348-287d3a433ffc indicates the ID of the faulty controller node VM in the cascaded system.

  8. Run the following command to obtain the VNC URL for the faulty controller node VM in the cascaded system:

    nova get-vnc-console 55c6f891-ae56-4cd5-b348-287d3a433ffc novnc

    +-------+-----------------------------------------------------------------------------------------------------------------------+
    | Type  | Url                                                                                                                   |
    +-------+-----------------------------------------------------------------------------------------------------------------------+
    | novnc | https://nova-novncproxy.az1.dc1.domainname.com:8002/vnc_auto.html?token=559e04af-8036-4649-9cd2-04941ac64ba2&lang=EN  |
    +-------+-----------------------------------------------------------------------------------------------------------------------+

    55c6f891-ae56-4cd5-b348-287d3a433ffc indicates the VM ID queried in 7.

  9. Replace nova-novncproxy.az1.dc1.domainname.com in the Url column in the command output with the reverse proxy IP address of FusionSphere OpenStack. For example, https://192.168.55.21:8002/vnc_auto.html?token=559e04af-8036-4649-9cd2-04941ac64ba2&lang=EN
  10. Use the replaced URL to log in to the faulty VM through any browser.
  11. Run the following command to change the startup mode of the faulty VM to PXE:

    nova meta 55c6f891-ae56-4cd5-b348-287d3a433ffc set __bootDev=network,hd

  12. On the web client of the cascaded FusionSphere OpenStack system that you have logged in to, choose O&M > Capacity Expansion. On the Capacity Expansion page, enable PXE Boot Hosts.
  13. Run the following command to restart the faulty VM:

    nova reboot 55c6f891-ae56-4cd5-b348-287d3a433ffc

  14. Log in to the faulty VM using the VNC URL of the VM queried in 8.
  15. Check the VNC window. If the following interface is displayed, the VM is PXE-booted.

    If the VM fails to be booted from PXE, contact technical support for assistance.

  16. Run the following command to change the startup mode of the faulty VM to volume:

    nova meta 55c6f891-ae56-4cd5-b348-287d3a433ffc set __bootDev=hd,network

  17. After the faulty host is restored, push agents, including Zabbix agent and eSight agent, to the new controller node VM in the cascaded FusionSphere OpenStack system.
Translation
Download
Updated: 2019-06-01

Document ID: EDOC1100062375

Views: 1340

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next