No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionCloud 6.3.1.1 Troubleshooting Guide 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Management Node Faults

Management Node Faults

Both the Active and Standby Service OM VMs Faulty

Scenarios

When two Service OM VMs are deployed in active/standby mode and both the active and standby Service OM VMs become faulty and cannot be restored by restart, perform the operations provided in this section to rectify faults and quickly restore services.

NOTE:

Before restoring the active and standby Service OM VMs, obtain backup files from the backup path and copy the backup files to the local PC.

For details, see "Service OM" in the FusionCloud 6.3.1.1 Backup and Restoration Guide.

Impact on the System
  • The system cannot process new services, such as VM creation.
  • Operation and maintenance services, such as alarm monitoring and service configuration, are unavailable.
Procedure

Check whether the image partition of the host uses the remote storage.

The Service OM VMs exist in the image partition of the host, and the image partition uses the remote storage. After the storage is interrupted or the storage is recovered, the image mounting is abnormal. As a result, the Service OM VMs cannot be started.

  1. On the FusionSphere OpenStack web client, choose Configuration > Disk.
  2. In the Expand Storage Capacity area, check whether the image disk partition uses the remote storage.

    • If it is, go to 3.
    • If no, go to 8.

  3. Check whether alarm ALM-6026 Faulty Fiber Channels on the Host is displayed on the host accommodating the Service OM VMs.

    • If yes, clear the alarm based on the alarm help and go to 4.
    • If no, go to 4.

  4. Log in to the host where the Service OM VMs are located as user root.
  5. Run the following command to check whether the image partition exists:

    mount | grep '/opt/HUAWEI/image '

    If the following information is displayed, the image partition exists:

    /dev/mapper/extend_vg-image on /opt/HUAWEI/image type ext3 (rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered)
    • If yes, go to 6.
    • If no, go to 7.

  6. Run the following command to check whether the image partition is read-only:

    mount | grep '/opt/HUAWEI/image '

    /dev/mapper/extend_vg-image on /opt/HUAWEI/image type ext3 (rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered)

    If the following information is displayed, the image partition is read-only.

    /dev/mapper/extend_vg-image on /opt/HUAWEI/image type ext3 (ro,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered)
    • If yes, go to 7.
    • If no, contact technical support for assistance.

  7. Migrate service VMs on the host and restart the host to restore the image partition.

    No further action is required.

Reset the status of the faulty VMs.

  1. Use PuTTY to log in to the controller node through the reverse proxy IP address of the External OM plane.

    NOTE:
    • The system supports login authentication using a password or private-public key pair. If a private-public key pair is used for login authentication, seeUsing PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • For details about the IP address of the External OM plane, see the LLD generated by FCD sheet of the xxx_export_all.xlsm file exported from FusionCloud Deploy during software installation, and search for the IP addresses corresponding to VMs and nodes.The parameter names in different scenarios are as follows:
      • Cascading layer in the Region Type I scenario : Cascading-ExternalOM-Reverse-Proxy, Cascaded layer : Cascaded-ExternalOM-Reverse-Proxy.
      • Region Type II and Type III scenarios : ExternalOM-Reverse-Proxy.

    The default account is fsp and the default password is Huawei@CLOUD8.

  2. Run the following command to switch to user root and enter the password of user root as prompted:

    su - root

    The default password of user root is Huawei@CLOUD8!.

  3. Run the TMOUT=0 command to disable user logout upon system timeout.
  4. Import environment variables. For details, see Importing Environment Variables.
  5. Run the following command to set the status of the faulty VM to error:

    nova reset-state uuid

  6. Run the following commands to stop the VMs and check the VM status respectively:

    nova stop uuid

    nova show uuid | grep OS-EXT-STS:vm_state

    uuid in these commands indicates the VM universally unique identifier (UUID).

  7. When the VM status changes to stopped, run the following commands to start the VMs and check the VM status, respectively:

    nova start uuid

    nova show uuid | grep OS-EXT-STS:vm_state

  8. Check whether the VM status is active.

    • If yes, go to 16.
    • If no, go to 38.

Log in to the faulty VM.
NOTE:

In the Region Type II scenario, if the Service OM VM is created on the VMware hypervisor, use the VM console on vSphere Client to log in to the faulty VM.

  1. Log in to the FusionSphere OpenStack web client.

    For details, see Logging In to the FusionSphere OpenStack Web Client (ManageOne Mode).

  2. Choose Cloud Service > FusionSphere OpenStack OM.
  3. Locate the row that contains the faulty Service OM VM and click to log in to the Service OM VM.

    Log in to the OS as the galaxmanager user (the default password is IaaS@OS-CLOUD9!).

  4. Run the following command to switch to user root, and enter the password of user root as prompted:

    su - root

    The default password of user root is IaaS@OS-CLOUD8!.

  5. Run the TMOUT=0 command to disable user logout upon system timeout.
  6. Check whether the login is successful.

    • If yes, go to 22.
    • If no, the VM is faulty. Reinstall this VM by following operations provided in 38.

  7. Run the following command to check whether the /opt/goku/data/db usage exceeds 98%:

    df -h

    Filesystem            Size  Used Avail Use% Mounted on 
    /dev/sda1             6.0G 1017M  4.7G  18% / 
                         ...
    /dev/sda13             36G  177M   34G   1% /opt/goku/data 
    /dev/sda11             15G  408M   14G   3% /opt/goku/data/db 
    /dev/sda2             7.9G  147M  7.4G   2% /sysback 
                         ...
    /dev/sda6             9.9G  239M  9.2G   3% /var/log/goku
    • If yes, contact technical support for assistance.
    • If no, go to 23.

Enable the high availability (HA) service.

  1. Run the following command to check the HA status of the node:

    QueryHaState

    Information similar to the following is displayed:

    LOCAL_HOST=FMN01 
    LOCAL_STATE=unknow 
    LOCAL_IP=192.168.0.79 
    REMOTE_HOST=FMN02 
    REMOTE_STATE=unknow 
    REMOTE_IP=192.168.0.80

  2. Check whether the LOCAL_STATE value in the command output is unknow.

    • If yes, go to 25.
    • If no, go to 32.

  3. Run the following command to enable the HA service:

    startALL

  4. Check whether a heartbeat IP address conflict occurs.

    A heartbeat IP address conflict occurs if information similar to the following is displayed:

    The heartbeat IP address {IP address} of the system conflicts with another IP address.
    • If yes, go to 27.
    • If no, go to 30.

  5. Check whether multiple Service OM systems are configured on the network and their management IP addresses are the same.

    • If yes, go to 28.
    • If no, go to 29.

  6. Stop services of other Service OM systems or power off other Service OM hosts.
  7. Press Enter to continue starting the HA service.
  8. One minute after the preceding step, run the following command to query the HA status of the node:

    QueryHaState

  9. Check whether the LOCAL_STATE value in the command output is still unknow.

    • If yes, contact technical support for assistance.
    • If no, go to 32.

  10. Restore the HA service of the standby Service OM VM.

    For details, see 16 to 31.

  11. Use PuTTY to log in to the active Service OM VM.

    Ensure that the floating IP address of the External API network and the galaxmanager user are used to establish the connection.

    The default password of the galaxmanager user is IaaS@OS-CLOUD9!.

    NOTE:
    For details about Management IP address of the FusionSphere Service OM node , see the LLD generated by FCD sheet of the xxx_export_all.xlsm file exported from FusionCloud Deploy during software installation, and search for the IP addresses corresponding to VMs and nodes.The parameter names in different scenarios are as follows:
    • Region Type I scenario : Cascading-OM-externalOM-Float-IP.
    • Region Type II and Type III scenarios : OM-externalAPI-Float-IP.

  12. Run the following command to switch to user root, and enter the password of user root as prompted:

    su - root

    The default password of user root is IaaS@OS-CLOUD8!.

  13. Run the TMOUT=0 command to disable user logout upon system timeout.
  14. Ten minutes after the preceding step, run the following command to check the Service OM status:

    galaxmanager status

  15. Check whether the active and standby Service OM VMs are both in the normal state.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Delete the active and standby Service OM VMs.

  1. Log in to the FusionSphere OpenStack web client over the External API network.

    The login address format is https://FusionSphere OpenStack reverse proxy IP address:8890, for example, https://192.168.211.90:8890.

  2. Choose Cloud Service > FusionSphere OpenStack OM.
  3. Locate the row that contains the Service OM VM and click to delete the Service OM VMs.

Reinstall the faulty Service OM VMs.

  1. On the FusionSphere OpenStack web client, upload the Service OM software package again and install the faulty Service OM VMs. Wait till the uploading is complete.
  2. Back up and replace the FusionSphere OpenStack OM configuration files. For details, see Replacing FusionSphere OpenStack OM Configuration Files in "Related Tasks".
  3. Install the faulty Service node. For details, see "Installing and maintenancing Adaptation Package" in the FusionCloud 6.3.1.1 Backup and Restoration Guide.

    NOTE:
    • The reinstalled Service OM version must be the same as the original version before the fault occurred. If the versions before and after reinstallation are different, reinstall the Service OM software of the correct version or upgrade the reinstalled version to that before reinstallation, and then restore the system data.
    • In the Region Type II scenario, if Service OM is deployed in a VMware hypervisor, install Service OM by following the instructions provided in "Service OM". Service installation also includes Keystone and alarm configuration.
    • In the Region Type I scenario, when Cloud-init is used, if you plan to deploy the Service OM on a compute node in the cascaded system, make the following configurations:
      1. Select a compute node and run the following command to query cluster information of the node:

        cps cluster-list

        In the command output, the controller node cluster name is manage_cluster and the compute node (for example, host1) cluster name is compute_cluster. If the compute node is not added to any cluster, skip 43.b and perform 43.c.

      2. Run the following command to remove the compute node from the original cluster:

        cps cluster-update --name compute_cluster  host1

      3. Run the following commands to add the compute node to the management cluster:

        cps cluster-update --name manage_cluster  host1

        cps commit

        After the compute node is added to the management cluster, VMs running on the compute node cannot use Cloud-init functions.

  4. Configure Service OM to interconnect with FusionSphere OpenStack. For details, see "Adding Resources to FusionSphere OpenStack OM".
Restore the faulty Service OM VM data.
NOTE:

In the Region Type II scenario, if Service OM is deployed in a VMware hypervisor environment, restore the data by following the instructions provided in "Restoring Service OM" in the FusionCloud 6.3.1.1 Backup and Restoration Guide.

  1. Log in to the FusionSphere OpenStack web client, choose Cloud Service > FusionSphere OpenStack OM, and query the External API network IP addresses of the active and standby Service OM VMs.
  2. Use WinSCP to log in to the active and standby Service OM VMs through the IP address of the External API network.

    Log in to the OS as the galaxmanager user (the default password is IaaS@OS-CLOUD9!).

  3. Run the following command to switch to user root, and enter the password of user root as prompted:

    su - root

    The default password of user root is IaaS@OS-CLOUD8!.

  4. Run the TMOUT=0 command to disable user logout upon system timeout.
  5. Copy the backup files to the following paths on the active and standby Service OM VMs:

    • Copy the files that are manually backed up to /opt/gmbackup/db/manualbk/.
    • Copy the files that are automatically backed up everyday to /opt/gmbackup/db/.
    • Copy the files that are automatically backed up monthly to /opt/gmbackup/db/autobakm/.

  6. Log in to the active and standby Service OM VMs using VNC.

    Choose Cloud Service > FusionSphere OpenStack OM, locate the row that contains the Service OM VM, and click .

    On the VM login page, enter username galaxmanager and password as instructed.

    The default password of the galaxmanager user is IaaS@OS-CLOUD9!.

  7. Run the following command to switch to user root, and enter the password of user root as prompted:

    su - root

    The default password of user root is IaaS@OS-CLOUD8!.

  8. Run the TMOUT=0 command to disable user logout upon system timeout.
  9. On the active Service OM VM, run the following command to disable active/standby switchover:

    ha_switch -f MODIFY_DB 86400

  10. On the standby Service OM VM, run the following command to stop all processes:

    stopALL

    The processes are stopped if information similar to the following is displayed:

    Stop all services success.

    On the active Service OM VM, run the following command to stop all processes:

    stopALL

    The processes are stopped if information similar to the following is displayed:

    Stop all services success.

  11. On the active Service OM VM, run the following command to restore the data on the active VM:

    restoreFusionManager -f Directory that stores the backup file+Backup file name

    The backup file name is in the gmdb-YYYY-MM-DD-sn.dump format. YYYY-MM-DD indicates the backup date, and sn indicates the serial number.

    For example: restoreFusionManager -f /opt/gmbackup/db/manualbk/gmdb-2014-12-02-5.dump

    The data is restored if the following information is displayed:

    [INFO ] Check conflict backup or restore task... 
    ... 
    [INFO ] Restore FusionManager ok

  12. On the active Service OM VM, run the following command to configure the HA function:

    initGmn4Restore

    The configuration is successful if the following information is displayed:

    [INFO ] configure in HA mode 
    [INFO ] check configuration success 
    [INFO ] configure ip success 
    [INFO ] start configure ha, it will take about 1~2 minuters 
    [INFO ] configure ha success 
    [INFO ] init successful 
    [INFO ] init for restore successful

  13. Restore the data and configure the HA function on the standby Service OM VM by referring to 55 through 56.
  14. Configure the network information of both active and standby Service OM nodes.

    On the active Service OM node, run the following command:

    modConfig initNet

    The command is successfully executed if the following information is displayed:

    init finished.

    After this command is successfully executed on the active node, run the following command on the standby node:

    modConfig initNet

    The command is successfully executed if the following information is displayed:

    init finished.

  15. After about 5 minutes, run the following command on the active Service OM node to query the states of both active and standby Service OM nodes:

    QueryHaState

    Information similar to the following is displayed:

    LOCAL_HOST=FMN01 
    LOCAL_STATE=active 
    LOCAL_IP=192.168.0.79 
    REMOTE_HOST=FMN02 
    REMOTE_STATE=standby 
    REMOTE_IP=192.168.0.80     

    If the states of the Service OM nodes are active and standby, the states are normal.

  16. On the active Service OM node, run the following command to enable active/standby switchover:

    ha_switch -c MODIFY_DB

    NOTE:

    In the Region Type I scenario, when Cloud-init is used, if you plan to deploy the Service OM on a compute node at the cascaded layer, make the following configurations:

    1. Remove the computing node (for example, host1) from manage_cluster.

      cps cluster-update --name manage_cluster host1

    2. Add the compute node (for example, host1) to the original cluster (for example, compute_cluster). If the compute node has not been added to any cluster, perform 60.c.
    3. Run the following command to submit the modification:

      cps commit

  17. Restore the FusionSphere OpenStack OM configuration files on FusionSphere OpenStack. For details, see Replacing FusionSphere OpenStack Configuration Files in "Related Tasks".
  18. Run the following command on the active node of the Service OM to unlock the FSPRest user:

    unlockSysAccount FSPRest

  19. Configure FusionSphere OpenStack alarm reporting.

    Configure FusionSphere OpenStack alarm reporting. For details, see "Configuring FusionSphere OpenStack Alarm Reporting" in the FusionCloud 6.3.1.1 O&M Guide.

Related Tasks

Replacing FusionSphere OpenStack OM Configuration Files

  1. Use PuTTY to log in to the first host in the AZ using the IP address of the reverse proxy of FusionSphere OpenStack.

    The username is fsp and the default password is Huawei@CLOUD8.

  2. Run the following command to switch to user root, and enter the password of user root as prompted:

    su - root

    The default password of user root is Huawei@CLOUD8!.

  3. Run the TMOUT=0 command to disable user logout upon system timeout.
  4. Import environment variables. For details, see Importing Environment Variables.

    Enter 1 to import keystone V3 environment variables.

  5. Run the following command to query the hosts accommodating the CPS service and record the active node ID as well as the IP address of the External OM plane:

    cps template-instance-list --service cps cps-server

    In the command output, runsonhost indicates the ID of the host accommodating the CPS service.

    The host with status being active is the active host accommodating the CPS service. The IP address in the omip column is that of the External OM plane of the host.

  6. Run the following commands to log in to the active host accommodating the CPS service and switch to the root user:

    su - fsp

    ssh fsp@Host External OM IP address

  7. Back up and replace the FusionSphere OpenStack OM configuration files.

    Check whether Service OM has been reinstalled.

    • If yes, run the following commands and go to 8:

      cd /home/fsp

      cp /etc/huawei/fusionsphere/3rdvms/ca.crt /home/fsp/ca.crt.bak

      cp /etc/huawei/fusionsphere/3rdvms/fusionmanager-init.ini /home/fsp/fusionmanager-init.ini.bak

      cp /etc/huawei/fusionsphere/3rdvms/ca-default.crt /home/fsp/ca.crt

      cp /etc/huawei/fusionsphere/3rdvms/fusionmanager-init-default.ini /home/fsp/fusionmanager-init.ini

    • If no, run the following commands and go to 8:

      cd /home/fsp

      cp /home/fsp/ca.crt.bak /home/fsp/ca.crt

      cp /home/fsp/fusionmanager-init.ini.bak /home/fsp/fusionmanager-init.ini

  8. Run the following commands to restore the configuration files:

    ZAPPER_PATH=$(cat /etc/init.cfg|grep repo |awk -F '=' '{print $2}')

    echo $ZAPPER_PATH|grep "/$" || ZAPPER_PATH="${ZAPPER_PATH}/"

    INTERNAL_CPS_PWD=`python -c 'from FSSecurity import crypt;import ConfigParser;sys_file_parser = ConfigParser.RawConfigParser();sys_file_parser.read("/etc/huawei/fusionsphere/cfg/sys.ini");print crypt.decrypt(dict(sys_file_parser.items("system_account"))["internal_cps_password"])'`

    curl -k -i -H "X-Auth-User:internal_cps_admin" -H "X-Auth-Password:$INTERNAL_CPS_PWD" -X PUT -T /home/fsp/fusionmanager-init.ini ${ZAPPER_PATH}3rdvms/setup/fusionmanager-init.ini > /dev/null 2>&1

    curl -k -i -H "X-Auth-User:internal_cps_admin" -H "X-Auth-Password:$INTERNAL_CPS_PWD" -X PUT -T /home/fsp/ca.crt ${ZAPPER_PATH}3rdvms/setup/ca.crt

    The FusionSphere OpenStack configuration files are replaced if information similar to the following is displayed:

    HTTP/1.1 100 Continue 
     
    HTTP/1.1 201 Created 
    Last-Modified: Mon, 28 Sep 2015 14:06:55 GMT 
    Content-Length: 0 
    Etag: c57d2f13fd66905b62c8c0420a20a548 
    Content-Type: text/html; charset=UTF-8 
    X-Trans-Id: txe0257216b17347f3ba2f9-005609497e 
    Date: Mon, 28 Sep 2015 14:06:54 GMT 
    Connection: close     

    Run the following commands to import the FusionSphere OpenStack configuration files if the displayed information is abnormal:

    ZAPPER_PATH="https$(echo ${ZAPPER_PATH}|awk -F "http" '{print $2}')"

    INTERNAL_CPS_PWD=`python -c 'from FSSecurity import crypt;import ConfigParser;sys_file_parser = ConfigParser.RawConfigParser();sys_file_parser.read("/etc/huawei/fusionsphere/cfg/sys.ini");print crypt.decrypt(dict(sys_file_parser.items("system_account"))["internal_cps_password"])'`

    curl -k -i -H "X-Auth-User:internal_cps_admin" -H "X-Auth-Password:$INTERNAL_CPS_PWD" -X PUT -T /home/fsp/fusionmanager-init.ini ${ZAPPER_PATH}3rdvms/setup/fusionmanager-init.ini > /dev/null 2>&1

    curl -k -i -H "X-Auth-User:internal_cps_admin" -H "X-Auth-Password:$INTERNAL_CPS_PWD" -X PUT -T /home/fsp/ca.crt ${ZAPPER_PATH}3rdvms/setup/ca.crt

  9. Run the following commands on the hosts accommodating the CPS service respectively to delete the old configuration files in the two directories:

    rm /etc/huawei/fusionsphere/3rdvms/ca.crt > /dev/null 2>&1

    rm /etc/huawei/3rdvms/ca.crt> /dev/null 2>&1

    rm /etc/huawei/fusionsphere/3rdvms/fusionmanager-init.ini > /dev/null 2>&1

    rm /etc/huawei/3rdvms/fusionmanager-init.ini> /dev/null 2>&1

Checking Whether the Password of User system_admin Has Been Changed in Service OM

  1. Use PuTTY to log in to the first host in the AZ using the IP address of the reverse proxy of FusionSphere OpenStack.

    The username is fsp and the default password is Huawei@CLOUD8.

  2. Run the following command to switch to user root, and enter the password of user root as prompted:

    su - root

    The default password of user root is Huawei@CLOUD8!.

  3. Run the TMOUT=0 command to disable user logout upon system timeout.
  4. Import environment variables. For details, see Importing Environment Variables.
  5. Run the following command to check whether the password of the system_admin user is the default one:

    curl -i -k -d '{"auth":{"identity":{"methods":["password"],"password":{"user":{"domain":{"name":"Default"},"name":"system_admin","password":"FusionSphere123"}}},"scope":{"project":{"name":"admin","domain":{"name":"Default"}}}}}' -H "Content-type: application/json" https://identity.az1.dc1.domainname.com/identity-admin/v3/auth/tokens?nocatalog

    If information similar to the following is displayed, the password of the system_admin user is the default one:

    HTTP/1.0 200 Connection Established 
    Proxy-agent: Apache 
    HTTP/1.1 201 Created 
    Date: Fri, 31 Mar 2017 06:10:57 GMT 
    Server: Apache 
    X-Subject-Token: MIIDhgYJKoZIhvcNAQcCoIIDdzCCA3MCAQExDTALBglghkgBZQMEAgEwggHUBgkqhkiG9w0BBwGgggHFBIIBwXsidG9rZW4iOnsibWV0aG9kcyI6WyJwYXN 
    zd29yZCJdLCJyb2xlcyI6W3siaWQiOiI1ZmFlMTRlYzEzMDA0MTJmOGFmZGIzNDJiYTc0YmI2MCIsIm5hbWUiOiJhZG1pbiJ9XSwiZXhwaXJlc19hdCI 
    6IjIwMTctMDMtMzFUMTI6MTA6NTcuMDAwMDAwWiIsInByb2plY3QiOnsiZG9tYWluIjp7ImlkIjoiZGVmYXVsdCIsIm5hbWUiOiJEZWZhdWx0In0sIml 
    kIjoiNDk2MWQ4YTViYjkyNDdhNGJkZDAxZmE5M2FkNGEyMWYiLCJuYW1lIjoiYWRtaW4ifSwidXNlciI6eyJkb21haW4iOnsiaWQiOiJkZWZhdWx0Iiw 
    ibmFtZSI6IkRlZmF1bHQifSwiaWQiOiJiMDBkYTlmOTA2NjI0NGM2OWU2YjY4ZDJhMjJhOGFhZCIsIm5hbWUiOiJzeXN0ZW1fYWRtaW4ifSwiYXVkaXR 
    faWRzIjpbIkpfU3FRYlFLVFVtVno5dkF2R0MteEEiXSwiaXNzdWVkX2F0IjoiMjAxNy0wMy0zMVQwNjoxMDo1Ny4wMDAwMDBaIn19MYIBhTCCAYECAQE 
    wXDBXMQswCQYDVQQGEwJVUzEOMAwGA1UECAwFVW5zZXQxDjAMBgNVBAcMBVVuc2V0MQ4wDAYDVQQKDAVVbnNldDEYMBYGA1UEAwwPd3d3LmV4YW1wbGU 
    uY29tAgEBMAsGCWCGSAFlAwQCATANBgkqhkiG9w0BAQEFAASCAQA41Gj6-6NkD-8cJqjYhwn2Hem9Er-qZ5ynyRbMGAX77qqxW+vXesN3yG06LasW0wI 
    rUd9vKV5XbfsfVZBzsif6Cq3F4VKQ2q0zVNEx6ZnEcLu7XBEbMC9zWf9+0j3xtPx15lNLs-Hky9Jd5AIkFytLgufQUGniA2xKwfWYdd-p3eHHjN1hojC 
    75Dbr2yl9fE5HVlLllfWu2e6dSm+zLhOn37CdkYXeThhGLqGg15C0f5wM8YxbmeHC68HCF8YW3uYnvL3s9fNa8yO0951VJdXPxI9pQgGUSoF0txvNNcU 
    Afu3LzwNdo87WxpoRgsgAB6QfqMEmx62psjQuSbJ-a0rT 
    Vary: X-Auth-Token 
    Content-Length: 482 
    x-openstack-request-id: req-9e46aaff-9355-4d5d-ab72-f6b8b10037e1 
    Content-Type: application/json 
    {"token": {"methods": ["password"], "roles": [{"id": "5fae14ec1300412f8afdb342ba74bb60", "name": "admin"}],  
    "expires_at": "2017-03-31T12:10:57.000000Z", "project": {"domain": {"id": "default", "name": "Default"},  
    "id": "4961d8a5bb9247a4bdd01fa93ad4a21f", "name": "admin"}, "user": {"domain": {"id": "default", "name": "Default"}, 
     "id": "b00da9f9066244c69e6b68d2a22a8aad", "name": "system_admin"}, "audit_ids": ["J_SqQbQKTUmVz9vAvGC-xA"],  
    "issued_at": "2017-03-31T06:10:57.000000Z"}}     

    If information similar to the following is displayed, the password of the system_admin user is not the default one:

    HTTP/1.1 401 Unauthorized 
    Date: Fri, 31 Mar 2017 06:33:26 GMT 
    Server: Apache 
    WWW-Authenticate: Keystone uri="https://identity.az1.dc1.domainname.com/identity-admin" 
    Vary: X-Auth-Token 
    Content-Length: 114 
    x-openstack-request-id: req-b685c0db-c5d2-4f90-b5fd-e05c67b26530 
    Content-Type: application/json 
    {"error": {"message": "The request you have made requires authentication.", "code": 401, "title": "Unauthorized"}}     

A Single Service OM VM Faulty in Active/Standby Deployment Mode

Symptom

When two Service OM VMs are deployed in active/standby mode and one Service OM VM becomes faulty and cannot be restored by restart, perform the operations provided in this section to rectify faults and quickly restore services.

Possible causes
  • The file system of the Service OM VM is damaged.
  • The OS of the host accommodating the Service OM VM is faulty.
Procedure

Check whether the image partition of the host uses the remote storage.

The Service OM VMs exist in the image partition of the host, and the image partition uses the remote storage. After the storage is interrupted or the storage is recovered, the image mounting is abnormal. As a result, the Service OM VMs cannot be started.

  1. On the FusionSphere OpenStack web client, choose Configuration > Disk.
  2. In the Expand Storage Capacity area, check whether the image disk partition uses the remote storage.

    • If it is, go to 3.
    • If no, go to 8.

  3. Check whether alarm ALM-6026 Faulty Fiber Channels on the Host is displayed on the host accommodating the FusionSphere OpenStack OM VMs.

    • If yes, clear the alarm based on the alarm help and go to 4.
    • If no, go to 4.

  4. Log in to the host where the Service OM VMs are located as user root
  5. Run the following command to check whether the image partition exists:

    mount | grep '/opt/HUAWEI/image '

    If the following information is displayed, the image partition exists:

    /dev/mapper/extend_vg-image on /opt/HUAWEI/image type ext3 (rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered)

  6. Run the following command to check whether the image partition is read-only:

    mount | grep '/opt/HUAWEI/image '

    /dev/mapper/extend_vg-image on /opt/HUAWEI/image type ext3 (rw,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered)

    If the following information is displayed, the image partition is read-only.

    /dev/mapper/extend_vg-image on /opt/HUAWEI/image type ext3 (ro,relatime,errors=continue,user_xattr,acl,barrier=1,data=ordered)
    • If yes, go to 17.
    • If no, contact technical support for assistance.

  7. Migrate service VMs on the host and restart the host to restore the image partition.

    No further action is required.

Reset the status of the faulty VMs.

  1. Run the following command to set the status of the faulty VM to error:

    nova reset-state uuid

    NOTE:

    You can run the nova list --all-t | grep fm command to query VM information and take a note of the VM ID based on the name of the faulty VM.

  2. Run the following commands to stop the VMs and check the VM status respectively:

    nova stop uuid

    nova show uuid | grep OS-EXT-STS:vm_state

    uuid in these commands indicates the VM UUID.

  3. When the VM status changes to stopped, run the following commands to start the VMs and check the VM status, respectively:

    nova start uuid

    nova show uuid | grep OS-EXT-STS:vm_state

  4. Check whether the VM status is active.

    • If yes, go to 12.
    • If no, go to 30.

Log in to the faulty VM.

  1. Log in to the FusionSphere OpenStack web client.

    For details, see Logging In to the FusionSphere OpenStack Web Client (ManageOne Mode).

  2. Choose Cloud Service > FusionSphere OpenStack OM.
  3. Locate the row that contains the faulty Service OM VM and click to log in to the Service OM VM.

    Log in to the OS as the galaxmanager user (the default password is IaaS@OS-CLOUD9!).

  4. Run the following command to switch to user root, and enter the password of user root as prompted:

    su - root

    The default password of user root is IaaS@OS-CLOUD8!.

  5. Run the TMOUT=0 command to disable user logout upon system timeout.
  6. Check whether the login is successful.

    • If yes, go to 18.
    • If no, the VM is faulty. Go to 30.

  7. Run the following command to check whether the /opt/goku/data/db usage exceeds 98%:

    df -h

    Filesystem            Size  Used Avail Use% Mounted on 
    /dev/sda1             6.0G 1017M  4.7G  18% / 
                          ...
    /dev/sda13             36G  177M   34G   1% /opt/goku/data 
    /dev/sda11             15G  408M   14G   3% /opt/goku/data/db 
    /dev/sda2             7.9G  147M  7.4G   2% /sysback 
                          ...
    /dev/sda6             9.9G  239M  9.2G   3% /var/log/goku
    • If yes, contact technical support for assistance.
    • If no, go to 19.

Enable the high availability (HA) service.

  1. Run the following command to check the HA status of the node:

    QueryHaState

    Information similar to the following is displayed:

    LOCAL_HOST=Service OM01 
    LOCAL_STATE=unknow 
    LOCAL_IP=192.168.0.79 
     
    REMOTE_HOST=Service OM02 
    REMOTE_STATE=unknow 
    REMOTE_IP=192.168.0.80

  2. Check whether the LOCAL_STATE value in the command output is unknow.

    • If yes, go to 21.
    • If no, go to 28.

  3. Run the following command to enable the HA service:

    startALL

    A heartbeat IP address conflict occurs if information similar to the following is displayed:

    The heartbeat IP address {IP address} of the system conflicts with another IP address.

  4. Check whether a heartbeat IP address conflict occurs.

    • If yes, go to 23.
    • If no, go to 26.

  5. Check whether multiple Service OM systems are configured on the network and their management IP addresses are the same.

    • If yes, go to 24.
    • If no, go to 25.

  6. Stop services of other Service OM systems or power off other Service OM hosts.
  7. Press Enter to continue starting the HA service.
  8. One minute after the preceding step, run the following command to query the HA status of the node:

    QueryHaState

  9. Check whether the LOCAL_STATE value in the command output is still unknow.

    • If yes, contact technical support for assistance.
    • If no, go to 28.

  10. Use PuTTY to log in to the active Service OM VM through the floating IP address of the External OM network.

    The default username is galaxmanager and the default password is IaaS@OS-CLOUD9!.

    NOTE:
    For details about Management IP address of the FusionSphere Service OM node , see the LLD generated by FCD sheet of the xxx_export_all.xlsm file exported from FusionCloud Deploy during software installation, and search for the IP addresses corresponding to VMs and nodes.The parameter names in different scenarios are as follows:
    • Region Type I scenario : Cascading-OM-externalOM-Float-IP.
    • Region Type II and Type III scenarios : OM-externalAPI-Float-IP.

  11. Ten minutes after the preceding step, run the following command to check whether the Service OM status is normal:

    galaxmanager status

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Rebuild the faulty Service OM VM.

  1. Choose Cloud Service > FusionSphere OpenStack OM and import the software package of the current version on the displayed page.
  2. Click Update Image to update the Service OM image.
  3. Choose Cloud Service > FusionSphere OpenStack OM and click next to the faulty VM on the displayed page to rebuild the faulty FusionSphere OpenStack OM VM.

    About 60 minutes later, verify that statuses of the two VMs in the VM list are Active Node and Standby Node, respectively. If the rebuild status of the faulty VM is Success, the fault is rectified.

Translation
Download
Updated: 2019-06-10

Document ID: EDOC1100063248

Views: 23259

Downloads: 37

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next