No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
OceanStor DJ

OceanStor DJ

Information Collection

OceanStor DJ allows users to use SmartKit to collect information. SmartKit can check OceanStor DJ nodes in an all-around way and in real time, analyze faults, and provide suggestions on fault rectification.

Prerequisites
  • A maintenance terminal running Windows is available. This maintenance terminal can communicate with OceanStor DJ nodes.
  • The SmartKit package SmartKitV2R5C00RC10.zip has been obtained.
  • The floating IP address of the management plane of an OceanStor DJ node and the password of the djmanager account have been obtained.
  • The information collection package OceanStor DJ V1R3C00U1_Collect.zip for OceanStor DJ has been obtained.

In the three-node cluster deployment scenario, the primary node (node where the floating IP address of the management plane entered in device addition is located) collects log information of all nodes.

  • During information collection, when the log partition space usage of the primary node exceeds 80%, the system displays message Collection Failed.
  • During information collection, when the log partition space usage of any node other than the primary node exceeds 90%, the primary node does not collect log information on this node. For example, there are three OceanStor DJ nodes (A, B, and C) on the live network. A is the primary node. When the log partition space usage of node B exceeds 90%, the collection results include only log information of nodes A and C.
Procedure
  1. Decompress and install SmartKit according to the installation wizard.
  2. Double-click the SmartKit icon to run SmartKit. For details, see SmartKit V2R5C00RC10 User Guide 01 and choose Deploying SmartKit > Running SmartKit.
  3. Import the OceanStor DJ information collection package.

    1. On the home page of SmartKit, click Function Management.

    2. Click Import.

    3. In the Import dialog box that is displayed, select the OceanStor DJ V1R3C00U1_Collect.zip information collection tool package and click OK.
    4. In the Verification and Installation window that is displayed, select Information Collection and click Install.
    5. After the installation is complete, the Import succeeded dialog box is displayed. Click OK.
    6. Select Storage in the navigation tree on the left and click Storage Information Collection in the function pane.

    7. In the Storage Information Collection window, click the Information Collection area.

  4. Discover OceanStor DJ nodes.

    1. Click Add Devices.

    2. In the dialog box that is displayed, enter the IP address of the OceanStor DJ primary node.

    3. Click Next.
    4. Configure authentication information.The default password of the djmanager account is CloudService@123!.

    5. Click Finish.

      If the following window is displayed, click OK.

      If the following window is displayed, click OK.

  5. Collect log information from OceanStor DJ nodes.

    Select the added device and click Collect Information.

  6. After information is collected, click Open Directory and view the collected logs.

File Storage Faults

The Active/Standby Status of GaussDB Nodes Is Abnormal
Symptom

A user fails to log in to the OceanStor DJ GUI, and OceanStor DJ services are abnormal.

Possible Causes
  • The user waits for more than 10 minutes after the user stops services on all nodes or GaussDB nodes are powered off unexpectedly. Then, services on the active GaussDB node cannot be restarted properly.
  • The time difference is greater than 10 minutes after the system time is changed. Services on the active GaussDB node cannot be restarted properly.
Procedure
  1. Use PuTTY to log in to the SFS-DJ01, SFS-DJ02, or SFS-DJ03 node using the management plane IP address of the node.

    The default account and password are djmanager and CloudService@123!, respectively.

    To obtain the management plane IP address of the SFS_DJ01, SFS_DJ02, or SFSDJ_03 node, search for SFS_DJ01, SFS_DJ02, or SFSDJ_03 respectively in the Tool-generated IP Parameters sheet of xxx_export_all_EN.xlsm.

  2. Run the following command and enter the password of user root (Cloud12#$) to switch to user root:

    su root

  3. Run the following command to disable user logout upon timeout:

    TMOUT=0

  4. Run the show_service --service omm-ha command, and determine the two nodes where omm-ha is running according to the command output.

    The command output is as follows:

    [root@localhost ~]# show_service --service omm-ha  
    +-------------+---------+---------+------------+  
    | instanceid  | service | status  | runsonhost |  
    +-------------+---------+---------+------------+  
    | DJ03_omm-ha | omm-ha  | active  | DJ03       |  
    | DJ01_omm-ha | omm-ha  | standby | DJ01       |  
    +-------------+---------+---------+------------+     

  5. Log in to the two nodes where omm-ha is running. Run the bash /usr/local/bin/ha/ha/config_script/sync_monitor.sh get_status command to check the last online time of GaussDB.

    The command output is as follows:

    [root@localhost ~]# bash /usr/local/bin/ha/ha/config_script/sync_monitor.sh get_status  
    DB last online role : Standby  
    DB last online time : 2018-03-21 19:14:25.      

  6. Compare the last online time of GaussDB on each node obtained in 5 with the current time.

    • If the time differences are greater than 10 minutes, go to 7.
    • If the time differences are not greater than 10 minutes, go to 9.

  7. On the two nodes where omm-ha is running, run the bash /usr/local/bin/ha/ha/config_script/sync_monitor.sh get_status command. Determine the active GaussDB node according to the command output.

    • If the roles of the two nodes are Primary and Standby, the node with the Primary role is the active GaussDB node.
    • If the roles of both nodes are Primary, calculate the time difference between the last online time of GaussDB on each node and the current time. The node with a shorter time difference is the active GaussDB node.

  8. Run the bash /usr/local/bin/ha/ha/config_script/sync_monitor.sh reset_status command on the active GaussDB node.
  9. Wait 2 minutes, and then log in to the OceanStor DJ administrator GUI to check whether OceanStor DJ services are normal.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Failed to Uninstall OceanStor DJ
Symptom

Uninstalling OceanStor DJ fails.

Possible Causes

A process of uninstalling OceanStor DJ exists and OceanStor DJ cannot be uninstalled again.

Procedure
  1. Use PuTTY to log in to the SFS-DJ01, SFS-DJ02, or SFS-DJ03 node using the management plane IP address of the node.

    The default account and password are djmanager and CloudService@123!, respectively.

    To obtain the management plane IP address of the SFS_DJ01, SFS_DJ02, or SFSDJ_03 node, search for SFS_DJ01, SFS_DJ02, or SFSDJ_03 respectively in the Tool-generated IP Parameters sheet of xxx_export_all_EN.xlsm.

  2. Run the following command and enter the password of user root (Cloud12#$) to switch to user root:

    su root

  3. Run the following command to disable user logout upon timeout:

    TMOUT=0

  1. Run the docker ps -a command to view the status of the Docker container. Check whether the container corresponding to the component where uninstalling OceanStor DJ fails is in the Exited state.

    [root@DJ182 inst]# docker ps -a 
    CONTAINER ID        IMAGE                        COMMAND                  CREATED             STATUS                        PORTS               NAMES
    a5589b3df054        dashboard:1.2.10.2           "bash /etc/dashboard/"   33 hours ago        Exited (137) 20 seconds ago                       dashboard
    1e5e7e08f6c0        oms-controller:1.2.10.2      "/bin/bash /usr/bin/i"   33 hours ago        Up 33 hours                                       oms-controller
    672d9e966363        hermes:1.2.10.2              "/bin/bash -c 'sh /et"   33 hours ago        Up 33 hours                                       hermes
    3b2e1646cbdf        heat:1.2.10.2                "bash -c 'sh /install"   33 hours ago        Up 33 hours                                       heat-engine
    1e44b9b55269        heat:1.2.10.2                "bash -c 'sh /install"   33 hours ago        Up 33 hours                                       heat-api
    29027f6ae2cc        filemeter-service:1.2.10.1   "/bin/bash /usr/bin/S"   33 hours ago        Up 33 hours                                       filemeter-service
    f2bb90699e6d        filemeter-api:1.2.10.1       "/bin/bash /usr/bin/S"   33 hours ago        Up 33 hours                                       filemeter-api
    662653dcdda7        authkeepmgt:1.2.10.2         "/bin/bash -c /usr/bi"   33 hours ago        Up 31 hours                                       authkeepmgt
    a3bf03de8c2f        oms-agent:1.2.10.2           "/bin/bash /usr/bin/i"   33 hours ago        Up 33 hours                                       oms-agent
    ac75776db2cd        manila-scheduler:1.2.10.0    "/bin/bash /usr/bin/S"   33 hours ago        Up 33 hours                                       manila-scheduler
    fd2b42f8d015        manila-api:1.2.10.0          "/bin/bash /usr/bin/S"   33 hours ago        Up 33 hours                                       manila-api_tenant
    0c43cf729dd9        manila-api:1.2.10.0          "/bin/bash /usr/bin/S"   33 hours ago        Up 33 hours                                       manila-api_admin
    fe0beebc452b        oms-api:1.2.10.2             "/bin/bash /usr/bin/i"   33 hours ago        Up 33 hours                                       oms-api
    30cda53ce979        certms:1.2.10.2              "/bin/bash -c /usr/bi"   33 hours ago        Up 31 hours                                       certms
    9bf80fcbca10        rabbitmq:1.2.10.2            "bash /usr/local/lib/"   33 hours ago        Up 33 hours                                       rabbitmq
    681eb9754aa0        keystone:1.2.10.2            "/bin/bash keystone_r"   33 hours ago        Up 33 hours                                       keystone
    1e1805ce94c4        fms:1.2.10.2                 "bash /opt/huawei/dj/"   33 hours ago        Up 33 hours                                       fms
    f37ac70e39e9        cms:1.2.10.2                 "bash /etc/cms/cms-se"   33 hours ago        Up 33 hours                                       cms
    27aa50fe68bf        zookeeper:1.2.10.2           "/bin/bash -c 'bash /"   33 hours ago        Up 33 hours                                       zookeeper
    1b2ca1fff7c8        gaussdb:1.2.10.2             "bash /home/start_gau"   33 hours ago        Up 33 hours                                       gaussdb

  2. Run the ps -ef | grep dashboardControl command to check whether an uninstallation process which is not cleared exists on the dashboard.

    • If message dashboardControl -S STOP is displayed, check the process ID. As shown in Figure 13-3, the process ID is 10636. Go to 6.
      Figure 13-3 command output
    • If the message is not displayed, contact technical support for assistance.

  3. Run the kill -9 Process ID command to forcibly stop the dashboardControl -S STOP process.
  4. Run the ps -ef | grep dashboardControl command to check whether the dashboardControl -S STOP process exists.

    • If yes, contact technical support for assistance.
    • If no, go to 8.

  5. Uninstall OceanStor DJ again by referring to Uninstalling OceanStor DJ in STaaS Solution 6.5.0 SFS Software Installation Guide (Private Cloud Scenario for HUAWEI CLOUD Stack 6.5.0) 01.
ManageOne Operation Portal Displays You are not allowed to perform any operation on a deleted resource
Symptom

After GuassDB data is restored, message You are not allowed to perform any operation on a deleted resource is displayed when you perform operations on a file system on ManageOne Operation Portal.

Possible Causes

After the backup time, the user deletes the file system permanently. As a result, the file system still exists after the GuassDB database is restored using the backup, but no operation can be performed.

Procedure
  1. Use PuTTY to log in to the SFS-DJ01, SFS-DJ02, or SFS-DJ03 node using the management plane IP address of the node.

    The default account and password are djmanager and CloudService@123!, respectively.

    To obtain the management plane IP address of the SFS_DJ01, SFS_DJ02, or SFSDJ_03 node, search for SFS_DJ01, SFS_DJ02, or SFSDJ_03 respectively in the Tool-generated IP Parameters sheet of xxx_export_all_EN.xlsm.

  2. Run the following command and enter the password of user root (Cloud12#$) to switch to user root:

    su root

  3. Run the following command to disable user logout upon timeout:

    TMOUT=0

  1. Run the docker exec -it -u root manila-api_tenant bash command to go to the manila container.
  2. Run the vi /home/env.sh command to check whether environment variables exist in the env.sh file.

    • If environment variables exist, go to 6. Environment variables are as follows:
      #!/bin/bash
      FULL_PATH=`readlink -f ${BASH_SOURCE}`
      CWD=`dirname ${FULL_PATH}`
      IP_ADDR=$(get_info.py --manage_float_ip)
      if [[ ${IP_ADDR} == *:* ]];then
              IP_ADDR="["${IP_ADDR}"]"
          fi
      export OS_PASSWORD=CloudService@123!
      export OS_AUTH_URL=https://${IP_ADDR}:35357/identity/v3
      export OS_USERNAME=manila
      export OS_TENANT_NAME=service
      export OS_PROJECT_DOMAIN_NAME=Default
      export OS_USER_DOMAIN_NAME=Default
      export OS_IDENTITY_API_VERSION=3
      export OS_SERVICE_ENDPOINT=https://${IP_ADDR}:35357/identity-admin/v3
      export OS_SERVICE_TOKEN=$(curl -g -k -i -X POST https://${IP_ADDR}:35357/identity-admin/v3/auth/tokens -H "Content-Type:application/json" -d '{"auth": {"identity": {"methods":[ "password" ],"password": {"user": {"name": "manila","domain": { "name": "Default" },"password": "CloudService@123!" } } }, "scope": {"project": { "name":"service", "domain": {"name":"Default" }}}}}' |grep "X-Subject-Token"|awk -F':' '{print $2}')
      export OS_REGION_NAME="az1.dc1"
      export OS_ENDPOINT_TYPE=internalURL
      export MANILA_ENDPOINT_TYPE=adminURL
      export MANILACLIENT_INSECURE=True
      
      NOTE:

      password is the password of the manila account, and it is CloudService@123! by default.

    • If no environment variable exists, copy and paste the environment variable contents to the file, enter wq! to save the file, and exit. Then go to 6.
      NOTE:

      The value of OS_SERVICE_TOKEN is only one piece of information, which is automatically wrapped in a PDF file. After copying the information, manually delete newline characters.

  1. Run the source /home/env.sh command to import environment variables.
  2. Log in to ManageOne Maintenance Portal. Choose System > Logs > Tenant Operation Logs, view tenant operation logs, and record the IDs of the file systems which have been permanently deleted after the backup time.
  3. Run the manila force-delete <share_id> command to delete the file systems.

    Replace <share_id> with the IDs of the file systems recorded in 7.

  4. Run the rm /home/env.sh command to delete environment variables.
Failed to Adjust Capacities
Symptom

When the total capacity of a file system is adjusted close to its used capacity, capacity adjustment fails.

Possible Causes

Considering performance, some capacity is pre-allocated to the file system. During capacity adjustment, the used capacity calculated by the system is equal to the actually used capacity plus the pre-allocated capacity. As a result, the entered capacity is smaller than the used capacity calculated by the system and the adjustment fails.

Procedure
  1. Wait 2 to 3 minutes and refresh the page. The status of the file system changes from Capacity reduction error to Available.

  1. Adjust the total capacity of the file system again. For details, see Resizing a File System.

    When you reduce capacity again, the entered capacity must be greater than the integer that is entered during the capacity adjustment.

Failed to Report an Alarm
Symptom

After the SNMP is configured on OceanStor DJ, an alarm is generated on OceanStor DJ, but ManageOne Maintenance Portal cannot receive the alarm.

Possible Causes

The Hermes component fails to register the SNMP station information with the ZooKeeper service.

Procedure
  1. Use PuTTY to log in to any OceanStor DJ node through the floating IP address of the OceanStor DJ management plane.

    The default account and password are djmanager and CloudService@123!, respectively.

    To obtain the floating IP address of the management plane, search for SFS_MANAGE_FLOAT_IP in the Tool-generated IP Parameters sheet of xxx_export_all_EN.xlsm.

  2. Open the log file in /var/log/huawei/dj/services/system/hermes/hermes.log.

    vi /var/log/huawei/dj/services/system/hermes/hermes.log

  3. Check whether KeeperErrorCode = AuthFailed for /alarm/snmp-site/cfg/snmp_site_info exists in the log:

    • If no, go to 4.
    • If no, contact technical support for assistance.

  4. Run the following command to trigger Hermes to register the SNMP station information with the ZooKeeper service again:

    1. Run the stop_service --service hermes command to stop the Hermes service.
    2. Run the start_service --service hermes command to start the Hermes service.

  5. Log in to ManageOne Maintenance Portal using a browser.

    • URL: https://Address for accessing the homepage of ManageOne Maintenance Portal:31943, for example, https://oc.type.com:31943
    • Default username: admin; default password: Huawei12#$

  6. Click Log In.
  7. Choose Alarms > Current Alarms and check whether the alarm is reported successfully.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

----End

Translation
Download
Updated: 2019-06-01

Document ID: EDOC1100062375

Views: 1147

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next