No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Zenith Database Faults

Zenith Database Faults

This section describes common faults of Zenith databases and how to restore the Zenith databases when the databases are deployed in active/standby mode.

Operations Before Troubleshooting

Precautions

If you have located the cause of the database fault before troubleshooting, skip this section. If you need to analyze the cause of the fault, perform the following operations to collect the required information.

Procedure

Database instance test101-0-999 is used as an example in the following procedure. test101-0-999 is the database instance ID.

  1. Use PuTTY to log in to the node where the abnormal database instance resides as the sopuser user.

    The default password is D4I$awOD7k.

    NOTE:

    You can log in to ManageOne Deployment Portal and choose Application > Service Management > System Monitoring > Relational Databases to view the IP address of the node where the abnormal database instance resides.

  2. Run the following command to switch to the root user:

    sudo su root

    The default password is Changeme_123.

  3. Run the following commands to back up the log and configuration files of the abnormal database instance:

    cp /opt/zenith/data/test101-0-999/log/zctl-Backup date_random number.log /opt/zenith/data/test101-0-999/log/zctl-Backup date_random number.log.bak

    cp /opt/zenith/data/test101-0-999/log/run/zengine.rlog /opt/zenith/data/test101-0-999/log/run/zengine.rlog.bak

    NOTE:

    Run the cd /opt/zenith/data/test101-0-999/log command to go to the log directory and obtain the value of Backup date_random number.

  4. Run the following command to obtain the process ID of the abnormal database instance:

    ps -ef | grep test101-0-999

    Information similar to the following is displayed:

    dbuser    9955     1  1 18:14 ?        00:00:21 /opt/zenith/app/bin/zengine open -D /opt/zenith/data/test101-0-999

  5. Run the following command to save the instance startup time to the specified file:

    ps -p 9955 -o lstart > /opt/zenith/data/test101-0-999/log/test101-0-999_zenith_time

    NOTE:

    9955 indicates the process ID corresponding to -D /opt/zenith/data/test101-0-999 in the command output in 4.

  6. Run the following command to log out of the root user:

    exit

Checking Whether the ManageOne VM Is Started

Context

If a VM node is not started before the fault is rectified, start the VM and then check the fault status.

Procedure
  1. Use PuTTY to log in to the first FusionSphere OpenStack node through the IP address of the External OM plane.

    The default account is fsp, and the default password is Huawei@CLOUD8.

    NOTE:
    • The system supports both password and public-private key pair for identity authentication. If the public-private key pair is used for login authentication, see detailed operations in Using PuTTY to Log In to a Node in Key Pair Authentication Mode.
    • To obtain the IP address of External OM, search for required parameter names on the 2.1 LLD generated by FCD sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
      • Cascading system in the Region Type I scenario: Cascading-ExternalOM-Reverse-Proxy; Cascaded system in the Region Type I scenario: Cascaded-ExternalOM-Reverse-Proxy
      • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  2. Run the following command to switch to the root user:

    su - root

    The default password is Huawei@CLOUD8!.

  3. Run the following command to disable user logout upon system timeout:

    TMOUT=0

  4. Run the following command to import environment variables:

    source set_env

    Information similar to the following is displayed:

    please choose environment variable which you want to import:
    (1) openstack environment variable (keystone v3)
    (2) cps environment variable
    (3) openstack environment variable legacy (keystone v2)
    (4) openstack environment variable of cloud_admin (keystone v3)

  5. Select an authentication mode.

    1. To enable Keystone V3 authentication with the built-in DC administrator, enter 1, press Enter, and enter the password of OS_USERNAME as instructed.

      The default password is FusionSphere123.

    2. The environment variables are successfully imported if the command outputs of cps host-list and nova list are automatically displayed. After the environment variables are imported, the system uses Keystone V3 authentication with the built-in DC administrator to authenticate requests, and you can run both CPS and OpenStack commands.
    NOTE:

    In the Mitaka release of OpenStack, Keystone command line interfaces (CLIs) are normalized and encrypted as OpenStack commands. When the Identity service is registered with Keystone, Keystone V3 is automatically used, but compatibility issues may occur when you run original Keystone commands related to tenants, roles, and services. Preferentially use the Keystone V3 authentication and OpenStack commands related to projects, roles, and services.

  1. Run the following command to query the ManageOne VM status:

    nova list | egrep "Status|ManageOne"

    If the Status is SHUTOFF or the Power State is Shutdown in the command output, the VM is powered off. Run the following command to start the VM:

    | ID                                   | Name                             | Status | Task State | Power State | Networks                                                                           |
    | c3cc6508-4b77-4b34-85f0-3150403383d2 | ManageOne-APS-Global-DMZ-Proxy01 | ACTIVE | -          | Running     | DMZ_Service=192.168.56.16; DMZ_Tenant=192.168.57.7                                 |
    | cf7adbbe-f3cd-4432-bc31-0c666e31be34 | ManageOne-APS-Global-DMZ-Proxy02 | ACTIVE | -          | Running     | DMZ_Service=192.168.56.17; DMZ_Tenant=192.168.57.8    

    nova start uuid

    uuid is the ID in the command output.

    If the DB node is started in Step 6, after the DB node is started, it takes about 20 minutes to rebuild the active/standby relationship.

  2. Use PuTTY to log in to the active Deploy node of ManageOne as the sopuser user.

    The default password is D4I$awOD7k.
    NOTE:

    For details about how to query the IP address of the Deploy node, see Querying the IP Address of a Node. For details about how to determine active and standby nodes, see Determining the Active and Standby Nodes of the Deployment System.

  3. Run the following command to switch to the ossadm user:

    su - ossadm

    The default password is Changeme_123.

  4. Run the following command to query the database instance replication status:

    /opt/oss/manager/apps/DBAgent/bin/dbsvc_adm -cmd query-db-instance -type zenith

    Information similar to the following is displayed:

    DBInstanceId                ClassId  InstNumber             Tenant   AzName         IP             Port   State  DBType  Version                Role    Rpl Status      MasterID  GuardMode  DataCheckSum  isSSL  RplPort
    admindbsvr-6-1@7-1          primary  admindbsvr-6-1         Product  mo-global-1-a  192.168.33.40  32081  Up     zenith  --                     Slave   Abnormal        --        --         --            on     26951  
    admindbsvr-6-1@7-1          primary  admindbsvr-7-1         Product  mo-global-1-a  192.168.33.41  32081  Up     zenith  V300R001C00SPC100B209  Master  Normal          --        --         1654766847    on     26951  
    backupdbsvr-6-0@7-0         primary  backupdbsvr-6-0        Product  mo-global-1-a  192.168.33.40  32080  Up     zenith  --                     Slave   Abnormal        --        --         --            on     26950  
    NOTE:

    The command output varies depending on the version of the database service. Pay attention only to the value of Rpl Status.

    • Normal indicates that the replication status is normal.
    • Abnormal indicates that the replication status is abnormal.

    If Abnormal is displayed in the command output, locate the fault by referring to Possible Causes and rectify the fault by following the instructions provided in Troubleshooting.

  5. After the database is restored, log in to ManageOne Maintenance Portal. If an internal error message is displayed or the login fails, perform 11 to 14.
  6. Use PuTTY to separately log in to the regionAlias-ManageOne-Deploy01 and regionAlias-ManageOne-Deploy02 nodes as the sopuser user.

    The default password is D4I$awOD7k.

  7. Run the following command to switch to the root user:

    sudo su root

    The default password is Changeme_123.

  1. Run the following command to restart the service:

    su - ossadm -c "/opt/oss/manager/agent/bin/ipmc_adm -cmd restartapp"

  2. Log in to ManageOne Maintenance Portal again and check whether the login is successful.

    • If yes, the fault is rectified.
    • If no, contact technical support for assistance.

Slave Database Instance Is Abnormal

Symptom

On ManageOne Deployment Portal, choose Application > Service Management > System Monitoring from the main menu. On the Relational Databases tab page, the Status and Replication Status of the master database instance are Running and Normal, respectively. The Status of the slave database instance is Not Running or Unknown, or the Replication Status of the slave database instance is Abnormal.

Possible Causes
  • The VM where the database instance resides is stopped.
  • The database instance is manually stopped.
  • The database process is abnormal and fails to be restarted.
  • The database data is damaged.
Prerequisites
  • Before restoring the fault, ensure that the OS of the node which contains the database is running properly.
  • You have obtained the IP address of the Deploy node where the faulty database instance is located.
Procedure
  1. Check whether the VM where the database instance resides is stopped.

    • If yes, start the VM and check whether the fault is rectified.

      If the fault is rectified, no further action is required. Otherwise, go to 2.

    • If no, go to 2.

  2. Check the database instance status.

    1. Log in to Deployment Portal and choose Application > Service Management > System Monitoring.
    2. In the upper left corner of the System Monitoring page, click to switch to Deployment Portal or product page.
    3. On the Relational Database tab page of Deployment Portal or product page, check whether the status of the database instance is Not Running.
      • If yes, manually start the node where the database instance is located by referring to Starting the Database on Deployment Portal or Starting a Product Database.

        If the database instance is in the Running state, the node is started successfully, and the fault is rectified.

        NOTE:

        If the database instance is manually stopped, locate the problem and start the node after related tasks are complete.

        If the database instance is still in the Not Running state, the start failed. In this case, perform 3 through 7.

      • If no, the database instance is in the Running state, the fault is rectified. If the database instance is in the Unknown state, perform 3 through 7.

  3. Perform the following operations to start DBAgent and DeployAgent:

    1. Use PuTTY to log in to the node where the faulty database instance resides as the sopuser user in SSH mode.

      The default password is D4I$awOD7k.

    2. Run the following command to switch to the ossadm user:

      su - ossadm

      The default password is Changeme_123.

    3. Run the following commands to start DBAgent:

      cd /opt/oss/manager/bin

      . engr_profile.sh

      ipmc_adm -cmd startapp -app DBAgent -tenant manager

      If the system displays the following information, DBAgent is started. Otherwise, contact technical support for assistance.

      Starting process dbagentapp-6-0 ... success
    4. Run the following command to start DeployAgent:

      ipmc_adm -cmd startmgr -app DeployAgent

      If the system displays the following information, DeployAgent is started. Otherwise, contact technical support for assistance.

      ============================ Starting management processes...
      Starting deployagent...
      ...
      start mcwatchdog... success
      ============================ Starting management processes is complete.

  4. Use PuTTY to log in to any Deploy node as the sopuser user in SSH mode.

    The default password is D4I$awOD7k.

    NOTE:

  5. Run the following command to switch to the ossadm user:

    su - ossadm

    The default password is Changeme_123.

  6. Run the following commands to restore the slave database instance. After the commands are executed, the database instance will be rebuilt. For details about the parameter descriptions in the command, see Table 9-1.

    cd /opt/oss/manager/apps/UniEPService/tools/DB_Recovery

    bash DBSlaveInstance_Recovery.sh -instid admindbsvr-6-1@7-1 -tenant Product

    Table 9-1 Parameter description

    Parameter

    Description

    -instid

    Name of the database instance. The value can be the name of a single database instance or all. all indicates restoring all database instances of the product.

    -tenant

    Name of the product to which the database instance belongs. For details, see Querying the Product Name.

    Assume that the database instance name is admindbsvr-6-1@7-1. If the database instance is restored successfully, the following information is displayed. Otherwise, contact technical support for assistance.
    ...
    The result:
    admindbsvr-6-1@7-1: success
    [2018-12-22 02:29:33] [264943] Recovery DB-Instance Success.

  7. Log in to Deployment Portal and check whether the database instance is restored.

    • If the Status of the database instances is Running, and the Replication Status is Normal, the fault is rectified. No further action is required.
    • If the Status of the database instance is Not Running or Unknown, and the Replication Status is Abnormal, contact technical support for assistance.

Slave Database Instance Startup Is Abnormal

Symptom
Description

The database instance fails to be started because the server or VM is powered on or off for multiple times within a short period of time.

Troubleshooting
  1. Use PuTTY to log in to the active Deploy node of ManageOne as the sopuser user.

    The default password is D4I$awOD7k.
    NOTE:

    For details about how to query the IP address of the Deploy node, see Querying the IP Address of a Node. For details about how to determine active and standby nodes, see Determining the Active and Standby Nodes of the Deployment System.

  2. Run the following command to switch to the ossadm user:

    su - ossadm

    The default password is Changeme_123.

  3. Run the following command to check the status of the database instance:

    /opt/oss/manager/apps/DBAgent/bin/dbsvc_adm -cmd query-db-instance -type zenith

    If Status is Down, the database instance is not started. Rectify the fault by following the instructions provided in Troubleshooting.

Troubleshooting

The following configurations are used as examples in the troubleshooting procedure:

  • The database installation directory is /opt/zenith.
  • The root application installation directory is /opt/oss.
  • The master database instance number is admindbsvr-6-1, the server IP address is 10.162.33.40, the port number is 32081.
  • The slave database instance number is admindbsvr-7-1, the server IP address is 10.162.33.41, the port number is 32081.
NOTE:

Temporary backup files generated during troubleshooting must be deleted after the database fault is rectified.

Disabling Database Failover
Context

The failover function needs to be disabled before troubleshooting to prevent unexpected database failovers. After this function is disabled, the master/slave switchover does not take place.

Procedure
  1. Use PuTTY to log in to the active Deploy node of ManageOne as the sopuser user.

    The default password is D4I$awOD7k.
    NOTE:

    For details about how to query the IP address of the Deploy node, see Querying the IP Address of a Node. For details about how to determine active and standby nodes, see Determining the Active and Standby Nodes of the Deployment System.

  2. Run the following command to switch to the ossadm user:

    su - ossadm

    The default password is Changeme_123.

  3. Run the following command to go to the directory that contains the master/slave failover tool:

    cd /opt/oss/manager/apps/DBHASwitchService/bin

  4. Run the following command to disable the failover function between the master and slave database nodes:

    ./switchtool.sh -cmd set-ignore-nodes -nodes 6,7

    The command is successfully executed if the following information is displayed:

    Successful
    NOTE:

    In the preceding command, 6 and 7 indicate the IDs of the master and slave database nodes, they are the numbers in the middle of database instance names.

Handling Abnormal Slave Database Instances
Procedure

Database instance admindbsvr-7-1 is used as an example in the following procedure. admindbsvr-7-1 is the database instance ID.

  1. Use PuTTY to log in to the slave database node as the sopuser user.

    The default password is D4I$awOD7k.

    NOTE:

    To obtain the IP address of the standby node, perform the following steps: Log in to ManageOne Deployment Portal, choose Application > Service Management > System Monitoring > Relational Databases, locate the abnormal instance, and click the node name. The IP address in the upper part of the node details page is the IP address of the abnormal slave database node.

  2. Run the following command to switch to the dbuser user:

    su - dbuser

    The default password is Changeme_123.

  1. Run the following command to back up database files:

    cp -r /opt/zenith/data/admindbsvr-7-1/data /opt/zenith/data/admindbsvr-7-1/data_bak

  2. Run the following command to clear the data files in the data directory of the slave database instance:

    rm -rf /opt/zenith/data/admindbsvr-7-1/data/*

  3. Run the following commands to create a tablespace directory:

    mkdir -p /opt/zenith/data/admindbsvr-7-1/data/tablespace

    chmod -R 700 /opt/zenith/data/admindbsvr-7-1/data

  4. Run the following command to check whether the database instance process has been stopped:

    ps -ef | grep -v "grep" | grep admindbsvr-7-1
    • If no command output is displayed, the instance process has been stopped. Perform 12 trough 14.
    • If any command output is displayed, perform 7 through 14.

  5. Run the following command to switch to the root user:

    sudo su root

    The default password is Changeme_123.

  6. Run the following command to switch to the ossadm user:

    su - ossadm

    The default password is Changeme_123.

  7. Run the following command to stop the database process:

    /opt/oss/manager/agent/bin/ipmc_adm -cmd stopdc -instance admindbsvr-7-1

  8. Run the following command to switch to the root user:

    sudo su root

    The default password is Changeme_123.

  9. Run the following command to switch to the dbuser user:

    su - dbuser

    The default password is Changeme_123.

  10. Run the following command to start the database in the NOMOUNT state:

    python /opt/zenith/app/bin/zctl.py -t start -m NOMOUNT -D /opt/zenith/data/admindbsvr-7-1

  11. Run the following command to enter the abnormal database:

    zsql sys/Admin@123@127.0.0.1: Abnormal database port number

    NOTE:

    To obtain the database port number, perform the following steps: Log in to ManageOne Deployment Portal and choose Application > Service Management > System Monitoring > Relational Databases to view the port number corresponding to the abnormal instance in the Port column.

  12. Run the following command to rebuild the database:

    build database;

    If information similar to the following is displayed, the database is successfully rebuilt:

    Succeed.

Enabling Database Failover
Context

After the database fault is rectified, the database failover function must be enabled to ensure the database is highly reliable.

Procedure
  1. Use PuTTY to log in to the active Deploy node of ManageOne as the sopuser user.

    The default password is D4I$awOD7k.
    NOTE:

    For details about how to query the IP address of the Deploy node, see Querying the IP Address of a Node. For details about how to determine active and standby nodes, see Determining the Active and Standby Nodes of the Deployment System.

  2. Run the following command to switch to the ossadm user:

    su - ossadm

    The default password is Changeme_123.

  1. Run the following command to go to the directory that contains the master/slave failover tool:

    cd /opt/oss/manager/apps/DBHASwitchService/bin

  2. Run the following command to enable the failover function:

    ./switchtool.sh -cmd del-ignore-nodes

    If the following information is displayed, the command is successfully executed:

    Successful.

  3. Run the following command to clear the last failover time:

    ./switchtool.sh -cmd del-failover-time -instid admindbsvr-6-1@7-1

    If the following information is displayed, the command is successfully executed:

    Successful.
    NOTE:

    admindbsvr-6-1@7-1 is the database instance name. Replace it as required.

Slave Database Instance Replication Status Is Abnormal

Symptom
Description

The master database instance is normal, but the replication status of the slave database instance is abnormal.

Troubleshooting
  1. Use PuTTY to log in to the active Deploy node of ManageOne as the sopuser user.

    The default password is D4I$awOD7k.
    NOTE:

    For details about how to query the IP address of the Deploy node, see Querying the IP Address of a Node. For details about how to determine active and standby nodes, see Determining the Active and Standby Nodes of the Deployment System.

  2. Run the following command to switch to the ossadm user:

    su - ossadm

    The default password is Changeme_123.

  3. Run the following command to query the database instance replication status:

    /opt/oss/manager/apps/DBAgent/bin/dbsvc_adm -cmd query-db-instance -type zenith

    Information similar to the following is displayed:

    DBInstanceId                ClassId  InstNumber             Tenant   AzName         IP             Port   State  DBType  Version                Role    Rpl Status      MasterID  GuardMode  DataCheckSum  isSSL  RplPort
    admindbsvr-6-1@7-1          primary  admindbsvr-6-1         Product  mo-global-1-a  192.168.33.40  32081  Up     zenith  --                     Slave   Abnormal        --        --         --            on     26951  
    admindbsvr-6-1@7-1          primary  admindbsvr-7-1         Product  mo-global-1-a  192.168.33.41  32081  Up     zenith  V300R001C00SPC100B209  Master  Normal          --        --         1654766847    on     26951  
    backupdbsvr-6-0@7-0         primary  backupdbsvr-6-0        Product  mo-global-1-a  192.168.33.40  32080  Up     zenith  --                     Slave   Abnormal        --        --         --            on     26950  
    NOTE:

    The command output varies depending on the version of the database service. Pay attention only to the value of Rpl Status.

    • Normal indicates that the replication status is normal.
    • Abnormal indicates that the replication status is abnormal.

    If Abnormal is displayed in the command output, locate the fault by referring to Possible Causes and rectify the fault by following the instructions provided in Troubleshooting.

Possible Causes
  • The server network is disconnected.
  • The replication data of the slave database is incorrect.
  • The expected role is inconsistent with the actual role.
  • Data (GTID) conflict occurs after a failover.
  • Replication is interrupted because the binlog is deleted.
  • Data conflict occurs because a write operation is manually performed on the standby database.
Troubleshooting

The database management service has the self-recovery capability. Generally, wait about 20 minutes and query the database instance and database replication status again. If the database instance and database replication status are normal, the fault is rectified. If the database instance and database replication status are abnormal after 30 minutes, rectify the fault. For details, see Slave Database Instance Is Abnormal.

Abnormal Master Database Instance Due to Abnormal VM Power-On or Power-Off

Symptom

On ManageOne Deployment Portal, choose Application > Service Management > System Monitoring from the main menu. On the Relational Databases tab page, the Role of a database instance is Slave, its Status is Running but its Replication Status is Abnormal.

Possible Causes

When the master database instance is abnormal, the switchover is automatically performed. To ensure system stability, the master and slave database instances can be switched over only once within 30 minutes. If the master and slave database instances are powered off and then powered on for multiple times in 30 minutes, the master database instance may be abnormal.

Troubleshooting

After the VM power supply is stable, wait for 30 minutes and check whether the master and slave instances are recovered.

  • If the Status of the database instances is Running, and the Replication Status is Normal, the fault is rectified.
  • Otherwise, rectify the fault by following the instructions provided in Slave Database Instance Is Abnormal.
Translation
Download
Updated: 2019-06-01

Document ID: EDOC1100062375

Views: 1263

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next