No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionCloud 6.3.1.1 Troubleshooting Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Database Faults

Database Faults

This section describes common faults of MySQL databases and how to restore the MySQL databases when the databases are deployed in active/standby mode.

Common Database Exceptions

Common database exceptions include:

  • The node where the master database instance resides is normal, but the replication status of the node where the slave database instance resides is abnormal.
  • The master and slave database instances are both abnormal. The physical files of master database instance are lost or damaged.
  • The master database instance is normal and the slave database instance is abnormal. The physical files of the slave database instance are lost or damaged.

Operations Before Troubleshooting

Precautions

If you have located the cause of the database fault before troubleshooting, skip this section. If you need to analyze the cause of the fault, perform the following operations to collect the required information.

Procedure

Database instance test101-0-999 is used as an example in the following procedure:

  1. Use PuTTY to log in to the node where the abnormal database instance resides using the IP address of that node.

    NOTE:

    You can log in to the ManageOne deployment plane and choose Deployment > Database > RDBMS to view the IP address of the node where the abnormal database instance resides.

    Default account: sopuser; default password: D4I$awOD7k

  2. Run the following command to switch to the root user:

    su - root

    The default password is Changeme_123.

  3. Run the following commands to back up the log and configuration files of the abnormal database instance:

    cp /opt/mysql/data/test101-0-999/mysql.err /opt/mysql/data/test101-0-999/mysql.err.bak

    cp /opt/mysql/data/test101-0-999/auto.cnf /opt/mysql/data/test101-0-999/auto.cnf.bak

    cp /opt/mysql/data/test101-0-999/my.cnf /opt/mysql/data/test101-0-999/my.cnf.bak

  4. Run the following command to obtain the process ID of the abnormal database instance:

    ps -ef | grep test101-0-999

    Information similar to the following is displayed:

    root      18712   2691  0 18:26 pts/4    00:00:00 grep test101-0-999
    dbuser   107956      1  0 Jan05 ?        00:00:00 /bin/sh /opt/mysql/bin/mysqld_safe --defaults-file=/opt/mysql/data/test101-0-999/my.cnf --safe-user-create --skip-symbolic-links
    dbuser   109227 107956  0 Jan05 ?        00:04:58 /opt/mysql/bin/mysqld --defaults-file=/opt/mysql/data/test101-0-999/my.cnf --basedir=/opt/mysql --datadir=/opt/mysql/data/test101-0-999 --plugin-dir=/opt/mysql/lib/plugin --safe-user-create --skip-symbolic-links --log-error=/opt/mysql/data/test101-0-999/mysql.err --pid-file=/opt/mysql/data/test101-0-999/test101-0-999.pid --socket=/opt/mysql/data/test101-0-999/test101-0-999.sock --port=32081

  5. Run the following commands to save the instance startup time to the specified file:

    ps -p 107956 -o lstart > test101-0-999_mysqld_safe_time

    ps -p 109227 -o lstart > test101-0-999_mysqld_time

    NOTE:
    • 107956 is the process ID of database instance mysqld_safe in 4.
    • 109227 is the process ID of database instance mysqld in 4.

  6. Run the following command to log out of the root user:

    exit

Slave Database Instance Is Abnormal

Exceptions
Symptom
  • The master database instance is normal, but the replication status of the slave database instance is abnormal.
  • The master database instance is normal, but the physical file of the abnormal slave database instance is missing or damaged.
Troubleshooting

The master database instance is normal, but the replication status of the slave database instance is abnormal.

  1. Use a browser to log in to the ManageOne deployment plane.

    URL: https://Floating IP address of the deployment plane:31943, for example, https://192.168.0.1:31943

    Default account: admin; default password: Huawei12#$

  2. Choose Deployment > Database > RDBMS from the main menu.
  3. View the Duplication Status column. If is displayed, the replication status of the slave database instance is abnormal.

The master database instance is normal, but the physical file of the abnormal slave database instance is missing or damaged.

  1. Use PuTTY to log in to the node where the master database instance resides using the IP address of that node.

    NOTE:

    You can log in to the ManageOne deployment plane and choose Deployment > Database > RDBMS to view the IP address of the node where the master database instance resides.

    Default account: sopuser; default password: D4I$awOD7k

  2. Run the following command to switch to the dbuser user:

    su - dbuser

    The default password is Y7xohbheY!.

  3. Run the following command to connect to the node where the master database instance resides:

    /opt/mysql/bin/mysql -hIP address of the node where the master database instance resides -udbuser -p -PPort number of the node where the master database instance resides

    Enter password:     
    NOTE:

    During the operation, the default password is Admin@123.

    Log in to the ManageOne deployment plane and choose Deployment > Database Management > Relationship Database to check the port number of the node where the master database instance resides.

  4. Run the following command to check the status of the master database instance:

    show slave status\G;

    If the following information is displayed, the master database instance is normal:

    Empty set (0.03 sec)

  5. Use PuTTY to log in to the node where the slave database instance resides using the IP address of that node.

    NOTE:

    You can log in to the ManageOne deployment plane and choose Deployment > Database > RDBMS to view the IP address of the node where the slave database instance resides.

    Default account: sopuser; default password: D4I$awOD7k

  6. Run the following command to switch to the dbuser user:

    su - dbuser

    The default password is Y7xohbheY!.

  7. Run the following command to connect to the node where the slave database instance resides:

    /opt/mysql/bin/mysql -hIP address of the node where the slave database instance resides -udbuser -p -PPort number of the node where the slave database instance resides
    Enter password:   
    NOTE:

    During the operation, the default password is Admin@123.

    Log in to the ManageOne deployment plane and choose Deployment > Database Management > Relationship Database to check the port number of the node where the slave database instance resides.

    If the following message is displayed, the slave database instance is down:

    ERROR 2003 (HY000): Can't connect to MySQL server

  8. Run the following command to exit MySQL:

    exit;

  9. Run the following command to restart the instance:

    sudo su - ossadm

    [sudo]password for root:
    NOTE:

    The default password is Changeme_123.

    cd /opt/oss/manager/apps/DeployAgent-version/container/mysql/bin

    ./start_mysql. sh /opt/mysql/data/Instance ID

    • If the following information is displayed, the slave database instance is normal:
    Start <Instance ID>...
    Start <Instance ID>...done
    • If other information is displayed, the slave database instance is damaged and needs to be recovered.
    NOTE:

    version: indicates the version of DeployAgent.

    You can query the version of DeployAgent as follows:

    1. Use a browser to log in to the ManageOne deployment plane.

      URL: https://Floating IP address of the deployment plane:31943, for example, https://192.168.0.1:31943

      Default username: admin; default password: Huawei12#$

    2. Choose Deployment > Microservice Deployment > Packages.

      Check the DeployAgent version in the Latest Version column.

Disabling Database Failover
Context

The failover function needs to be disabled before troubleshooting to prevent unexpected database failovers. After this function is disabled, the master/slave switchover does not take place.

Procedure

Perform following operations on the node where DBHASwitchService reside of the data zone. If active and standby DBHASwitchService nodes exist in this region, perform the following steps on either of the nodes.

  1. Use PuTTY and the IP address of the node where DBHASwitchService is deployed for login.

    NOTE:

    You can perform the following operations to query the node where DBHASwitchService is deployed:

    1. Use a browser to log in to the ManageOne deployment plane.

      URL: https://Floating IP address of the deployment plane:31943, for example, https://192.168.0.1:31943

      Default username: admin; Default password: Huawei12#$.

    2. Choose Deployment > Environments, enter DBHASwitchService in the search box on the right, and click .
    3. Click the displayed service and view the node where DBHASwitchService is deployed in Node List.

    Default account: sopuser; default password: D4I$awOD7k

  2. Run the following command to switch to the ossadm user:

    su - ossadm

    The default password is ZJE%JLq5qx.

  3. Run the following command to go to the directory that contains the master/slave failover tool:

    cd /opt/oss/manager/apps/DBHASwitchService/bin

  4. Run the following command to disable the failover function between the master and slave database nodes:

    ./switchtool.sh -cmd set-ignore-nodes -nodes IDs of the master and slave database nodes

    NOTE:

    Log in to the ManageOne deployment plane and choose Resource > Server to check the IDs of the master and slave database nodes.

    For example: ./switchtool.sh -cmd set-ignore-nodes -nodes 4,5

    The failover function is successfully disabled if the following information is displayed:

    Successful

Handling Abnormal Slave Database Instances
Procedure

Resetting the master database instance

  1. Use PuTTY to log in to the node where the master database instance resides using the IP address of that node.

    NOTE:

    You can log in to the ManageOne deployment plane and choose Deployment > Database > RDBMS to view the IP address of the node where the master database instance resides.

    Default account: sopuser; default password: D4I$awOD7k

  2. Run the following command to switch to the dbuser user:

    su - dbuser

    The default password is Y7xohbheY!.

  3. Run the following command to connect to the node where the master database instance resides:

    /opt/mysql/bin/mysql -udbuser -p -P32080 -h10.145.93.121
    Enter password:
    NOTE:
    • -u: user of the database instance.
    • -P: port number of the abnormal master database instance.
    • -h: IP address of the abnormal master database instance.
    • In the preceding command, 32080 and 10.145.93.121 are the port number and IP address of the master database node. Change them based on the actual requirements.

    During the operation, the default password is Admin@123.

  4. Run the following command to reset the master database instance:

    reset master;

    If the following information is displayed, the database software is successfully installed:

    Query OK, 0 rows affected (0.00 sec)

  1. Run the following command to exit MySQL:

    exit;

  2. Run the following command to view the backup space size:

    df -H

    The system displays the following information:
    Filesystem         Size  Used Avail Use% Mounted on 
    /dev/xvda3          18G   13G  4.7G  73% / 
    devtmpfs           34G  144k   34G   1% /dev 
    tmpfs              34G     0   34G   0% /dev/shm 
    /dev/xvda1         1.1G   42M  957M   5% /boot 
    ...
    /dev/mapper/oss_vg-opt_vol      53G   27G   24G  54% /opt 
    ...

    Remaining space size of the partition where the local or remote backup directory is located: indicated by the value of Avail corresponding to Mounted on whose value is /opt.

  3. Run the following command to view the backup file size:

    ls -lh /opt/mysql/data/Instance ID

    The system displays the following information:

    total 6.1G
    ...
    -rw-r-----. 1 dbuser dbgroup 250M Jun 27 18:46 ib_logfile0
    -rw-r-----. 1 dbuser dbgroup 250M Jun 15 19:14 ib_logfile1
    -rw-r-----. 1 dbuser dbgroup 250M Jun 27 18:46 ib_logfile2
    ...
    -rw-------. 1 dbuser dbgroup 802M Jun 12 17:20 mysql-bin.000001
    -rw-------. 1 dbuser dbgroup 802M Jun 13 12:07 mysql-bin.000002
    -rw-------. 1 dbuser dbgroup 344M Jun 13 17:36 mysql-bin.000003
    -rw-------. 1 dbuser dbgroup  18M Jun 14 14:22 mysql-bin.000004
    -rw-------. 1 dbuser dbgroup 521K Jun 14 14:38 mysql-bin.000005
    -rw-------. 1 dbuser dbgroup 357M Jun 19 14:46 mysql-bin.000006 
    -rw-------. 1 dbuser dbgroup  114 Jun 14 14:42 mysql_bin.index
    ...
    -rw-------. 1 dbuser dbgroup  198 Jun 14 14:42 relay_bin.000001
    -rw-------. 1 dbuser dbgroup 531M Jun 21 11:25 relay_bin.000002
    -rw-------. 1 dbuser dbgroup  285 Jun 21 11:25 relay_bin.000003
    -rw-------. 1 dbuser dbgroup  22K Jun 21 11:26 relay_bin.000004
    -rw-------. 1 dbuser dbgroup  285 Jun 21 11:26 relay_bin.000005
    -rw-------. 1 dbuser dbgroup  50K Jun 21 11:29 relay_bin.000006
    -rw-------. 1 dbuser dbgroup  285 Jun 21 11:29 relay_bin.000007
    -rw-------. 1 dbuser dbgroup 558K Jun 21 11:52 relay_bin.000008
    -rw-------. 1 dbuser dbgroup  285 Jun 21 11:52 relay_bin.000009
    -rw-------. 1 dbuser dbgroup 252M Jun 23 01:00 relay_bin.000010
    -rw-------. 1 dbuser dbgroup 1.5K Jun 27 01:00 relay_bin.index
    ...

    Calculate the space occupied by backup files by subtracting the total size of the .index files from the total value.

    NOTE:
    • The previous information is for reference only.
    • ib_logfile: indicates a log file. mysql-bin: indicates a database file. relay_bin indicates a recycle bin file.
    • total indicates the total size of the backup space.

  4. Compare the available backup space obtained in 6 and the space occupied by the backup files in 7.

    • If the available backup space obtained in 6 is greater than or equal to the space occupied by the backup files in 7, perform 9 through 10.
    • If the available backup space obtained in 6 is less than the space occupied by the backup files in 7, go to the directory where the files to be deleted are stored, run the rm /File name command to clear the space, and then perform 9 through 10.
    NOTE:

    Replace the file directory and file name based on the site requirements.

  5. Run the following command to switch to the ossadm user:

    sudo su - ossadm

    [sudo]password for root:

  6. Run the following commands to physically back up the configuration files:

    cd /opt/oss/manager/apps/DBAgent/bin

    /opt/oss/manager/bin/engr_profile.sh

    ./dbsvc_tool -cmd backup-db-instance -instid tenantdbsvr-4-38@5-38 -method physical
    NOTE:
    • If the slave database instance or the replication relationship between the master and slave database instances is abnormal, the data of the master database instance needs to be backed up.
    • In the current backup scenario, the backup data is saved in the /opt/pub/backup_local directory of the master database instance.
    • tenantdbsvr-4-38@5-38 indicates the instance name. Replace it with the name of the instance to be backed up.

    The following information is displayed:

    Check database status, and try login. Login database ossdbsvr-0-999 success. Check database status finish. CheckDB success, instid:tenantdbsvr-4-38@5-38, method:physical Begin backup database. Backup success, instid:tenantdbsvr-4-38@5-38, method:physical Compress backup file to tenantdbsvr-4-38@5-38_abatest-0-999_20170303105515_manual_full_day_physical.tar.gz... Compress success, instid:tenantdbsvr-4-38@5-38, method:physical Sign backup files... Signing file for instanceUnit sign file is tenantdbsvr-4-38@5-38_tenantdbsvr-4-38_20170303105515_manual_full_day_physical.tar.gzsigning file error code is 0 . SignFiles success, instid:tenantdbsvr-4-38@5-38, method:physical SaveBackupFiles success, instid:tenantdbsvr-4-38@5-38, method:physical Count backup files number. Count backup files number. CountBackup success, instid:tenantdbsvr-4-38@5-38, method:physical Check backup file permission. CheckPermission success, instid:tenantdbsvr-4-38@5-38, method:physical CleanTempFiles success, instid:tenantdbsvr-4-38@5-38, method:physical

Copying backup files to the slave database node

  1. Use PuTTY to log in to the node where the slave database instance resides using the IP address of that node.

    NOTE:

    You can log in to the ManageOne deployment plane and choose Deployment > Database > RDBMS to view the IP address of the node where the slave database instance resides.

    Default account: sopuser; default password: D4I$awOD7k

  2. Run the following command to switch to the dbuser user:

    su - dbuser

    The default password is Y7xohbheY!.

  3. Run the following command to copy the backup file and the signature file on the master database node to the /opt/pub/backup_local directory of the slave database node:

    scp -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null root@<IP address of the master database node>:/opt/pub/backup_local/tenantdbsvr-4-38@5-38_tenantdbsvr-4-38_20150709115845_manual_full_day_physical.tar.gz* /opt/pub/backup_local

    Warning: Permanently added 'IP address of the master database node' (ECDSA) to the list of known hosts.
    Authorized users only. All activities may be monitored and reported.
    root@IP address of the master database node's password:
    tenantdbsvr-4-38@5-38_tenantdbsvr-4-38_20150709115845_manual_full_day_physical.tar.gz                         100%  104MB  34.6MB/s   00:03
    tenantdbsvr-4-38@5-38_tenantdbsvr-4-38_20150709115845_manual_full_day_physical.tar.gz.sign                    100%  695     0.7KB/s   00:00
    NOTE:
    • The default password is Changeme_123.
    • If security hardening performed on the system causes the scp command to become unavailable, copy the files according to the actual situation.

  4. Run the following commands to change the permission on the backup file to 600 and the owner and owner group to dbuser and dbgroup, respectively:

    chown dbuser:dbgroup /opt/pub/backup_local/tenantdbsvr-4-38@5-38_tenantdbsvr-4-38_20150709115845_manual_full_day_physical.tar.gz*

    chmod 600 /opt/pub/backup_local/tenantdbsvr-4-38@5-38_tenantdbsvr-4-38_20150709115845_manual_full_day_physical.tar.gz*

Rebuilding the slave database instance

  1. Run the following command to go to the opt/mysql/data/ directory:

    cd /opt/mysql/data/

  2. Go to the opt/mysql/data/ directory to view the name of the target database instance. If the instance name does not exist, perform 12 through 18 to create a database instance.
  3. Run the following command to switch to the ossadm user:

    sudo su - ossadm

    [sudo]password for root:

  4. Run the following command to go to the directory that contains the dbsvc_tool tool:

    cd /opt/oss/manager/apps/DBAgent/bin/

  5. Run the following commands to rebuild the slave database instance:

    . /opt/oss/manager/bin/engr_profile.sh

    ./dbsvc_tool -cmd repair-db-instance -method rebuild -instid tenantdbsvr-4-38@5-38 -newmaster tenantdbsvr-4-38 -masterfile /opt/pub/backup_local/tenantdbsvr-4-38@5-38_tenantdbsvr-4-38_20150709115845_manual_full_day_physical.tar.gz

    NOTE:

    Related parameters are as follows:

    • -method: indicates the method of backing up the database.
    • -instid: indicates the database instance name.
    • -newmaster: indicates the ID of the master instance.
    • -masterfile: indicates the data backup file of the master database.

    The commands are successfully executed if the following information is displayed:

    [2016-11-17 17:13:16] [35502] 1 Uncompress /opt/mysql/data/tenantdbsvr-4-38_tenantdbsvr-4-38_20150709115845_manual_full_day_physical_full.tar.gz to Temp dir ...
    [2016-11-17 17:13:16] [35502] 2 Stopping local db instance ossdbsvr-104-0 ...
    [2016-11-17 17:13:16] [35502] 3 Rebuilding local db ...
    [2016-11-17 17:13:16] [35502] 4 Restarting local db ...
    [2016-11-17 17:13:16] [35502] 5 Repairing GTID ...
    [2016-11-17 17:13:16] [35502] 6 Setting up replication ... 
    [2016-11-17 17:13:16] [35502] Rebuld local db Successful!

  1. Run the following command to switch to the dbuser user:

    sudo su - dbuser

    [sudo]password for root:
    NOTE:

    The default password is Changeme_123.

  2. Run the following command to delete the redundant backup files:

    rm /opt/pub/backup_local/tenantdbsvr-4-38@5-38_tenantdbsvr-4-38_20150709115845_manual_full_day_physical.tar.gz*

Enabling Database Failover
Context

After the database fault is rectified, the database failover function must be enabled to ensure the database is highly reliable.

Procedure
  1. Use PuTTY and the IP address of the node where DBHASwitchService is deployed for login.

    NOTE:

    You can perform the following operations to query the node where DBHASwitchService is deployed:

    1. Use a browser to log in to the ManageOne deployment plane.

      URL: https://Floating IP address of the deployment plane:31943, for example, https://192.168.0.1:31943

      Default username: admin; Default password: Huawei12#$.

    2. Choose Deployment > Environments, enter DBHASwitchService in the search box on the right, and click .
    3. Click the displayed service and view the node where DBHASwitchService is deployed in Node List.

    Default account: sopuser; default password: D4I$awOD7k

  2. Run the following commands to switch to the ossadm user and go to the directory that contains the master/slave failover tool:

    su - ossadm

    The default password is ZJE%JLq5qx.

    cd /opt/oss/manager/apps/DBHASwitchService/bin

  3. Run the following command to enable the failover function:

    ./switchtool.sh -cmd del-ignore-nodes

    If the following information is displayed, the command is successfully executed:

    Successful.

  4. Run the following command to clear the last failover time:

    ./switchtool.sh -cmd del-failover-time -instid tenantdbsvr-4-38@5-38

    If the following information is displayed, the command is successfully executed:

    Successful.
    NOTE:

    tenantdbsvr-4-38@5-38 is the database instance name. Replace it as required.

  5. View the replication status of the master and slave database instances.

    1. Verify that the replication status of the master and slave database instances is normal. For details, see 1.
    2. Use PuTTY to log in to the active node of the ManageOne deployment plane as the ossadm user. For details about how to determine the active node, see Determine the Active and Standby Nodes of the Deployment System.

      The default password is D4I$awOD7k.

    3. Run the following command to switch to the ossadm user:

      su - ossadm

      The default password is ZJE%JLq5qx.

    4. Run the following command to view the replication status:

      bash /opt/oss/manager/apps/DataMgmtService/bin/dbsvc_adm -cmd query-db-instance

      Information similar to the following is displayed:

      DBInstanceId  Service Name NodeId IP Port DBType RoleRpl Status
      tenantdbsvr-4-38@5-38 tenantdbsvr-4-38 4 10.145.93.121 32080 mysql Master Normal
      tenantdbsvr-4-38@5-38 tenantdbsvr-5-38 5 10.145.93.122 32080 mysql Slave Normal
      NOTE:
      • The command output may vary depending on actual conditions.
      • The value of Rpl Status in the command output can be:
        • Normal: indicates that the replication status is normal.
        • Abnormal: indicates that the replication status is abnormal.

Rebuilding the Slave Database Instance in One-Click Mode
Context

The slave database instance can be rebuilt in one-click mode by running a few commands, greatly simplifying the operations in section Handling Abnormal Slave Database Instances.

One-click slave database instance rebuilding can be used to rebuild only the slave database instance that encounters the replication status exception with error code 210, 211, 212, or 213 returned.

Prerequisites

This rebuilding method only applies to remote backup policies.

Procedure
  1. Disable the failover function. For details, see Disabling Database Failover.
  2. Run the following command to go to the installation directory of the dbsvc_adm tool:

    cd /opt/oss/manager/apps/DataMgmtService/bin

  3. Use PuTTY to log in to the node where the master database instance resides using the IP address of that node.

    NOTE:

    You can log in to the ManageOne deployment plane and choose Deployment > Database > RDBMS to view the IP address of the node where the master database instance resides.

    Default account: ossadm; default password: ZJE%JLq5qx

  4. Run the following commands to rebuild the slave database instance:

    ./dbsvc_adm -cmd repair-db-instance -instid tenantdbsvr-4-38@5-38 -slave tenantdbsvr-5-38 -name remotepolicy

    The following information is displayed:

    Beginning repair db instance task.

  5. Log in to the ManageOne deployment plane and choose Deployment > Microservice Deployment > Tasks. On the Tasks page, check the status of the Manual Repair task.
  6. After the database fault is rectified, enable the failover function. For details, see Enabling Database Failover.

Master and Slave Database Instances Are Abnormal

Exceptions
Symptom

The physical files of the master and slave database instances are missing or damaged.

Troubleshooting
  1. Use PuTTY to log in to the node where the master database instance resides using the IP address of that node.

    NOTE:

    You can log in to the ManageOne deployment plane and choose Deployment > Database > RDBMS to view the IP address of the node where the master database instance resides.

    Default account: sopuser; default password: D4I$awOD7k

  2. Run the following command to switch to the dbuser user:

    su - dbuser

    The default password is Y7xohbheY!.

  3. Run the following command to connect to the node where the master database instance resides:

    /opt/mysql/bin/mysql -hIP address of the node where the master database instance resides -udbuser -p -PPort number of the node where the master database instance resides
    Enter password:
    NOTE:
    • The default password is Admin@123.
    • Log in to the ManageOne deployment plane and choose Deployment > Database Management > Relationship Database to check the port number of the node where the master database instance resides.

    If the following message is displayed, the slave database instance is down:

    ERROR 2003 (HY000): Can't connect to MySQL server on 'IP address of the master database node'(111)

  4. Run the following command to restart the instance:

    sudo su - ossadm

    [sudo]password for root:
    NOTE:

    The default password is Changeme_123.

    cd /opt/oss/manager/apps/DeployAgent-version/container/mysql/bin

    ./start_mysql. sh /opt/mysql/data/Instance ID

    • If the following information is displayed, the slave database instance is normal:
    Start <Instance ID>...
    Start <Instance ID>...done
    • If other information is displayed, the slave database instance is damaged and needs to be recovered.
    NOTE:

    version: indicates the version of DeployAgent.

    You can query the version of DeployAgent as follows:

    1. Use a browser to log in to the ManageOne deployment plane.

      URL: https://Floating IP address of the deployment plane:31943, for example, https://192.168.0.1:31943

      Default username: admin; default password: Huawei12#$

    2. Choose Deployment > Microservice Deployment > Packages.

      Check the DeployAgent version in the Latest Version column.

  5. On the node where the slave database instance is located, perform 3 and 4 to check whether the instance is damaged.
Disabling Database Failover

For details, see Disabling Database Failover.

Handling Abnormal Database Nodes
Handling the Abnormal Master Database Node

Installing the database software

  1. Use PuTTY to log in to the node where the master database instance resides using the IP address of that node.

    NOTE:

    You can log in to the ManageOne deployment plane and choose Deployment > Database > RDBMS to view the IP address of the node where the master database instance resides.

    Default account: sopuser; default password: D4I$awOD7k

  2. Run the following command to switch to the ossadm user:

    su - ossadm

    The default password is ZJE%JLq5qx.

  3. Run the following command to delete the flag file:

    rm -rf /opt/oss/manager/var/agent/*dcprocess.flag

  4. Run the following command to switch to the dbuser user:

    sudo su - dbuser

    The default password is Changeme_123.

  5. Run the following commands to view the residual database processes:

    ps -ef|grep /opt/mysql/data/ocdbsvr-2-0|grep -v grep

    • If the following information is displayed, run the following command to kill the residual database processes:

    kill -9 22563 16921

    The respective IDs of the MySQL service process and daemon process are 22563 and 16921.

    dbuser   16921     1  0 02:46 ?        00:00:00 /bin/sh /opt/mysql/bin/mysqld_safe --defaults-file=/opt/mysql/data/ocdbsvr-2-0/my.cnf --safe-user-create --skip-symbolic-links
    dbuser   22563 16921  1 02:46 ?        00:00:03 /opt/mysql/bin/mysqld --defaults-file=/opt/mysql/data/ocdbsvr-2-0/my.cnf --basedir=/opt/mysql --datadir=/opt/mysql/data/ocdbsvr-2-0--plugin-dir=/opt/mysql/lib/plugin --safe-user-create --skip-symbolic-links --log-error=/opt/mysql/data/ocdbsvr-2-0/mysql.err --pid-file=/opt/mysql/data/ocdbsvr-2-0/ocdbsvr-2-0.pid --socket=/opt/mysql/data/ocdbsvr-2-0/ocdbsvr-2-0.sock --port=32080
    • If no command output is displayed, go to 6.
    NOTE:
    • If the /opt/mysql directory is lost or damaged, perform this operation for all instances on the node.
    • If only the directory of an instance under /opt/mysql/data is lost or damaged, you need to delete the remaining processes which damage the instance.

  6. Run the following command to switch to the root user:

    sudo su - root

    The default password is Changeme_123.

  7. Run the following commands to go to the mysql directory and view the directory structure:

    cd /opt/mysql

    ll

    dr-x------  2 dbuser dbgroup 4096 Jan  7 09:48 bin
    drwx------  9 dbuser dbgroup 4096 Dec 27 07:10 data
    dr-x------  3 dbuser dbgroup 4096 Dec 20 03:20 lib
    -r--------  1 dbuser dbgroup 2729 Dec 20 03:20 LICENSE.mysql
    dr-x------  4 dbuser dbgroup 4096 Oct 27 11:47 man
    -rw-------  1 dbuser dbgroup 2270 Dec 20 03:20 my_product.cnf
    -rw-------  1 dbuser dbgroup 2270 Jan  7 09:49 my_product_oc_sc.cnf
    drwxr-x---  2 dbuser dbgroup 4096 Oct 27 11:47 package
    -r--------  1 dbuser dbgroup 1449 Dec 20 03:20 README
    dr-x------  2 dbuser dbgroup 4096 Oct 27 11:47 scripts
    dr-x------ 28 dbuser dbgroup 4096 Oct 27 11:47 share
    dr-x------  2 dbuser dbgroup 4096 Oct 27 11:47 support-files
    drwx------  9 dbuser dbgroup 4096 Dec 27 07:11 trace
    • If the queried directory structure is inconsistent with the directory structure in the preceding command output, the /opt/mysql directory is lost or damaged. Run the following commands to delete the remaining files and then go to 8.

      mv /opt/mysql /opt/mysql_bck

    • If only the directory of an instance under /opt/mysql/data is lost or damaged, run the following commands to delete the remaining files, and no further operation is required.

      mkdir -p /opt/mysql_bck/data

      chown -R dbuser:dbgroup /opt/mysql_bck/*

      mv /opt/mysql/data/ocdbsvr-2-0 /opt/mysql_bck/data/

  8. Run the following command to switch to the ossadm user:

    su - ossadm

    The default password is ZJE%JLq5qx.

  9. Run the following command to switch to the directory that contains the dbsvc_adm tool:

    cd /opt/oss/manager/apps/DataMgmtService/bin

  10. View the version of MySQL.

    1. Use a browser to log in to the ManageOne deployment plane.

      URL: https://Floating IP address of the deployment plane:31943, for example, https://192.168.0.1:31943

      Default username: admin; default password: Huawei12#$

    2. Choose Deployment > Microservice Deployment > Packages.

      View the version of MySQL. MySQL 5.6.28.0 is used as an example.

  11. Run the following command to install the database software:

    ./dbsvc_adm -cmd install-pkg -type mysql -nodes 104 -pkgname MySQL-5.6.28.0

    NOTE:

    The parameter following nodes indicates the node ID of the server where the database is to be installed. Replace it with the actual node ID.

    If the following information is displayed, the command is successfully executed:

    Successful.

Recreating the data directory

  1. Run the following command to switch to the dbuser user:

    sudo su - dbuser

    The default password is Y7xohbheY!.

  2. Run the following command to create a data directory:

    mkdir /opt/mysql/data/ocdbsvr-2-0

  3. Run the following command to change the permission of all files under the data directory to 700:

    chmod 700 /opt/mysql/data/ocdbsvr-2-0

    If the auto.cnf, my.cnf or my_paramgroup.cnf file does not exist in the /opt/mysql/data/ocdbsvr-2-0 directory, perform 16 to 18.

  4. Run the following command to change the permission of all files under the data directory to 600:

    chmod 600 /opt/mysql/data/ocdbsvr-2-0/*

Generating a configuration file

NOTE:

If the auto.cnf or my.cnf file does not exist under the data directory, perform operations in this section to generate the my.cnf file. The auto.cnf file can be automatically generated. If the auto.cnf or my.cnf file exists, skip this step.

  1. Run the following command to copy the my.cnf file under the data directory of a normal database instance to the data directory of the master database instance:

    cp /opt/mysql/data/Database instance name/my.cnf /opt/mysql/data/Name of the master database instance

  2. Replace the values of the configuration items displayed in the following example in the my.cnf file with the information about the master database instance. The following is an example. Replace the values of the configuration items with the actual information.

    vi /opt/mysql/data/ocdbsvr-2-0/my.cnf
    [client]
    port = 32080
    socket = /opt/mysql/data/ocdbsvr-2-0/ocdbsvr-2-0.sock
    [mysqld]
    port = 32080
    socket = /opt/mysql/data/ocdbsvr-2-0/ocdbsvr-2-0.sock
    bind-address = 10.145.93.121
    log-error = /opt/mysql/data/ocdbsvr-2-0/mysql.err
    pid-file = /opt/mysql/data/ocdbsvr-2-0/ocdbsvr-2-0.pid
    datadir = /opt/mysql/data/ocdbsvr-2-0
    [mysqld]
    server-id = 1
    log-bin = /opt/mysql/data/ocdbsvr-2-0/mysql_bin
    log-bin-index = /opt/mysql/data/ocdbsvr-2-0/mysql_bin.index
    relay-log = /opt/mysql/data/ocdbsvr-2-0/relay_bin.index
    relay-log-index = /opt/mysql/data/ocdbsvr-2-0/relay_bin.index
    !include /opt/mysql/my_product.cnf
    NOTE:

    Configurations that need to be modified in the my.cnf file are as follows:

    • bind-address: Set it to the IP address of the current master database node.
    • port: Set it to the port number of the current master database instance.
    • Change the instance name in the file to the correct instance name.
    • server-id: The server-id of the master node can be set to any different positive integer. When a slave node database is created, the server-id of the slave node must be different from that of the master node.

  3. Run the following commands to change the file permission to 600 and change the file owner and owner group to dbuser:dbgroup:

    chmod 600 /opt/mysql/data/ocdbsvr-2-0/my.cnf

    chown dbuser:dbgroup /opt/mysql/data/ocdbsvr-2-0/my.cnf

Restoring the database

  1. Perform the physical restoration of the master database instance on the master database node by referring to "Restoring ManageOne Data" in FusionCloud 6.3.1.1 Backup and Restoration Guide.

Starting the database

  1. Run the following command to switch to the ossadm user:

    sudo su - ossadm

    The default password is ZJE%JLq5qx.

  2. Run the following command to import environment variables:

    . /opt/oss/manager/bin/engr_profile.sh

  3. Run the following command to start the master database instance:

    /opt/oss/manager/agent/container/mysql/bin/start_mysql.sh /opt/mysql/data/ocdbsvr-2-0

    If the following information is displayed, the master database instance is successfully started:

    ============================ Starting data container processes...
    Starting mysql process ocdbsvr-2-0 ... success
    ============================ Starting data container processes is complete.

Handling Abnormal Slave Database Instances

Installing the database software

  1. Install the database software on the slave database node by performing 1 to 11 in section Handling Abnormal Database Nodes.

Recreating the data directory

  1. Recreate the data directory on the slave database node by performing 12 to 15 in section Handling Abnormal Database Nodes.

Generating a configuration file

  1. If the my.cnf file does not exist, generate the configuration file of the slave database instance by performing 16 to 18 in section Handling Abnormal Database Nodes.

Rebuilding the slave database instance

  1. Rebuild the slave database instance by following the instructions provided in 15 to 21 in Handling Abnormal Slave Database Instances.
Enabling Database Failover

For details, see Enabling Database Failover.

Failure to Log In to the GUIs Due to Database Node Restart

Symptom

The database VM is restarted, the database becomes read-only, and all pages cannot be logged in to.

Procedure
  1. Use PuTTY to log in to the node where the master database instance resides using the IP address of that node.

    NOTE:

    You can log in to the ManageOne deployment plane and choose Deployment > Database > RDBMS to view the IP address of the node where the master database instance resides.

    Default account: sopuser; default password: D4I$awOD7k

  2. Run the following command to switch to the dbuser user:

    su - dbuser

    The default password is Y7xohbheY!.

  3. Run the following command to connect to the node where the master database instance resides:

    /opt/mysql/bin/mysql -udbuser -p Password -P 32080 -h IP address of the master database node -D keystone

    NOTE:
    • Password: Admin@123
    • Port number in the command indicates the port number of the instance.

  4. Run the following commands to set read_only to OFF:

    set global read_only=OFF

    stop slave;

  5. Run the following command to check whether read_only is set to OFF:

    show variables like 'read_only';

Database Instance Replication Status Is Abnormal

Symptom

The node where the master database instance resides is normal, but the replication status of the node where the slave database instance resides is abnormal.

Possible Causes
  • The server network is disconnected.
  • The replication data of the slave database is incorrect.
  • The expected role is inconsistent with the actual role.
  • Data (GTID) conflict occurs after a failover.
  • Replication is interrupted because the binlog is deleted.
  • Data conflict occurs because the write operation is manually performed on the standby database.
  • Network delay occurs.
  • The write operation workload on the master database is heavy.
  • The transactions per second (TPS) is high or big transactions exceed the replication processing capability of the slave database.
  • The hardware configuration of the slave database server is low.
Troubleshooting
  1. Use a browser to log in to the ManageOne deployment plane.

    URL: https://Floating IP address of the deployment plane:31943, for example, https://192.168.0.1:31943.

    Default account: admin; default password: Huawei12#$

  2. Choose Deployment > Database > RDBMS from the main menu.
  3. View the Replication Status column. Move the pointer over to view the error code. For details about the error code, see Table 8-1.
Procedure
NOTE:
  • Error codes smaller than 200 are shared by MySQL and Redis databases.
  • Error codes greater than or equal to 200 are MySQL replication exceptions. Errors with the code beginning with 21 can be rectified by rebuilding the MySQL slave database instance in one-click mode.
Table 8-1 Replication status error codes

Error Code

Description

Possible Causes

Troubleshooting

101

The database instance or the node where it resides is in the DOWN state.

  1. The database node is not started.
  2. The database instance is not started, or the disk space of the database node is used up.
  3. The network communication between the master and slave database nodes is abnormal.
  1. Based on the status (UP/DOWN) of the faulty instance, check whether the master and slave database nodes where the instance resides are started.
  2. Check whether the database instance is started. Query the startup logs of the database.
  3. Check whether the network communication between the master and slave nodes is abnormal.

102

The roles for both the master and slave database instances become master.

The nodes where the master and slave instances reside are manually set to ignore nodes.

Locate the cause for ignoring the nodes by referring to "switchtool.sh -cmd get-ignore-nodes" in FusionCloud 6.3.1.1 Command Reference. Then, run the switchtool.sh command to cancel the setting. For details, see "switchtool.sh -cmd del-ignore-nodes" in FusionCloud 6.3.1.1 Command Reference.

103

The roles for both the master and slave database instances become slave.

The nodes where the master and slave instances reside are manually set to ignore nodes.

Locate the cause for ignoring the nodes by referring to "switchtool.sh -cmd get-ignore-nodes" in FusionCloud 6.3.1.1 Command Reference. Then, run the switchtool.sh command to cancel the setting. For details, see "switchtool.sh -cmd del-ignore-nodes" in FusionCloud 6.3.1.1 Command Reference.

104

The roles for the master and slave database instances are inconsistent with those for the database instances of ZookeeperService.

The nodes where the master and slave instances reside are manually set to ignore nodes.

Locate the cause for ignoring the nodes by referring to "switchtool.sh -cmd get-ignore-nodes" in FusionCloud 6.3.1.1 Command Reference. Then, run the switchtool.sh command to cancel the setting. For details, see "switchtool.sh -cmd del-ignore-nodes" in FusionCloud 6.3.1.1 Command Reference.

Catchup

Database replication is delayed.

  1. Heavy write operations in the database in a short period cause replication delay.
  2. All data is being synchronized to the Redis instance.

Check whether the alarm is cleared after a few minutes. If the alarm persists, contact the DBA for troubleshooting.

  • If you run the show slave status command in the MySQL slave instance, the value of Seconds_Behind_Master is greater than 0 in the command output.
  • If you run the info command in the Redis slave instance, the value of aof_rewrite_in_progress, rdb_bgsave_in_progress, or loading equals 1 in the command output.

200

The network communication between the MySQL master and slave instances is abnormal.

The I/O communication between the master and slave instances is abnormal. The relevant I/O thread of the MySQL instance is abnormal (the status of the Slave_IO_Running process is NO).

  1. Check whether the master database instance is started and whether the disk space of the node where the master instance resides is used up. Alternatively, check whether the nodes where master and slave database instances reside can communicate with each other.
    If the fault occurs on the network, contact network administrators to restore the network connection. If the master instance is not started, rectify the fault based on the troubleshooting methods for error code 101. If the disk space of the node where the master instance resides is used up, free up some space, and then perform the following operations:
    1. Stop and then restart the master instance by referring to Stopping a Database and Starting the Database.
    2. Check whether the alarm is cleared after a few minutes. If the alarm persists, manually rebuild the slave database instance.
  2. Run the show slave status command on the slave instance to query the MySQL database error code and collect the error information.

201

The MySQL slave database instance synchronizes GTIDs from the master database instance with a delay.

  1. Heavy write operations in the database in a short period cause replication delay.
  2. 1000 or more GTIDs in the master database instance are not synchronized to the slave database instance.
  1. Check whether the alarm is cleared after a few minutes. If the alarm persists, contact technical support for assistance.
  2. Check the GTID difference between master and slave databases and contact technical support for assistance.

210

The SQL thread of the MySQL slave database instance is abnormal.

  1. The status of the Slave_SQL_Running process is NO.
  2. Write operations are performed improperly on the slave instance as the dbuser user.

211

MySQL in master/slave mode: The slave database instance contains GTIDs that are not included in the GTIDs of the master database instance.

Write operations are performed improperly on the slave instance as the dbuser user.

212

MySQL in master/master mode: The roles for both the master and slave database instances become master, and the slave database instance contains GTIDs that are not included in the GTIDs of the master database instance.

MySQL master/slave database failover occurred recently. Before the failover, some data is not synchronized to the slave instance. As a result, the slave database instance contains GTIDs that are not included in the GTIDs of the master database instance.

For details, see Disabling Database Failover, Handling Abnormal Slave Database Instances, and Enabling Database Failover.

213

MySQL in active/standby mode: After the master/slave database instance failover is complete, the slave one contains GTIDs that are not included in the master one.

The values of sync_binlog and innodb_flush_log_at_trx_commit in the MySQL configuration file my_product.cnf are not set to 1. Before the failover, data is synchronized to the slave database but is not written into the master database. As a result, data conflict occurs.

  1. The restoration method is as follows:
  2. It is recommended that both sync_binlog and innodb_flush_log_at_trx_commit be set to 1 in the MySQL configuration file my_product.cnf.

    sync_binlog=1;

    innodb_flush_log_at_trx_commit = 1

Redis Database Instances Are Abnormal

Context

Common database exceptions of Redis include:

  • The master and slave Redis database instances are both abnormal. The physical files of master database instance are lost or damaged.
  • The master Redis database instance is normal and the slave database instance is abnormal. The physical files of the slave database instance are lost or damaged.
  • If the sysmgrrdb instance of the Redis database is damaged, rectify the fault as a two-node cluster fault of the deployment system.
Precautions
  • The name of the database server must not be localhost.
  • Database nodes must have DBAgent deployed.
  • After the recovery, the Redis database must be redeployed.
Procedure
  1. Use PuTTY to log in to the node where the abnormal database instance resides using the IP address of that node.

    NOTE:

    You can log in to the ManageOne deployment plane and choose Deployment > Database > RDBMS to view the IP address of the node where the abnormal database instance resides.

    Default account: sopuser; default password: D4I$awOD7k

  2. Run the following command to switch to the root user:

    su - root

    The default password is Changeme_123.

  3. Run the following commands to check whether the file directory or database instance file exists:

    cd /opt or cd /opt/redis/data

    ll

    Information similar to the following is displayed:

    -rw-------  1 root   root      7168 Oct 27 10:56 aquota.group
    -rw-------  1 root   root      7168 Oct 27 10:56 aquota.user
    drwxr-xr-x  4 root   root      4096 Oct 27 10:56 log
    drwx------  2 root   root     16384 Oct 27 10:55 lost+found
    drwxr-x--- 11 dbuser dbgroup   4096 Jan  2 08:44 mysql
    drwxr-x---  9 ossadm ossgroup  4096 Nov  2 11:15 oss
    drwxr-x---  6 root   ossgroup  4096 Oct 27 11:45 pub
    drwxr-x---  8 dbuser dbgroup   4096 Oct 27 11:50 redis
    drwxr-x---  3 ossadm ossgroup  4096 Oct 27 11:18 share
    dr-x------  3 root   root      4096 Oct 27 15:19 sudobin2
    1. If the command output does not contain the Redis file, the Redis file directory does not exist.
    2. If the command output contains the Redis file, go to 4.

  4. Delete the damaged Redis instances.

    1. Use PuTTY to log in to the active node of the ManageOne deployment plane as the ossadm user. For details about how to determine the active node, see Determine the Active and Standby Nodes of the Deployment System.

      The default password is ZJE%JLq5qx.

    2. Run the following commands to delete the damaged Redis instances:

      cd /opt/oss/manager/apps/DataMgmtService/bin

      ./dbsvc_adm -cmd delete-db-instance -instid DBInstanceId

      NOTE:

      {DBInstanceId} indicates the instance ID of the abnormal Redis database instance.

    3. Run the ll command to check whether the db-instance instance has been deleted.

  5. Run the following command to reinstall the Redis software:

    ./dbsvc_adm -cmd install-pkg -type redis -nodes Node ID -pkgname Redis-version

    NOTE:
    • Node ID indicates the database node ID, for example, 3 or 4. The database node ID can be checked on the server ID on the server management page.
    • Only one or two node IDs are allowed. If there is one node ID, a single-node instance is created. If there are two node IDs separated by a comma, the first one is the master instance and the second one is the slave instance.
    • version: indicates the Redis version. To view the version, choose Deployment > Database > Redis from the main menu on the deployment plane.

  6. Configure the database deployment template.

    If a database has been configured for a service, configure the service and database name according to the following template.

    The database deployment template is as follows:

    {
        "type": "redis",
        "storage": "1024",
            "memory": "1024",
            "dbList": {
                   "dpacomputingr01db": {
                "appName": "DPAComputingService",
                            "dataSize": 256
            },
                    "dpastreamingr02db": {
                "appName": "DPAStreamingService",
                            "dataSize": 256
            }
        }
    }

    In the preceding command, dpastreamingr01db and dpastreamingr02db indicate the database name. Change them based on the site requirements.

    appName indicates the service name.

  7. Set a name to the template described in 6. The name dbdeploy_template_redis.json is used as an example here.

    You are advised to place the template in the /opt/pub/software directory and run the following command to modify the group:

    chown ossadm: ossgroup dbdeploy_template_redis.json

  8. Run the following command to create database instances:

    ./dbsvc_adm -cmd create-db-instance -nodes NodeID -tenant TenantName -stage StageName -file /opt/pub/software/dbdeploy_template_redis.json

    NOTE:
    • Set TenantName to one of following: SOP, Product, and manager.
    • StageName indicates the name of the created stage, which can be Alpha, Beta, Gamma, or Product.

  9. Run the following command to query the database instance:

    ./dbsvc_adm -cmd query-db-instance|grep DBInstanceId

    For example:

    ./dbsvc_adm -cmd query-db-instance|grep dbmgr_rdb-0-999@1-999

    The command output is supposed to be as follows:

    dbmgr_rdb-0-999@1-999          primary  dbmgr_rdb-0-999          cn-global-1  manager  Product  10.90.73.120  32090  Up     redis   3.0.7.7  Slave   Normal      dbmgr_rdb-1-999          --         --            
    dbmgr_rdb-0-999@1-999          primary  dbmgr_rdb-1-999          cn-global-1  manager  Product  10.90.73.121  32090  Up     redis   3.0.7.7  Master  Normal      --                       --         --            

Information Required for Troubleshooting Other Exceptions

If the method provided in this document cannot restore databases, collect the following information and contact technical support for assistance.

NOTE:

In this section, /var/log/oss is assumed as the root log directory.

  • Logs under the /var/log/oss/manager/DataMgmtService/ directory on the active Deploy node
  • Logs under the /var/log/oss/manager/DataMgmtService/ directory on the standby Deploy node
  • Logs under the following directories on the master database node:

    /var/log/oss/manager/DeployAgent/

    Database installation directory/data/Name of the master database/mysql.err

    Results obtained by running the following commands after the MySQL command-line tool accesses the database:

    show master status\G

    show slave status\G

  • Logs under the following directories on the slave database node:

    /var/log/oss/manager/DeployAgent/

    Database installation directory/data/Name of the slave database/mysql.err

    Results obtained by running the following commands after the MySQL command-line tool accesses the database:

    show master status\G

    show slave status\G

  • Information returned after the following commands are run:

    su - ossadm

    ./dbsvc_adm -cmd query-db-instance

  • Symptom description
Translation
Download
Updated: 2019-06-10

Document ID: EDOC1100063248

Views: 22570

Downloads: 37

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next