No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionCloud 6.3.1.1 Troubleshooting Guide 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Abnormal Replication State of Slave Database Instance

Abnormal Replication State of Slave Database Instance

Symptom

Fault Symptom

The master database instance is normal, but the replication status of the slave database instance is abnormal.

Precautions
  • Obtain the dbuser password for accessing the MySQL database in "Default OS User Information" of Maintenance Guide.
  • A password needs to be entered in interactive mode when running commands. (Directly entering a password has the password leakage risk.)
  • When the slave database instance is being restored, the Status of the slave database instance is on the RDBMS page of the FusionStage OM system.
Operation Description
  • Method 1 is preferred in the scenarios where the GUI is running properly and remote backup policies are configured.
  • Method 2 is preferred in the scenarios where only remote backup policies are configured.
  • For details about how to configure and bind remote backup policies to a database instance when method 1 or method 2 is needed for restoration, see MySQL Backup Guide in Backup and Restoration Guide.
Fault Locating
  1. Run the following command on the master database node to access the master database instance as the dbuser user:

    /opt/mysql/bin/mysql -h{IP address of the master database node} -udbuser -p -P {Port number of the master database instance}

    Enter password:     //Enter the password when prompted. 

  2. Run the following command to check the status of the master database instance:

    mysql> show slave status\G

    If the following command output is displayed, the master database instance is normal:

    Empty set (0.00 sec)

  3. Run the following command to access the slave database instance:

    /opt/mysql/bin/mysql -h{IP address of the slave database node} -udbuser -p -P {Port number of the slave database instance}

    Enter password:     //Enter the password when prompted. 

  4. Run the following command to check the status of the slave database instance:

    mysql> show slave status\G

    Information similar to the following is displayed:

    *************************** 1. row ***************************
    Slave_IO_State: Waiting for master to send event
    Master_Host: 10.137.60.229
    Master_User: rplUser
    Master_Port: 32084
    Connect_Retry: 60
    Master_Log_File: mysql-bin.000004
    Read_Master_Log_Pos: 318564325
    Relay_Log_File: relay_bin.000011
    Relay_Log_Pos: 318564535
    Relay_Master_Log_File: mysql-bin.000004
    Slave_IO_Running: No
    Slave_SQL_Running: Yes
    Replicate_Do_DB:
    Replicate_Ignore_DB:
    Replicate_Do_Table:
    Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
    Replicate_Wild_Ignore_Table:
    Last_Errno: 0
    Last_Error:
    Skip_Counter: 0
    Exec_Master_Log_Pos: 318564325
    Relay_Log_Space: 318564820
    Until_Condition: None
    Until_Log_File:
    Until_Log_Pos: 0
    Master_SSL_Allowed: No
    Master_SSL_CA_File:
    Master_SSL_CA_Path:
    Master_SSL_Cert:
    Master_SSL_Cipher:
    Master_SSL_Key:
    Seconds_Behind_Master: 0
    Master_SSL_Verify_Server_Cert: No
    Last_IO_Errno: 0
    Last_IO_Error:
    Last_SQL_Errno: 0
    Last_SQL_Error:
    Replicate_Ignore_Server_Ids:
    Master_Server_Id: 461
    Master_UUID: 46c2f330-dd27-11e5-afa0-286ed488c6f5
    Master_Info_File: /opt/mysql/data/csbCommondbsrv-462-25/master.info
    SQL_Delay: 0
    SQL_Remaining_Delay: NULL
    Slave_SQL_Running_State: Slave has read all relay log; waiting for the slave I/O thread to update it
    Master_Retry_Count: 86400
    Master_Bind:
    Last_IO_Error_Timestamp:
    Last_SQL_Error_Timestamp:
    Master_SSL_Crl:
    Master_SSL_Crlpath:
    Retrieved_Gtid_Set: 46c2f330-dd27-11e5-afa0-286ed488c6f5:1-2708554
    Executed_Gtid_Set: 46c2f330-dd27-11e5-afa0-286ed488c6f5:1-2708554
    Auto_Position: 1
    1 row in set (0.00 sec)

    In the preceding command output, the replication status of the slave database instance is abnormal if no value is displayed for Slave_IO_Running or Slave_SQL_Running, or if the value of either is No.

(Method 1) Re-creating the Slave Database Instance on the GUI

Background

The OM zone offers a graphical user interface (GUI) to re-create the slave database instance easily and quickly, greatly simplifying the operations in section (Method 3) Rebuild the Slave Database Instance.

Procedure
  1. Disable the failover function. For details, see section Disabling Database Failover.
  2. Log in to the OM zone, and choose More > Database > RDBMS.
  3. Check whether a remote backup policy exists. If no remote backup policy exists, create a remote backup policy. For details about how to create a backup policy, see "Configuring the Backup Policy" in OM Zone Operation Guide.
  4. Modify the backup policy of the database instance to the remote backup policy.

    1. Select the target database.
    2. Click Modify Backup Policy.
    3. In the Modify Backup Policy window, select the corresponding remote backup policy.
    4. Click OK.

  5. Select the slave database instance with the abnormal replication state, and click Operation > Manual Repair to repair the slave database instance.
  6. After the database fault is rectified, enable the failover function. For details, see section Enabling Database Failover.

(Method 2) Re-creating the Slave Database Instance in One-Click

Background

The slave database instance can be rebuilt in one-click mode by running a few commands, greatly simplifying the operations in section (Method 3) Rebuild the Slave Database Instance.

Procedure
  1. Disable the failover function. For details, see section Disabling Database Failover.
  2. Use PuTTY to log in to the om_core1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  3. Run the following command to query the node where the datamgmtservice service is deployed:

    kubectl get pod -nom -oyaml `kubectl get pod -nom | grep datamgmtservice | awk '{print $1}'` | grep hostIP

  4. Log in to the node queried in 3 as the paas user.
  5. Run the following command to go to the installation directory of the dbsvc_adm tool:

    cd /opt/paas/oss/manager/apps/DBAgent/bin

  6. Run the following command to re-create the slave database instance:

    ./dbsvc_adm -cmd repair-db-instance -instid ossdbsvr-10_90_73_178-21@10_90_73_179-21 -slave ossdbsvr-10_90_73_179-21 -name remotepolicy -force true

    The following command output is displayed:

    Beginning repair db instance task.
    NOTE:

    Check whether a remote backup policy exists. If no remote backup policy exists, create a remote backup policy. For details about how to create a backup policy, see "Configuring the Backup Policy" in OM Zone Operation Guide.

    remotepolicy indicates remote backup policy.

  7. Log in to the slave database node used for instance restoration as the paas user, and run the following command to check the restoration results:

    tail -f /var/log/paas/oss/manager/DeployAgent/oss.dbrepair.trace |grep success

    The following information is displayed:

    2017-05-28 17:32:21.232(28646|140520325105408)[common:118]Slave replication status is ok, dbInstanceId:ossdbsvr-10_90_73_178-21@10_90_73_179-21, targetDC:ossdbsvr-10_90_73_179-21
    2017-05-28 17:32:21.233(28646|140520325105408)[common:118]Finished waiting for rebuilding the replication relationship of the slave database instance, dbInstanceId:wcptestsvr05171134-10_8_41_65-25@10_8_41_66-25, targetDC:ossdbsvr-10_90_73_179-21
    2017-05-28 17:32:21.233(28646|140520325105408)[common:118]repairSlave success, dbInstanceId:ossdbsvr-10_90_73_178-21@10_90_73_179-21, targetDC:ossdbsvr-10_90_73_179-21
    2017-05-28 17:32:21.254(28646|140520325105408)[proc:370]runCommand result=0

    If the message "repairSlave success" is displayed, the restoration succeeds. If the restoration fails, contact technical support.

  8. After the database fault is rectified, enable the failover function. For details, see section Enabling Database Failover.

(Method 3) Rebuild the Slave Database Instance

The following configurations are used as examples in the troubleshooting procedure:

  • The database instance ID (DBInstanceId) of the master and slave database instances is apmdbsvr-10_90_73_178-21@10_90_73_179-21.
  • The name of the master database instance is apmdbsvr-10_90_73_178-21, the server IP address is 10.90.73.178, the port number is 32080, and the node ID is 10_90_73_178.
  • The name of the slave database instance is apmdbsvr-10_90_73_179-21, the server IP address is 10.90.73.179, the port number is 32080, and the node ID is 10_90_73_179.
Disabling Database Failover
Background

The failover function needs to be disabled before troubleshooting to prevent unexpected database failovers. After this function is disabled, the master/slave failover does not take place.

Procedure
  1. Use PuTTY to log in to the om_core1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command to obtain the name of the pod corresponding to DBHASwitchService:

    kubectl get pod -n om | grep dbhaswitch | grep Running | awk '{ print $1 }'

    If the following command output is displayed, the failover function is successfully disabled:

    dbhaswitchservice-3302270813-1n452
    dbhaswitchservice-3302270813-cp154

  3. Run the following command to enter the pod corresponding to DBHASwitchService:

    kubectl exec dbhaswitchservice-3302270813-1n452 -n om -it sh

    NOTE:

    dbhaswitchservice-3302270813-1n452 indicates the name obtained in 2. If there are multiple names, set pod in the preceding command to any one of them.

  4. Run the following commands to go to the installation directory of the database failover tool:

    cd /opt/apps/DBHASwitchService/bin

  5. Run the following command to disable the failover function:

    ./switchtool.sh -cmd set-ignore-nodes -nodes 10_90_73_178,10_90_73_179

    If the following command output is displayed, the failover function is successfully disabled:

    Successful
    NOTE:

    -nodes: IDs of abnormal master and slave database nodes.

Resetting the Master Database Instance
Background

The start position in the binary log file must be reset for the master database instance.

Procedure
  1. Use PuTTY to log in to the manage_db1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!

  2. Run the following command to switch to the dbuser user:

    su - root
    password:

    su - dbuser

  3. Run the following commands to access the master database instance and reset the master database instance:

    /opt/mysql/bin/mysql -u dbuser -p -P32080 -h 10.90.73.178

    Enter password: //Enter the password when prompted.

    mysql> reset master;

    If the following command output is displayed, the master database instance is successfully reset:

    Query OK, 0 rows affected (0.00 sec)
    NOTE:
    • A password needs to be entered in interactive mode when running commands. (Directly entering a password has the password leakage risk.)
    • -u: user of the database instance.
    • -P: port number of the abnormal master database instance.
    • -h: IP address of the abnormal master database instance.

Backing Up the Master Database Instance
Background

Data of the master database instance must be backed up for subsequent re-creation of the slave database instance.

Prerequisites

Then backup space size must not be smaller than the size of the database instance to be backed up. For details about how to check the backup space size, see "How Do I Check the Backup Space Size?" in Backup and Restoration Guide. To check the size of the database instance, perform the following steps:

  1. Use PuTTY to log in to the master database instance as the paas user.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Switch to the dbuser user.
    su - root
    password:

    su - dbuser

  3. Take the database instance apmdbsvr-10_90_73_178-21 for example. Run the following commands to check the size of the instance:

    cd /opt/mysql/data/apmdbsvr-10_90_73_178-21

    du -sch *

    Information similar to the following is displayed:

    524K alsconfigdb
    4.0K apmdbsvr-10_120_175_218-25.pid
    0    apmdbsvr-10_120_175_218-25.sock
    9.2M audit.log
    ……
    743M mysql-bin.000001
    ……
    1.9G

    The preceding output shows that the size of the database instance is 1.9 GB.

    NOTE:

    The MySQL binary logs are not backed up. Therefore, the actual size of the database instance is 1.9 GB minus the size of MySQL binary logs. Provide the backup space based on the instance size.

Procedure
  1. Use PuTTY to log in to the manage_db1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!

  2. Run the following commands to perform the physical backup:

    cd /opt/paas/oss/manager/apps/DBAgent/bin/

    ./dbsvc_tool -cmd backup-db-instance -instid apmdbsvr-10_90_73_178-21@10_90_73_179-21 -method physical
    NOTE:
    • If the slave database instance is in an abnormal state or the replication status is abnormal, the master database instance must be backed up.
    • In this case, the backup files are stored in /opt/pub/backup_local of the master database node.
    • apmdbsvr-10_90_73_178-21@10_90_73_179-21: instance name. Replace apmdbsvr-10_90_73_178-21@10_90_73_179-21 with the name of the instance to be backed up.

    Information similar to the following is displayed:

    Check database status, and try login. Login database ossdbsvr-0-999 success. Check database status finish. CheckDB success, instid:apmdbsvr-10_90_73_178-21@10_90_73_179-21, method:physical Begin backup database. Backup success, instid:apmdbsvr-10_90_73_178-21@10_90_73_179-21, method:physical Compress backup file to apmdbsvr-10_90_73_178-21@10_90_73_179-21_abatest-0-999_20170303105515_manual_full_day_physical.tar.gz... Compress success, instid:apmdbsvr-10_90_73_178-21@10_90_73_179-21, method:physical Sign backup files... Signing file for instanceUnit sign file is apmdbsvr-10_90_73_178-21@10_90_73_179-21_ossdbsvr-0-999_20170303105515_manual_full_day_physical.tar.gzsigning file error code is 0 . SignFiles success, instid:apmdbsvr-10_90_73_178-21@10_90_73_179-21, method:physical SaveBackupFiles success, instid:apmdbsvr-10_90_73_178-21@10_90_73_179-21, method:physical Count backup files number. Count backup files number. CountBackup success, instid:apmdbsvr-10_90_73_178-21@10_90_73_179-21, method:physical Check backup file permission. CheckPermission success, instid:apmdbsvr-10_90_73_178-21@10_90_73_179-21, method:physical CleanTempFiles success, instid:apmdbsvr-10_90_73_178-21@10_90_73_179-21, method:physical

Copying Backup Files to the Slave Database Node
Background

Before the re-creation of the slave database instance, the backup files on the master database node must be copied to the slave database node.

Procedure
  1. Use PuTTY to log in to the slave database node as the paas user.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command to switch to the root user:

    su - root
    password:

  3. Run the following command to copy the backup files and their signature files from the master database node to /opt/pub/backup_local of the slave database node:

    scp dbuser@<IP address of the master database node>:/opt/pub/backup_local/apmdbsvr-10_90_73_178-21@10_90_73_179-21_apmdbsvr-10_90_73_178-21_20150709115845_manual_full_day_physical.tar.gz* /opt/pub/backup_local

    NOTE:

    If the scp command is forbidden because the database is security hardened, you can copy the file through other methods to the above directory.

  4. Run the following commands to change the permission, owner, and owner group of the backup files to 600, dbuser, and dbgroup, respectively:

    chown dbuser:dbgroup /opt/pub/backup_local/apmdbsvr-10_90_73_178-21_apmdbsvr-10_90_73_178-21_20150709115845_manual_full_day_physical_full.tar.gz*

    chmod 600 /opt/pub/backup_local/apmdbsvr-10_90_73_178-21@10_90_73_179-21_apmdbsvr-10_90_73_178-21_20150709115845_manual_full_day_physical.tar.gz*

Re-creating the Slave Database Instance
Background

Before you restore the replication status of the slave database instance, the slave database instance must be rebuilt.

Procedure
  1. Use PuTTY to log in to the slave database node as the paas user.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command to go to the installation directory of dbsvc_tool:

    cd /opt/paas/oss/manager/apps/DBAgent/bin/

  3. Run the following commands to re-create the slave database instance:

    ./dbsvc_tool -cmd repair-db-instance -method rebuild -instid apmdbsvr-10_90_73_178-21@10_90_73_179-21 -newmaster apmdbsvr-10_90_73_178-21 -masterfile /opt/pub/backup_local/apmdbsvr-10_90_73_178-21@10_90_73_179-21_apmdbsvr-10_90_73_178-21_20150709115845_manual_full_day_physical.tar.gz

    If the message "Rebuild local db Successful" is displayed, the slave database instance is successfully rebuilt.

    [2016-11-17 17:13:16] [35502] 1 Uncompress /opt/mysql/data/apmdbsvr-10_90_73_178-21_apmdbsvr-10_90_73_178-21_20150709115845_manual_full_day_physical_full.tar.gz to Temp dir ... 
    [2016-11-17 17:13:16] [35502] 2 Stopping local db instance apmdbsvr-10_90_73_179-21 ... 
    [2016-11-17 17:13:16] [35502] 3 Rebuilding local db ... 
    [2016-11-17 17:13:16] [35502] 4 Restarting local db ... 
    [2016-11-17 17:13:16] [35502] 5 Repairing GTID ... 
    [2016-11-17 17:13:16] [35502] 6 Setting up replication ...  
    [2016-11-17 17:13:16] [35502] Rebuild local db Successful!

    The output may vary with versions. If the message "Rebuild local db Successful" is displayed, the slave database instance is successfully rebuilt.

Deleting Backup Files on the Slave Database Node
Background

After the slave database instance is rebuilt, the backup files need to be deleted.

Procedure
  1. Use PuTTY to log in to the slave database node as the paas user.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command to switch to the dbuser user:

    su - root
    password:

    su - dbuser

  3. Run the following command to delete the backup files that are copied from the master database node:

    rm /opt/pub/backup_local/apmdbsvr-10_90_73_178-21@10_90_73_179-21_apmdbsvr-10_90_73_178-21_20150709115845_manual_full_day_physical.tar.gz*

Enabling Database Failover
Background

After the database fault is rectified, the database failover function must be enabled to ensure the database is highly reliable.

Procedure
  1. Use PuTTY to log in to the om_core1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command to obtain the name of the pod corresponding to DBHASwitchService:

    kubectl get pod -n om | grep dbhaswitch | grep Running | awk '{ print $1 }'

    If the following command output is displayed, the failover function is successfully enabled:

    dbhaswitchservice-3302270813-1n452
    dbhaswitchservice-3302270813-cp154

  3. Run the following command to enter the pod corresponding to DBHASwitchService:

    kubectl exec dbhaswitchservice-3302270813-1n452 -n om -it sh

    NOTE:

    dbhaswitchservice-3302270813-1n452 indicates the name obtained in 2. If there are multiple names, set pod in the preceding command to any one of them.

  4. Run the following commands to go to the installation directory of the database failover tool:

    cd /opt/apps/DBHASwitchService/bin

  5. Run the following command to enable the failover function:

    ./switchtool.sh -cmd del-ignore-nodes

    If the following command output is displayed, the failover function is successfully enabled:

    Successful.

  6. Wait until the replication status of both master and slave database instances returns to normal. Log in to the database node as user paas. Run the following commands to view the replication status:

    . /opt/paas/oss/manager/bin/engr_profile.sh

    bash /opt/paas/oss/manager/apps/DBAgent/bin/dbsvc_adm -cmd query-db-instance

    Information similar to the following is returned:

    DBInstanceId  Service Name  NodeId  IP  Port  DBType  Role  Rpl Status 
    apmdbsvr-10_90_73_178-21@10_90_73_179-21 apmdbsvr-10_90_73_178-21 103 10.145.93.121 32080 mysql Master Normal 
    apmdbsvr-10_90_73_178-21@10_90_73_179-21 apmdbsvr-10_90_73_179-21 104 10.145.93.122 32080 mysql Slave Normal
    NOTE:
    • The preceding information is only a part of the actual command output.
    • In the Rpl Status column of the command output, if Normal is displayed, the replication status of the master and slave database instances is normal; if Abnormal is displayed, the replication status is abnormal.
    • If a database exception is due to a data conflict between master and slave database instances, a master/slave failover is not allowed within 30 minutes after the database exception.

Verification

  1. Use PuTTY to log in to the om_core1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following commands to check information about database:

    cd /opt/paas/oss/manager/apps/DBAgent/bin

    ./dbsvc_adm -cmd query-db-instance | grep mysql

    Information similar to the following is displayed:

    DBInstanceId                             ClassId  Service Name               Region        Tenant Stage    IP          Port   State   DBType  Version  Role  Rpl Status   MasterID   
    apmdbsvr-10_90_73_163-3@10_90_73_164-3   primary  apmdbsvr-10_90_73_164-3    cn-global-1   om     Product 10.90.73.164 32082  Up      mysql   5.6.35   Master  Normal       apmdbsvr-10_90_73_163-3   
    apmdbsvr-10_90_73_178-21@10_90_73_179-21 primary  apmdbsvr-10_90_73_179-21   cn-global-1   om     Product 10.90.73.179 32080  Up      mysql   5.6.35   Slave Normal       apmdbsvr-10_90_73_178-21

    You only need to pay attention to the value of Rpl Status.

    If the statuses of Master and Slave are Normal, the database is running properly.

    If the statues are abnormal, contact technical support.

Translation
Download
Updated: 2019-06-10

Document ID: EDOC1100063248

Views: 23297

Downloads: 37

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next