No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionCloud 6.3.1.1 Troubleshooting Guide 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Checking the Database Instance Replication Status Is Abnormal

Checking the Database Instance Replication Status Is Abnormal

Symptom

The node where the master database instance is located is normal, but the replication status of the node where the slave database instance is located is abnormal.

Possible Causes

  • The server network is disconnected.
  • The replication data of the slave database is incorrect.
  • The expected role is inconsistent with the actual role.
  • Data (GTID) conflict occurs after a failover.
  • Replication is interrupted because the binlog is deleted.
  • Data conflict occurs due to the manual write operation to the slave database.

Troubleshooting Methods

  1. Log in to the database node as user paas.
  2. Run the following commands to check the database replication status:

    cd /opt/paas/oss/manager/apps/DBAgent/bin

    sh dbsvc_adm -cmd query-db-instance | egrep "DBInstanceId|mysql.*Slave"

    For example:

    DBInstanceId                             ClassId  Service Name               Region        Tenant Stage    IP          Port   State   DBType  Version  Role  Rpl Status   MasterID   
    apmdbsvr-10_90_73_163-3@10_90_73_164-3   primary  apmdbsvr-10_90_73_164-3    cn-global-1   om     Product 10.90.73.164 32082  Up      mysql   5.6.35   Slave Normal       apmdbsvr-10_90_73_163-3   
    apmdbsvr-10_90_73_178-21@10_90_73_179-21 primary  apmdbsvr-10_90_73_179-21   cn-global-1   om     Product 10.90.73.179 32080  Up      mysql   5.6.35   Slave Abnormal (212)      apmdbsvr-10_90_73_178-21
    NOTE:
    • The command output varies depending on the version of DataMgmtService. Pay attention only to the value of Rpl Status.

      If the value of Rpl Status is Normal, the database replication status is normal.

      If the value of Rpl Status is Abnormal, the database replication status is abnormal.

    • In this example, the replication status of the database instance apmdbsvr-10_90_73_178-21@10_90_73_179-21 is abnormal.

    If the command output contains "Abnormal", locate the cause and troubleshoot the fault.

  3. For the replication status error codes and troubleshooting methods, see Table 23-3.

    Table 23-3 Replication status error description and solution

    Error Code

    Symptom

    Possible Cause

    Troubleshooting Method

    101

    The database instance or the node where it resides is in the DOWN state.

    1. The database node is not started.
    2. The database instance is not started, or the disk space of the database node is used up.
    3. The network communication between the master and slave database nodes is abnormal.
    1. Based on the status (UP/DOWN) of the faulty instance, check whether the nodes where the master and slave database nodes reside are started.
    2. Check whether the database instance is started. Query the startup logs of the database.
    3. Check whether the network communication between the master and slave nodes is abnormal.

    102

    The roles for both the master and slave database instances become master.

    The nodes where the master and slave instances reside are set to ignore nodes.

    Check whether the nodes where master and slave database instances are located are set to ignore nodes. If they are, run the switchtool.sh command to cancel the setting. For details about the command, see Command Reference.

    103

    The roles for both the master and slave database instances become slave.

    The nodes where the master and slave instances reside are set to ignore nodes.

    Check whether the nodes where master and slave database instances are located are set to ignore nodes. If they are, run the switchtool.sh command to cancel the setting. For details about the command, see Command Reference.

    104

    The roles for the master and slave database instances are inconsistent with those for the database instances of ZookeeperService.

    The nodes where the master and slave instances reside are set to ignore nodes.

    Check whether the nodes where master and slave database instances are located are set to ignore nodes. If they are, run the switchtool.sh command to cancel the setting. For details about the command, see Command Reference.

    Delay(201)

    Data replication is delayed.

    1. Heavy write operations in the database in a short period cause replication delay.
    2. All data is being synchronized to the Redis instance.

    Check whether the alarm is cleared after a few minutes. If the alarm persists, contact the DBA for troubleshooting.

    • If you run the show slave status command in the MySQL instance, the output containing Seconds_Behind_Master whose value is greater than 0 is displayed.
    • If you run the info command in the Redis slave instance, the output containing aof_rewrite_in_progress equals 1, rdb_bgsave_in_progress equals 1, or loading equals 1 is displayed.

    200

    The network communication between the MySQL master and slave instances is abnormal.

    The I/O communication between master and slave instances is abnormal. The relevant I/O thread of the MySQL instance is abnormal (the status of the Slave_IO_Running process is NO).

    1. Check whether the master database instance is started and whether the disk space of the node where the master instance resides is used up. Alternatively, check whether the nodes where master and slave database instances are located can communicate with each other.

      If the fault is due to network reasons, contact network administrators to restore the network connection. If the master instance is not started, rectify based on the troubleshooting methods for error code 101. If the disk space of the node where the master instance resides is used up, free up some space, and then perform the following actions:

      1. Stop the master instance, and then restart it. For details, see "Stopping the Database" and "Starting the Database" in Maintenance Guide.
      2. Check whether the alarm is cleared after a few minutes. If the alarm persists, rebuild the slave database instance. For details, see "Re-creating the Slave Database Instance".
    2. Run the show slave status command on the slave instance to query the MySQL database error code and collect the error information.

    210

    The SQL thread of the MySQL slave database instance is abnormal.

    1. The status of the Slave_SQL_Running process is NO.
    2. Write operations are performed improperly on the slave instance as the dbuser superuser.

    Run the command for rebuilding the slave database instance in one-click modes of the dbsvc_adm tool to rectify the fault. For details about the command, see "Abnormal Replication State of Slave Database Instance".

    211

    The slave database instance contains GTIDs that are not included in the GTIDs of the master database instance.

    Write operations are performed improperly on the slave instance as the dbuser superuser.

    Run the command for rebuilding the slave database instance in one-click modes of the dbsvc_adm tool to rectify the fault. For details about the command, see "Abnormal Replication State of Slave Database Instance".

    212

    The roles for both the master and slave database instances become master, and the slave database instance contains GTIDs that are not included in the GTIDs of the master database instance.

    MySQL master/slave database failover occurs recently. Some data is not synchronized to the slave instance. As a result, the slave database instance contains GTIDs that are not included in the GTIDs of the master database instance.

    Run the command for rebuilding the slave database instance in one-click modes of the dbsvc_adm tool to rectify the fault. For details about the command, see "Abnormal Replication State of Slave Database Instance".

    213

    After the master/slave database instance failover is complete, the slave one contains GTIDs that are not included in the master one.

    sync_binlog or innodb_flush_log_at_trx_commit is not set to 1. Though being synchronized to the slave instance before the failover starts, the data in the master database instance is not written into the disk. As a result, data conflict occurs between the master and slave database instances.

    1. Run the command for rebuilding the slave database instance in one-click modes of the dbsvc_adm tool to rectify the fault. For details about the command, see Command Reference.
    2. sync_binlog=1; innodb_flush_log_at_trx_commitIt is recommended that both sync_binlog and innodb_flush_log_at_trx_commit be set to 1 in the MySQL configuration file (my_product.cnf).
    NOTE:
    • Error codes smaller than 200 are common MySQL/Redis error codes.
    • Error codes greater than or equal to 200 are replication errors specific to MySQL databases.

      Errors with the code beginning with 21 can be rectified by rebuilding the MySQL slave database instance in one-click mode.

Translation
Download
Updated: 2019-06-10

Document ID: EDOC1100063248

Views: 23161

Downloads: 37

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next