No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Troubleshooting Guide 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Checking the Database Instance Replication Status Is Abnormal

Checking the Database Instance Replication Status Is Abnormal

Symptom

The node where the master database instance is located is normal, but the replication status of the node where the slave database instance is located is abnormal.

Possible Causes

  • The server network is disconnected.
  • The replication data of the slave database is incorrect.
  • The expected role is inconsistent with the actual role.
  • Data conflict occurs due to the manual write operation to the slave database.

Troubleshooting Method

  1. Use PuTTY to log in to the manage_db1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following commands to check the database replication status:

    cd /opt/paas/oss/manager/apps/DBAgent/bin

    sh dbsvc_adm -cmd query-db-instance | egrep "DBInstanceId|gauss.*Slave"

    For example:

    DBInstanceId                                  ClassId  InstNumber                         Tenant          IP             Port   State  DBType  Version            Role    Rpl Status  MasterID                      GuardMode  DataCheckSum  isSSL
    apmdbsvr-10_186_66_155-1@10_186_67_174-1      primary  apmdbsvr-10_186_66_155-1           fst-manage      10.186.66.155  32081  Up     gauss   V100R003C20SPC112  Master  Normal      --                            --         908781404     off
    apmdbsvr-10_186_66_155-1@10_186_67_174-1      primary  apmdbsvr-10_186_67_174-1           fst-manage      10.186.67.174  32081  Up     gauss   V100R003C20SPC112  Slave   Normal      apmdbsvr-10_186_66_155-1      --         908779606     off
    NOTE:

    The command output varies depending on the version of DataMgmtService. Pay attention only to the value of Rpl Status.

    • If the value of Rpl Status is Normal, the database replication status is normal.
    • If the value of Rpl Status is Abnormal, the database replication status is abnormal.

    If the command output contains "Abnormal", locate the cause and troubleshoot the fault.

  3. For the replication status error codes and troubleshooting methods, see Table 20-3.

    Table 20-3 Replication status error codes

    Error Code

    Description

    Possible Cause

    Troubleshooting Method

    101

    The database instance or the node where it resides is in the DOWN state.

    1. The corresponding data node is not started.
    2. The corresponding database instance is not started, or the disk space of the data node is insufficient.
    3. The network communication between the master and slave database nodes is abnormal.
    4. Files in the corresponding database instance are damaged.
    1. Based on the running status (UP/DOWN) of the instance ID, check whether the master and slave data nodes where the instances are located are started.
    2. Check whether the database instance is started and check the database startup log.
    3. Check whether the communication between the master and slave nodes is normal.
    4. Check whether the database startup logs contain illegible characters. For details, see Database Startup Failure Caused by Database File Damage.

    102

    The roles for both the master and slave database instances become master.

    The nodes where the master and slave instances reside are manually set to ignore nodes.

    Confirm the reason for ignoring the node and if you want to fix it, run the switchtool.sh command to cancel the setting. For details, see "Command Reference" in FusionStage 6.5.0.SPC100 Product Documentation.

    103

    The roles for both the master and slave database instances become slave.

    The nodes where the master and slave instances reside are manually set to ignore nodes.

    Confirm the reason for ignoring the node and if you want to fix it, run the switchtool.sh command to cancel the setting. For details, see "Command Reference" in FusionStage 6.5.0.SPC100 Product Documentation.

    104

    The roles of the database instances are inconsistent with those in ZooKeeper.

    The nodes where the master and slave instances reside are manually set to ignore nodes.

    Confirm the reason for ignoring the node and if you want to fix it, run the switchtool.sh command to cancel the setting. For details, see "Command Reference" in FusionStage 6.5.0.SPC100 Product Documentation.

    301

    Data replication is delayed.

    1. A large number of database write operations are performed in a short period of time. As a result, the replication processing is delayed.
    2. Redis is performing full data synchronization.

    If the fault persists or the replication delay frequently occurs, contact the DBA to locate the fault.

    302

    The slave instance is being started.

    The instance is being started.

    If the fault persists or the replication delay frequently occurs, contact the DBA to locate the fault.

    303

    The slave database needs to be rebuilt.

    The nodes where the master and slave instances reside are manually set to ignore nodes.

    If the fault persists or the replication delay frequently occurs, contact technical support.

    304

    The slave node is waiting for the peer end to switch over to the slave state.

    The nodes where the master and slave instances reside are manually set to ignore nodes.

    If the fault persists or the replication delay frequently occurs, contact technical support.

    305

    The slave node is switching over to the slave state.

    The nodes where the master and slave instances reside are manually set to ignore nodes.

    If the fault persists or the replication delay frequently occurs, contact technical support.

    306

    The slave node is switching over to the master state.

    The nodes where the master and slave instances reside are manually set to ignore nodes.

    If the fault persists or the replication delay frequently occurs, contact technical support.

    307

    An unknown error occurs.

    -

    If the fault persists or the replication delay frequently occurs, contact technical support.

    310

    The slave database needs to be rebuilt.

    DETAIL_INFORMATION is equal to WalSegmentRemoved|SystemIDNotMatched|TimeLineNotMatched

    The nodes where the master and slave instances reside are manually set to ignore nodes.

    Confirm the reason for ignoring the node and if you want to fix it, run the switchtool.sh command to cancel the setting. For details, see "Command Reference" in FusionStage 6.5.0.SPC100 Product Documentation.

    If the fault persists or the replication delay frequently occurs, contact technical support.

    NOTE:
    • Error codes smaller than 300 are common Gauss/Redis error codes.
    • Error codes greater than or equal to 300 are replication errors specific to Gauss databases.

  4. If the fault persists, contact technical support for assistance.
Translation
Download
Updated: 2019-06-01

Document ID: EDOC1100062375

Views: 1539

Downloads: 12

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next