No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionCloud 6.3.1.1 Troubleshooting Guide 02

Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
RDS Service Faults

RDS Service Faults

This chapter describes how to analyze causes for common service faults based on their symptoms and troubleshoot these faults.

MySQL

MySQL Parameter Failed to Take Effect After a Modification
Symptom

The default MySQL parameter is changed on the parameter configuration page, but the modification does not take effect.

Possible Causes
  • The user does not reboot the database after modifying the parameter. The database needs to be rebooted for certain parameter modifications to take effect.
  • The user does not close the current session. The session needs to be reopened for certain parameter modifications to take effect.
  • The modified value is improper, resulting in the MySQL fault.
  • Some parameters cannot be written to the configuration file and are automatically restored.
  • The DB instance is faulty.
Locating Method

Check whether the DB instance status is normal, understand the features of the MySQL parameter to be modified, and use a correct method to change the parameter to a correct value. Reboot the MySQL or reopen a session as required.

Procedure
  1. Use PuTTY to log in to the newRDS-Database01 node.

    Default account: dbs; default password: Changeme_123

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Cloud12#$.

  3. Run the following command to switch to gausscore user:

    su gausscore

  4. Run the following command to log in to the CORE database:

    gsql -p Port number -q Database name-U COREDatabase username-W CORE user password;

    For example, if the database username is wfs, the port number is 8635, and the database user password is RDSDB@Remote123, run the following command:

    gsql -p 8635 -q core -U core -W RDSDB@Remote123

  1. Connect to GaussDB and query the DB instance status in the dbs_instance table.

    CORE=> Select name,status from dbs_instance where name='DB instance name';

    • If the DB instance status is normal, the DB instance is normal.
    • If the DB instance status is abnormal, the DB instance is faulty. Restore the instance first.

  2. Check and reboot the database, or close the current connection and reopen a session.
  3. Set the parameter again following the instructions provided at the official MySQL website.
  4. If the fault persists, contact technical support for assistance.
Failed to Access the RDS Console Page
Symptom

After the address of the RDS console is entered in the address bar of the web browser, the RDS console page cannot be accessed.

Possible Causes
  • The Nginx server is faulty.
  • The RDSConsole service is abnormal.
Locating Method

Ensure that the Nginx server is working properly. Log in to the VM where RDSConsole is deployed. Ensure that the RDSConsole process is running properly.

Procedure
  1. Check whether the Nginx server IP address can be pinged.

    • If the IP address cannot be pinged, the Nginx server is faulty. Contact technical support for assistance.
    • If the IP address can be pinged, go to 2.

  2. Use PuTTY to log in to the newRDS-Console01 and newRDS-Console02 node.

    Default account: dbs; default password: Changeme_123

  3. Run the following command to check whether the Tomcat process exists:

    ps -ef|grep tomcat

    If information similar to the following is displayed, the Tomcat process exists:
    dbs        1831      1  0 Jan16 ?        00:02:54 /opt/dbs/jre/bin/java -Djava.util.logging.config.file=/opt/dbs/tomcat/newrds-console/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/opt/dbs/tomcat/newrds-console/logs/gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/dbs/tomcat/newrds-console/oom.log -Dorg.quartz.scheduler.skipUpdateCheck=true -Dorg.apache.catalina.connector.RECYCLE_FACADES=true -Dfile.encoding=UTF-8 -server -Xms2046m -Xmx2046m -Xmn1023m -XX:SurvivorRatio=10 -XX:MaxTenuringThreshold=15 -XX:NewRatio=2 -XX:+DisableExplicitGC -Dorg.apache.catalina.STRICT_SERVLET_COMPLIANCE=true -classpath /opt/dbs/tomcat/newrds-console/bin/bootstrap.jar:/opt/dbs/tomcat/newrds-console/bin/tomcat-juli.jar -Dcatalina.base=/opt/dbs/tomcat/newrds-console -Dcatalina.home=/opt/dbs/tomcat/newrds-console -Djava.io.tmpdir=/opt/dbs/tomcat/newrds-console/temp org.apache.catalina.startup.Bootstrap start
    • If the process exists, go to 4.
    • If the process does not exist, go to 5.

  4. Run the following commands to restart the Tomcat process:

    cd /opt/dbs/tomcat/newrds-console/bin/

    ./shutdown.sh

    ./startup.sh

    NOTE:

    If the Tomcat cannot be stopped by running the ./shutdown.sh command, run the following command:

    pid=$(ps -ef | grep tomcat | grep -v grep | awk '{print $2}')

    kill -9 $pid

  5. Run the following commands to start the process:

    cd /opt/dbs/tomcat/newrds-console/bin/

    ./startup.sh

  6. After the Tomcat process is started, run the ps -ef | grep tomcat command to check whether the process is started successfully.
  7. Perform 3 to 6 on the newRDS-Console02 node.
Failed to Display User Instances in the Instance List
Symptom

Users can visit the instance list page, but user instances are not displayed.

Possible Causes

The Tomcat service of the newRDS-Service is abnormal.

Locating Method

Log in to the VM where the newRDS-Service resides and ensure that the Tomcat service process is running properly.

Procedure
  1. Use PuTTY to log in to the newRDS-Service01 node.

    Default account: dbs; default password: Changeme_123

  2. Run the following command to check whether the Tomcat process exists:

    ps -ef|grep tomcat

    If information similar to the following is displayed, the Tomcat process exists:
    [dbs@dbs_mysqlmgnt2 mysql]$ ps -ef|grep tomcatdbs      120700      1  2 Sep21 ?        01:13:49 /opt/dbs/jre/bin/java -Djava.util.logging.config.file=/opt/dbs/tomcat/mysql-instancemanager/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/opt/dbs/tomcat/mysql-instancemanager/logs/gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/dbs/tomcat/mysql-instancemanager/oom.log -Dorg.quartz.scheduler.skipUpdateCheck=true -Dfile.encoding=UTF-8 -server -Xms4094m -Xmx4094m -Xmn2047m -XX:SurvivorRatio=10 -XX:MaxTenuringThreshold=15 -XX:NewRatio=2 -XX:+DisableExplicitGC -classpath /opt/dbs/tomcat/mysql-instancemanager/bin/bootstrap.jar:/opt/dbs/tomcat/mysql-instancemanager/bin/tomcat-juli.jar -Dcatalina.base=/opt/dbs/tomcat/mysql-instancemanager -Dcatalina.home=/opt/dbs/tomcat/mysql-instancemanager -Djava.io.tmpdir=/opt/dbs/tomcat/mysql-instancemanager/temp org.apache.catalina.startup.Bootstrap start
    • If the process exists, go to 3.
    • If the process does not exist, go to 4.

  3. Run the following commands to restart the Tomcat process:

    cd /opt/dbs/tomcat/mysql-instancemanager/bin/

    ./shutdown.sh

    ./startup.sh

    NOTE:

    If the Tomcat cannot be stopped by running the ./shutdown.sh command, run the following command:

    pid=$(ps -ef | grep tomcat | grep -v grep | awk '{print $2}')

    kill -9 $pid

  4. Run the following commands to start the process:

    cd /opt/dbs/tomcat/mysql-instancemanager/bin/

    ./startup.sh

  5. Run the ps -ef |grep tomcat command to check whether the Tomcat process exists.
  6. Perform 2 to 5 on the newRDS-Service02 node to check the service status.
Invalid Deletion Operation
Symptom

A message indicating successful delivery of a deletion operation is displayed, but the DB instance is not deleted after 1000 seconds.

Possible Causes
  • The newRDSService service is abnormal.
  • The deletion check fails.
  • Failed to invoke the backup and restoration API or the resource management API.
Locating Method
  • Log in to the VM where the newRDSService is located and ensure that the newRDSService service process is running properly.
  • Log in to the database and identify the error cause and task in which the error occurs.
  • Log in to the VM where the newRDSService is located and view logs to identify the detailed error cause.
Procedure
  1. Use PuTTY to log in to the newRDS-Service01 node.

    Default account: dbs; default password: Changeme_123

  2. Run the following command to check whether the Tomcat process exists:

    ps -ef|grep tomcat

    If information similar to the following is displayed, the Tomcat process exists:
    [dbs@dbs_mysqlmgnt2 mysql]$ ps -ef|grep tomcatdbs      120700      1  2 Sep21 ?        01:13:49 /opt/dbs/jre/bin/java -Djava.util.logging.config.file=/opt/dbs/tomcat/mysql-instancemanager/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djdk.tls.ephemeralDHKeySize=2048 -Djava.protocol.handler.pkgs=org.apache.catalina.webresources -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/opt/dbs/tomcat/mysql-instancemanager/logs/gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/opt/dbs/tomcat/mysql-instancemanager/oom.log -Dorg.quartz.scheduler.skipUpdateCheck=true -Dfile.encoding=UTF-8 -server -Xms4094m -Xmx4094m -Xmn2047m -XX:SurvivorRatio=10 -XX:MaxTenuringThreshold=15 -XX:NewRatio=2 -XX:+DisableExplicitGC -classpath /opt/dbs/tomcat/mysql-instancemanager/bin/bootstrap.jar:/opt/dbs/tomcat/mysql-instancemanager/bin/tomcat-juli.jar -Dcatalina.base=/opt/dbs/tomcat/mysql-instancemanager -Dcatalina.home=/opt/dbs/tomcat/mysql-instancemanager -Djava.io.tmpdir=/opt/dbs/tomcat/mysql-instancemanager/temp org.apache.catalina.startup.Bootstrap start
    • If the process exists, go to 3.
    • If the process does not exist, go to 4.

  3. Run the following commands to restart the Tomcat process:

    cd /opt/dbs/tomcat/mysql-instancemanager/bin/

    ./shutdown.sh

    ./startup.sh

    NOTE:

    If the Tomcat cannot be stopped by running the ./shutdown.sh command, run the following command:

    pid=$(ps -ef | grep tomcat | grep -v grep | awk '{print $2}')

    kill -9 $pid

  4. Run the following commands to start the process:

    cd /opt/dbs/tomcat/mysql-instancemanager/bin/

    ./startup.sh

  5. Run the ps -ef |grep tomcat command to check whether the Tomcat process exists.
  6. Perform 2 to 5 on the newRDS-Service02 node to check the service status.
  1. Use PuTTY to log in to the newRDS-Database01 node.

    Default account: dbs; default password: Changeme_123

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Cloud12#$.

  3. Run the following command to switch to gausscore user:

    su gausscore

  4. Run the following command to log in to the CORE database:

    gsql -p Port number -q Database name-U COREDatabase username-W CORE user password;

    For example, if the database username is wfs, the port number is 8635, and the database user password is RDSDB@Remote123, run the following command:

    gsql -p 8635 -q core -U core -W RDSDB@Remote123

  1. View the task execution status.

    • Query the dbs_action table and view the task execution status. Replace ##### with the DB instance ID.
      CORE=> select * from dbs_action where instanceId = '###########';
      • If the status value is OK_TO_RUN, the deletion is in process.
      • If the status value is DELETED, the deletion is ended and may fail.
    • Query the dbs_metering_usagerecord table and check whether the accounting stop information is recorded in the table.
      CORE=> select * from dbs_metering_usagerecord where instance_id = '###########' and module ~ 'DelInst' order by created_at desc;
      • If the charging information is abnormal, the deletion check fails and the deletion task is not delivered. Go to 12.
      • If the accounting information is correct, you can log in to the database as a resource tenant and manually delete the user VM, backup, and network information.
    • Query dbs_alarm and dbs_alarm_log tables to check whether there is any alarm information.
      • alarm_name indicates the type of the failed task. For example, "failed to delete a task" and "failed to change specifications".
      • cause indicates the name of the failed task and the detailed cause.
      • location_info indicates the detailed error information.
      CORE=> select * from dbs_alarm where location_info ~ '###########';
      CORE=> select * from dbs_alarm_log where location_info ~ '###########';

    The common errors are as follows: failed to delete backups and policies, failed to invoke the resource management module, and the resource management module failed to delete a DB instance. Go to 12 to further identify the error.

  2. Log in to the newRDSService VM and query the DB instance management logs.

    • Query the info log, locate the DB instance ID, and obtain the workflow ID according to the deletion delivery time. Locate the workflow ID in the log to trace the task execution process.
    • Query the error log and obtain the DB instance ID and workflow ID to identify the error cause. For detailed error information, you need to log in to the server where the backup and restoration service or the resource management service is located to query related logs.

Failed to Perform a Failover for Primary/Standby DB Instances
Symptom

The primary DB instance is faulty, but the standby DB instance fails to be automatically promoted to the primary.

Possible Causes

The standby DB instance is faulty. I/O operations on the primary node are frequent, but those on the standby node I/O are slow.

Locating Method

On the standby DB instance, check whether the Seconds_Behind_Master value in show slave status\G is small, for example, within 120, or whether Last_IO_Error and Last_SQL_Error contain error information.

Procedure
  1. Check the status of the standby instance. If the standby instance is faulty, the switchover cannot be performed. For example, the standby DB instance breaks down and its database process has stopped.
  2. Log in to the standby DB node and check the system environment. For example, check whether the disk space is sufficient, and whether the memory usage is normal.
  1. Use PuTTY to log in to the newRDS-Database01 node.

    Default account: dbs; default password: Changeme_123

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Cloud12#$.

  3. Run the following command to switch to gausscore user:

    su gausscore

  4. Run the following command to log in to the CORE database:

    gsql -p Port number -q Database name-U COREDatabase username-W CORE user password;

    For example, if the database username is wfs, the port number is 8635, and the database user password is RDSDB@Remote123, run the following command:

    gsql -p 8635 -q core -U core -W RDSDB@Remote123

  1. Obtain the value of the ha_monitor_id field in the dbs_instance_ha table based on the DB instance ID. Query the IP address of the service node where the HAMonitor is located in the dbs_ha_monitor table based on the ha_monitor_id value.

    select c.ip from core.dbs_ha_monitor c where c.id = (select a.ha_monitor_id from core.dbs_instance_ha a where a.instance_id = 'DB instance ID')

    Check whether the HAMonitor service is running properly and whether the service can ping the DB instance.

  2. Run the following command to query the agent log to check whether the agent process is repeatedly killed by other processes:

    less /home/Ruby/log/agent.log

  3. Check whether the directory of the agent log has been deleted or modified before. You are advised to stop the HA process before deleting the directory of the agent log or reboot the HA process after deleting the agent directory.
  4. On the standby DB instance, check whether the Seconds_Behind_Master value in show slave status\G is small, for example, within 120, or whether Last_IO_Error and Last_SQL_Error contain error information. Rectify the fault based on the incorrect information.
  5. If the fault persists, contact technical support for assistance.
Remarks

The primary/standby DB instances failover is automatic and usually finished within one minute. However, the duration varies depending on the volume of data that is not synchronized between primary and standby DB instances.

When a large volume of data on the primary DB instance is not synchronized and the primary DB instance is faulty, the failover will not be triggered instantly. The system needs to wait till the data is synchronized to the standby DB instance, so it takes a long time to recovery the service. You can connect to the standby DB instance and run show slave status to check whether the service recovery takes a long period of time because the data is not synchronized.

Failed to Restore the Primary/Standby Relationship
Symptom

The HAMonitor does not restore the instances properly, but the instances are still available. The primary/standby relationship is incorrect and data between primary and standby DB instances cannot be synchronized.

Possible Causes

The primary/standby HAagent process is faulty. The standby DB instance fails to be promoted to the primary DB instance.

Locating Method

Check the HAagent process by the logs.

Solution
  1. Use PuTTY to log in to the newRDS-Database01 node.

    Default account: dbs; default password: Changeme_123

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Cloud12#$.

  3. Run the following command to switch to gausscore user:

    su gausscore

  4. Run the following command to log in to the CORE database:

    gsql -p Port number -q Database name-U COREDatabase username-W CORE user password;

    For example, if the database username is wfs, the port number is 8635, and the database user password is RDSDB@Remote123, run the following command:

    gsql -p 8635 -q core -U core -W RDSDB@Remote123

  1. Obtain the value of the ha_monitor_id field in the dbs_instance_ha table based on the DB instance ID. Query the IP address of the service node where the HAMonitor is located in the dbs_ha_monitor table based on the ha_monitor_id value.

    select c.ip from core.dbs_ha_monitor c where c.id = (select a.ha_monitor_id from core.dbs_instance_ha a where a.instance_id = 'DB instance ID')

    Check whether the HAMonitor service is running properly and whether the service can ping the DB instance.

  2. Log in to the HAMonitor service node using the IP address obtained in 5, and go to the directory of the HAMonitor log. Query the instance error log information based on the alarm occurrence time and instance ID on the OperationCenter platform.
  3. Log in to primary and standby DB instances, respectively. View information in show slave status\G and check whether Last_IO_Error and Last_SQL_Error contain error information. Record the error information.
  4. Contact technical support for assistance.
Failed to Perform a Point-In-Time Restoration
Symptom

The automated backup policy has been set for the DB instance, but the restoration button on the GUI is still dimmed.

Possible Causes
  • The mysql-backupmanager service is disconnected to the DB instance through SSH.
  • The DB instance fails to connect to OBS.
  • The backup file fails to be uploaded to the OBS bucket.
  • The DB instance backup fails and there is no backup file can be restored.
  • The volume of data is huge and the service is busy, resulting in a slow response.
Locating Method

Ensure that the communications from mysql-backupmanager to the DB instance through SSH and the network between the DB instance and OBS are normal. Then, log in to mysql-backupmanager and the DB instance to view the logs and analyze the fault causes based on the logs.

Procedure
  1. Use PuTTY to log in to the newRDS-Database01 node.

    Default account: dbs; default password: Changeme_123

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Cloud12#$.

  3. Run the following command to switch to gausscore user:

    su gausscore

  4. Run the following command to log in to the CORE database:

    gsql -p Port number -q Database name-U COREDatabase username-W CORE user password;

    For example, if the database username is wfs, the port number is 8635, and the database user password is RDSDB@Remote123, run the following command:

    gsql -p 8635 -q core -U core -W RDSDB@Remote123

  1. Search for the IP address of the corresponding instance by searching the CORE.DBS_INSTANCE table.

    SELECT I.ID,N.ID,G.NAME,N.ROLE,C.IP,C.INTERNAL_VXLAN_IP

    FROM CORE.DBS_NODE N

    inner join CORE.DBS_PARENTSHIP P on P.ENTITY_ID = N.ID

    inner join CORE.DBS_INSTANCE I on I.ID = SPLIT_PART(P.PATH_FROM_ROOT, '/', 1)

    inner join CORE.DBS_GROUP G on G.ID = SPLIT_PART(P.PATH_FROM_ROOT, '/', 2)

    inner join CORE.DBS_RELATIONSHIP R on R.SOURCE_ENTITY_ID = N.ID

    inner join CORE.DBS_RELATIONSHIP_TYPE T on T.ID = R.TYPE_ID

    inner join CORE.DBS_NIC C on C.ID = R.TARGET_ENTITY_ID

    WHERE I.NAME = 'DB instance name';

  2. Check whether the mysql-backupmanager service can ping the DB instance.

    Check whether the mysql-backupmanager service can ping the manageIp value.
    • If no, the network between the mysql-backupmanager service and the DB instance is unavailable. Identify the network fault first.
    • If yes, go to 7.

  3. Use the connection tool to log in to the DB instance and run the ping command to check whether the DB instance can be connected to OBS.

    • If no, the network between the DB instance and OBS is unavailable. Identify the network fault first.
    • If yes, go to the next step.

  4. Log in to GaussDB and query the backup file status.

    1. Query backups based on the DB instance name.
      SELECT ID, CREATED_AT, NAME, STATUS FROM CORE.DBS_BACKUP_INFO_DETAILS WHERE INSTANCE_NAME = 'DB instance name';

      If Status (the backup file status) is Failed, go to 9.

    2. If the backupmanager service is deployed on multiple servers, you need to log in to multiple backupmanager nodes to query the logs and the result is as follows:
      Figure 20-1 Filter result

      You need to query the latest error information, filter the latest log information, check which step the error occurs in, and handle related issues.

  5. Back up the entire database and then upload backup files to OBS bucket.

    • If the log prompts that the files failed to be uploaded to the bucket, check the network.
    • If the log file does not contain the current backup information and the fault cause cannot be located based on logs, contact technical support for assistance.
      NOTE:

      If the instance service is busy or the number of concurrent services is large, the automated backup task may be deferred or the backup may be slow. Wait for a few minutes and check logs.

Failed to Replicate a Backup
Symptom

The backup replication task is delivered, but the new backup file is not displayed on the backup management page.

Possible Causes
  • The backupmanager service is disconnected to OBS.
  • The OBS file is lost.
Locating Method
  • Check whether the network is normal.
  • Check whether the backup file in the OBS bucket exists.
Procedure
  1. Check whether the mysql-backupmanager service can ping OBS.

    • If no, the network between the backupmanager service and OBS is unavailable. Identify the network fault first.
    • If yes, go to the next step.

  2. Use PuTTY to log in to the newRDS-Database01 node.

    Default account: dbs; default password: Changeme_123

  3. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Cloud12#$.

  4. Run the following command to switch to gausscore user.

    su gausscore

  5. Run the following command to log in to the CORE database:

    gsql -p Port number -q Database name-U COREDatabase username-W CORE user password;

    For example, if the database username is wfs, the port number is 8635, and the database user password is RDSDB@Remote123, run the following command:

    gsql -p 8635 -q core -U core -W RDSDB@Remote123

  6. Log in to GaussDB and query the backup file status.

    select id,status from dbs_backup_info where name = 'Backup name'

    select file_name from dbs_backup_info_details where backup_id = 'Backup ID';

    NOTE:

    The ID in the dbs_backup_info table indicates the backup ID.

  7. Connect to the OBS bucket and check whether the backup file exists.

    • If no, contact technical support for assistance.
    • If yes, replicate the backup again.

Failed to Restore a DB Instance from a Backup
Symptom

A message is displayed indicating a failure or a restoration failure after users click the restoration button.

Possible Causes
  • The mysql-backupmanager service fails to connect to a DB instance through SSH.
  • The DB instance fails to connect to OBS.
  • The backup file fails to be downloaded from OBS.
  • The RDS instance fails to be restored.
Locating Method

Check whether the instance status is normal. Ensure that the communications from mysql-backupmanager to the DB instance through SSH and the network between the DB instance and OBS are normal. Then, log in to mysql-backupmanager and the DB instance to view the logs and analyze the fault causes based on the logs.

Procedure
  1. Use PuTTY to log in to the newRDS-Database01 node.

    Default account: dbs; default password: Changeme_123

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Cloud12#$.

  3. Run the following command to switch to gausscore user:

    su gausscore

  4. Run the following command to log in to the CORE database:

    gsql -p Port number -q Database name-U COREDatabase username-W CORE user password;

    For example, if the database username is wfs, the port number is 8635, and the database user password is RDSDB@Remote123, run the following command:

    gsql -p 8635 -q core -U core -W RDSDB@Remote123

  1. Connect to GaussDB, retrieve core.dbs_nic in the core.dbs_instance table, and query the manageIp value of the DB instance.

    SELECT I.ID,N.ID,G.NAME,N.ROLE,C.IP,C.INTERNAL_VXLAN_IP

    FROM CORE.DBS_NODE N

    inner join CORE.DBS_PARENTSHIP P on P.ENTITY_ID = N.ID

    inner join CORE.DBS_INSTANCE I on I.ID = SPLIT_PART(P.PATH_FROM_ROOT, '/', 1)

    inner join CORE.DBS_GROUP G on G.ID = SPLIT_PART(P.PATH_FROM_ROOT, '/', 2)

    inner join CORE.DBS_RELATIONSHIP R on R.SOURCE_ENTITY_ID = N.ID

    inner join CORE.DBS_RELATIONSHIP_TYPE T on T.ID = R.TYPE_ID

    inner join CORE.DBS_NIC C on C.ID = R.TARGET_ENTITY_ID

    WHERE I.NAME = 'DB instance name';

  2. Check whether the mysql-backupmanager service can ping the DB instance.

    Check whether the mysql-backupmanager service can ping the manageIp value obtained in 5.
    • If no, the network between the mysql-backupmanager service and the DB instance is unavailable. Identify the network fault first.
    • If yes, go to 7.

  3. Use the connection tool to log in to the DB instance and run the ping command to check whether the DB instance can be connected to OBS.

    • If no, the network between the DB instance and OBS is unavailable. Identify the network fault first.
    • If yes, go to 8.

  4. Log in as user core.dbs and view backupmanager background logs.

    1. Connect to GaussDB, query the core.dbs_restore_schedule table based on the DB instance ID to retrieve the restoration failure record and the workflow ID (workflow_id).
      select id from core.dbs_instance where name = 'DB instance name'
      select id,name,status,workflow_id from core.dbs_restore_schedule where instance_id = 'DB instance ID' order by created_at desc;
    2. Query the DB instance record based on the workflow ID and back up the record.

      Log in to the Linux server of each mysql-backupmanager node. Run the following commands on each node to filter backupmanager backend logs by DB instance name:

      cd ../../opt/core.dbs/tomcat/mysql-backupmanager/logs/backup-mysql/

      grep "Workflow ID" backup-mysql.project.log*

      NOTE:
      • The ID in the core.dbs_instance table indicates the DB instance ID.
      • Workflow ID is the workflow_id value obtained in 8.a.
    3. If the backupmanager service is deployed on multiple servers, you need to log in to multiple backupmanager nodes to query the logs and the result is as follows:
      Figure 20-2 Filter result

      If the log prompts that the backup files fail to be downloaded, check the network.

Failed to Change the Database Port
Symptom

The database port fails to be changed.

Possible Causes
  • The newRDS-Service cannot connect to the instance through SSH.
  • The submitted database port is occupied.
Locating Method
  • Check whether the newRDS-Service can connect to the instance through SSH.
  • Check whether the submitted database port has been occupied.
  • Log in to the VM where the newRDS-Service is located and view logs to locate the fault.
Procedure
  1. Use PuTTY to log in to the newRDS-Database01 node.

    Default account: dbs; default password: Changeme_123

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Cloud12#$.

  3. Run the following command to switch to gausscore user:

    su gausscore

  4. Run the following command to log in to the CORE database:

    gsql -p Port number -q Database name-U COREDatabase username-W CORE user password;

    For example, if the database username is wfs, the port number is 8635, and the database user password is RDSDB@Remote123, run the following command:

    gsql -p 8635 -q core -U core -W RDSDB@Remote123

  1. Connect to GaussDB and query the manageIp value of the DB instance.

    CORE=>SELECT INTERNAL_VXLAN_IP FROM DBS_NIC WHERE ID in(

    SELECT TARGET_ENTITY_ID FROM DBS_RELATIONSHIP

    INNER JOIN DBS_RELATIONSHIP_TYPE

    ON DBS_RELATIONSHIP.TYPE_ID = DBS_RELATIONSHIP_TYPE.ID

    AND DBS_RELATIONSHIP_TYPE.NAME = 'MANAGE_NIC_IS'

    AND DBS_RELATIONSHIP.SOURCE_ENTITY_ID = 'Node ID');

  2. Check whether RDSService can ping the DB instance.

    Check whether RDSService can ping the manageIp value obtained in 5.
    • If no, the network between RDSService and the DB instance is unavailable. Identify the network fault first.
    • If yes, the network between RDSService and the DB instance is available. Go to 7.

  3. Submit the new database port again to change the original database port.

    • If the original database port is changed successfully, the previous change failed because the submitted database port is occupied.
    • If the change still fails, go to 8.

  4. Log in to the RDSService VM and query the DB instance management logs.

    • Connect to GaussDB. Query the DB instance ID in the dbs_instance table by the DB instance name.

      CORE=>SELECT ID FROM DBS_INSTANCE WHERE NAME = 'DB instance name';

    • Query the info log and obtain the workflow ID based on the DB instance ID and the operation delivery time.
    • Query the error log and obtain the DB instance ID and workflow ID to identify the error cause.
    • Contact technical support for assistance.

Failed to Reset the Administrator Password
Symptom

The administrator password fails to be reset.

Possible Causes
  • The password fails to pass the verification.
  • The RDSService fails to connect to a DB instance through SSH.
  • Users reset the administrator password after backups. When the DB instance is restored, the administrator password is restored to the original password.
  • The administrator password has been reset for the primary DB instance but has not been synchronized to the standby DB instance. At this moment, a switchover occurs and the original password was still used.
Locating Method
  • Check whether the password is standard.
  • Check whether the RDSService can be connected to a DB instance through SSH.
  • Check whether the DB instance is restored after the administrator password is reset.
  • Check whether a switchover occurs when the administrator password is being reset.
Procedure
  1. Use PuTTY to log in to the newRDS-Database01 node.

    Default account: dbs; default password: Changeme_123

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Cloud12#$.

  3. Run the following command to switch to gausscore user:

    su gausscore

  4. Run the following command to log in to the CORE database:

    gsql -p Port number -q Database name-U COREDatabase username-W CORE user password;

    For example, if the database username is wfs, the port number is 8635, and the database user password is RDSDB@Remote123, run the following command:

    gsql -p 8635 -q core -U core -W RDSDB@Remote123

  5. Connect to GaussDB and query the manageIp value of the DB instance.

    CORE=>SELECT INTERNAL_VXLAN_IP FROM DBS_NIC WHERE ID in(

    SELECT TARGET_ENTITY_ID FROM DBS_RELATIONSHIP

    INNER JOIN DBS_RELATIONSHIP_TYPE

    ON DBS_RELATIONSHIP.TYPE_ID = DBS_RELATIONSHIP_TYPE.ID

    AND DBS_RELATIONSHIP_TYPE.NAME = 'MANAGE_NIC_IS'

    AND DBS_RELATIONSHIP.SOURCE_ENTITY_ID = 'Node ID');

  1. Check whether RDSService can ping the DB instance.

    Check whether RDSService can ping the manageIp value obtained in 5.
    • If no, the network between RDSService and the DB instance is unavailable. Identify the network fault first.
    • If yes, the network between RDSService and the DB instance is available. Go to 7.

  1. Log in to the RDS console and choose More > Reset Password on the Instance Management page to reset the administrator password again.
Failed to Reboot the Database
Symptom

The database fails to be rebooted.

Possible Causes
  • The RDSService fails to connect to a DB instance through SSH.
  • The DB instance is faulty, and the agent fails to start the database.
Locating Method
  • Check whether the RDSService can be connected to a DB instance through SSH.
  • Log in to the MySQL DB instance and check the MySQL process status.
Procedure
  1. Use PuTTY to log in to the newRDS-Database01 node.

    Default account: dbs; default password: Changeme_123

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Cloud12#$.

  3. Run the following command to switch to gausscore user:

    su gausscore

  4. Run the following command to log in to the CORE database:

    gsql -p Port number -q Database name-U COREDatabase username-W CORE user password;

    For example, if the database username is wfs, the port number is 8635, and the database user password is RDSDB@Remote123, run the following command:

    gsql -p 8635 -q core -U core -W RDSDB@Remote123

  5. Connect to GaussDB and query the manageIp value of the DB instance.

    CORE=>SELECT INTERNAL_VXLAN_IP FROM DBS_NIC WHERE ID in(

    SELECT TARGET_ENTITY_ID FROM DBS_RELATIONSHIP

    INNER JOIN DBS_RELATIONSHIP_TYPE

    ON DBS_RELATIONSHIP.TYPE_ID = DBS_RELATIONSHIP_TYPE.ID

    AND DBS_RELATIONSHIP_TYPE.NAME = 'MANAGE_NIC_IS'

    AND DBS_RELATIONSHIP.SOURCE_ENTITY_ID = 'Node ID');

  1. Check whether RDSService can ping the DB instance.

    Ping the manageIp retrieved in 5 on the RDSService.
    • If no, the network between RDSService and the DB instance is unavailable. Identify the network fault first.
    • If yes, the network between RDSService and the DB instance is available. Go to 7.

  1. Log in to the console and reboot the database.

    If the restart still fails, contact technical support for assistance.

Failed to Insert Data to a MySQL Table (ERROR 1114)
Symptom

You cannot insert data to a table.

Troubleshooting
  1. Query metadata information of the table.

    mysql> select

    table_schema,table_name,table_type,engine,table_rows,concat(round(DATA_LENGTH/1024/1024,4),'M

    ') current_size from information_schema.tables where table_name='test';

    The command output indicates that this table uses the MEMORY storage engine and the table size is 16.0625 MB.

  2. Query the maximum size of MEMORY tables.

    mysql> show variables like 'max_heap_table_size';

    The size of a table whose storage engine is MEMORY cannot exceed the value of max_heap_table_size. Otherwise, ERROR 1114 is reported.

    The max_heap_table_size parameter on the MySQL official website is described as follows: "This variable sets the maximum size to which user-created MEMORY tables are permitted to grow. The value of the variable is used to calculate MEMORY table MAX_ROWS values. Setting this variable has no effect on any existing MEMORY table, unless the table is re-created with a statement such as CREATE TABLE or altered with ALTER TABLE or TRUNCATE TABLE. A server restart also sets the maximum size of existing MEMORY tables to the global max_heap_table_size value."

    Therefore, modifying the value of max_heap_table_size has no effect on existing MEMORY tables. You need to perform subsequent operations after modifying this parameter.

  3. Modify the value of max_heap_table_size as required. The value 67108864 (64 MB) is used as an example.

    mysql> set global max_heap_table_size='67108864';

    mysql> set global max_heap_table_size='67108864';
    Query OK, 0 rows affected (0.00 sec)
  4. Check whether the value of max_heap_table_size is changed.

    mysql> show variables like 'max_heap_table_size';

  5. Perform subsequent operations.
    • Method 1:
      1. Create a temporary table.

        mysql> use test

        mysql> use test
        Reading table information for completion of table and column names
        You can turn off this feature to get a quicker startup with –A
        
        mysql> create table test_temp like test;
        Query OK, 0 rows affected (0.01 sec)
      2. Insert the original table data to the temporary table.

        mysql> insert into test_temp select * from test;

        mysql> insert into test_temp select *  from test;
        Query OK, 63570 rows affected (0.18 sec)
        Records: 63570  Duplicates: 0  Warnings: 0
      3. Change the temporary table name to the original table name.

        mysql> alter table test rename to test_backup;

        mysql> alter table test rename to test_backup;
        Query OK, 0 rows affected (0.00 sec)
        mysql> alter table test_temp rename to test;
        Query OK, 0 rows affected (0.01 sec)
    • Method 2: Change the table storage engine to InnoDB.

      alter table test engine innodb;

Failed to Use mysqldump to Import Data to an RDS MySQL DB Instance (ERROR 1227)
Symptom

When you use mysqldump to export local data and import it to an RDS MySQL DB instance, the following error may occur:

ERROR 1227 (42000) at line 18: Access denied; you need (at least one of) the SUPER privilege(s) for this operation

Possible Causes

You do not have the super permission to import data to a MySQL database.

RDS for MySQL grants the highest permissions to user root. However, for security purposes, neither user root nor self-created users have the super permission. SQL statements produced by mysqldump require the super permission to be executed. If you do not have this permission, data import fails and ERROR 1227 is reported.

If you enable the global transaction identifier (GTID) feature in the source database and do not set OFF to set-gtid-purged when using mysqldump to export data, SQL statements produced by mysqldump require the super permission to be executed.

SET @@SESSION.SQL_LOG_BIN= 0;
SET @@GLOBAL.GTID_PURGED='18f9a804-343b-11e5-a21d-b083fed01601:1-2';
Troubleshooting

Method 1:

Set OFF to set-gtid-purged when you use mysqldump to generate a .sql file that contains SQL statements. Then, reload the .sql file.

mysqldump -uroot -p -h192.168.0.50 -P8635 --databases test --set-gtid-purged=OFF --master-data=2 --single-transaction --order-by-primary -r dump.sql

Method 2:

Run the source command to import data. The remaining SQL statements can still be executed even if a permission error is reported, which does not affect the final data.

mysql>source /tmp/dump.sql

Ensure that the directory of the .sql file is correct and relevant permissions are obtained.

Insufficient Storage Space Because of Too Many Binlogs
Symptom
  • The MySQL master database can be logged in, but the master database process is suspended when SQL statements are executed.

  • The slave replication I/O thread is in the Connecting status and lags behind the master by 718 binlogs.

Fault Locating
  • Go to the log directory to view error logs. Bash programmable completion fails and the system displays a message indicating that storage space is insufficient.

  • Check whether the storage space is insufficient. The following command output indicates that the storage space is fully occupied.

Possible Causes

In this test environment, the size of the database directory is only 100 GB, but the size of the data to be tested is huge, resulting in increased response latency. A large number of binlogs are generated, resulting in insufficient storage space. Master database disks are not monitored. Therefore, the master database cannot be accessed and the replication I/O thread of the slave database is abnormal.

Troubleshooting
  1. Check the slave database status.

    show slave status\G;

    The command output indicates that the slave database has obtained the binlog mysql-bin.004458 and those generated earlier on the master database. Therefore, you can remove mysql-bin.004458 and previous binlogs from the master database.

  2. Remove the binlog mysql-bin.004458 and those generated earlier, and check disk status to ensure that the storage space is sufficient for accessing the master database.

    rm -rf mysql-bin.003*

    df -h

  3. Log in to the database and run the following command to view processes running in the database:

    show processlist;

    The command output indicates that the MySQL database is accessible.

  4. Delete unnecessary binlogs.

    PURGE BINARY LOGS TO 'mysql-bin.004458';

  5. Log in to the slave database and run the following command to view replication status. The command output indicates that the master-slave replication is normal, but the slave lags far behind the master.
    show slave status\G;

Precautions
  • In this simulated fault scenario, data and logs are in the same directory. In production environment, it is recommended that the database directory and log directory be separated.
  • In production environment, you must monitor remaining resources of data disks, log disks, and temporary objects of the MySQL database. When the threshold of a resource is exceeded, the system sends an alarm. You need to immediately handle the alarm to prevent service interruption.
Failed to Join Two MySQL Tables Because of Inconsistent Index Character Sets
Simulated Scenarios
  1. Query table structures. Table t1 and t2 are used as an example.

    mysql> show create table t1\G

    mysql> show create table t2\G

  2. Query all records from tables.

    mysql> select * from t1;

    mysql> select * from t2;

  3. Query the execution plan.

    mysql> explain select * from t2 join t1 on t1.c3 = t2.c3 where t2.c2 = 'guangdong'\G

    The command output indicates that the index of t2.c2 is used in the join condition t2.c2 = 'guangdong', but the index of t1.c3 is not used in t1.c3 = t2.c3. Therefore, the system performs full table scans on t1.

    A warning is displayed at the end of the command output. Run the following command to view the warning details:

    mysql> show warnings\G

    If the information convert (test.t1.c3 using utf8mb4) is displayed in the command output, the character set of t1.c3 has been converted.

    Check table character sets again. The table t1 character set is utf8, but the table t2 character set is utf8mb4. Different character sets cause full table scans on t1.

Possible Causes

Analysis based on the execution plan is as follows:

MySQL allows you to specify a character set for a whole table or a single column. If no character set is specified for a column, the column shares the table character set by default.

The t1.c3 character set is utf8, but the t2.c3 character set is utf8mb4. The t1.c3 character set needs to be converted. Otherwise, the execution plan ignores the index of t1.c3, and the system performs full table scans on t1. The number of times full table scans will be performed on t1 is equal to the number of records in t2 that satisfy the condition where t2.c2 = 'guangdong', which deteriorates the system performance.

The warning indicates that the character set of t1.c3 is converted into utf8mb4. Character set utf8mb4 is a superset of utf8. MySQL only supports conversion from a smaller character set to a larger one.

Troubleshooting

Character sets of two tables to be joined must be unified. For example, convert the character set of table t1 to utf8mb4 in this simulated scenario.

NOTE:

The alter table command may block write operations. Perform the character set conversion in off-peak hours. In production environment, you are advised to back up the table before converting its character set. You are advised to use pt-online-schema-change to modify character sets for large tables without blocking read or write operations.

  1. Convert the character set of table t1.

    mysql> alter table t1 convert to charset utf8mb4;

    Query OK, 9 rows affected (0.05 sec)
    Records: 9  Duplicates: 0  Warnings: 0
  2. Query the execution plan.

    mysql> explain select * from t2 join t1 on t1.c3 =t2.c3 where t2.c2 = 'guangdong'\G

    The command output indicates that the index character set of t1.c3 has been changed and the fault is rectified.

Failed to Perform MySQL Physical Backups Because of Slow Queries Blocking the FTWTL Operation
Symptom
  • SQL statements fail to be executed and no result is returned.
  • Physical backups of MySQL cannot be completed.
Possible Causes

The following figure shows the symptom.

The preceding figure indicates that the slow query whose ID is 111804586 and execution time is 26114s blocks the FLUSH TABLES WITH READ LOCK (FTWTL) operation of innobackupex backup. All tables need to be closed to perform FTWTL, but the slow query 111804586 is running which makes FTWTL wait. As a result, backups fail to start or finish.

In this case, FTWTL adds global read locks to all tables except news_webpage. Subsequent data manipulation language (DML) operations are in the Waiting for global read lock state, and query operations on table news_webpage are in the Waiting for table flush state. Therefore, no result is returned.

Temporary Solutions
  • Solution 1

    If the database does not contain any user-defined MyISAM tables, or MyISAM tables exist but store a small amount of data, run the following command to kill the slow query process:

    kill 111804586;

  • Solution 2

    If the database contains a user-defined MyISAM table that stores a large amount of data, run the following commands to kill the slow query and backup processes:

    kill 111804586;

    kill 111952866;

Final Solutions
  • Optimize slow query processes. Locate and rectify the faults that cause a long SQL statement execution. Pay special attention to whether an endless loop exists in functions.
  • If a user-defined MyISAM table exists, you are advised to convert the user-defined MyISAM table to an InnoDB table.
A Large Number of MyISAM Table Sessions in the Waiting for Table Level Lock State
Symptom

A large number of MyISAM table sessions are in the Waiting for table level lock state. As a result, the quota of connections is quickly used up, data retrieval times out, and front-end applications report errors.

Run the following command to view session statuses:

show processlist

The MyISAM engine only supports table-level locking. It does not support row-level locking.

Read operations on a MyISAM table does not block read requests but write requests from other sessions to the table. Write operations on a MyISAM table block read and write operations performed by other users on the table. For a MyISAM table, read and write operations are serial, and the write operations are also serial. After a thread obtains the write lock of a table, only the thread that holds the lock can update the table. Read and write operations of other threads wait until the lock is released.

Troubleshooting

Fast read or write operations on a MyISAM table shorten the time of other threads to wait for the lock to be released and reduce the number of sessions in the Waiting for table level lock state.

Therefore, you need to optimize SQL statements. Perform the following procedure:

  1. Query table information.

    mysql> select table_schema,table_name,table_type, engine from information_schema.tables where table_name='userinfo';

    The command output indicates that this table uses the MyISAM storage engine.

  2. Modify the value of key_buffer_size.

    The key_buffer_size parameter specifies the index buffer size of a MyISAM table. The default value of this parameter in RDS is 16 MB. If all tables in the database are MyISAM tables, set the value to 1/4 of the server memory. Data retrieval can use the database memory, which greatly improves retrieval efficiency.

  3. Optimize SQL statements.

    If the performance of SQL statements is poor, you need to consider index problems first. The following is an example of index missing. Optimization measures for other SQL statements are not described here.

    In the following example, the total number of rows in the database is less than 300,000, but the execution plan indicates that the database scans 270,000 rows of data to query a record. Indexes of the token field work poorly.

    mysql> explain

    -> SELECT * FROM 'userinfo' WHERE ( 'token' = 'rtbpjd1476262654' ) AND ( 'weid' = 'ohGDUjmosQl1JXWa7j91gpKUcgM' ) LIMIT 1;

  4. Query index information.

    mysql> show keys from userinfo;

    The command output indicates that the weid field has no index.

  5. Create composite indexes.

    create index idx_weid_token on userinfo(weid,token);

    NOTE:

    Creating indexes locks tables. Therefore, you need to perform this operation during off-peak hours.

  6. Delete the index of the token field.

    alter table userinfo drop index token;

  7. Query the execution plan.

    mysql> explain

    -> SELECT * FROM 'userinfo' WHERE ( 'token' = 'rtbpjd1476262654' ) AND ( 'weid' = 'ohGDUjmosQl1JXWa7j91gpKUcgM' ) LIMIT 1;

    The command output indicates that only two rows of records need to be scanned, and the execution time is greatly reduced to milliseconds.

  8. Check session statuses.

    mysql> show processlist;

  9. Query specific fields as required. Do not use select * to query all records.
  10. The InnoDB engine supports transactions and row-level locks with higher concurrency performance. You are advised to use the InnoDB engine instead of the MyISAM engine. Run the following command to change MyISAM tables to InnoDB tables:

    mysql> alter table table_name engine innodb;

Failed to Modify innodb_buffer_pool_size Without Restarting the Database (ERROR 1231 in MySQL 5.7)
Symptom

When you set 5242880 to innodb_buffer_pool_size for MySQL 5.7 DB instances, ERROR 1231 (42000): Variable 'innodb_buffer_pool_size' can't be set to the value of '5242880' is reported. The error message is as follows.

Possible Causes

When the value of innodb_buffer_pool_size is less than 1 GB, the value of innodb_buffer_pool_instances must be 1 to prevent performance problems caused by an excessive number of innodb buffer pool instances. However, innodb_buffer_pool_instances is read-only, and modifications to this parameter take effect only after the database is restarted. Therefore, innodb_buffer_pool_size fails to be set to 5242880.

Troubleshooting

Modify the value of innodb_buffer_pool_instances in the my.cnf file and restart the database. The modification takes effect if the following information is displayed.

If the memory of a MySQL DB instance is greater than or equal to 2 GB, the default value of innodb_buffer_pool_instances is 8. Therefore, the value of innodb_buffer_pool_size cannot be modified to less than 1 GB without restarting the database.

xtrabackup Backup Failed Because of Inconsistent innodb_log_file_size Values in my.cnf and the DB Instance
Symptom

After you modify the value of innodb_log_file_size in the configuration file by using a parameter group, the parameter value in the configuration file differs from the parameter value in the DB instance. As a result, backup fails.

The following shows the symptom.

Possible Causes

When you use xtrabackup to back up a MySQL database, xtrabackup checks whether the innodb_log_file_size value in my.cnf is equal to the value in the DB instance. If they are inconsistent, backup fails.

Troubleshooting

Modify the value of innodb_log_file_size in the configuration file to ensure it is equal to the value in the DB instance.

The fault is rectified if the following information is displayed.

PostgreSQL

PostgreSQL Parameter Failed to Take Effect After a Modification
Symptom

The default PostgreSQL parameter is changed on the parameter configuration page, but the modification does not take effect.

Possible Causes
  • The user does not reboot the database after modifying the parameter. The database needs to be rebooted for certain parameter modifications to take effect.
  • The user does not close the current session. The session needs to be reopened for certain parameter modifications to take effect.
  • The new value is improper, resulting in a PostgreSQL failure.
  • Some parameters cannot be written to the configuration file and are automatically restored.
  • The DB instance is faulty.
Locating Method

Check whether the DB instance status is normal, understand the features of the PostgreSQL parameter to be modified, and use a correct method to change the parameter to a correct value. Reboot the PostgreSQL or reopen a session as required.

Procedure
  1. Use PuTTY to log in to the newRDS-Database01 node.

    Default account: dbs; default password: Changeme_123

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Cloud12#$.

  3. Run the following command to switch to gausscore user:

    su gausscore

  4. Run the following command to log in to the CORE database:

    gsql -p Port number -q Database name-U COREDatabase username-W CORE user password;

    For example, if the database username is wfs, the port number is 8635, and the database user password is RDSDB@Remote123, run the following command:

    gsql -p 8635 -q core -U core -W RDSDB@Remote123

  5. Connect to GaussDB and query the DB instance status in the dbs_instance table.

    CORE=> Select name,status from dbs_instance where name='DB instance name';

    • If the DB instance status is normal, the DB instance is normal.
    • If the DB instance status is abnormal, the DB instance is faulty. Restore the instance first.

  6. Check and reboot the database, or close the current connection and reopen a session.
  7. Set the parameter again following the instructions provided at the official PostgreSQL website.
  8. If the fault persists, contact technical support for assistance.
PostgreSQL Processes Killed Due to Out of Memory
Symptom

The following information is displayed in PostgreSQL logs:

Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] Node 0 DMA free:15876kB min:64kB low:80kB high:96kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] lowmem_reserve[]: 0 3360 15863 15863

The following information is displayed in operating system logs:

Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.334044] kbox catch oom event.
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] Mem-Info:
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] Node 0 DMA per-cpu:
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    0: hi:    0, btch:   1 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    1: hi:    0, btch:   1 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    2: hi:    0, btch:   1 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    3: hi:    0, btch:   1 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    4: hi:    0, btch:   1 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    5: hi:    0, btch:   1 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    6: hi:    0, btch:   1 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    7: hi:    0, btch:   1 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] Node 0 DMA32 per-cpu:
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    0: hi:  186, btch:  31 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    1: hi:  186, btch:  31 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    2: hi:  186, btch:  31 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    3: hi:  186, btch:  31 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    4: hi:  186, btch:  31 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    5: hi:  186, btch:  31 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    6: hi:  186, btch:  31 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    7: hi:  186, btch:  31 usd:   0
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] Node 0 Normal per-cpu:
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    0: hi:  186, btch:  31 usd: 178
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    1: hi:  186, btch:  31 usd: 171
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    2: hi:  186, btch:  31 usd: 113
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    3: hi:  186, btch:  31 usd:  76
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    4: hi:  186, btch:  31 usd: 181
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    5: hi:  186, btch:  31 usd: 201
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    6: hi:  186, btch:  31 usd: 169
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] CPU    7: hi:  186, btch:  31 usd: 166
Feb 27 12:15:39 DBS_GaussBase2 kernel: [469644.335022] active_anon:1840295 inactive_anon:339373 isolated_anon:6304
 active_file:11393 inactive_file:39907 isolated_file:72
 unevictable:20763 dirty:44366 writeback:0 unstable:0
 free:158473 slab_reclaimable:61741 slab_unreclaimable:51077
 mapped:102894 shmem:349617 pagetables:1483725 bounce:0
 free_cma:0

 active_file:11393 inactive_file:39907 isolated_file:72
 unevictable:20763 dirty:44366 writeback:0 unstable:0
 free:158473 slab_reclaimable:61741 slab_unreclaimable:51077
 mapped:102894 shmem:349617 pagetables:1483725 bounce:0
 free_cma:0

Troubleshooting
  • Databases
    • Change the value of max_connections based on the maximum number of concurrent connections required by a service.
    • Set shared_buffer and work_mem to proper values based on the value of max_connections and server memory.
  • Applications: Optimize services to reduce database operations, and optimize SQL statements to reduce memory usage.
Current Transaction Aborted and Commands Ignored Until End of Transaction Block
Symptom

The following error information is displayed in PostgreSQL logs:

< 2018-03-23 09:16:08.092 CST >ERROR:  current transaction is aborted, commands ignored until end of transaction block

The following error information is displayed in Java logs:

org.postgresql.util.PSQLException: ERROR: 
current transaction is aborted, commands ignored until end of transaction block
Possible Causes

An SQL statement of a transaction failed to be executed. The system caught the exception but ignored it. Then, the same transaction in the same connection ran another SQL statement. Such an operation is forbidden in PostgreSQL.

Troubleshooting
  1. Enable the autocommit mode on applications.
  2. After the exception is caught, roll back the transaction and restart it.
Failed to Restart PostgreSQL Because of Insufficient Semaphores
Symptom

The following error is reported during database startup:

[2018-01-31 17:07:56.726 CST] postmaster 104630 LOG: Starting checkDataDir (postmaster.c:1073)
[2018-01-31 17:07:56.726 CST] postmaster 104630 LOG: Starting ChangeToDataDir (postmaster.c:1079)
[2018-01-31 17:07:56.726 CST] postmaster  104630 LOG: Starting CheckDateTokenTables (postmaster.c:1125)
[2018-01-31 17:07:56.726 CST] postmaster  104630 LOG: Starting CreateDataDirLockFile (postmaster.c:1156)
[2018-01-31 17:07:56.728 CST] postmaster  104630 LOG: Starting pgaudit_agent_init (postmaster.c:1177)
[2018-01-31 17:07:56.728 CST] postmaster  104630 LOG: Starting process_shared_preload_libraries (postmaster.c:1186)
[2018-01 -31 17:07 :56.735 CST] postmaster 104630 LOG: could not create IPv6 socket: Address family not supported by protocol family (pqcomm.c:494)
[2018-01-31 17:07:56.736 CST] postmaster  104630 LOG: Create TCP/IP socket for Psql "*":5432 success (postmaster.c:1240)
[2018-01-31 17:07:56.736 CST] postmaster  104630 LOG: Starting reset_shared (postmaster.c:1393)
[2018-01 -31 17:07 :56.854 CST] postmaster 104630 FATAL: could not create semaphores: No space left on device (pg_sema.c:123)
[2018-01-31 17:07:56.854 CST] postmaster  104630 DETAIL: Failed system call was semget(5432129, 17, 03600).
[2018-01-31 17:07:56.854 CST] postmaster  104630 HINT: This error does *not* mean that you have run out of disk space. It occurs when either the system limit for the maximum number of semaphore sets (SEMMNI), or the system wide maximum number of semaphores (SEMMNS), would be exceeded. You need to raise the respective kernel parameter. Alternatively, reduce PostgreSQL's consumption of semaphores by reducing its max_connections parameter.
The PostgreSQL documentation contains more information about configuring your system for PostgreSQL.
[2018-01-31 17:07:56.884 CST] postmaster  104630 LOG: StreamDoUnlink sockpath =/tmp/.s.PGSQL.5432 (pqcomm.c:342)
[2018-01-31 17:07:56.884 CST] postmaster  104630 LOG: UnlinkLockFile socket_lock_file =/tmp/.s.PGSQL.5432.lock (miscinit.c:1232)
..............
postmaster.pid file does not exist after 15 seconds
stopped waiting
could not start server
Troubleshooting

This error occurs because the kernel parameter related to semaphores is not configured or is set to a small value. You can modify the value of the kernel parameter based on site requirements. For example, set 250 256000 32 1024 to kernel.sem. Then, run the sysctl -p command.

Failed to Delete Corresponding Files After a Table Is Dropped
Symptom

A table was dropped, but the corresponding files have not been deleted. The following shows the symptom.

WFS=# SELECT OID,RELNAME FROM PG_CLASS WHERE RELNAME='WFS_SESSION';
 
  OID  |   RELNAME   
 
-------+-------------
 
 16472 | WFS_SESSION
 
(1 row)
 
 
WFS=# drop table wfs.wfs_session;
 
DROP TABLE
 
WFS=# vacuum full;
 
VACUUM
 
 
 
[gaussbase@dbs_gauss1 16396]$ ls -la 16472
-rw------- 1 gaussbase dbgrp 0 Jan 25 10:26 16472

Troubleshooting

After a table is dropped, perform the checkpoint operation to delete the corresponding files.

WFS=# checkpoint;
CHECKPOINT
 
[gaussbase@dbs_gauss1 16396]$ ls -la 16472
ls: cannot access 16472: No such file or directory
PostgreSQL Lock Waiting
Symptom

Some SQL statements take a long time to execute and cannot be completed in PostgreSQL.

Troubleshooting
  1. Open all logs.
    1. Run the following command to go to the /opt/gaussdbcore/data directory as user gaussdbcore:

      cd /opt/gaussdbcore/data

      You can also run the following command to go the /opt/gaussdbbase/data directory as user gaussdbbase:

      cd /opt/gaussdbbase/data

    2. Run the following command to edit the postgresql.conf file:

      vim postgresql.conf

    3. Record the value of log_statement and change to all.
    4. Run the following command to save the configurations:

      :wq!

    5. Run the following command to reload the configurations:

      gs_ctl reload

  2. Log in to the PostgreSQL database as the database administrator and run the following SQL statements:

    SELECT blocked_locks.pid AS blocked_pid,

    blocked_activity.usename AS blocked_user,

    blocking_locks.pid AS blocking_pid,

    blocking_activity.usename AS blocking_user,

    blocked_activity.query AS blocked_statement,

    blocking_activity.query AS current_statement_in_blocking_process

    FROM pg_catalog.pg_locks blocked_locks

    JOIN pg_catalog.pg_stat_activity blocked_activity ON blocked_activity.pid = blocked_locks.pid

    JOIN pg_catalog.pg_locks blocking_locks

    ON blocking_locks.locktype = blocked_locks.locktype

    AND blocking_locks.DATABASE IS NOT DISTINCT FROM blocked_locks.DATABASE

    AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation

    AND blocking_locks.page IS NOT DISTINCT FROM blocked_locks.page

    AND blocking_locks.tuple IS NOT DISTINCT FROM blocked_locks.tuple

    AND blocking_locks.virtualxid IS NOT DISTINCT FROM blocked_locks.virtualxid

    AND blocking_locks.transactionid IS NOT DISTINCT FROM blocked_locks.transactionid

    AND blocking_locks.classid IS NOT DISTINCT FROM blocked_locks.classid

    AND blocking_locks.objid IS NOT DISTINCT FROM blocked_locks.objid

    AND blocking_locks.objsubid IS NOT DISTINCT FROM blocked_locks.objsubid

    AND blocking_locks.pid != blocked_locks.pid

    JOIN pg_catalog.pg_stat_activity blocking_activity ON blocking_activity.pid = blocking_locks.pid

    WHERE NOT blocked_locks.GRANTED;

    The preceding command output is displayed as follows:
    BLOCKED_PID | BLOCKED_USER | BLOCKING_PID | BLOCKING_USER | BLOCKED_STATEMENT | CURRENT_STATEMENT_IN_BLOCKING_PROCESS 
    -------------+--------------+--------------+---------------+------------------------------------------------------------+----------------------------------------------------------------------------------
    114211 | GAUSSDBA | 114217 | GAUSSDBA | SELECT * FROM DBS_ACTION WHERE INSTANCE_ID = $1 FOR UPDATE | SELECT * FROM DBS_ENDPOINT WHERE DBS_ENDPOINT.ID IN +
    | | | | | (SELECT DBS_RELATIONSHIP.TARGET_ENTITY_ID +
    | | | | | FROM DBS_RELATIONSHIP +
    | | | | | INNER JOIN DBS_RELATIONSHIP_TYPE +
    | | | | | ON DBS_RELATIONSHIP_TYPE.ID = DBS_RELATIONSHIP.TYPE_ID +
    | | | | | AND DBS_RELATIONSHIP_TYPE.NAME = 'BIND_ENDPOINT'+
    | | | | | AND DBS_RELATIONSHIP.SOURCE_ENTITY_ID = $1)

    The command output indicates that the transaction of process 114217 blocks the transaction of process 114211.

  3. View logs in the $PGDATA/pg_log directory and query COMMIT or ROLLBACK operation records whose process IDs are 114217. In this example, there is no ROLLBACK operation record whose process ID is 114217. Only COMMIT operation records are displayed.
    Line 249: Line 102774: [2017-12-20 22:01:57.164 CST] postgres  114217 LOG: execute S_4: COMMIT (postgres.c:6326)
    Line 253: Line 104921: [2017-12-20 22:02:08.481 CST] postgres  114217 LOG: execute S_4: COMMIT (postgres.c:6326)
    Line 258: Line 106572: [2017-12-20 22:02:17.171 CST] postgres  114217 LOG: execute S_4: COMMIT (postgres.c:6326)
    Line 302: Line 123074: [2017-12-20 22:03:47.215 CST] postgres  114217 LOG: execute S_4: COMMIT (postgres.c:6326)
    Line 311: Line 126123: [2017-12-20 22:03:57.164 CST] postgres  114217 LOG: execute S_4: COMMIT (postgres.c:6326)
    Line 318: Line 129293: [2017-12-20 22:04:17.162 CST] postgres  114217 LOG: execute S_4: COMMIT (postgres.c:6326)
    Line 333: Line 136107: [2017-12-20 22:04:47.170 CST] postgres  114217 LOG: execute S_4: COMMIT (postgres.c:6326)
    Line 334: Line 136213: [2017-12-20 22:04:47.177 CST] postgres  114217 LOG: execute S_4: COMMIT (postgres.c:6326)
    Line 340: Line 137580: [2017-12-20 22:04:57.173 CST] postgres  114217 LOG: execute S_4: COMMIT (postgres.c:6326)
    Line 344: Line 138829: [2017-12-20 22:05:07.162 CST] postgres  114217 LOG: execute S_4: COMMIT (postgres.c:6326)
  4. Starting from line 138829 of the result in step 3, query records of UPDATE and DELETE SQL statements performed on the table DBS_ACTION whose process IDs are 114217.
    Line 138829: [2017-12-20 22:05:07.162 CST] postgres  114217 LOG: execute S_4: COMMIT (postgres.c:6326)
    Line 141308: [2017-12-20 22:05:16.888 CST] postgres  114217 LOG: execute <unnamed>: UPDATE DBS_ACTION SET status=$1, version = $2+1, updated_at = (now() at time zone 'utc') WHERE id = $3 AND version = $4 (postgres.c:6326)
    Line 141309: [2017-12-20 22:05:16.888 CST] postgres  114217 DETAIL: parameters: $1 = 'DELETED', $2 = '1', $3 = '50fa2cb5-1817-4050-8a92-f767cc8e1e50', $4 = '1'
    Line 141314: [2017-12-20 22:05:16.891 CST] postgres  114217 LOG: execute <unnamed>: UPDATE DBS_ACTION SET status=$1, version = $2+1, updated_at = (now() at time zone 'utc') WHERE id = $3 AND version = $4 (postgres.c:6326)
    Line 141315: [2017-12-20 22:05:16.891 CST] postgres  114217 DETAIL: parameters: $1 = 'DELETED', $2 = '0', $3 = 'd2ee9f6b-b8a1-442a-867e-15cde72154c3', $4 = '0'

    The preceding query result indicates that the UPDATE DBS_ACTION SQL statement of process 114217 is not committed or rolled back. As a result, process 114211 is blocked. Based on this conclusion, check and rectify code errors.

  5. In the postgresql.conf file, change the value of log_statement to the original value recorded in step 1.c.
  6. Run the following command to save the configurations:

    :wq!

  7. Run the following command to reload the configurations:

    gs_ctl reload

Abnormal Connections to PostgreSQL
Symptom
[gausscore@dbs_gauss1 pg_log]$ psql -p 8635 -q core -U core -W clouddb@123
 gsql: could not fork new process for connection: Cannot allocate memory

 could not fork new process for connection: Cannot allocate memory
 could not fork new process for connection: Cannot allocate memory

Check the following items:

  1. Check the memory status. The memory is sufficient, as shown in the following figure.

  2. Check the number of file handles opened by GaussDB processes.
    [root@dbs_gauss1 rds]#lsof | grep gausscore | wc -l
     16319
  3. Check default configurations of the ulimit command.

Troubleshooting
  1. Run the following command to edit the /etc/security/limits.conf file:

    vim /etc/security/limits.conf

  2. Add the following information to the file:
    gausscore soft nproc 5500
    gausscore hard nproc 5500
    gausscore soft nofile 1000000
    gausscore hard nofile 1000000
  3. Press Esc and run the following command to save the configurations:

    :wq!

Microsoft SQL Server

Microsoft SQL Server Parameter Failed to Take Effect After a Modification
Symptom

A user has changed the default value of a Microsoft SQL Server parameter but the modification does not take effect.

Possible Causes
  • The user does not reboot the database after modifying the parameter. The database needs to be rebooted for certain parameter modifications to take effect.
  • The user does not close the current session. The session needs to be reopened for certain parameter modifications to take effect.
  • The new value is improper, resulting in a Microsoft SQL Server failure.
  • Some parameters cannot be written to the configuration file and are automatically restored.
  • The DB instance is faulty.
Locating Method

Check whether the instance status is normal, understand the configuration features of the Microsoft SQL Server parameter to be modified, and use a correct method to change the parameter to a correct value. Reboot the Microsoft SQL Server or reopen a session as required.

Procedure
  1. Use PuTTY to log in to the newRDS-Database01 node.

    Default account: dbs; default password: Changeme_123

  2. Run the following command to switch to the root user:

    su - root

    The default password of the root user is Cloud12#$.

  3. Run the following command to switch to gausscore user:

    su gausscore

  4. Run the following command to log in to the CORE database:

    gsql -p Port number -q Database name-U COREDatabase username-W CORE user password;

    For example, if the database username is wfs, the port number is 8635, and the database user password is RDSDB@Remote123, run the following command:

    gsql -p 8635 -q core -U core -W RDSDB@Remote123

  1. Connect to GaussDB and query the DB instance status in the dbs_instance table.

    Core=>Select name,status from dbs_instance where name='DB instance name';

    • If the DB instance status is normal, the DB instance is normal.
    • If the DB instance status is abnormal, the DB instance is faulty. Restore the instance first.

  2. Check and reboot the database, or close the current connection and reopen a session.
  3. Set the parameter again following the instructions provided at the official Microsoft SQL Server website.
  4. If the fault persists, contact technical support for assistance.
Translation
Download
Updated: 2019-06-10

Document ID: EDOC1100063248

Views: 22930

Downloads: 37

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next