No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-73411 Database Files Damaged

ALM-73411 Database Files Damaged

Description

The file used for storing data in the database is damaged.

Attribute

Alarm ID

Alarm Severity

Auto Clear

73411

Critical

Yes

Parameters

Name

Meaning

Fault Location Info

host_id: specifies the ID of the host for which the alarm is generated.

Additional Info

hostname: specifies the name of the host for which the alarm is generated.

Details: provides detailed information about the alarm.

Impact on the System

if the alarm is generated, the data table to which the damaged file cannot be visited, and therefore services may be faulty.

Possible Causes

The system is exceptionally powered off.

Procedure

  1. Use PuTTY to log in to the first FusionSphere OpenStack node through the IP address of the External OM plane.

    The default user name is fsp. The default password is Huawei@CLOUD8.

    The system supports both password and public-private key pair for identity authentication. If the public-private key pair is used for login authentication, see detailed operations in Using PuTTY to Log In to a Node in Key Pair Authentication Mode.

    NOTE:
    To obtain the IP address of the External OM plane, search for the required parameter on the Tool-generated IP Parameters sheet of the xxx_export_all.xlsm file exported from HUAWEI CLOUD Stack Deploy during software installation. The parameter names in different scenarios are as follows:
    • Region Type I scenario:

      Cascading system: Cascading-ExternalOM-Reverse-Proxy

      Cascaded system: Cascaded-ExternalOM-Reverse-Proxy

    • Region Type II and Region Type III scenarios: ExternalOM-Reverse-Proxy

  2. Run the following command and enter the password of user root to switch to user root:

    su - root

    The default password of user root is Huawei@CLOUD8!.

  3. Run the following command to disable user logout upon system timeout:

    TMOUT=0

  1. Import environment variables. For details, see Importing Environment Variables.
  2. Obtain the name of the component which generates the alarm and the ID of the node on which the alarm is generated.

    Query alarm details and obtain the alarm object, which is the component name in the xxx format. Query the number of the alarms with the same object.

    The file damage is related to the node fault. If multiple nodes are faulty, multiple alarms are generated. You need to take a note of IDs of all nodes on which alarms are generated.

  3. Obtain information about the active/standby deployment mode.

    Run the following command to obtain the service name of the component:

    cps template-list|grepxxx

    The first column indicates the service name, and yyy indicates the obtained service name, as shown in the following figure.

    Run the following command to obtain the active/standby deployment information of the component:

    cps template-instance-list --service yyy xxx

    GaussDB is used as an example, as shown in the following figure.

  4. Perform the required operation based on results of 5 and 6.

    • If the data file damage occurs in only the standby database, go to 8.
    • If the data file damage occurs in only the active database, go to 9.
    • If the data file damage occurs in both the active and standby databases, go to 10.

  5. Rebuild the standby database.

    1. Log in to the node where the standby database is deployed.
    2. Import environment variables. For details, see Importing Environment Variables.
    3. Run the following command to check whether environment variables related to the certificate need to be imported:

      cps template-params-show --service yyy xxx | grep -E "database_use_ssl|force_cert_check"

      Information similar to the following is displayed:

      A074D328-D21D-B211-8FAE-001823E5F68B:~ # cps template-params-show --service gaussdb gaussdb | grep -E "database_use_ssl|force_cert_check"
      | database_use_ssl         | true                                               |
      | force_cert_check         | true                                               |
      Check whether the returned values are true in the command output.
      • If yes, run the following commands to import required environment variables:

        export PGSSLCERT="/opt/fusionplatform/data/gaussdb_data/client_cert/client.crt"

        export PGSSLKEY="/opt/fusionplatform/data/gaussdb_data/client_cert/client.key"

        export PGSSLMODE="verify-ca"

        export PGSSLROOTCERT="/opt/fusionplatform/data/gaussdb_data/client_cert/ca.crt"

      • If no, go to 8.d.
    4. Run the following commands to rebuild the database:

      su gaussdba

      gs_ctl build

      If information similar to the following is displayed, the commands are successfully executed.

      Check whether the rebuilding is successful.

      • If yes, restart the database and go to 12.
      • If no, contact technical support for assistance.

  6. Rebuild the active database.

    1. Log in to the node where the active database is deployed.
    2. Import environment variables. For details, see Importing Environment Variables.
    3. Run the following command to perform an active/standby switchover:

      cps host-template-instance-operate --service yyy xxx --action swap

    4. Wait for 1 to 3 minutes, run the following command to query the status of the components, and check whether the switchover is successful:

      cps template-instance-list --service yyy xxx

      NOTE:

      If standby changes to active and active changes to standby, the switchover is successful.

      The switchover takes a certain period of time. If the switchover is not complete, wait for a period of time and then check whether the switchover is successful.

      • If yes, go to 9.e.
      • If no, contact technical support for assistance.
    5. Run the following command to check whether environment variables related to the certificate need to be imported:

      cps template-params-show --service yyy xxx | grep -E "database_use_ssl|force_cert_check"

      Information similar to the following is displayed:

      A074D328-D21D-B211-8FAE-001823E5F68B:~ # cps template-params-show --service gaussdb gaussdb | grep -E "database_use_ssl|force_cert_check"
      | database_use_ssl         | true                                               |
      | force_cert_check         | true                                               |
      Check whether the returned values are true in the command output.
      • If yes, run the following commands to import required environment variables:

        export PGSSLCERT="/opt/fusionplatform/data/gaussdb_data/client_cert/client.crt"

        export PGSSLKEY="/opt/fusionplatform/data/gaussdb_data/client_cert/client.key"

        export PGSSLMODE="verify-ca"

        export PGSSLROOTCERT="/opt/fusionplatform/data/gaussdb_data/client_cert/ca.crt"

      • If no, go to 9.f.
    6. Run the following commands to rebuild the database:

      su gaussdba

      gs_ctl build

      If information similar to the following is displayed, the commands are successfully executed.

      Check whether the rebuilding is successful.

      • If yes, restart the database and go to 12.
      • If no, contact technical support for assistance.

  7. Restore the active and standby databases.

    If both the active and standby databases are abnormal, this operation will cause data loss. Therefore, exercise caution when performing this operation. If you have any questions, contact technical support for assistance.

    If you are sure to perform the restoration operation, you are advised to perform the system audit after the restoration.

    1. Log in to the node where the active database is deployed.
    2. Find the damaged database.

      Check additional information about the alarm and find the damaged database (multiple damaged databases may exist).

    3. Run the following commands to log in to the faulty database (Nova database is used as an example):

      su gaussdba

      gsql nova

      NOTE:

      When you run the gsql nova command, the system prompts you to enter the password. The default password is FusionSphere123. If the database password has been changed, use the actual password.

    4. Run the following command to enable the data table function:

      set zero_damaged_pages=on;

    5. Reconstruct the data table.

      vacuum full;

      This operation may take several minutes. The duration of this operation depends on the amount of data in the data table.

      Check whether any error message is displayed during the command execution.

      • If no, go to 10.f.
      • If yes, contact technical support for assistance.
    6. If multiple databases are damaged, repeat 10.c to 10.e on each database.
    7. Rebuild the standby database.
      1. Log in to the node where the standby database is deployed.
      2. Import environment variables. For details, see Importing Environment Variables.
      3. Run the following command to check whether environment variables related to the certificate need to be imported:

        cps template-params-show --service yyy xxx | grep -E "database_use_ssl|force_cert_check"

        Information similar to the following is displayed:

        A074D328-D21D-B211-8FAE-001823E5F68B:~ # cps template-params-show --service gaussdb gaussdb | grep -E "database_use_ssl|force_cert_check"
        | database_use_ssl         | true                                               |
        | force_cert_check         | true                                               |
        Check whether the returned values are true in the command output.
        • If yes, run the following commands to import required environment variables:

          export PGSSLCERT="/opt/fusionplatform/data/gaussdb_data/client_cert/client.crt"

          export PGSSLKEY="/opt/fusionplatform/data/gaussdb_data/client_cert/client.key"

          export PGSSLMODE="verify-ca"

          export PGSSLROOTCERT="/opt/fusionplatform/data/gaussdb_data/client_cert/ca.crt"

        • If no, go to 10.g.iv.
      4. Run the following commands to rebuild the database:

        su gaussdba

        gs_ctl build

        If information similar to the following is displayed, the commands are successfully executed.

        Check whether the rebuilding is successful.

        • If yes, go to 11.
        • If no, contact technical support for assistance.

  8. Restart all component instances.

    Run the cps host-template-instance-operate --service yyy xxx --action stop command to stop the component.

    Run the cps template-instance-list --service yyy xxx command to query the status. Wait until fault is displayed for status and then start the component.

    Run the cps host-template-instance-operate --service yyy xxx --action start command to start the component.

    Run the cps template-instance-list --service yyy xxx command to query the statuses again. Wait until the component statuses become active and standby.

    Check whether all component instances are normal.

    • If yes, go to 13.
    • If no, contact technical support for assistance.

  9. Restart a single component instance.

    Run the cps host-template-instance-operate --service yyy xxx --host host_id --action stop command to stop the component.

    Replace host_id with the actual node ID.

    Run the cps template-instance-list --service yyy xxx command to query the component status. Wait until the status of the stopped instance becomes fault.

    Run the cps host-template-instance-operate --service yyy xxx --host host_id--action start command to start the component.

    Run the cps template-instance-list --service yyy xxx command to query the status again. Wait until the component status becomes normal.

    Check whether the component status is normal.

    • If yes, go to 13.
    • If no, contact technical support for assistance.

  10. After 5 to 10 minutes, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 35934

Downloads: 31

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next