No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-37006 Coordinator Process Is Abnormal

ALM-37006 Coordinator Process Is Abnormal

Description

This alarm is generated if:
  • The hardware of the computer where the CN is located is faulty (for example, the power is cut off or the hard disk is damaged).
  • The postgresql.conf configuration file does not exist in the Coordinator instance data directory, or a parameter in the file is incorrectly configured.
  • The Coordinator instance thread cannot monitor IP addresses or be bound with monitoring ports.
  • The CN instance process does not have the read/write permission on its data directory or the data directory is lost.
  • The virtual IP address to which the Coordinator instance is bound is abnormal.

Attribute

Alarm ID

Alarm Severity

Auto Clear

37006

Major

No

Parameters

Name

Meaning

ServiceName

Identifies the service for which the alarm is generated.

RoleName

Identifies the role for which the alarm is generated.

HostName

Identifies the host for which the alarm is generated.

Instance

Identifies the instance for which the alarm is generated.

Impact on the System

If the Coordinator instance fails to be started, the cluster displays a start failure message and the database system does not support data definition language (DDL) statements. However, the data manipulation language (DML) can be used normally.

After about 30 minutes, the system automatically deletes the faulty CN. By running the gs_om -t status --detail command, you can query that the status of the CN is Deleted. At this point, the DDL and DML statements can be used.

In this case, do not restart the MPPDB service directly. Follow the instructions provided in Procedure.

Possible Causes

  • The hardware of the computer where the CN is located is faulty (for example, the power is cut off or the hard disk is damaged).
  • The postgresql.conf configuration file does not exist in the Coordinator instance data directory, or a parameter in the file is incorrectly configured.
  • The Coordinator instance thread cannot monitor IP addresses or be bound with monitoring ports.
  • The CN instance process does not have the read/write permission on its data directory or the data directory is lost.
  • The virtual IP address to which the Coordinator instance is bound is abnormal.

Procedure

  1. Log in to the FusionInsight Manager.

    1. Log in to the ManageOne OM plane using a browser, then choose Alarms.
      • Login address: https://URL for the homepage of the ManageOne OM plane:31943. Example: https://oc.type.com:31943.
      • Default username: admin, default password: Huawei12#$.
    2. In the alarm list, locate and click the target alarm name in the Name column. The Alarm Details and Handling Recommendations dialog box is displayed.
    3. Locate the value in the IP Address/URL/Domain Name column, which is the float IP address of the FusionInsight Manager.
    4. Log in to the FusionInsight Manager using a browser.
      • Login address: https://float IP address of the FusionInsight Manager:28443/web. Example: https://10.10.192.100:28443/web.
      • Default username: admin, default password: obtain it from the system administrator.

  2. If a device partition loss alarm is generated before or after the alarm is generated, rectify the fault by referring to "Hard Disk Troubleshooting" in the Product Documentation.
  3. After the alarm is reported, wait for 5 minutes and click Alarms on FusionInsight Manager to check whether the alarm persists.

    • If yes, go to 4.
    • If no, no further action is required.

  4. Log in to the LibrA server as user omm and run the source ${BIGDATA_HOME}/mppdb/.mppdbgs_profile command to start the environment variables. Run the following command to check whether the CN status of the faulty node is Deleted.

    Default user: omm, default password: Bigdata123@.

    gs_om -t status --detail

    • If yes, go to 5.
    • If no, go to 11.

  5. On FusionInsight Manager, choose Services > MPPDB. Click Instances. In the instance list, select the MPPDBServer of the faulty node.
  6. Click Instance Configuration, set Type to All, and enter parameter mppdb.coo.number in the search box.
  7. Set the value of mppdb.coo.number to 0 and click Save Configuration.
  8. In the dialog box that is displayed, click OK. When the system displays Operation succeeded, click Finish to check whether the operation is successful.

    • If yes, manually clear the CN process alarm after deleting the faulty CN.
    • If no, go to 11.

  9. (Optional) Restore the CN of the faulty node, repeat steps 5 to 6, set mppdb.coo.number to 1, and click Save Configuration.
  10. (Optional) In the dialog box that is displayed, click OK. When the system displays Operation succeeded, click Finish to check whether the operation is successful.

    • If yes, the CN is added.
    • If no, go to 11.

  11. On FusionInsight Manager, choose System > Log Download.
  12. Select MPPDB from the Services drop-down list box and click OK.
  13. Set Start Time for log collection to 1 hour ahead of the alarm generation time and End Time to 1 hour after the alarm generation time, and click Download.
  14. Contact Technical Support and send the collected logs.

Alarm Clearing

After the fault is rectified, the system does not automatically clear this alarm, and you need to manually clear the alarm.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 34006

Downloads: 31

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next