No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ManageOne Management

ManageOne Management

ALM-101209 Switchover Triggered by a Faulty Node

Description

If the OMMHA active node is faulty, OMMHA node failover or manual switchover is triggered. This alarm is generated when the failover or manual switchover is complete. After the switchover is complete and the nodes are normal, the alarm is automatically cleared.

Attribute

Alarm ID

Alarm Severity

Alarm Type

101209

Critical

Protection switching

Parameters

Name

Meaning

Node Name

Name of the deployment node

Site Name

Name of the site for which the alarm is generated

Impact on the System

  • The floating IP address is switched from the active node to the standby node.
  • The management plane cannot be logged during the switching of the floating IP address.
  • The single process including the critical process on the original active node will be stopped. When the standby node is becoming the active node, the single process including the critical process on this node will be started.

Possible Causes

  • The hardware of the original active VM is faulty.
  • The original active node is powered off unexpectedly.
  • The critical process on the original active node is abnormal.
  • The CPU and memory usage of the original active node is too high.
  • The network of the original active node is faulty.
  • The floating IP address fails to be mounted.
  • The user performed a manual switchover.

Procedure

  1. Check whether the hardware of the original active VM is faulty and whether the original active node is powered off unexpectedly.

    If the OMMHA switchover occurs because the hardware of the original active VM is faulty or the original active node is powered off abnormally, contact the administrator to rectify the fault.

  2. Check whether the processes on the active node are running properly.

    1. Use PuTTY to log in to the node for which this alarm is generated in SSH mode as the sopuser user. Obtain the IP address of the node from Related Information.

      The default password of the sopuser user is D4I$awOD7k.

    2. Run the following command to switch to the ossadm user:

      su - ossadm

      The default password of the ossadm user is Changeme_123.

    3. Run the following commands to query the statuses of the active and standby nodes:

      > cd /opt/oss/Product/apps/OMMHAService/bin

      > bash status.sh

      NOTE:

      For the management node, replace /opt/oss/SOP with /opt/oss/manager.

      Information similar to the following is displayed:

      NodeName    HostName    ...    HAAllResOK    HARunPhase
      ha2         Service-2   ...    normal        Actived
      ha1         Service-1   ...    normal        Deactived
      
      NodeName    ResName     ...    ResHAStatus    ResType
      ha2         RMCritical  ...    Normal         Active_standby
      ha2         RMFloatIp   ...    Normal         Single_active
      ha2         RMIrNic     ...    Normal         Single_active
      ha1         RMCritical  ...    Normal         Active_standby
      ha1         RMFloatIp   ...    Normal         Single_active
      ha1         RMIrNic     ...    Normal         Single_active
      • If both the values in the HAAllResOK and ResHAStatus columns are Normal, the switchover is complete and the nodes are normal. This alarm is automatically cleared.
      • If any value in the HAAllResOK and ResHAStatus columns is not Normal, the node is faulty or the switchover is not complete.

  3. Wait 3 minutes. Check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, collect the alarm handling information and contact technical support.

Related Information

The following describes how to query the management IP address of the node:

  1. In the address box of the web browser, enter https://Client IP address of the Deployment Portal:31945 and press Enter.
  2. Enter the username and password, and click Log In.
  3. Choose Application > System Monitoring from the main menu. In the upper left corner of the System Monitoring page, move the pointer to and select the product. On the System Monitoring page, click the Nodes tab.
  4. In the Node Name column, click the name of the node whose management IP address you need to query.
  5. Click the node name. The IP address in the upper part of the node details page is the management IP address of the node.

ALM-51023 NTP service abnormal

Description

By default, the management plane checks the status of the NTP server on the deployment node and product nodes every five minutes. This alarm is generated when the deployment detects that the NTP server exists but is not functioning properly for three consecutive times. The alarm is automatically cleared when the deployment detects that the NTP server exists and is functioning properly after three consecutive checks.

Attribute

Alarm ID

Alarm Severity

Alarm Type

51023

Critical

Processing error alarm

Parameters

Name

Meaning

Node Name

Name of the node for which the alarm is generated

Site Name

Name of the site for which the alarm is generated

NTP Address

Abnormal NTP service on the node

Impact on the System

If ManageOne services fail to synchronize time from the upper-layer NTP server, the time of each device in the network may be inaccurate. When operations requiring timestamp recording such as backup and restoration and operation log recording are performed, the service processing efficiency may be affected due to incorrect backup package restoration or log obtaining errors.

Possible Causes

The possible causes of this alarm are as follows:

  • The NTP service of the NTP server is abnormal.
  • The time between multiple NTP servers is inconsistent.

Procedure

  1. Use a browser to log in to ManageOne Deployment Portal.

    URL: https://Floating IP address of ManageOne Deployment Portal:31945, for example, https://192.168.0.1:31945

    Default account: admin; default password: Huawei12#$

  2. Choose Maintenance > Time Management > Configure NTP from the management plane main menu. On the Configure NTP page, check the number of NTP servers.

    • If there are multiple NTP servers, check whether the time between the NTP servers is consistent.
      • If yes, go to 4 and 6.
      • If no, go to 3.
    • Only one NTP server exists. Go to 4 and 6.

  3. Adjust the system time of the NTP server whose time is incorrect by referring to its configuration document. Check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 4 and 6.

  4. Use PuTTY to log in to the deployment node as the sopuser user in SSH mode.

    The default password of the sopuser user is D4I$awOD7k.

  5. Run the following command to switch to the ossadm user:

    su - ossadm

    The default password of the ossadm user is Changeme_123.

  6. Check whether the NTP service has started.

    For the Euler OS, perform the following operations:

    > service ntpd status

    If the command output contains "active (running)", the NTP service has been started on the deployment node. Otherwise, the NTP service on the deployment node is not started. Run the following command to start the NTP service:

    > su - root

    Password: Password of the root user

    # service ntpd start

  7. Check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, contact technical support.

Related Information

None

ALM-101211 Database Instance Failover

Description

This alarm is generated when a database instance failover occurs. After the fault is rectified, this alarm needs to be manually cleared.

Attribute

Alarm ID

Alarm Severity

Alarm Type

101211

Major

Processing error alarm

Parameters

Name

Meaning

Host

Hostname of the abnormal database instance.

Database service

Name of the database instance for which this alarm is generated.

DB type

Type of the database instance.

Impact on the System

Abnormal status of the replication between database instances may cause data loss.

Possible Causes

  • The node where the master database instance is located is faulty or powered off.
  • The master database instance is faulty or stops running.

Procedure

  1. On the Current Alarms page, click the alarm name to view the host name, instance name, and type of the faulty database instance.
  2. Check whether the alarm "The database local copy status is abnormal" is generated.

  3. On the Current Alarms page, select the alarm "Database Instance Failover" and click Clear to manually clear the alarm.

Clearing

Manually clear this alarm on the Alarms page.

Related Information

None

ALM-101212 Failed to connect ZooKeeper

Description

This alarm is generated when the DBHASwitchService fails to connect to ZooKeeper. The database switchover is abnormal. This alarm is automatically cleared when the connection between DBHASwitchService and ZooKeeper recovers.

Attribute

Alarm ID

Alarm Severity

Alarm Type

101212

Major

Processing error alarm

Parameters

Name

Meaning

Host

Hostname of the node where the abnormal ZooKeeper service is located

Zookeeper service

Name of the service

Impact on the System

The database access route cannot be updated because the database switchover cannot be used.

Possible Causes

  • More than half of ZooKeeper services are faulty.
  • More than half of nodes where ZooKeeper is located are faulty or the network is faulty.

Procedure

  1. Use a browser to log in to ManageOne Deployment Portal.

    URL: https://Floating IP address of ManageOne Deployment Portal:31945, for example, https://192.168.0.1:31945

    Default account: admin; default password: Huawei12#$

  2. Check the running status of the MCZKService service process.

    1. On the main menu of the management plane, choose Application > Service Management > System Monitoring.
    2. On the upper left corner of the System Monitoring page, move the pointer to and select CloudSOP-UniEP.
    3. On the Services tab page, click UniEPMgr.
    4. In the Process area, view the Status of all processes starting with mczkapp.
      • If the Status is Running, the MCZKService service process is running properly. Go to 3.
      • If the Status is Starting or Stopping, the duration for starting or stopping a service is less than 1 minute. If the service is in this state for a long time, contact technical support.
      • If the Status is Faulty, Unknown, or Not Running, the MCZKService service process is abnormal. Contact technical support.

  3. In the Process area, view and record the Node Name specified by the MCZKService service process.
  4. Check the Connection Status of the nodes where the MCZKService service is deployed.

    1. On the upper left corner of the System Monitoring page, move the pointer to and select CloudSOP-UniEP.
    2. On the Nodes tab page, check the Connection Status of the node recorded in 3.
      • If the Connection Status is Normal, the node where the MCZKService service is deployed is running properly. Go to 5.
      • If the Connection Status is Disconnect, the node where the MCZKService service is deployed runs improperly. Contact technical support.

  5. Check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, collect the alarm handling information and contact technical support.

Clearing

This alarm is automatically cleared when the connection between DBHASwitchService and ZooKeeper recovers.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 48380

Downloads: 33

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next