No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Alarm Reference

Alarm Reference

ALM-servicemonitor_agent_heartbeat Node Disconnection Alarm

Alarm Description

MOICAgent is deployed on the node where cloud services and ManageOne are located for monitoring cloud services and ManageOne, collecting cloud service logs, and backing up the ManageOne database. In normal cases, MOICAgent periodically reports heartbeat messages. If no heartbeat message is reported within 30 minutes, this alarm is reported, indicating that the monitored node is disconnected.

Alarm Attribute

Alarm ID

Alarm Severity

Alarm Type

servicemonitor_agent_heartbeat

Minor

Communication alarm

Alarm Parameters

Parameter

Description

IP Address of Monitored Node

IP address of the monitored node that is disconnected from MOICAgent

IP Address of Monitoring Node

IP address of the MOICAgentMgmtService node

Impact on the System

  • Failed to monitor CloudService systems, collect CloudService logs, and backup the ManageOne database on the abnormal node.
  • If the disconnected node is netcluster_elb_lvs_vm_x_x, ELB traffic will be interrupted for 10 seconds.
  • If the disconnected node is netcluster_elb_nginx_vm_x_x, ELB traffic will be interrupted for 15 seconds.

Possible Causes

  • The network between the monitored node and the node where MOICAgentMgmtService of ManageOne is located is faulty or the process is abnormal.
  • The certificate of the monitored node and that of the node where MOICAgentMgmtService of ManageOne is located are different.

Procedure

  1. Obtain the IP addresses of the faulty node and the MOICAgentMgmtService service.

    1. If the alarm is a root alarm, click the Details tab page on ManageOne Maintenance Portal and check the value of Location Info to obtain the host IP address of the abnormal node and the IP address of the node where MOICAgentMgmtService resides.
    2. If the alarm is an aggregation alarm, click the Original Alarms tab page on ManageOne Maintenance Portal and check the value of Location Info to obtain the host IP address of the abnormal node and the IP address of the node where MOICAgentMgmtService resides.

  2. Check whether network faults exist.

    1. Use PuTTY to log in to the MOICAgentMgmtService as a system user with the required permission and then perform the following steps to switch to the root user.
      1. Log in to the node where MOICAgentMgmtService resides as the sopuser user.
      2. Run the following command to switch to the root user:

        sudo su root

    2. Run the following command to check whether network congestion occurs between the abnormal node and the node where MOICAgentMgmtService resides:

      ping IP addresses of the abnormal node

      • If network congestion occurs, go to 5.
      • If the network is normal, go to 3.

  3. Collect process information.

    1. Use PuTTY to log in to the node where the alarm is generated as a system user who has the login permission and switch to the root user.
      • To log in to the ManageOne node as the root user, perform the following steps:
        1. Log in to the node where the alarm is generated as the sopuser user.
        2. Run the following command to switch to the root user:

          sudo su root

      • For details about how to log in to a non-ManageOne node as the root user, see the Type A sheet in HUAWEI CLOUD Stack 6.5.0 Account List.
    2. Run the following command to check whether the MOICAgent process is normal. If no command output is displayed, the process is abnormal:

      ps -ef | grep moicagent | grep python

    3. If the process is normal, go to 5. Otherwise, run the required command to start the MOICAgent process:
      • ManageOne node:

        su ossadm -c ". /opt/oss/manager/agent/bin/engr_profile.sh;ipmc_adm -cmd restartapp -tenant manager -app MOICAgent"

      • Non-ManageOne node:

        sh /home/moicagent/bin/manual/mstart.sh

    4. Repeat 3.b to check whether the MOICAgent process is normal.
      • If yes, wait for 2 to 3 minutes and check whether the alarm is automatically cleared. If the alarm is cleared, no further action is required. Otherwise, go to 5.
      • If no, go to 4.

  4. Install MOICAgent. For details, see section "Installing MOICAgent on a New Physical Server" of HUAWEI CLOUD Stack 6.5.0 Capacity Expansion Guide.
  5. Contact technical support for assistance.

Alarm Clearance

This alarm is automatically cleared when MOICAgent reports a new heartbeat message.

Related Information

None

ALM-servicemonitor_heartbeat ServiceMonitor's heartbeat abnormal

Alarm Description

This alarm is generated when the heartbeat of MOSMAccessService is abnormal.

Alarm Attribute

Alarm ID

Alarm Severity

Alarm Type

servicemonitor_heartbeat

Critical, Major, Minor, or Warning

QoS

Alarm Parameters

Parameter

Description

Service Name

Indicates the name of the service for which the alarm is generated.

Component Name

Indicates the IP address or name of the VM for which the alarm is generated.

Component Type

Indicates the type of the instance for which the alarm is generated.

Impact on the System

Monitoring data in the region where MOSMAccessService is deployed cannot be reported, causing monitoring data errors and alarm data errors.

Possible Causes

  • The process of MOSMAccessService is stopped.
  • The power supply status of the VM where MOSMAccessService is located is abnormal.

Handling Procedure

  1. Check the VM where MOSMAccessService is located.

    1. Open the Details page of the corresponding alarm. Check the value of IP Address/URL/Domain Name to obtain the IP address of the host for which the alarm is generated.
    2. Provide the host IP address for the administrator to query the host name of the VM, for example, ManageOne-Service01.

  2. Check whether Power Status of the VM where MOSMAccessService is located is Running.

    1. Use a browser to log in to FusionSphere OpenStack Management Console.
      • Login method 1

      The login address is https://Management plane IP address of FusionSphere OpenStack deployed in single-node mode or floating IP address of FusionSphere OpenStack deployed in active/standby mode, for example, https://192.168.1.1.

      Default username: cloud_admin; default password: FusionSphere123

      • Login method 2
        1. On main menu of ManageOne Maintenance Portal, click .
        2. In the Quick Links area on the right, click Service OM to enter FusionSphere OpenStack Management Console.
        3. Enter the username and password.

          Default username: cloud_admin; default password: FusionSphere123

    2. Choose Resources > Computing > Compute Instances from the main menu.
    3. On the VMs tab page, check whether the value of Power Status of the VM corresponding to the alarm source is Running.
      • If yes, go to 3.
      • If no, click More in the Operation column and choose Restart so that the value of Power Status of the VM becomes Running.

  3. Check whether the process of MOSMAccessService has been stopped. If it has been stopped, restart it. If the alarm is not cleared, contact technical support for assistance.

    1. Use PuTTY to log in to the VM determined in 1.b as the sopuser user in SSH mode and run the following command to switch to the root user:

      sudo su root

    2. Run the following command to check whether the MOSMAccessService process is running: If information similar to that shown in Figure 4-1 is displayed, the service process is running.

      ps -ef | grep "mosmaccessservice"

      • If yes, contact technical support for assistance.
      • If no, go to 3.c.
        Figure 4-1 Message indicating that MOSMAccessService is running
    3. Run the following command to start the main process:

      su ossadm -c ". /opt/oss/manager/agent/bin/engr_profile.sh;ipmc_adm -cmd startapp -app MOSMService"

      Figure 4-2 Message indicating that MOSMAccessService is started
    4. Check whether the alarm is cleared. If the alarm is not cleared, contact technical support for assistance.

Alarm Clearance

This alarm is automatically cleared when the heartbeat of MOSMAccessService is normal.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 47835

Downloads: 33

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next