No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-2012 Service Fails in Health Check and Needs to Be Restarted and Isolated

ALM-2012 Service Fails in Health Check and Needs to Be Restarted and Isolated

Description

This alarm is generated when the service is suspended due to incorrect configuration or insufficient resources and cse-service-center is abnormal.

Attribute

Alarm ID

Alarm Severity

Alarm Type

2012

Major

Communications alarm

Parameters

Parameter

Description

Namespace

Indicates the namespace of the service for which the alarm is generated.

ServiceName

Indicates the name of the service for which the alarm is generated.

InstanceName

Indicates the name of the service instance for which the alarm is reported.

NodeIp

Indicates the IP address of the host where the microservice instance is deployed.

Impact on the System

When the service is suspended due to incorrect configuration or insufficient resources, the service center and related service functions are abnormal.

Possible Causes

  • The service is suspended due to incorrect configuration or insufficient resources.
  • The permission or content of the attached key file is incorrect.

Procedure

  1. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  2. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  3. In general, Kubernetes automatically restarts a service when the service becomes abnormal. If the service still restarts, check the etcd address configuration and view logs to locate the fault.

    Check the address configuration of etcd.

    kubectl edit deployment -n fst-manage cse-service-center

    Check whether there is a pair of name and value fields together set to CSE_REGISTRY_ADDRESS and https://cse-etcd-client.fst-manage.svc.cluster.local:30101, respectively, in the deployment file of cse-service-center. If no, change the value. After the change takes effect, the system automatically deletes the original pod and generates another pod whose name starts with cse-service-center.

  4. Check whether the authentication information of the key file is correct.

    Run the following commands to access the pod of the CSE service for which the alarm is generated (cse-service-center-3608034112-ht6lx is used as an example):

    kubectl -n fst-manage get pod |grep cse-service-center

    kubectl -n fst-manage exec -it cse-service-center-3608034112-ht6lx sh

    Run the following command in the container:

    ls -al /opt/CSE/etc/ssl/

    The authentication information is correct if the following information is displayed.

  5. View logs to locate the fault.

    1. Query the IP addresses of the nodes where all pods in cse-etcd are located.

      For example, to query the IP address of the node where the pod of cse-etcd-0 resides, run the following required command:

      kubectl -n fst-manage describe pod cse-etcd-0 | grep Node

    2. Log in to the node where the pod of cse-etcd-0 resides and view the /var/paas/sys/log/cse-service-center/service-center.log file of cse-service-center.

      Check the service or resource configuration.

      • If the configuration is incorrect, search for err in the log file and locate the fault based on the error information.
      • If resources are insufficient, check the memory and CPU usage of the node for which the alarm is generated. For details about how to locate and rectify the fault, see sections "Full Disk Space" and "Full etcd Data Space" in FusionStage CSE 6.5.0.SPC100 Product Documentation.

  6. Check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, contact technical support for assistance.

Alarm Clearing

This alarm will be automatically cleared after the fault is rectified.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 37771

Downloads: 31

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next