No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-21 Node Is Abnormal

ALM-21 Node Is Abnormal

Description

This alarm is reported when a node is abnormal.

Attribute

Alarm ID

Alarm Severity

Alarm Type

21

Critical

Environmental alarm

Parameters

Parameter Name

Parameter Description

kind

Resource type.

namespace

Name of the project to which the resource belongs.

name

Resource name.

uid

Unique ID of the resource.

OriginalEventTime

Event generation time.

EventSource

Name of the component that reports an event.

EventMessage

Supplementary information about an event.

Impact on the System

The node is unavailable and original applications on the node are migrated to another node. As a result, some functions do not work properly.

System Actions

The system detects node exceptions and schedules the pod on the abnormal node to a normal node.

Possible Causes

  • The node breaks down, the network is disconnected, or the system malfunctions.
  • The kubelet component works unexpectedly.

Procedure

  1. Obtain the name and namespace of the node is abnormal.

    1. Use a browser to log in to the FusionStage OM zone console.
      1. Log in to ManageOne Maintenance Portal.
        • Login address: https://Address for accessing the homepage of ManageOne Maintenance Portal:31943, for example, https://oc.type.com:31943.
        • The default username is admin, and the default password is Huawei12#$.
      2. On the O&M Maps page, click the FusionStage link under Quick Links to go to the FusionStage OM zone console.
    2. Choose Application Operations > Application Operations from the main menu.
    3. In the navigation pane on the left, choose Alarm Center > Alarm List and query the alarm by setting query criteria.
    4. Click to expand the alarm information. Record the values of name and namespace in Location Info, that is, podname and namespace.

  2. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  3. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  4. Run the following command to obtain the IP address of the abnormal node:

    kubectl get node name -n namespace -oyaml | grep -i address

    In the preceding command, name and namespace are the name and namespace obtained in 1.

  5. Use PuTTY to log in to the node obtained in 4 and check whether the node breaks down or the network is disconnected.

    • If yes, go to 6.
    • If no, go to 7.

  6. Contact O&M personnel to rectify the fault and then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 7.

  7. Log in to the faulty node as the paas user and run the following command to check whether the kubelet component works properly:

    monit summary
    • If yes, go to 8.
    • If no, go to 9.

  8. Run the following commands to check whether time hopping occurred when the last node exception was reported:

    vi /var/paas/sys/log/kubernetes/kubelet.log

    Search for the node status using the keyword node_status. Information similar to the following is displayed, indicating that the system time changes. In this case, go to 9. Otherwise, go to 10.

    I0213 15:14:04.920823  32201 kubelet_node_status.go:919] Node 10.116.246.113-lweg-eac93411 status is changed
    I0213 15:18:17.739407    497 kubelet_node_status.go:919] Node 10.116.246.113-lweg-eac93411 status is changed
    I0213 15:21:14.038594   3466 kubelet_node_status.go:919] Node 10.116.246.113-lweg-eac93411 status is changed
    I0213 15:32:17.784517   3486 kubelet_node_status.go:919] Node 10.116.246.113-lweg-eac93411 status is changed
    I0213 15:42:52.855925   3463 kubelet_node_status.go:919] Node 10.116.246.113-lweg-eac93411 status is changed
    I0213 22:19:39.507252   3456 kubelet_node_status.go:919] Node 10.116.246.113-lweg-eac93411 status is changed
    I0213 22:26:53.216912   3463 kubelet_node_status.go:919] Node 10.116.246.113-lweg-eac93411 status is changed

  9. Run the following command to restart the kubelet component and then check whether the alarm is cleared:

    monit restart kubelet
    • If yes, no further action is required.
    • If no, go to 10.

  10. Contact technical support for assistance.

Alarm Clearing

This alarm will be automatically cleared after the fault is rectified.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 35868

Downloads: 31

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next