No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-34997 Handle Threshold Alarm

ALM-34997 Handle Threshold Alarm

Description

This alarm is generated when the ICAgent detects that the number of handles of the current node reaches 80% of the upper limit.

Attribute

Alarm ID

Alarm Severity

Alarm Type

34997

Major

Environment alarm

Parameters

Parameter Name

Parameter Description

hostName

Indicates the host name.

hostIP

Indicates the host IP address

fdNum

Indicates the total number of the current user's handles.

fdLimit

Indicates the upper limit of the current user's handles set in the OS.

threadLimit

Indicates the upper limit of the current user's threads set in the OS.

Impact on the System

The system cannot create handles.

System Actions

None

Possible Causes

The number of handles created by the service process on the current node is too large.

Procedure

  1. Check the location information of the alarm.

    1. Use a browser to log in to the FusionStage OM zone console.
      1. Log in to ManageOne Maintenance Portal.
        • Login address: https://Address for accessing the homepage of ManageOne Maintenance Portal:31943, for example, https://oc.type.com:31943.
        • The default username is admin, and the default password is Huawei12#$.
      2. On the O&M Maps page, click the FusionStage link under Quick Links to go to the FusionStage OM zone console.
    2. Choose Application Operations > Application Operations from the main menu.
    3. In the navigation pane on the left, choose Alarm Center > Alarm List and query the alarm by setting query criteria.
    4. Click to expand the alarm information. Record the values of hostIP and hostName.

  2. Use PuTTY to log in to the manage_lb1_ip node. Run the following command to ping the IP address of the hostIP and check where the network is normal:

    The default username is paas, and the default password is QAZ2wsx@123!.

    ping hostIP

    • If the network is normal, go to 9.
    • If the network is abnormal, go to 3~8.

  3. Use PuTTY to log in to the manage_lb1_ip node.

    The default username is paas, and the default password is QAZ2wsx@123!.

  4. Run the following command and enter the password of the root user to switch to the root user:

    su - root

    Default password: QAZ2wsx@123!

  5. Run the following command to check whether the node corresponding to the queried hostIP is a management zone node:

    kubectl get node hostName -oyaml | grep 'com.huawei.project/name'

    labels: 
     com.huawei.project/name: fst-manage

    Check whether the value of com.huawei.project/name is fst-manage.

    • If yes, the node is a management zone node. Go to 6.a.
    • If no, the node is a data zone node. Go to 6.b

  6. Query the IP address for logging in to the node.

    1. Run the following command to query InternalIP of the node:

      kubectl get node hostName -oyaml | grep -B 2 InternalIP

    2. Run the following command to query ExternalIP of the node:

      kubectl get node hostName -oyaml | grep ExternalIP

  7. Run the following command switch to the paas user:

    su - paas

  8. Run the following command to go to the node using the IP address queried in 6:

    ssh IP

  9. Use PuTTY to log in to the faulty node based on the value of the hostIP parameter.

    Default username: paas. Default password: QAZ2wsx@123!

  10. Run the following command to check top 10 handles of the current node:

    lsof|awk 'tolower($3) !=$3 || toupper($3)!=$3'|awk '{if($4~/^([0-9])+[a-zA-Z]/)print $2}'|sort|uniq -c|sort -nr|head -n 10

    Information similar to the following is displayed:

         1222 3636        632 7975        473 1975        288 47442        268 3863        268 13336        231 21226        229 17447        179 5888        178 30167

    In the preceding information, the first column indicates the number of handles created by the process, and the second column indicates the process ID.

  11. Run the following command to view the information about the opened Handle in the process:

    lsof -p 3636|awk '$4~/^([0-9])+[a-zA-Z]/'

    Information similar to the following is displayed:

    COMMAND  PID USER   FD   TYPE    DEVICE SIZE/OFF      NODE NAME    docker  3636 root    0r      CHR                1,3      0t0       1028 /dev/null    docker  3636 root    1u     unix 0xffff8807ee030000      0t0      36393 socket    docker  3636 root    2u     unix 0xffff8807ee030000      0t0      36393 socket    docker  3636 root    3r      CHR                1,9      0t0       1033 /dev/urandom    docker  3636 root    4u     unix 0xffff8807ee028800      0t0      39122 socket    docker  3636 root    5u  a_inode                0,9        0       7854 [eventpoll]    docker  3636 root    6u     unix 0xffff880036799000      0t0      35435 /var/run/docker.sock    docker  3636 root    7u     unix 0xffff8807ef844400      0t0      41638 socket    docker  3636 root    8u      CHR             10,236      0t0      12305 /dev/mapper/control    docker  3636 root    9uW     REG              202,1    65536    1441822 /var/lib/docker/volumes/metadata.db    ...    docker  3636 root 1215u     unix 0xffff880074dd6800      0t0  260992488 /var/run/docker.sock    docker  3636 root 1216u     FIFO               0,18      0t0  260992490 /run/docker/libcontainerd/f67a4da18c97db5dd2e06e2b5430c93e42e41539a0249260d337a19879bb5165/b1abdc8b1e999b227ffd09609ce9b11055591bfd0e3d560d30b2255ef6590b65-stdin (deleted)    docker  3636 root 1217u     unix 0xffff8805a9f31400      0t0  260993522   /var/run/docker.sock    docker  3636 root 1218u     FIFO               0,18      0t0  260993524 /run/docker/libcontainerd/b6f0e7fe5429677502643cfb8ccaa94d06eb8128e2e2f01f5ddc59fdaf525d78/2e63686fbe6881086932dddfca346c274d956c622d4d7c2f9d5145102b1fcb34-stdin (deleted)    docker  3636 root 1219u     unix 0xffff880125759400      0t0  260996507 /var/run/docker.sock    docker  3636 root 1220u     FIFO               0,18      0t0  260996509 /run/docker/libcontainerd/03ae5fc7c8707e4a2edec637bb1494564f18d63515284e62f8cb6b0c7801a107/69fe49d409577720dc347f278d14990922860dfef332c8572ba259dedbabc667-stdin (deleted)    docker  3636 root 1221u     unix 0xffff880074dd0400      0t0  260995482 /var/run/docker.sock    docker  3636 root 1222u     FIFO               0,18      0t0  260995484 /run/docker/libcontainerd/0d61b4d6af88084fe273692bfc90bc5f352417853f4fbc2401483cfcc7623b38/1f7394775db7bc04338a150db0ab1b7fdcc30fc5ff417d3faed68758ec512f49-stdin (deleted)

  12. Differentiated handling is needed based on the functions of the identified processes. Collect the information of 10 and 11 and contact technical support.

Alarm Clearing

When the ICAgent detects that the number of handles of the current node is less than 60% of the upper limit, the system automatically clears the alarm.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 45080

Downloads: 33

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next