No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-73019 Too Many Processes Are Running

ALM-73019 Too Many Processes Are Running

Description

FusionSphere OpenStack checks the number of system processes on a host every 60 seconds. This alarm is generated when the number of system processes on the host is greater than or equal to the threshold (the larger one of 90% between the maximum number of processes in the system and the maximum value configured for alarm thresholds in the /etc/sysmonitor/pscnt configuration file). This alarm is cleared when the number of system processes is less than the threshold (the larger one between 80% of the maximum number of processes in the system and the maximum value configured for recovery thresholds in the /etc/sysmonitor/pscnt configuration file).

Attribute

Alarm ID

Alarm Severity

Auto Clear

73019

Minor

Yes

Parameters

Name

Meaning

Fault Location Info

host_id: specifies the ID of the host for which the alarm is generated.

Additional Info

  • error_info: provides alarm exception information.
  • host_id: specifies the ID of the host for which the alarm is generated.
  • hostname: specifies the name of the host for which the alarm is generated.
  • HostIP: specifies the IP address of the host for which the alarm is generated.

Impact on the System

If there are too many processes, the system may fail to create new processes, which affects service running or causes login failures. As a result, fault locating cannot be performed.

Possible Causes

An exception occurs in the system.

Procedure

  1. Log in to the FusionSphere OpenStack web client.

    For details, see Logging In to the FusionSphere OpenStack Web Client (ManageOne Mode).

  2. On the Summary page, obtain the management IP address of the host in the OM IP Address column based on the host ID or host name in the alarm additional information.
  3. Use PuTTY to log in to the host for which the alarm is generated using the management IP address of the host.

    The default user name is fsp. The default password is Huawei@CLOUD8.

    The system supports both password and public-private key pair for identity authentication. If the public-private key pair is used for login authentication, see detailed operations in Using PuTTY to Log In to a Node in Key Pair Authentication Mode.

  4. Run the following command and enter the password of user root to switch to user root:

    su - root

    The default password of user root is Huawei@CLOUD8!.

  5. Run the following command to disable user logout upon system timeout:

    TMOUT=0

  6. Run the following command to import environment variables:

    source set_env

    Information similar to the following is displayed:

      please choose environment variable which you want to import: 
      (1) openstack environment variable (keystone v3) 
      (2) cps environment variable 
      (3) openstack environment variable legacy (keystone v2) 
      (4) openstack environment variable of cloud_admin (keystone v3) 
      please choose:[1|2|3|4] 

  7. Enter 1 to enable Keystone V3 authentication and enter the password of OS_USERNAME as prompted.

    Default account format: DCname_admin; default password: FusionSphere123.

  8. Run the following command to check the number of processes in the current system:

    ps -xH | sed '1d' | wc -l

    564D8324-F2F0-5440-DFE7-B5B21D5323A0:/home/fsp # ps -xH | sed '1d' | wc -l
    553

  9. Run the following command to check the alarm upper limit configured in the /etc/sysmonitor/pscnt file:

    cat /etc/sysmonitor/pscnt

    564D8324-F2F0-5440-DFE7-B5B21D5323A0:/home/fsp # cat /etc/sysmonitor/pscnt
    # Ceiling percentage of processes(threads) number alarm(Maximum between this value and 90% pid_max)
    ALARM="1600"
    
    # Floor percentage of processes(theads) number alarm(Maximum between this value and 80% pid_max)
    RESUME="1500"
    
    # Periodic of monitor (second)
    PERIOD="60"

  10. Run the following command to check the maximum number of processes (The following uses 32768 as an example, and the value depends on servers.):

    cat /proc/sys/kernel/pid_max

    564D8324-F2F0-5440-DFE7-B5B21D5323A0:/home/fsp # cat /proc/sys/kernel/pid_max
    32768

  11. Check whether the number of current system processes exceeds the larger one between 90% of the maximum number of system processes and the alarm threshold configured in the /etc/sysmonitor/pscnt file.

    • If yes, go to 12.
    • If no, go to 15.

  12. Run the following command to query the processes:

    ps -aux

    564D8324-F2F0-5440-DFE7-B5B21D5323A0:/home/fsp # ps -aux
    USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
    root         1  1.6  0.0 191268  4332 ?        Ss   Nov09 172:05 /usr/lib/systemd/systemd --switched-root --system --deserialize 21
    root         2  0.0  0.0      0     0 ?        S    Nov09   0:05 [kthreadd]
    root         3  0.1  0.0      0     0 ?        S    Nov09  19:13 [ksoftirqd/0]
    root         5  0.0  0.0      0     0 ?        S<   Nov09   0:00 [kworker/0:0H]
    root         8  0.1  0.0      0     0 ?        S    Nov09  13:34 [migration/0]
    root         9  0.0  0.0      0     0 ?        S    Nov09   0:00 [rcu_bh]
    root        10  1.3  0.0      0     0 ?        S    Nov09 138:35 [rcu_sched]
    root        11  0.0  0.0      0     0 ?        S    Nov09   1:05 [watchdog/0]
    root        12  0.0  0.0      0     0 ?        S    Nov09   1:05 [watchdog/1]
    root        13  0.2  0.0      0     0 ?        S    Nov09  23:28 [migration/1]
    root        14  0.2  0.0      0     0 ?        S    Nov09  20:27 [ksoftirqd/1]
    root        16  0.0  0.0      0     0 ?        S<   Nov09   0:00 [kworker/1:0H]
    root        17  0.0  0.0      0     0 ?        S    Nov09   1:03 [watchdog/2]

  13. Run the following command to kill unnecessary or repeated processes:

    kill -9 PID

    In the preceding command, PID indicates the process ID specified in 12.

  14. After the number of system processes reaches the alarm clearance threshold, wait for 1 to 2 minutes and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 15.

  15. Contact technical support for assistance.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 45110

Downloads: 33

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next