No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-5002 Threshold-Crossing Alarm

ALM-5002 Threshold-Crossing Alarm

Description

The Alarm Severity parameter is set to Warning when the threshold rule is configured. This alarm is reported when the specified metric value reaches the preset threshold.

Attribute

Alarm ID

Alarm Severity

Alarm Type

5002

Warning

Physical violation

Parameters

Parameter Name

Parameter Description

thresholdRuleName

Indicates the name of the threshold rule.

dimensions

Indicates the dimension information of the metric.In a shared cluster, clusterId has the fixed value 11111111-1111-1111-1111-111111111111.

For example, dimensions : appName:cse-ber, clusterId:11111111-1111-1111-1111-111111111111, nameSpace:fst-manage

Impact on the System

The impact of this alarm on the system varies with the specific resources for which this alarm is generated.

System Actions

None

Possible Causes

This alarm is reported when the specified metric value reaches the preset threshold.

Procedure

  1. Query the location and additional information of the threshold-crossing alarm.

    1. Use a browser to log in to the FusionStage OM zone console.
      1. Log in to ManageOne Maintenance Portal.
        • Login address: https://Address for accessing the homepage of ManageOne Maintenance Portal:31943, for example, https://oc.type.com:31943.
        • The default username is admin, and the default password is Huawei12#$.
      2. On the O&M Maps page, click the FusionStage link under Quick Links to go to the FusionStage OM zone console.
    2. On the main menu, choose Application Operations > Application Operations. In the navigation pane, choose Alarm Center > Alarm List. Locate the alarm and click in front of the alarm to view the location and additional information.
    3. Record the value of thresholdRuleName in the location information, that is, the name of the threshold rule. For example, NODE-paas-100-107-227-193-memUsedRate. Record the value of nameSpace in the location information, for example, fst-manage.

  2. Check whether the threshold rule is properly set and query the physical resource information based on the alarm.

    1. On the main menu, choose Application Operations > Application Operations. In the navigation pane, choose Alarm Center > Threshold Rules. You need to follow the preceding options to find the threshold rule.
    2. On the displayed Threshold Rules page, search for the threshold rule in the search box using the rule name. The rule name is the value recorded in 1.c, for example, CPUUsageofpaas20.

    3. Click next to the threshold rule to view details.

    4. Click the metric trend chart on the right. Then the Metric Monitoring page is displayed.

    5. The physical resource information is displayed in the Metric Monitoring page.

    6. Check whether the threshold rule is proper.
      • If yes, handle the alarm as follows:

        If the cpuUsage metric value exceeds the alarm threshold and an alarm is generated, go to 4. If the memUsedRate metric value exceeds the alarm threshold and an alarm is generated, go to 5. If the diskUsedRate metric value exceeds the alarm threshold and an alarm is generated, go to 6. If the diskUsedRate metric value exceeds the alarm threshold and an alarm is generated, go to 7. If the alarm is caused by other threshold rules, go to 8.

      • If no, go to 3.

  3. Reset the threshold rule.

    1. On the main menu, choose Application Operations > Application Operations. In the navigation pane, choose Threshold Rules and the Threshold Rules page is displayed. In the Switch Project list box in the upper right corner of the displayed page, select the value of nameSpace in the alarm location information. For example, fst-manage.
    2. On the displayed Threshold Rules page, search for the threshold rule using the rule name. Choose Modify to modify the threshold rule as required. Check whether the threshold is greater than the metric value. If yes, wait for a collection period (which is user-defined) and check whether the alarm is cleared.
      • If yes, no further action is required.
      • If yes, handle the alarm as follows:

        If the cpuUsage metric value exceeds the alarm threshold and an alarm is generated, go to 4. If the memUsedRate metric value exceeds the alarm threshold and an alarm is generated, go to 5. If the diskUsedRate metric value exceeds the alarm threshold and an alarm is generated, go to 6. If the diskUsedRate metric value exceeds the alarm threshold and an alarm is generated, go to 7. If the alarm is caused by other threshold rules, go to 8.

  4. If the cpuUsage metric value exceeds the alarm threshold and an alarm is generated, you are advised to locate the cause as follows:

    1. Log in to the node based on the node IP address in the location information.

    2. Run the following command to identify service processes with high CPU usage.

      top -c

      The command output is as follows (By default, the service processes are listed in a descending order of CPU usage):

    3. Press Ctrl+C to exit the top command interface. Run the ps -ef | grep PID command to check the service processes with the top five CPU usage and contact technical support for assistance. PID is the value in the first column of the command output in 4.b.

  5. If the memUsedRate metric value exceeds the alarm threshold and an alarm is generated, you are advised to locate the cause as follows:

    1. Log in to the node based on the node IP address in the location information.

    2. Run the following command to identify service processes with high memory usage. Press Shift+M to row the service processes in a descending order of memory usage.

      top -c

    3. Run the ps -ef | grep PID command to check the service processes with the top five memory usage and contact technical support for assistance. PID is the value in the first column of the command output in 5.b.

  6. If the diskUsedRate metric value of the file system exceeds the alarm threshold and an alarm is generated, you are advised to locate the cause as follows:

    1. Log in to the node based on the node IP address and the mount point in the location information.

    2. Run the following command to identify the files or directories with high disk usage. Check whether the directory with high disk usage is the /opt directory of the Paas-manage-db01 or Paas-manage-db02 node.

      du -a {mountPoint}/{ mountPointsubdirectory} | sort -n -r | head -n 6

      If the node with high disk usage is Paas-manage-db01 or Paas-manage-db02, go to 6.c. If the node with high disk usage is neither Paas-manage-db01 nor Paas-manage-db02, go to 6.h.

    3. Use PuTTY to log in to the manage_lb1_ip node.

      The default username is paas, and the default password is QAZ2wsx@123!.

    4. Run the following command and enter the password of the root user to switch to the root user:

      su - root

      Default password: QAZ2wsx@123!

    5. Run the following command to modify the metric aging time in the tenant management zone:

      kubectl edit configmap -n fst-manage amscalc-configmap

      Press I and change the value of aggagingperiod to an estimated metric aging time (unit: day).

      Press Esc to switch to the command mode and run the :wq! command to save the file and exit.

      By default, the AMS metric aging time is set to 15 days. Change the value of aggagingperiod to less than 15. For example, set aggagingperiod to 10.

      Contact the system administrator to expand the disk capacity and you need to set totaldbcapacity to total disk size (MB) multiplied by 80% (If there are digits after the decimal point, delete them to keep the integer part.). For example, if the total disk size is 500 GB, the value of totaldbcapacity is 409600 (500 x 80% x 1024).

    6. Run the following command to query ams-calc service pods on the management zone. Record the pod names starting with ams-calc on the management zone.

      kubectl get pod -n fst-manage |grep ams-calc

      Output example:

      ams-calc-7d59496fdb-2x66q                    1/1       Running   0          7h
      ams-calc-7d59496fdb-8mlhf                    1/1       Running   0          7h
    7. Run the following command to redeploy pods of the ams-calc service in the tenant management zone:

      kubectl delete pod –n fst-manage Pod of the ams-calc service

      If there are multiple pods starting with ams-calc in the tenant management zone, perform this step for each pod respectively.

    8. Check the service processes to which files or directories with the top five disk usage belong and contact technical support for assistance.

  7. If the diskIOUtil metric value exceeds the alarm threshold and an alarm is generated, you are advised to locate the cause as follows:

    1. Log in to the node based on the node IP address and diskDevice in the location information.

    2. Use PuTTY to log in to the node IP address queried in 7.a .

      The default username is paas, and the default password is QAZ2wsx@123!.

    3. Run the following command to switch to the root user, and view the disk I/O usage:

      su root

      iotop -P

      The I/O column indicates the disk usage and records the PID corresponding to the percentage of the highest disk usage.

    4. Run the following command to view the detailed command corresponding to the PID. Contact technical support for assistance.

      ps -ef|grep {PID}

      PID indicates the value recorded in Step 7.c .

  8. Solve the problem based on your O&M experience and the alarm information, application monitoring details, and physical resource information. If the problem persists, collect the alarm information and contact technical support for assistance.

Alarm Clearing

When the status of the threshold rule changes from Exceeded to OK, the threshold-crossing alarm is automatically cleared. When the status of the threshold rule changes from Exceeded to Insufficient, the threshold-crossing alarm is automatically cleared, and an insufficient data event is generated.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 47772

Downloads: 33

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next