No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

HUAWEI CLOUD Stack 6.5.0 Alarm and Event Reference 04

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-70129 The GPU Usage Exceeds the Threshold

ALM-70129 The GPU Usage Exceeds the Threshold

Description

FusionSphere OpenStack periodically (every 5 minutes by default) checks the GPU usage in a host group. This alarm is generated when the GPU usage reaches 85% in the hosts.

This alarm is cleared when the GPU usage in the hosts is reduced to 70%.

Attribute

Alarm ID

Alarm Severity

Auto Clear

70129

Warning

Yes

Parameters

Name

Meaning

Fault Location Info

  • aggregate_id: specifies the ID of the host group where GPU is located.
  • gpu_type: specifies the GPU type.

Additional Info

  • Service: specifies the service for which the alarm is generated.
  • MicroService: specifies the microservice for which the alarm is generated.
  • aggregate_name: specifies the name of the host group where GPU is located.
  • gpu_total: specifies the total number of GPUs.
  • gpu_used: specifies the number of used GPUs.
  • threshold: specifies the GPU threshold.

Impact on the System

VMs with GPU cards installed may fail to create due to insufficient resources of GPU cards.

Possible Causes

The GPU usage is too high in the host group.

Procedure

  1. Obtain the GPU type based on the alarm information and expand the capacity of the GPU cards of the type.
  2. After the GPU usage drops below 70%, check whether this alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 3.

  3. If any of the preceding operations fails, contact technical support for assistance.

Related Information

None

Translation
Download
Updated: 2019-08-30

Document ID: EDOC1100062365

Views: 46548

Downloads: 33

Average rating:
This Document Applies to these Products
Related Version
Related Documents
Share
Previous Next