No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade
FusionServer Pro E9000 Server iBMC (Earlier than V250) Alarm Handling 02
Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-0149FFFF/0149FF01 Above Upper Major Threshold (GPUN Temp)

ALM-0149FFFF/0149FF01 Above Upper Major Threshold (GPUN Temp)

Description

Alarm message:

Above upper major threshold

This alarm is generated when a temperature sensor detects that the temperature of a graphics processing unit (GPU) is higher than the major alarm upper threshold. This alarm is cleared when the system detects that the temperature is restored to the acceptable range.

This alarm is generated by the following sensors:

  • GPUN Temp (N indicates a GPU number.)

Attribute

Alarm ID

Alarm Severity

Auto Clear

iBMC: 0149FFFF

MM: 0149FF01

Major

Yes

Parameters

Name

Meaning

Time

Time when an alarm is generated.

Sensor

Name of the sensor that generates an alarm.

Event

Details about an alarm.

Severity

Severity of an alarm.

Event Code

Event code that corresponds to an alarm.

Impact on the System

The GPU cannot operate stably, which interrupts services.

Possible Causes

  • A fan module is faulty.
  • The ambient temperature is excessively high.
  • The air intake vent is blocked.
  • The air exhaust vent is blocked.
  • The heat sink is not properly connected to the mainboard.
  • The GPU is faulty.

Procedure

  1. Check whether a critical alarm is generated for a low fan speed.

    Log in to the iBMC command-line interface (CLI) or WebUI, and check whether an alarm is generated for the fan module.

  2. Reinstall the fan module. Then check whether the alarm is cleared after five minutes.

  3. Replace the fan module. Then check whether the alarm is cleared after five minutes.

    For details about how to replace a fan module, see the server user guide.

  4. Check whether the ambient temperature is extremely high.

  5. Ensure that the ambient temperature is in the range of 10°C to 30°C (50°F to 86°F). Then check whether the alarm is cleared after five minutes.

    • If yes, no further action is required.
    • If no, go to Step 6

  6. Check whether the air intake vent or air exhaust vent is blocked.

  7. Remove barriers. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 8

  8. Power off the server, and open the chassis. Then check whether the heat sink is not properly connected to the GPU.

  9. Remove and then install the heat sink, and power on the server. 5 minutes later, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 10

  10. Replace the GPU, and then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 11.

  11. Contact Huawei technical support for help.
Translation
Download
Updated: 2019-11-19

Document ID: EDOC1100035007

Views: 32419

Downloads: 12

Average rating:
This Document Applies to these Products

Related Version

Related Documents

Share
Previous Next