No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionServer Pro E9000 Server iBMC (Earlier than V250) Alarm Handling 02

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-0C0AFFFF Major Alarm Caused by High Memory Temperature

ALM-0C0AFFFF Major Alarm Caused by High Memory Temperature

Description

Alarm message:

Critical overtemperature

This alarm is generated when the temperature of the Compute Node, Switch Module, or Pass Through Module register is higher than the major alarm threshold.

This alarm is generated by the following sensor:

  • DIMMn
  • CPUn Memory

Attribute

Alarm ID

Alarm Severity

Auto Clear

0C0AFFFF

Major

Yes

Parameters

Name

Meaning

Time

Time when an alarm is generated.

Sensor

Name of the sensor that generates an alarm.

Event

Details about an alarm.

Severity

Severity of an alarm.

Event Code

Event code that corresponds to an alarm.

Impact on the System

The mainboard service performance deteriorates:

  • The operating system (OS) cannot operate properly.
  • The mainboard may restart or stop responding.

Possible Causes

The possible causes of this alarm are as follows:

  • The ambient temperature in the equipment room is high.
  • The chassis fan module is faulty.
  • The service volume is massive.
  • The air intake vent is blocked.
  • The air exhaust vent is blocked.
  • Power supply to the entire chassis is insufficient.

Procedure

  1. Check whether the sensor name is CPUn Memory.

    • If yes, go to 2.
    • If no, go to 4.

  2. Log in to the MM CLI, run the smmget -d smalert command, and check whether the status is abnormal.

    • If yes, go to 3.
    • If no, go to 4.

  3. Install six PSUs in the chassis, and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 4.

  4. Check whether the fan module is running properly. If the indicator on the fan module turns red, the fan module is not running properly.

    • If yes, go to 6.
    • If no, go to 5.

  5. Replace the fan module. Then check whether the alarm is cleared. For details about how to replace the fan module, see the E9000 Server User Guide.

    • If yes, no further action is required.
    • If no, go to 6.

  6. Check whether the services running on the server are in massive volume.

    • If yes, go to 7.
    • If no, go to 8.

  7. Stop non-critical services to reduce the service load on the server. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 8.

  8. Check whether the ambient temperature is higher than 40°C (104°F).

    • If yes, go to 9.
    • If no, go to 10.

  9. Reduce the ambient temperature. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 10.

  10. Check whether the air intake vent or air exhaust vent is blocked. If yes, remove the barriers and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 11.

  11. Contact Huawei technical support for help.
Translation
Download
Updated: 2019-11-19

Document ID: EDOC1100035007

Views: 28874

Downloads: 11

Average rating:
This Document Applies to these Products

Related Version

Related Documents

Share
Previous Next