No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

E9000 Server Compute Node and Switch Module iMana 200 Alarm Handling 01

Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-0x0149FFFF Overtemperature Major Alarm (CPUN Prochot)

ALM-0x0149FFFF Overtemperature Major Alarm (CPUN Prochot)

Description

Alarm message:

Above upper major threshold

This alarm is generated when the iMana 200 or iMana 200 detects a signal sent from the CPU that the CPU core temperature is excessively high. This alarm is cleared when the system detects that the temperature is restored to the acceptable range.

This alarm is generated by the sensor: CPUN Prochot

Attribute

Alarm ID

Alarm Severity

Auto Clear

0x0149FFFF

Major

Yes

Parameters

Name

Meaning

N

Slot number of the CPU.

Impact on the System

When the CPU core temperature is excessively high, the system powers off the mainboard forcibly for self-protection purpose. As a result, the services on the mainboard are interrupted, and data is lost.

Possible Causes

  • A fan module is faulty.
  • The service volume is massive.
  • The temperature in the equipment room is excessively high.
  • The air intake vent is blocked.
  • The air exhaust vent is blocked.
  • The heat sink is not properly connected to the mainboard.
  • The power supply to the device may be insufficient. Insufficient power supply may cause the alarm only when the sensor reporting the alarm is CPUn Prochot.
  • The mainboard is faulty.

Procedure

  1. Log in to the MM CLI, run the smmget -d smalert command, and check whether the status is abnormal.

    • If yes, go to 2.
    • If no, go to 3.

  2. Install six PSUs in the chassis, and check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 3.

  3. Check whether a fan alarm is generated.

    • If yes, go to 4.
    • If no, go to 5.

  4. Clear the fan fault alarm. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 5.

  5. Check whether all fans are properly installed.

    • If yes, go to 7.
    • If no, go to 6.

  6. Install fans in spare slots properly. Then check whether the alarm is cleared.

    For details about how to install the fans, see the E9000 Server User Guide.

    • If yes, no further action is required.
    • If no, go to 7.

  7. Check whether the services running on the server are in massive volume.

    • If yes, go to 8.
    • If no, go to 9.

  8. Stop non-critical services to reduce the service load on the server. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 9.

  9. Check whether the ambient temperature is higher than 40°C (104°F).

    • If yes, go to 10.
    • If no, go to 11.

  10. Reduce the ambient temperature in the equipment room by using air conditioners and fans (for example, turn down the air conditioners and increase the fan speed. When the air conditioners cannot work properly, open doors and windows for ventilation) Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 11.

  11. Check whether the air intake vent or air exhaust vent is blocked.

    • If yes, go to 12.
    • If no, go to 13.

  12. Remove the barriers. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 13.

  13. Check whether the filler panels are inserted into the idle slots or spaces of the chassis.

    • If yes, go to 15.
    • If no, go to 14.

  14. Insert a filler pane into each vacant slot. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 15.

  15. Power off the node and remove it from the chassis. Then check whether the air duct is properly installed in the server..

    • If yes, go to 17.
    • If no, go to 16.

  16. Install the air duct properly, and insert the node into the chassis to power it on. Then, check whether the alarm is cleared.

    For details, see the server user guide.
    • If yes, no further action is required.
    • If no, go to 17.

  17. Power off the node and remove it from the chassis. Then check whether the CPU heat sink is properly installed.

    • If yes, go to 19.
    • If no, go to 18.

  18. Install the CPU heat sink properly, and insert the node into the chassis to power it on. Then, check whether the alarm is cleared.

    For details, see the server user guide.
    • If yes, no further action is required.
    • If no, go to 19.

  19. Power off the node and remove it from the chassis. Replace the mainboard and and insert the node into the chassis to power it on. Then, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 20.

  20. On the HMM WebUI, choose System Management > Information Collection, and collect logs.
  21. Contact Huawei technical support.
Translation
Download
Updated: 2018-08-16

Document ID: EDOC1100035007

Views: 15004

Downloads: 8

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next