No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

FusionServer Pro Rack Server iBMC Alarm Handling 30

This document describes iBMC alarms in terms of the meaning, impact on the system, possible causes, and handling suggestions.
Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-0x0701FFFF Critical Alarm for CPU Temperature (Thermal Trip) (CPUN Status)

ALM-0x0701FFFF Critical Alarm for CPU Temperature (Thermal Trip) (CPUN Status)

Description

Alarm message:

Critical alarm for CPU temperature (thermal trip)

This alarm is generated when the iBMC detects a signal sent from the CPU that the CPU core temperature is excessively high. This alarm is cleared when the system detects that the temperature is restored to the acceptable range.

Sensor triggering the alarm: CPUN Status

Attribute

Alarm ID Alarm Severity Auto Clear

0x0701FFFF

Critical

Yes

Parameters

Name Meaning

N

Serial number of the CPU.

Impact on the System

When the CPU core temperature is excessively high, the system powers off the mainboard forcibly for self-protection purpose. As a result, the services on the mainboard are interrupted, and data is lost.

NOTE:

Alarms are generated by the CPUN Prochot and CPUN DTS sensors before this alarm. Cool down the server before the increased temperature triggers this alarm.

Possible Causes

  • The fan module is faulty.

  • The ambient temperature exceeds the normal range.

  • The air inlet or outlet is blocked.

  • Idle disk bays are not installed with hard disk fillers.

  • Air ducts are not installed properly.

  • The heat sink is in poor contact with the mainboard.

  • The mainboard is faulty.

  • The CPU is faulty.

Procedure

  1. Check whether a low fan speed alarm is generated for the fan module.

    You can obtain alarm information in either of the following ways:
    • View alarm information on the Current Alarms page of the iBMC WebUI.
    • Run the ipmcget -d healthevents command on the iBMC CLI.
    • If yes, go to 2.

    • If no, go to 5.

  2. Remove and then install the fan module. Five minutes later, check whether the fan module alarm is cleared.

    • If yes, go to 4.

    • If no, go to 3.

  3. Replace the fan module. After 5 minutes, checkwhether the fan module alarm is cleared.

    For details about how to replace the fan module, see the server user guide.

    • If yes, go to 4.

    • If no, go to 17.

  4. Check whether the CPU overheating alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 5.

  5. Check whether the ambient temperature exceeds the normal range.

    • If yes, go to 6.

    • If no, go to 7.

  6. Lower the ambient temperature to the normal range. After 5 minutes, check whether the alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 7.

  7. Check whether the air inlet or outlet is blocked.

    • If yes, go to 8.

    • If no, go to 9.

  8. Remove the blockage. Then, check whether the alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 9.

  9. Check whether idle slots are not installed with filler panels.

    • If yes, go to 10.

    • If no, go to 11.

  10. Install filler panels in idle slots. Five minutes later, check whether the alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 11.

  11. Check whether air ducts are installed properly.

    • If yes, go to 13.

    • If no, go to 12.

  12. Install air ducts properly. After 5 minutes, ,check whether the alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 13.

  13. Power off the server and open the chassis. Check whether the heat sink is in poor contact with the mainboard.

    • If yes, go to 14.

    • If no, go to 15.

  14. Remove and install the heat sink, and power onthe server. After 5 minutes, check whether the alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 15

  15. Replace the mainboard and check whether the alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 16.

  16. Replace the CPU and check whether the alarm is cleared.

    • If yes, no further action is required.

    • If no, go to 17.

  17. Contact Huawei technical support.
Download
Updated: 2019-08-05

Document ID: EDOC1000054724

Views: 332703

Downloads: 3063

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next