No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

KunLun Mission Critical Server V100R001 CMC Alarm Handling 09

This document describes KunLun 9016 and 9032 alarms in the CMC, in terms of their meanings, impact on the system, possible causes, and handling suggestions.
Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-0000000F A CPU Is Overheating and the Server Will Be Powered Off

ALM-0000000F A CPU Is Overheating and the Server Will Be Powered Off

Description

Alarm message:

CPU # is overheating and the server will be powered off. 

Attribute

Alarm ID

Alarm Severity

Auto Clear

0000000F

Critical

Yes

Parameters

Name

Meaning

Alarm Severity

Indicates the alarm severity.

Alarm Source

Indicates the alarm source.

Subject

Indicates the event body for which an alarm is generated.

Time

Indicates the time when an alarm is generated.

Description

Provides an alarm description.

Event Code

Indicates the event code of an alarm.

Impact on the System

The server automatically powers off, which interrupts services.

NOTE:

Alarms are generated by the CPUN Prochot and CPUN DTS sensors before this alarm. When the CPUN Prochot and CPUN DTS sensors generate alarms, cool down the server as soon as possible to prevent the further increase in the CPU core temperature from triggering this alarm.

Possible Causes

  • A fan module is faulty.
  • The service volume is massive.
  • The ambient temperature is higher than 30°C (86°F).
  • The air intake vent is blocked.
  • The air exhaust vent is blocked.
  • The heat sink is in poor contact with the CPU.

Procedure

  1. Check whether a fault alarm is generated for a fan module.

    On the CMC WebUI, check for any fan module fault alarms whose values of Alarm Source are SCE#-BPUA/B.

  2. Remove and reinstall the fan module for which a fault alarm is generated. Five minutes later, check whether the fan module fault alarm is cleared.

  3. Replace the fan module. Five minutes later, check whether the fan module fault alarm is cleared.

    For details about how to replace a fan module, see the KunLun 90xx V100R001 User Guide.

  4. Check whether the services running on the server are in massive volume.

  5. Stop non-critical services to reduce the service load on the server. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 6.

  6. Check whether the ambient temperature is higher than 30°C (86°F).

  7. Lower the ambient temperature to a range of 10°C to 30°C (50°F to 86°F). Five minutes later, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 8.

  8. Check whether the air intake or exhaust vent is blocked.

  9. Remove barriers from the air intake or exhaust vent. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 10.

  10. Power off the server, remove the system compute module (SCM) from the system compute enclosure (SCE), and remove the CPU board module from the SCM. Check whether the heat sink is in poor contact with the CPU.

  11. Remove and reinstall the heat sink, and power on the server. Five minutes later, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 12.

  12. Replace the CPU. Five minutes later, check whether the alarm is cleared.

    For details about how to replace a CPU, see the KunLun 90xx V100R001 User Guide.

    • If yes, no further action is required.
    • If no, go to Step 13.

  13. Contact Huawei technical support.
Translation
Download
Updated: 2018-12-29

Document ID: EDOC1000111849

Views: 61377

Downloads: 77

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next