No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

KunLun Mission Critical Server V100R001 CMC Alarm Handling 09

This document describes KunLun 9016 and 9032 alarms in the CMC, in terms of their meanings, impact on the system, possible causes, and handling suggestions.
Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-27000001 The PCH Temperature Reaching the Overtemperature Threshold

ALM-27000001 The PCH Temperature Reaching the Overtemperature Threshold

Description

Alarm message:

PCH temperature (xx degree C) reaches the overtemperature threshold (xx degree C).

Attribute

Alarm ID

Alarm Severity

Auto Clear

27000001

Minor

Yes

Parameters

Name

Meaning

Alarm Severity

Indicates the alarm severity.

Alarm Source

Indicates the alarm source.

Subject

Indicates the event body for which an alarm is generated.

Time

Indicates the time when an alarm is generated.

Description

Provides an alarm description.

Event Code

Indicates the event code of an alarm.

Impact on the System

The PCH bridge cannot operate stably. If the alarm persists, the server may automatically power off or restart, which will interrupt services and cause data loss.

Possible Causes

  • A fan module is faulty.
  • The service volume is massive.
  • The ambient temperature is higher than 30°C (86°F).
  • The air intake vent is blocked.
  • The air exhaust vent is blocked.
  • The heat sink is in poor contact with the PCH bridge.
  • The upper threshold for minor alarms is not properly set for the sensor.

Procedure

  1. Check whether a fault alarm is generated for a fan module.

    On the CMC WebUI, check for any fan module fault alarm whose value of Alarm Source is SCE#-BPUA/B.

  2. Remove and reinstall the fan module for which a fault alarm is generated. Five minutes later, check whether the fan module fault alarm is cleared.

  3. Replace the fan module. Five minutes later, check whether the fan module fault alarm is cleared.

    For details about how to replace a fan module, see the KunLun 90xx V100R001 User Guide.

  4. Check whether the services running on the server are in massive volume.

  5. Stop non-critical services to reduce the service load on the server. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 6.

  6. Check whether the ambient temperature is higher than 30°C (86°F).

  7. Lower the ambient temperature to a range of 10°C to 30°C (50°F to 86°F). Five minutes later, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 8.

  8. Check whether the air intake or exhaust vent is blocked.

  9. Remove barriers from the air intake or exhaust vent. Then check whether the alarm is cleared.

  10. Run the smmget -l bladeN -t sensorname -d thresholdall command on the CMC CLI, and check whether the upper threshold for minor alarms is set properly.

    The proper threshold for the sensor is as follows:

    PCH Temp: 108°C

    If the command output is as follows, the threshold is set properly:

    root@SMM:/#smmget -l blade2 -t PCH Temp -d thresholdall     
    Upper Non-recoverable:  Not supported.  
    Upper Critical       :  Not supported.  
    Upper Non-critical : 108.000 Celsius 
    Lower Non-critical : Not supported.  
    Lower Critical       : Not supported.  
    Lower Non-recoverable: Not supported.     

  11. Run the smmset -l bladeN -t sensor -d uppernoncritical -v value command (value indicates the upper threshold for minor alarms) on the CMC CLI to change the threshold. Then check whether the alarm is cleared.

  12. Power off the server, remove the local partition management module (LPM) from the system compute enclosure (SCE), and remove the PCH bridge from the LPM. Check whether the heat sink is in poor contact with the PCH bridge.

  13. Remove and reinstall the heat sink, and power on the server. Five minutes later, check whether the alarm is cleared.

  14. Contact Huawei technical support.
Translation
Download
Updated: 2018-12-29

Document ID: EDOC1000111849

Views: 75202

Downloads: 78

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next