No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

KunLun Mission Critical Server V100R001 CMC Alarm Handling 09

This document describes KunLun 9016 and 9032 alarms in the CMC, in terms of their meanings, impact on the system, possible causes, and handling suggestions.
Rate and give feedback:
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
ALM-17000067 CMC Temperature Exceeding the Major Alarm Threshold

ALM-17000067 CMC Temperature Exceeding the Major Alarm Threshold

Description

Alarm message:

The CMC temperature (xx degrees Celsius) exceeds the major alarm threshold (xx degrees Celsius).

Attribute

Alarm ID

Alarm Severity

Auto Clear

17000067

Major

Yes

Parameters

Name

Meaning

Alarm Severity

Indicates the alarm severity.

Alarm Source

Indicates the alarm source.

Subject

Indicates the event body for which an alarm is generated.

Time

Indicates the time when an alarm is generated.

Description

Provides an alarm description.

Event Code

Indicates the event code of an alarm.

Impact on the System

The components of the device are damaged and the mainboard is burnt, which causes unrecoverable damage to the system hardware. In general, the system automatically increases the fan speed to reduce the temperature and then the alarm is cleared automatically. If the alarm persists, a certain problem may exist in the system, which will cause the system temperature to increase and even cause more severe alarms to be generated. Therefore, after a temperature alarm is generated, locate the problem as soon as possible to prevent the problem from worsening.

System Actions

None

Possible Causes

  • The fan module on a power and fan integrity module (PFM) is faulty.
  • The fan module cannot be detected.
  • The ambient temperature of the equipment room is high.
  • The upper threshold for major alarms is not properly set for the sensor.

Procedure

  1. Check whether the alarm "ALM-3600002D Front Fan of Counter Rotary Fan 1 Failed" is generated.

  2. Clear the alarm "ALM-3600002D Front Fan of Counter Rotary Fan 1 Failed". Then check whether alarm 17000067 is cleared.

    • If yes, no further action is required.
    • If no, go to Step 3.

  3. Check whether all PFMs are properly installed.

  4. Install PFMs in required slots. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 5.

  5. Check whether the ambient temperature of the equipment room is in a normal range.

    For details about the normal range, see the KunLun 90xx V100R001 User Guide.

  6. Reduce the ambient temperature of the equipment room by using air conditioners and fans in the equipment room. (For example, lower the temperature of air conditions and increase the fan speed. If air conditions are unavailable, open doors and windows to improve ventilation.) Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 7.

  7. Run the smmget [-l smm] -t sensorname -d thresholdall command on the CMC CLI, and check whether the upper threshold for major alarms is set properly.

    The proper threshold for the sensor is as follows:

    Ambient Temp: 57°C

    If the command output is as follows, the threshold is set properly:

    root@SMM:/# smmget -t Ambient Temp -d thresholdall  
    Upper Non-recoverable: 62.000 Celsius 
    Upper Critical       : 57.000 Celsius 
    Upper Non-critical : 47.000 Celsius 
    Lower Non-critical : Not supported. 
    Lower Critical       : Not supported. 
    Lower Non-critical : Not supported.

  8. Run the smmset -t sensor -d uppercritical -v value command (value indicates the upper threshold for major alarms) on the CMC CLI to change the threshold. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 9.

  9. Check whether the fan modules on all PFMs are running properly. If an alarm indicating a low fan speed is generated for the fan module on a PFM, remove and reinstall the PFM. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 10.

  10. Replace the PFM. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 11.

  11. Stop non-critical services to reduce the service load on the server. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 12.

  12. Check whether the air intake or exhaust vent is blocked. If yes, remove the barriers. Then check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to Step 13.

  13. Contact Huawei technical support.

Clearing

This alarm is cleared when the sensor detects that the temperature is lower than the major alarm threshold.

After the fault is rectified, the system automatically clears the alarm.

Translation
Download
Updated: 2018-12-29

Document ID: EDOC1000111849

Views: 79906

Downloads: 78

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next