No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

E9000 Server V100R001 HMM Alarm Handling 19

This document describes E9000 server alarms in terms of the meaning, impact on the system, possible causes, and solutions.
Rate and give feedback :
Huawei uses machine translation combined with human proofreading to translate this document to different languages in order to help you better understand the content of this document. Note: Even the most advanced machine translation cannot match the quality of professional translators. Huawei shall not bear any responsibility for translation accuracy and it is recommended that you refer to the English document (a link for which has been provided).
Slot: RAID Controller Card MCE/AER Error (Critical, RAIDN PCIE ERR)

Slot: RAID Controller Card MCE/AER Error (Critical, RAIDN PCIE ERR)

Description

Alarm message:

RAID or NIC card fault

or

The arg3 RAID controller card arg1 triggered an uncorrectable error, arg2.

This alarm is generated when an uncorrectable error occurs on a RAID controller card.

This alarm is generated by the following sensors:

RAIDN PCIE ERR

NOTE:

If the server supports only one RAID controller card, the corresponding sensor names do not include N.

Attribute

Alarm ID

Alarm Severity

Auto Clear

0x0341FF17

Critical

Yes

Parameters

Name

Meaning

N, arg1

Slot number of the RAID controller card.

arg2

Error code of the alarm.

arg3

Front I/O module or compute node, for example, FM or CM.

Impact on the System

Services related to the RAID controller card are affected.

Possible Causes

  • The RAID controller card is faulty.
  • The mainboard is faulty.

Procedure

  1. Power off and remove the compute node from the chassis. Then check whether there is damage or poor contact between the component and its slot.

    • If yes, go to 5.
    • If no, go to 2.

  2. Power on the server to start the power-on self-test (POST) and then run test software. Check whether the POST succeeds and the test software finds no fault.

    • If yes, go to 3.
    • If no, go to 4.

  3. Check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 4.

  4. Replace the RAID controller card. Then, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 5.

  5. Replace the mainboard. Then, check whether the alarm is cleared.

    • If yes, no further action is required.
    • If no, go to 6.

  6. On the HMM WebUI, choose System Management > Information Collection, and collect logs.
  7. Contact Huawei technical support.
Translation
Download
Updated: 2018-08-16

Document ID: EDOC1000015902

Views: 192809

Downloads: 1565

Average rating:
This Document Applies to these Products
Related Documents
Related Version
Share
Previous Next