No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

E9000 alarm - FRU communication lost View 0X2C07FFFF

Publication Date:  2018-02-26 Views:  43 Downloads:  0
Issue Description

             Description

Alarm message:

FRU communication lost.

 

This alarm is generated if a running board (compute node or switch module) loses communication for over five minutes.

This alarm is generated by the following sensors:

  • SlotN Monitor (N indicates the board slot number and ranges from 1 to 32.)
  • M Manage (M indicates the switch module slot number. The values are 1E, 2X, 3X, and 4E.)

Alarm Information

Major    MM1 Thu Jan 18 13:22:54  2018    Slot12 Manage    FRU communication lost  View    0X2C07FFFF

Major    MM1  Thu Jan 18 12:57:11 2018    Slot10 Manage    FRU communication lost   View    0X2C07FFFF

Major    MM1  Fri Jan  12 14:31:50 2018      Slot7 Manage     FRU communication lost  View   0X2C07FFFF

 

The alarms were on  present on three different E9000 chassis in the same data center.



 

Handling Process

Procedure


1. Check whether the power indicator status of the board is on.
        If yes, go to 3.

        If no, go to 2.

2. Directly remove the board and stop using it. Do not reinstall the board or power off and then on the board. Then go to 4.
3. Power off the board, remove and reinstall it, and power it on again. Then check whether the alarm is cleared.
          If yes, no further action is required.

          If no, go to 4.

4. Contact Huawei technical support.

---------------

As the power indicator was on, the customer followed steps 1-3-4, because reinstalling the board did not solve the issue.


We continued the troubleshooting process with the following steps:

5.     Try to failover MM910 (switch from the active MM to the passive MM) :

root@SMM:/#
smmset -l smm -d failover -v 1

This operation will make SMM switch over to standby.Continue?:y

Then check if the alarm has cleared. If the alarm is still present, go to step 6.

6.      Using High Density Cable connect to position no.2 (image below) in front of CH121V3.     And then using port no.3 on the cable (image below) serial port try to login the BMC and check BMC status.  

How to connect to the serial port:

 
 

3


      Serial port (RJ45)
 

How to log-in to BMC: http://localhost:7890/newhdx.cgi?fe=1&lib=30005320&v=08&tocLib=30005320&tocV=08&id=it%5fserver%5fserialport%5fputty&tocURL=resources%252fen%252fx6800%252fout%252fx6800%255fuser%255fguide%252fSemiXML%2528x6800%255fsemi%2529%252fcollection%255ffile%252fit%255fserver%255fserialport%255fputty%252ehtml&p=t&ui=3&keyword=serial%2520port

How to check status:


iBMC:/->ipmcget -d health

System in health state.


       If you get a different message than “System in health state.”, please let us know what that message is.

7.       If this step doesn’t solve the issue and you have a unoccupied slot in the E9000, please move the blade to that slot and check if the issue is resolved.
Root Cause

The alarm was cleared at step 5, when switching over to the passive MM.

Solution

Switched from the active MM to the passive MM :

root@SMM:/#
smmset -l smm -d failover -v 1

This operation will make SMM switch over to standby.Continue?:y

END