GSCC board reports HARD_BAD alarm through T2000 and Navigator. The connection with this NE is interrupted while communicating with ECC and user cannot login intermittently.
1. Check which board is reporting the alarm by analyzing the Alarm Parameter (If the latter four parameters of HARD_BAD is "0xff" HexaDecimal, then, the first parameter expressed the failed slot in HexaDecimal) but in some case the reading no meaning at all or parameter 4 and 5 have reading. Parameter 4 and 5 reading is not counted because by default it meaningless.
Parameter can be check by browsing alarm in T2000 or use navigator by command :alm-get-curdata:0,0;
2. In case of no reading on Parameter 1, plug in and out GSCC, alarm should be cleared or if not replace GSCC.
3. So, in this case parameter found 0x0d for parameter 1 and means 13 (D in hex equal to 13 in dec) , which is slot 13. Slot 13 equipped by BPA board.
4. Plug out BPA in slot 13 and alarm cleared, mean confirmed BPA has a problem. Then put back, alarm appear again. So after plug in out still have problem. It could be software problem or hardware problem.
5. Check version mismatch. The board which is reporting the alarm, need to check for version mismatch in version match table.
6. After check in version match table, found version mismatch between GSCC version and BPA board. Then upgarding BPA, after that alarm cleared.
Hard_Bad happened when intercommunication between the particular board and GSCC broken. Mean some hardware exception happened and one of that board lost internal interconnection.
1. By checking in T2000, it may show which board effected by hardware failure on one of the boards, but it's not obvious which slot reports Hard_Bad alarm.
So, for the troubleshooting purposes need to find out which board has hardware fault.
2. Checking by Alarm Parameter of HARD_BAD (0x0d 0xff 0xff 0xff 0xff), when the latter four parameters are 0xff, then the first parameter might shows the slot which is reporting the alarm. Only 1st parameter have a reading, 0x0d(HexDec)=13 in Decimal number means that slot 13 might incur the alarm which a BPA board is inserted in that slot. It mean BPA already lost communication with GSCC which alarm report in GSCC.
3. After finding out the board which is related to alarm, it could be either board failure or software failure. Try to reset the board first, if still not ok, try to check version match or could be board failure.
When you facing HARD_BAD alarm try to check:-
1. Parameter, normally 1st parameter will tell the board that report the alarm.
2. If no reading in parameter 1, it could be GSCC problem. Try warm reset, cold reset or plug in out GSCC board (slot number as in alarm show). If alarm still not clear, replace the board which slot report the alarm.