No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

ACTIVE BOARD WAS SWAPPED CAUSE STANDBY SCC BOARD FAULT NOT SYNCHRONIZED,32 GSM SITES WERE DOWN

Publication Date:  2012-07-25 Views:  36 Downloads:  0
Issue Description
One day, it was reported to Huawei that despite fibre cut being repaired 32 GSM sites using OSN2500 for backhaul were still down.







Alarm Information
Fiber cut occurred at  09:41:41, and node was offline from  09:43:55 until  23:14:33, Alarm Name NE Occur Time Restore Remarks
NE_NOT_LOGIN
Local NM-24-NAKURU CXR
 09:43:43
 23:14:33
OSN disconnects from NMS.
NE_COMMU_BREAK
Local NM-24-NAKURU CXR
 09:43:55
 23:14:33
NMS cannot manage the OSN.
NMS U2000 alarm
Alarm Name NE Time Remarks
COMMUN_FAIL
24-NAKURU CXR-83-GSCC-OTHER
 4:02
Communication with standby board failure
BD_STATUS
24-NAKURU CXR-83-GSCC-OTHER
 4:05
Board goes into offline state
SYNC_FAIL
24-NAKURU CXR-82-GSCC-OTHER
 4:05
Syncronization of active/standby failed.
OSN Alarm
The OSN GSCC board is used to store the configuration . From  04:02:36 slots 10/81/83 had COMMUN_FAIL alarm , offline alarm and SYNC_FAIL Alarm, which means the communication between active/standby board had failed.
Thus 82-GSCC board had the latest configuration, 83-GSCC had old configuration.




Handling Process
1.The standby board was faulty and not synchronized with the active board and still had old configurations.
2.The active board was reset/swapped causing it to become standby, the system loaded old configuration from the other board which became active.
3.Service was restored when service was reconfigured to use the new BSC STM-1.




Root Cause
From the OSN blackbox of slot 8, we found the service had rolled back to VC4 level configuration from  13:06:15, and recovered to VC12 from 18:54:20 after manual re-configuration by Huawei engineers.
387628  13:6:15 0xA9 08 3C 01 00 00 01 00 00
388165  18:54:20 0xA9 08 3C 01 00 00 01 03 01
Thus between  13:06:15 to  18:54:20 the OSN was using the old configuration (before cutover of GSM E1 to BSC STM-1), this was why even after fiber was restored service did not recover.
At  13:06:15 the standby board became active and the active board became standby. This was because of a manual physical swap of the board by TKL maintenance staff.
From black box logs the active board 82-GSCC was active and was reset at 2011- 9-24 13: 05:43
Init OCP Log OK  13: 5:43 82 0
After this swap, slot 82 become standby and slot 83 became active, so all the boards re-downloaded service from 83-GSCC which had old configuration.




Suggestions
1.When there is a major alarm especially related to GSCC which stores the configurations, handle with immediately.
2.Avoid manual reset/swap of control boards unless sure that the active/standby boards are synchronized.




END