ACTIVE BOARD WAS SWAPPED CAUSE STANDBY SCC BOARD FAULT NOT SYNCHRONIZED,32 GSM SITES WERE DOWN
Publication Date: 2012-07-25Views: 81Downloads: 0
One day, it was reported to Huawei that despite fibre cut being repaired 32 GSM sites using OSN2500 for backhaul were still down.
Fiber cut occurred at 09:41:41, and node was offline from 09:43:55 until 23:14:33, Alarm Name NE Occur Time Restore Remarks
Local NM-24-NAKURU CXR
OSN disconnects from NMS.
Local NM-24-NAKURU CXR
NMS cannot manage the OSN.
NMS U2000 alarm
Alarm Name NE Time Remarks
Communication with standby board failure
Board goes into offline state
Syncronization of active/standby failed.
The OSN GSCC board is used to store the configuration . From 04:02:36 slots 10/81/83 had COMMUN_FAIL alarm , offline alarm and SYNC_FAIL Alarm, which means the communication between active/standby board had failed.
Thus 82-GSCC board had the latest configuration, 83-GSCC had old configuration.
1.The standby board was faulty and not synchronized with the active board and still had old configurations.
2.The active board was reset/swapped causing it to become standby, the system loaded old configuration from the other board which became active.
3.Service was restored when service was reconfigured to use the new BSC STM-1.
From the OSN blackbox of slot 8, we found the service had rolled back to VC4 level configuration from 13:06:15, and recovered to VC12 from 18:54:20 after manual re-configuration by Huawei engineers.
387628 13:6:15 0xA9 08 3C 01 00 00 01 00 00
388165 18:54:20 0xA9 08 3C 01 00 00 01 03 01
Thus between 13:06:15 to 18:54:20 the OSN was using the old configuration (before cutover of GSM E1 to BSC STM-1), this was why even after fiber was restored service did not recover.
At 13:06:15 the standby board became active and the active board became standby. This was because of a manual physical swap of the board by TKL maintenance staff.
From black box logs the active board 82-GSCC was active and was reset at 2011- 9-24 13: 05:43
Init OCP Log OK 13: 5:43 82 0
After this swap, slot 82 become standby and slot 83 became active, so all the boards re-downloaded service from 83-GSCC which had old configuration.
1.When there is a major alarm especially related to GSCC which stores the configurations, handle with immediately.
2.Avoid manual reset/swap of control boards unless sure that the active/standby boards are synchronized.