In one project there were two MADM nodes Namely Hyderabad and Jamshoro combining two rings Ring 3A and Ring 3B as shown in attached Figure-1. Fiber cut occurred between these two nodes. Since there was only one physical fiber therefore both the rings Ring 3A and Ring 3B should go to protection. But customer complained that their services were down at the time of fiber cut and they restored after fiber was restored.
Temporary solution of this issue was to delete the database on SCC and re-download the configuration from the T2000. After doing this the problem resolved and we again checked the MSP switching that was successful. But the permanent solution of this issue is to upgrade to V100R003C02B032SP01 version which solves the problem of the 24 bytes memory block overwrite perfectly. This version has static allocation of memory.
For analysis of the problem we analyzed the alarms log for both the NE’s of Jamshoro and Hyderabad. It was found that there are alarms of RLOS, APS_INDI and MS_APS_INDI_EX on the Jamshoro node but on the site of Hyderabad there were no such alarms of protection switching. Then I checked the MSP configuration of the Hyderabad node. When I queried the MSP switching status of Hyderabad site then it gave me error on T2000 as shown in attached Figure-2.
After that I used the command line to check the status of the issue and it gave me error on the navigator also shown in attached Figure-3. The software version of the nodes was 220.127.116.11. I checked the issues of this particular version. But I could not found any bug of MSP. On further investigation it was found that problem occurred due to the 24 bytes memory block overwrite problem. These overwrite the class’s v-ptr which reset the SCC on saving the database and finally cause the configuration database abnormal. After that some of the configurations like enable MSP protocol didn’t send to EXCS Board. The 24 bytes block was based on dynamic allocate memory, then we changed the method to static allocate which could solve this problem.
This is special case when MSP switching was not successful and services were down. So whenever you encounter such issue check your software version. If it same version as mentioned above then upgrade the version of NE’s to V100R003C02B032SP01.