No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>Search


To have a better experience, please upgrade your IE browser.


The standby CXL fails to work normally or to report any alarm after the reset as a result of active standby SCC software synchronization anomaly in V1R1 NE upgrade

Publication Date:  2012-07-24 Views:  70 Downloads:  0
Issue Description

One day, after the standby CXL of OSN 1500 is reset through remote upgrade, the board cannot start.

Alarm Information
The local end raises no alarm, while the remote optical board reports an R_LOS alarm, and the service is switched over.      
Handling Process
1. Go to the site and find the red indicator on the PROG of the standby CXL is on, indicating the NE software is faulty.
2. Run :cm-set-ftp:open to enable the FTP function.
3. Telnet the standby SCC board and run sp dwFormatOFS,0 to format the Flash of the board (including the NE software, configuration files, FPGA, extended BIOS, and database). Then run sp dwEraseExtBios to erase the extended BIOS of non-file areas, and run reboot to reset the board.
4. Telnet the standby SCC board again, and run sp dwEraseSysParaArea,1 (instead of cold reset) to erase the main system parameter area. As the original basic BIOS of the SCC board is not damaged, it is allowed not to load it.
5. Log in to the standby SCC board through FTP to load related software. Load the relevant files and extbios.hwx in nesoft to the hwx file of osf1/osf2, and load the fpga.pga file to the fpga file of osf1/osf2. Remember to rename ne1500q1.ini to ne.ini and place it under /ofs1/hwx.
6. Unplug the board and plug it again, and perform cold reset of the standby SCC board, and the board starts normally. The R_LOS alarm disappears from the remote end elements, and the service stops switching.       
Root Cause
1. First we run :cfg-get-backup-info, and the returned value is 0, which indicates that the communication between the active and standby SCC boards is interrupted. Then we run :cfg-get-phybd, and no information is found about the boards in slots 83, 81 and 5. The local end has no BD_STATUS alarm reported by boards in slot 83 or 5 either.
2. Since the boards run normally before the problem occurs, it is suspected that something is wrong with loading of the NE software. We query the types and the amount of files that exist in the /ofs1/hwx, /ofs1/fpga, /ofs1/hwx, and /ofs1/fpga directories of the standby SCC board before reset, and no anomaly is found. Moreover, the /ofs1/hwx directory contains the ne.ini file too.
3. Since the active CXL is working normally, it can be inferred that nothing is wrong with the loading of the NE software or other files. It is suspected that some anomaly occurs to the process of synchronization between the active and standby SCC software.
4. We check the upgrade logs and find that when the file ne1500.hwx is copied synchronously from the active SCC board to the standby SCC board, the returned amount of the loaded packets is 5374 instead of the normal 9401, and a fault prompt pops up:
                 ERROR-CODE  Destination-NEID  Destination-BoardID                
                 0x90b6      589956            83 
5. It is clear now that the standby SCC board remains out of service after reset due to incomplete loading of the ne1500.hwx file.             
The NE software of the NG-SDH series contains many files, and the R1 series of NEs must be upgraded using the Navigator. Therefore, it is necessary to pay close attention to the information in the command line window returned by the board during loading and active-standby synchronization.