No relevant resource is found in the selected language.

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies. Read our privacy policy>

Reminder

To have a better experience, please upgrade your IE browser.

upgrade

LCM channel lost packet exceed 5 percent

Publication Date:  2014-06-11 Views:  55 Downloads:  0
Issue Description

On Dec 1th, 2012, all of the LPUs and sfus find there are packet loss on their LCM channel 0. And On Dec 4th, 2012,  some boards find there are checksum error on their LCM channel 0.

=====================================================

  ===============display alarm all===============

=====================================================

----------------------------------------------------------------------------

Index  Level      Date      Time                        Info

 

1      Emergency  12-12-04  10:16:25    Slot 4 is failed, LCM channel 0 has re

                                        ceived checksum error packet

2      Emergency  12-12-04  10:16:23    Slot 11 is failed, LCM channel 0 has r

                                        eceived checksum error packet

3      Emergency  12-12-04  10:15:55    Slot 3 is failed, LCM channel 0 has re

                                        ceived checksum error packet

4      Emergency  12-12-04  10:15:22    Slot 1 is failed, LCM channel 0 has re

                                        ceived checksum error packet

 

5      Emergency  12-12-01  16:06:20    Slot 4 is failed, LCM channel 0 lost p

                                        acket exceed 5 percent

6      Emergency  12-12-01  09:56:35    Slot 2 is failed, LCM channel 0 lost p

                                        acket exceed 5 percent

7      Emergency  12-12-01  09:34:38    Slot 3 is failed, LCM channel 0 lost p

                                        acket exceed 5 percent

8      Emergency  12-12-01  09:33:01    Slot 6 is failed, LCM channel 0 lost p

                                        acket exceed 5 percent

9      Emergency  12-12-01  09:28:35    Slot 5 is failed, LCM channel 0 lost p

                                        acket exceed 5 percent

10     Emergency  12-12-01  08:45:50    Slot 13 is failed, LCM channel 0 lost

                                        packet exceed 5 percent

11     Emergency  12-12-01  08:43:18    Slot 12 is failed, LCM channel 0 lost

                                        packet exceed 5 percent

12     Emergency  12-12-01  08:43:08    Slot 11 is failed, LCM channel 0 lost

                                        packet exceed 5 percent

13     Emergency  12-12-01  08:31:34    Slot 1 is failed, LCM channel 0 lost p

                                        acket exceed 5 percent

----------------------------------------------------------------------------

Handling Process

The ECM channel is a management channel for implementing inter-board communication, such as delivering FIB entries, sending and receiving heartbeat packets, delivering chip configurations, creating interfaces, and loading patch files.

And there are some reliability mechanism on this manage channel . first ,in order to detect the packet modification by chips , we use CRC checksum to compare the sending packet and the receiving packet.

Second ,in order to detect the packet losing ,we design a alarm to detect the packet losing though statics the receiving packets.

the following picture shows the ECM structure. Each lpu and sfu has two lcm channels, channel 0 shows in blue lines, channel 1 shows in red lines. The on Dec 1th, 2012 means there are packet losing on the lcm channel 0 as the following blue line shows. The alarms On Dec 4th, 2012 means the lpus and sfus found ther was checksum error on their receiving packet form MPU9.

 

   The following red information show there are checksum error on the MPU9 received packet form each LPU and SFU. As the number is increasing , so there was a chip modifying the packet continuously . as the packets form all the lpus and sfus influxed into the Lanswitch chip , so we can conclude that the lanswitch chip on mpu9 adopt packets.

 [TLT_BNG_NE40_1-diagnose]debugging lcm display checksum-sequence 9

The checksum and sequence information:

The MPU's checksum flag=1, alarm threshold: checksum=30, sequence=5

     checksum       ok     fail  sequence    seq_lost    mac_fail

--------------------------------------------------------------

slot= 1,port=0,    22          0     8387228     35353       0

slot= 1,port=1,   669442       0     48617932    0           0

slot= 2,port=0,    157         19    24654334    33127       0

slot= 2,port=1,   485536       0     10363441    0           0

slot= 3,port=0,    154         16    23694140    37759       0

slot= 3,port=1,   479782       0     10282757    0           0

slot= 4,port=0,    207         23    135150356   38072       0

slot= 4,port=1,   1434188      0     22993599    0           0

slot= 5,port=0,    58          6     23395610    38721       0

slot= 5,port=1,   477364       0     10252530    0           0

slot= 6,port=0,    38          4     23396119    38554       0

slot= 6,port=1,   477425       0     10252601    0           0

slot=10,port=0,    1344470     0     136719124   0           0

slot=10,port=1,   406129      0     17113233    0           0

slot=11,port=0,    128         23    12396675    39682       0

slot=11,port=1,   371717      0     8853122     0           0

slot=12,port=0,    83          13    12396679    39838       0

slot=12,port=1,   371726      0     8853089     0           0

slot=13,port=0,    213         28    12396631    39021       0

slot=13,port=1,   371729      0     8853139     0           0

--------------------------------------------------------------

Root Cause

The lanswitch chip on mpu9 adopt packets. Which lead to the packet losing and checksum alarm.

Solution

It was a risk to keep the mpu9 in using ,which may lead to the manage channel faulty or resetting of all the boards. So the solution is to change the mPU9 with a spare part as quick as possible.

Suggestions

do inspection in each season. in this way ,we can find the potenial risks

END