Troubleshooting of Video Got Freezing For S5700(V200R001C00SCP300 )

Publication Date:  2016-06-12 Views:  423 Downloads:  0
Issue Description
There are mosaics for the video. When the customers do the ping test for cameras, packets lost randomly.
Alarm Information

There are packets lost.

Handling Process

1. There are packets lost for the ping test from the access SW.

2. Check the log of access SW.

Set the ports that connected with terminals (PC, server, AP, cameras...) as edge port
#
[SW3700] interface ethernet X/X/X
[SW3700-EthernetX/X/X] stp edged-port enable
#
And bpdu-protection is needed.
#
[SW3700] stp bpdu-protection
#
Display stp tc-bpdu statistic (collect this information 5 times interval 10 minutes).

3. The customer feedback that the ETH0/0/7&0/0/8 is not connected with cameras. The camera 10.102.X.12 is connected with ETH0/0/25. After step 2 this issue is still not resolved. And did not find useful information.
4. Check the log for ETH0/0/25.
Ethernet0/0/25 current state : UP
Line protocol current state : UP
Description:connect to IVS Terminal
Switch Port, PVID :  820, TPID : 8100(Hex), The Maximum Frame Length is 1600
IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is d4b1-10c4-9520
Port Mode: COMMON FIBER
Speed :  100,  Loopback: NONE
Duplex: FULL,  Negotiation: DISABLE
Last 300 seconds input rate 1227640 bits/sec, 117 packets/sec
Last 300 seconds output rate 736704 bits/sec, 270 packets/sec
Input peak rate 23387464 bits/sec, Record time: 2008-01-10 05:11:24
Output peak rate 30238432 bits/sec, Record time: 2008-01-09 00:34:16
Input:  140003585 packets, 184635823426 bytes
Unicast        :           136805886, Multicast          :             3087824
Broadcast      :              109875, Jumbo              :                   0
CRC            :                   0, Giants             :                   0
Jabbers        :                   0, Fragments          :                   0
Runts          :                   0, DropEvents         :                   0
Alignments     :                   0, Symbols            :                   0
Ignoreds       :                   0, Frames             :                   0
Discard        :                   0, Total Error        :                   0
Output:  247155338 packets, 79555940234 bytes
Unicast        :            80616485, Multicast          :           159060537
Broadcast      :             7478316, Jumbo              :                   0
Collisions     :                   0, Deferreds          :                   0
Late Collisions:                   0, ExcessiveCollisions:                   0
Buffers Purged :                   0
Discard        :              215818, Total Error        :                   0
    Input bandwidth utilization threshold : 100.00%
    Output bandwidth utilization threshold: 100.00%
    Input bandwidth utilization  : 1.23%
Output bandwidth utilization : 0.74%

5. Set traffic statistics for the port ETH 0/0/25 for inbound (S3700 only support traffic statistics for inbound direction).
6. And did not find packets lost for port ETH0/0/25 for inbound direction.
7. Capture the packets for the port ETH0/0/25.
From the capture packet file, most of the traffic is Unicast.  The Multicast is very small. You can find Multicast from the pictures just as below (green line). Although you can find there are a lot of Multicast but the packets is very small only=62 bytes. But the Unicasts are 1514 bytes. You can find the details just as below:

So, the it is due to unicast use out the bandwidth. That is the reason, when we configure “multicast-suppression  20” but it is not help.
The bandwidth of Eth0/0/25 is 100M.
From the capture packet file, you can find at the time 18:56:15, 18:56:31, 18:56:47, the traffic reach to 100M.
The interval is about 10~20s. This is coincidence with the result of ping test.
Each time when the traffic reach to 100M, it will result in the discard and packets lost.

8. The S3700 don’t support “qos brust-mode”. So we can’t do setting on S3700 to resolve this issue.
From the topology, we found the cameras connected with Fiber Receiver.
Please confirm whether the Fiber Receiver connected with S3700 by 100M ports and Fiber Receiver connected with cameras by 1000M ports?
Please do setting--speed limit for cameras or Fiber Receiver. And then it will resolve this issue.
9. Change this to CBR and 2000.

10. This issue is still not resolved.
11. After deep analysis of the capture packets in the ACC-02 switch which connect to the camera directly. We found there has a lot of TCP retransmission packets, and when the burst traffic reach to 100M, it will result the camera appears mosaic and the ping packets lost. Because the Ethernet 0/0/25 bandwidth is only 100M which connect to the camera.

12. Another thing we found there has a lot of unicast packets from other cameras, which should not transfer to Ethernet 0/0/25. We can find other cameras 10.102.X.18 and 10.102.X.16 packets in the capture as below.

13. Then we guess there may have a loop or mac-flapping in the ACC-02 switch, so we suggest to enable the “loop-detect eth-loop alarm-only” command in VLAN 820. After that we can find the mac-flapping trap happen between the port GE0/0/1 and GE0/0/2 in ACC-02 switch as below:
#Jan 31 2008 03:43:15-05:13 dsbr3accsw02 L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 Loop exists in vlan 820, for flapping mac-address 001c-XXXX-dab1 between port GE0/0/1 and port GE0/0/2.
#Jan 31 2008 03:42:44-05:13 dsbr3accsw02 L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 Loop exists in vlan 820, for flapping mac-address 001c-XXXX-da4f between port GE0/0/1 and port GE0/0/2.
#Jan 31 2008 03:42:12-05:13 dsbr3accsw02 L2IFPPI/4/MFLPVLANALARM:OID 1.3.6.1.4.1.2011.5.25.160.3.7 Loop exists in vlan 820, for flapping mac-address 001c-XXXX-737f between port GE0/0/1 and port GE0/0/2.
14. Then we check the STP staus in the ACC-02 switch, and find the GE0/0/2 is discarding, but its role is DESI, not ALTE. It means the STP protocol has a wrong status in this network.
MSTID  Port                        Role  STP State     Protection
   0    Ethernet0/0/3               DESI  FORWARDING      NONE
   0    Ethernet0/0/4               DESI  FORWARDING      NONE
   0    Ethernet0/0/7               DESI  FORWARDING      NONE
   0    GigabitEthernet0/0/1        ROOT  FORWARDING      NONE
   0    GigabitEthernet0/0/2        DESI  DISCARDING      NONE
15. Then we open the debug in the ACC-02 and AGG-02 switch, and find the AGG-02 switch only can send the STP protocol packet to ACC-02 switch but can’t receive the STP protocol packet form the ACC-02 switch.
We check the configuration and version information between the AGG-01 and AGG-02. They have different configurations and different versions.
Version BPDU Enable function
AGG-01 S5700V200R001C00SCP300 and latest version
It’s enabled by default, and can’t be seen in the configuration.
AGG-02 S5700V100R005C01SPC100
It’s enabled by default, and can be seen in the configuration.
We find the Port in the AGG-02 connect to the ACC switch has no BPDU enable configuration, then it will make the STP calculate a wrong result periodicity, then there has a loop and mac-flapping in all of the ACC switch.
16. Trigger condition of this issue:
When the AGG-02 with V1R5 version has no BPDU enable configuration in the port , then the STP protocol will calculate a wrong result periodicity, then there will has a loop and mac-flapping in all of the ACC switch.
17. Cause Analysis:
Consequently, the video service will flood in the VLAN 820, and make a burst traffic beyond the bandwidth of the ACC switch, then the cameras will appear mosaic and the ping packet will drop randomly.
18. Solutions and Measures
To add the configuration “bpdu enable” command on all the ports which enable the STP protocol of the AGG-02 switch.
And we suggest upgrading the version of AGG-02 to the same version as the AGG-01 switch.


Root Cause
The AGG-02(S5700) with version V100R005C01SPC100 do not support BPDU enable configuration in the ports by default. Then the STP protocol will calculate a wrong result periodicity. There will be a loop and MAC flapping in all of the access SWs. Consequently, the video service will flood in the VLAN 820, and make a burst traffic beyond the bandwidth of the access switch, then the cameras will appear mosaic and packets lost for ping test randomly.
Suggestions
During the Troubleshooting, using different way to reduce the arrange of  possible root cause. That will be helpful for the work.

END