Traffic dropped sharply on the GE 1/0/0 interface of an ME60 at a site.
service boardd: BSF-21（LPUK）
1.ME60 configuration and user issues;
3.The processing capability issues of the BSUF-21;
1. After the issue occurs, First check the system configuration and alarms, We found ME60 configured correctly, And no user failure alarms of any information,Thus excluding ME60 configuration and user issues.
2. We find the same PE_AGG device is also connected to another following three old ME60 equipment, Confirm that three old ME60 equipment are no problem, So the exclusion of the upper PE_AGG equipment issues.
3. Checked information about the GE 1/0/0 interface and found that the count of TxPause frames increased continuously in the outbound direction. The sending of TxPause frames indicates that the inbound traffic rate of the GE 1/0/0 interface exceeded the processing capability of slot-1 board on the ME60. It is suspected that the peer device stopped sending traffic after receiving TxPause frames.
[JD-DSL-HU04-diagnose]dis int g1/0/0
GigabitEthernet1/0/0 current state : UP
Line protocol current state : DOWN
Description:** Connected to PE-AGGMX9-muj-214-1 xe-5/3/0 **
Route Port,The Maximum Transmit Unit is 4492
Internet protocol processing : disabled
IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is e024-7f96-1d39
The Vendor Name is Opnext Inc. , The Vendor PN is TRF5013FN-GA420
Transceiver max BW: 10000~11300Mbps, Transceiver Mode: Single Mode
WaveLength: 1310nm, Transmission Distance: 10km
Rx Optical Power: -3.15dBm, warning range: [-14.40, 0.50]dBm
Tx Optical Power: -2.91dBm, warning range: [-6.00, -1.00]dBm
Loopback:none, LAN full-duplex mode, Pause Flowcontrol:Receive Enable and Send Enable
Last physical up time : 2014-01-13 04:15:12 UTC+03:00
Last physical down time : 2014-01-13 04:15:04 UTC+03:00
Current system time: 2014-07-17 06:56:16+03:00
Statistics last cleared:2014-07-16 00:18:51
Last 300 seconds input rate: 1235819576 bits/sec, 121567 packets/sec
Last 300 seconds output rate: 159619960 bits/sec, 106815 packets/sec
Input: 14621401725011 bytes, 18694047910 packets
Output: 14381328841784 bytes, 18002084273 packets
Unicast: 18692669383 packets, Multicast: 516471 packets
Broadcast: 862056 packets, JumboOctets: 5375930760 packets
CRC: 0 packets, Symbol: 0 packets
Overrun: 0 packets, InRangeLength: 0 packets
LongPacket: 0 packets, Jabber: 0 packets, Alignment: 0 packets
Fragment: 0 packets, Undersized Frame: 0 packets
RxPause: 0 packets
Unicast: 17998939297 packets, Multicast: 3144887 packets
Broadcast: 89 packets, JumboOctets: 440805730 packets
Lost: 0 packets, Overflow: 0 packets, Underrun: 0 packets
System: 0 packets, Overrun: 0 packets
TxPause: 3144887 packets
Input bandwidth utilization : 13%
Output bandwidth utilization : 1.77%
4. Checked the TM chip information of the slot-1 board on the ME60 and found that FIFO overflow occurred on a TM chip. This indicates that the rate of traffic received by the TM chip exceeded its processing capability of 10 Gbit/s. As a result, The TM chip sent a backpressure signal.
TM_ReadReg:(0xc8090434)=0x0000ffff( 65535) SRX BP CTR //Here, 65535 indicates that the traffic rate exceeded the TM chip's processing capability.
TM_ReadReg:(0xc84a0438)=0x00000000( 0) STX LINK BP CTR
TM_ReadReg:(0xc80a0630)=0x00000000( 0) CTX FIC2TM BP CNT
TM_ReadReg:(0xc80a0634)=0x00000000( 0) CTX TM2FIC BP CNT
5. Analyzed the characteristics of live network services and found that the service boards were BSUF-21s and the users were LAC users. The user-side and network-side interfaces shared the same 10GE interface. Because L2TP traffic needs to be internally looped before being forwarded, A TM chip may have to process both the original L2TP traffic and the L2TP traffic looped back. As a result, The rate of traffic processed by a TM chip may exceed 10 Gbit/s even if the traffic rate of the 10GE interface for the L2TP service does not reach 10 Gbit/s.
6. Figure 3-1 shows how the problem happens on the GE 1/0/0 interface when the inbound traffic rate is 4.5 Gbit/s and outbound traffic rate is 0.5 Gbit/s. Here, The network-to-user direction is the inbound direction and the user-to-network direction is the outbound direction.
Figure 3-1 Schematic diagram for inbound traffic processing
The bidirectional link bandwidth between the NP and TM0 chip is 10 Gbit/s. Assume that the inbound traffic of the GE 1/0/0 interface is 4.5 Gbit/s (1), then the traffic received by the TM0 chip is 4.5 Gbit/s (2) + 4.5 Gbit/s (6). Because L2TP headers have been added to traffic marked by (6), The traffic received by the TM0 chip from the NP exceeds 9 Gbit/s. Considering the 0.5 Gbit/s outbound traffic, The actual traffic processed by the TM0 chip is sure to exceed its processing capability of 10 Gbit/s. As a result, The TM0 chip generates a backpressure signal and sends TxPause frames through the GE 1/0/0 interface to the peer device. Upon receipt of the TxPause frames, The peer device stops sending traffic. Then, The traffic received by the GE 1/0/0 interface drops sharply.
7. Checked the configurations on the other three ME60s and found that these three ME60s also had the L2TP service configured and the inbound traffic rate of the interface used for the L2TP service exceeded 5 Gbit/s. These three ME60s, However, Did not encounter sharp traffic decrease. It was found that the three ME60s all use BSUF-40s as service boards. BSUF-40s and BSUF-21s have different L2TP internal looping mechanisms. The following schematic diagrams explain why the ME60s with BSUF-40s as service boards did not encounter sharp traffic decrease.
Figure 3-2 Schematic diagram of L2TP internal looping for user-to-network traffic (upper) and network-to-user traffic on a BSUF-21 (lower)
Figure 3-3 Schematic diagram of L2TP internal looping for network-to-user traffic on a BSUF-40
On an interface, The user-to-network traffic is about 1/10 of the network-to-user traffic. Here, only the L2TP internal looping for network-to-user traffic is analyzed and compared between the BSUF-21 and BSUF-40.
See Figure 3-1 for how the traffic between the NP and a TM chip is calculated.
Figure 3-2 shows that on a BSUF-21, the TM0 chip processes both the original network-to-user L2TP traffic and the network-to-user L2TP traffic looped back. Figure 3-3 shows that on a BSUF-40, the TM0 chip processes only the original network-to-user L2TP traffic and the TM3 chip processes the network-to-user L2TP traffic looped back. When a BSUF-21 receives 5 Gbit/s inbound traffic, The TM0 chip has to process 10 Gbit/s traffic. When a BSUF-40 receives 5 Gbit/s inbound traffic, The TM0 and TM3 chips each process 5 Gbit/s traffic. Because the processing capability of each TM chip is 10 Gbit/s, the TM chips on a BSUF-40 do not have overstretched capacity problems when the inbound L2TP traffic is 5 Gbit/s.
Migrate the network-side interfaces of GE 1/0/0 and GE 7/0/0 interfaces to GE 1/1/0 and GE 7/1/0 interfaces respectively. This adjustment fully utilizes the bidirectional 10 Gbit/s forwarding capability of the channel between each TM chip and the NP on a BSUF-21, as shown in Figure 5-1. After the adjustment, The BSUF-21 allows the total bandwidth of two 10GE interfaces to be 10 Gbit/s in the inbound direction.
Figure 5-1 Schematic diagram of L2TP internal looping for network-to-user traffic on a BSUF-21 after the network-side interface is migrated to the GE 1/1/0 interface
If the GE 1/1/0 and GE 7/1/0 interfaces cannot function as network-side interfaces, Add two more BSUF-21s to the ME60 and migrate the network-side interfaces of the GE 1/0/0 and GE 7/0/0 interfaces to subcard 1 on each board respectively.