Networking: NE40E-1 and NE40E-2 are connected through Eth-Trunk 1. There are two member interfaces, G 3/0/1 and G 2/0/10, of Eth-Trunk 1. G 2/0/0 is connected to S8505-1. Eth-Trunk 1 and G 2/0/0 are Layer 2 interfaces and belong to VLAN 903. G 4/0/16 is connected to the left network. VLANIF 903 and G 4/0/16 are bound with the VPN instance CDMA-RP. VRRP is enabled on both NE40Es. Heartbeat packets are transmitted through the Eth-Trunk link between the NE40Es. NE40E-1 is in the master state.
ip binding vpn-instance CDMA-RP
ip address 172.16.96.1 255.255.255.252
ip binding vpn-instance CDMA-RP
ip address 172.16.126.46 255.255.255.128
vrrp vrid 93 virtual-ip 172.16.126.45
vrrp vrid 93 priority 105
Symptom: According to the customer's feedback, services are affected. The technical support engineers log in to the device to perform the ping operation and find that a few packets are lost in the ping from PSF (172.16.97.1) to PDSN (172.16.126.42) in the peak hours each day.
According to the provided information, a static route is configured on PDSN, which is destined for the virtual IP address of VLANIF 903 on NE40E-1/2. The path of the ping packets is indicated by the blue dotted line in the networking diagram. Because the VRRP status on NE40E-2 is slave, thus, only Layer-2 forwarding is implemented. When returned ping packets reach NE40E-1, they are forwarded to G 4/0/16 through the local cross route.
Configure ACL rules on the interfaces along the path of the packets to match ping packets (based on the ICMP port number, source IP address, and destination IP address) to find out the device where packets are lost. It is found that the count of returned ping packets on Eth-Trunk 1 of NE40E-1 is correct, but the count of packets sent from G 4/0/16 is not so many. The number of discarded packets is the same as that of the lost ping packets. Thus, it is confirmed that packets are discarded on NE40E-1. The statistics show that the returned ping packets enter G 2/0/10 of Eth-Trunk 1.
Check the statistics on the packets discarded on the upstream and downstream boards. Because some statistics increase all the time and the discarded ping packets are a few, it is difficult to find out the cause of packet discarding. Perform the ping operation on another path. It is found that a few packets are also lost. There is one thing in common that both the two paths pass board 2 on NE40E-1. The primary analysis shows that a fault may occur to board 2, but proof is needed. Thus, a further analysis is required.
Because ACL count is implemented by the forwarding engine 588, the possibilities of sub-card or X11 faults can be excluded. To narrow the scope of fault location, configure user-queue in complex traffic classification so that the packets are sent to the flow queues of 567. If packets are lost, check the count of the packets in the flow queues to confirm whether packets are discarded before reaching 587. This method, however, cannot determine whether the packets are lost on 587, SFU, or downstream NP.
traffic behavior test_1
user-queue cir 1000000 pir 1000000
[BJ-BJ-DS-CE-1.CDMA]display user-queue statistics traffic behavior test_1 inbound
Traffic behavior test_1 inbound traffic statistics:
Pass: 2,500 packets, 3,647,500 bytes
Discard: 0 packets, 0 bytes
Remark the packets. Locate the problem within a smaller range by collecting statistics on 587 with the priority of CQ. Since packets are forwarded at Layer 3 on NE40E-1, and the outbound interface of G 4/0/16 does not have packets with the priority of AF4, remark the ping packets to AF4. The statistics about packets of the priority AF4 after CQ on G 4/0/16, however, are 0. Capture packets of a fixed length. The Qindex of the packets captured by the IPE module is 0.
[BJ-BJ-DS-CE-1.CDMA-hidecmd]display pe-entry 2 0 aclv4 3335
The entry is valid. PeNum = 0
TCAM Address: 96782(0x17a0e)
IFCT(H): 00000000 0A045608 3F155608 30808080 00000000
Mask: 00001F00 01FB8000 00000000 0000007F FFFFFFFF
Field Value Mask
TID 0 0
Rsv1 0 0x1
IsMpls 0 0x1
EXP 0 0x7
GID 2 0
DataType 1 0
IDSCP 0 0x3f
isIPv4 1 0
FM 0 0x1
FF 0 0x1
NoOption 0 0x1
SIP 172. 16.126. 42 0. 0. 0. 0
DIP 172. 16. 97. 1 0. 0. 0. 0
Prtcl 1 0
TCPSPort 0 0xffff
TCPDPort 0 0xffff
TCPFlag 0 3f
DDR Ram Address(Ingress): 265479(0x40d07)
DDR Ram Address(Egress): 527623(0x80d07)
IFIT(H): 00000004 00000002 A2800004 00000000
PriCmd: 2 DSCP: 0x22
UsrPriPass: 1 DSCPPass: 0 //DSCP is remarked in the packets.
CARCmd: 0 CARID: 0
Opcode: 2 Mirror: 0 CPUCopy: 0
ChgVRID: 0 VRID: 0
ChgFID: 0 FID: 0
EFIT(H): 00000000 00000000 00000000 04000000
Opcode: 2 CARCMD: 0
CARID: 0 Remark: 0 ODSCP: 0
Ingress Hit counts:105000 //Packets match the ACL.
Egress Hit counts:0
After an analysis of captured packets, it is found that the ping packets entering G 2/0/10 are not returned through G 4/0/16 by Layer 3 forwarding as what is expected. Instead, the packets are sent through G 2/0/0 by Layer 2 forwarding. Thus, for the CQ count on G 4/0/16, the Layer 2 and Layer 3 forwarding packets are of different lengths. That is the reason why the IPE module cannot capture the packets. After the packets are sent to the S8505, they are returned through G 2/0/0 and the destination MAC address changes to 0000-5e00-015d. The corresponding VRRP group ID is 93. This group exists on NE40E-1. Thus, packets are forwarded at Layer 3. The packets captured by the IPE module are such packets.
////For the packets, of a fixed length, captured by G 2/0/10, their destination MAC address is 0000-5e00-0101, and the corresponding VRRP group ID is 1. Because there is no corresponding VRRP group on NE40E-1, the packets are forwarded at Layer 2 on NE40E-1.
[BJ-BJ-DS-CE-1.CDMA-hidecmd]display pe-probe 2 0 iphp-data
1698a000 00005e00 01010007 bad8996f
81000387 08004500 059476f2 00003f01
c82aac10 7e2aac10 61010000 c01b2354
00193174 eb3cac10 7e2a0001 02030405
06070809 0a0b0c0d 0e0f1011 12131415
SPIHead(32=4B): PkrLen(14) = 5a6, IPort(6) = a, Desc(1) = 0
// The outbound interface corresponding to the MAC address of 0000-5e00-0101 is G 2/0/0. Thus, packets are sent through G 2/0/0.
//The engineers are confused by the following packets. At the very beginning, the engineers concern about the source and destination IP addresses, ICMP protocol number, destination TB, and TP, but do not check the source interfaces and MAC addresses carefully. The packets, in fact, are sent from G 2/0/0.
[BJ-BJ-DS-CE-1.CDMA-hidecmd]display pe-probe 2 0 ipe-data
16980000 0000670c e5040024 4004ac10
60020002 00004500 0594be36 00003d01 //ICMP packets
82e6ac10 7e2aac10 61010000 e2553b0e //DIP and SIP are consistent with the DIP and SIP in the ICMP packets.
0013317a b148ac10 7e2a0001 02030405
06070809 0a0b0c0d 0e0f1011 12131415
16171819 1a1b1c1d 1e1f2021 22232425
SPIHead(32=4B): PkrLen(14) = 5a6, IPort(6) = 0
Ingress TM Header(4B):IFQID(15) = 0
Mirror(1) = 0
toCPU(1) = 0
Fabric Class(2) = 3
FC_Class(3) = 7
TB(8) = c
Fram Header(10B/14B): MC/UC(1) = 1
FCinfo(4) = c
FHL(1) = 1
ContrlField(2) = 1
Qindex(4) = 0
TP(6) = 10 //The destination port is G 4/0/16.
DataType(2) = 0
L3stake(6) = 0
DSU(4) = 9
FHF(4) = 1 //U1 frame, IPv4 unicast
SP/SendTP(6) = 0
Res(2) = 0
SB(6) = 4 //The source port is G 2/0/0.
FHE(32) = ac106002
FHE_II(32) = 20000
Cause(8) = ac
Most of the packets from the PDSN enter G 2/0/0. The interface cannot support so many packets when the traffic reaches its peak. Packets are thus discarded owing to back pressure, and services are affected and ping packets are lost.
[BJ-BJ-DS-CE-1.CDMA]display interface GigabitEthernet 2/0/0
GigabitEthernet2/0/0 current state : UP
Switch Port,PVID : 903,The Maximum Transmit Unit is 1500
IP Sending Frames' Format is PKTFMT_ETHNT_2, Hardware address is 0018-8287-9cb6
The Vendor PN is FTLF1521P1BCL-HW
The Vendor name is FINISAR CORP.
Port BW: 1G, Transceiver max BW: 1G, Transceiver Mode: SingleMode
WaveLength: 1550nm, Transmission Distance: 40km
Rx Power: -15.09dBm, Tx Power: -2.15dBm
The setted port type is: fiber-1000
Loopback:none, full-duplex mode, negotiation: enable, Pause Flowcontrol:Receive Enable and Send Enable
Last physical up time : 2008-11-29 00:24:54
Last physical down time : 2008-11-28 00:04:43
Statistics last cleared:2009-04-16 03:55:12
Last 300 seconds input rate: 946727096 bits/sec, 211000 packets/sec
Last 300 seconds output rate: 948213320 bits/sec, 211554 packets/sec
Input: 11903886988020 bytes, 21545409259 packets
Output: 11912139673940 bytes, 21598580426 packets
Unicast: 21545227159 packets, Multicast: 169173 packets
Broadcast: 12927 packets, JumboOctets: 0 packets
CRC: 0 packets, Symbol: 0 packets
Overrun: 0 packets
LongPacket: 0 packets, Jabber: 0 packets, Alignment: 0 packets
Fragment: 0 packets, Undersized Frame: 0 packets
RxPause: 0 packets
Unicast: 21598380710 packets, Multicast: 193581 packets
Broadcast: 6135 packets, JumboOctets: 0 packets
Lost: 0 packets, Overflow: 0 packets, Underrun: 0 packets
TxPause: 0 packets
After the communication with the front line, it is confirmed that the static route configured on the PDSN should be destined for the virtual IP address of the VRRP group configured on NE40E-1/2. The static route, however, is mistakenly destined for the virtual IP address of the VRRP group on the S8505. NE40E-1 and NE40E-2 transmit the heartbeat packets between the two S8505s and learn the MAC address entry corresponding to the virtual MAC address (0000-5e00-0101) of the S 8505. Thus, when packets reach NE40E, they will be sent to the S8505 through G 2/0/0. The NE40E is also enabled with VRRP. Thus, the S8505 encapsulates the virtual MAC address of the NE40E into the received packets, which are then sent to G 2/0/0. Then, the NE40E forwards the packets at Layer 3. The brown line and green line in the figure shows the paths of the ping packets.
Now, the cause of the problem is clear. The incorrect configuration on the PDSN affects services.
The problem is caused by the incorrect configuration on the PDSN.
Change the static route configured on the PDSN to make it be destined for the virtual IP address of the VRRP group on the NE40E.