NE40E/80E VRP version: V300R002C06B511.
Topology: see the attachment, <Topology> and <SDH analysis>.
Problem description: there are 4 POS interfaces (stm-1 link) in the link ip-trunk 4 between LHR-NE80E-B1 and LHR-NE40E-A1.The problem happened at the loopback hard test time, and the activity is that the hard loop was given 2-way i.e. b to e and c to f at ODF point T2 simultaneously about POS 16/0/4 in LHR-NE80E-B1 and POS 5/0/1 in LHR-NE40E-A1 are one of them. (Suppose R1 in <SDH Analysis>is LHR-NE80E-B1 and R2 in <SDH Analysis> is LHR-NE40E-A1). And the LDP session was down and ISIS was up in ip-trunk 4 of both sides. The traffic of ip-trunk 4 between LHR-NE80E-B1 and LHR-NE40E-A1 was interrupted about 12 minute, and it recovered due to traffic reroute to Plane B after shutdown the ip-trunk 4 link between them.
Jul 29 2009 17:14:40 LHR-NE80E-B1 %%01LDP/4/DELSSNSENDNOTI(l): The session was deleted and the notification SHUTDOWN was sent to the peer 172.31.255.65.
Jul 29 2009 17:14:40 LHR-NE80E-B1 %%01L2V/5/RECVLDPSTA(l): VLL received LDP session status 5 from peer 172.31.255.65.
Jul 29 2009 17:14:40 LHR-NE80E-B1 %%01RM/3/LDP_SES_STA(l): RM receive LDP session DOWN on the Ip-Trunk4.
Jul 29 2009 17:14:40 LHR-NE80E-B1 %%01LDP/4/NOFINDSOCK(l): The TCP event 6 was received, but failed to find the corresponding Socket 101.
Jul 29 2009 17:14:40 LHR-NE80E-B1 %%01LDP/6/PEERCLS(l): The message that the peer closed was received from TCP Socket ID 101.
#Jul 29 17:14:40 2009 LHR-NE80E-B1 LSPM/4/TRAP:188.8.131.52.184.108.40.206.2.0.2 LSP 57578 Changes to Down
(Note: Information deleted for brevity)
Jul 29 2009 17:14:40 LHR-NE40E-A1 %%01LDP/4/HOLDTMREXP(l): Sessions were deleted because the hello hold timer expired. (PeerId=172.31.255.73)
Jul 29 2009 17:14:40 LHR-NE40E-A1 %%01RM/3/LDP_SES_STA(l): RM receive LDP session DOWN on the Ip-Trunk1.
Jul 29 2009 17:14:40 LHR-NE40E-A1 %%01LDP/4/DELSSNSENDNOTI(l): The session was deleted and the notification SHUTDOWN was sent to the peer 172.31.255.73.
Jul 29 2009 17:14:40 LHR-NE40E-A1 %%01L2V/5/RECVLDPSTA(l): VLL received LDP session status 5 from peer 172.31.255.73.
(Note: Information deleted for brevity)
Remove the pos interface from ip-trunk while we need to make loopback test.
We confirmed it from transmission expert that while loopback hard test in ODF of transmission side, transmission device will not send any rdi or ais SDH alarm, so the command “alarm lrdi sensitive; alarm prdi sensitive; alarm pais sensitive” will not work so the physical port will be up for router is receiving power from itself through loop. When default encapsulation protocol of POS link layer is HDLC, the line protocol can be up even looping with hardware device. So the both physical and line protocol status of the two POS ports were up.
Refer to RFC 3036, LDP Specification, LHR-NE80E-B1 is active; it attempts to establish the LDP TCP connection by connecting to the well-known LDP port at address LHR-NE40E-A1. LHR-NE40E-A1 is passive; it waits for LHR-NE80E-B1 to establish the LDP. (LSR1 determines whether it will play the active or passive role in session establishment by comparing addresses A1 and A2 unsigned integers. If A1 > A2, LSR1 plays the active role; otherwise it is passive, it waits for LSR2 to establish the LDP TCP connection to its well-known LDP port.) So we found many LDP establishing logs in LHR-NE80E-B1 but no such logs in LHR-NE40E-A1.
These LDP establishing tries continued until ip-trunk shutdown activity because the both physical and line protocol status of the two POS ports were up and LDP protocol packets were hashed to this link always. These LDP establishing tries failed because the communication between the two LDP peers was down actually due to loopback hard test.
A. PPP and HDLC are different and line protocol can not be up in the situation of last day so LDP would not try to establish peer in that pos interface otherwise shift to other good pos of ip-trunk.
PPP from Physical UP to Line Protocol UP need the below 3 steps：
3.Network layer negotiate(IPCP、IPXCP).
Any failure in any step will cause negotiate fail. While LCPU negotiate, including MRU, magic number, Authenticate mode and etc. If loopback detected, magic number checking mechanism will find loop then cause LCP negotiate fail and line protocol of PPP can not be up at last.
But maybe you can find line protocol of PPP is up for several seconds and down for ever if you lucky because it's during negotiate.
B. LDP module of NE80E/40E doesn't have the check list that if loopback detected, so physical up and line protocol up will let LDP trust the pos is fine.
C. The traffic hadn't been shifted in that pos during loopback test. There are two planes in NE40E/80E, one is control plane (such as ISIS) which control if give the command to reroute the traffic, another is forward plane (such as MPLS LDP) which is used to forward traffic (mpls label switch). So at that moment, control plane is ok but forward plane is down, control plane though ip-trunk 4 is ok but actually forward plane is down. So traffic was discarded and blocked.
1. We have submitted the suggestion as Customer Requirement which require NE40E/80E not select the pos interface as LDP negotiation when detects the loopback status to HQ for considering this good suggestion in the feature version of NE40E/80E.
2. We strongly suggest that we do loopback test after remove the pos interface from IP-Trunk because it will cause packet loss or interruption completely without any advance and it also is the operation standard.