LDP Reliability
Overview of LDP Reliability
If a node or link on a working LDP LSP fails, reliability technologies are required to set up a backup LDP LSP and switch traffic to the backup LDP LSP, while minimizing packet loss in this process.
When a node on a working LDP LSP encounters a control plane failure but the forwarding plane is still working, reliability technologies are required to ensure traffic forwarding during fault recovery on the control plane.
Reliability Technology |
Description |
Function |
---|---|---|
Fault detection |
Rapidly detects faults on LDP LSPs of an MPLS network and triggers protection switching. |
|
Traffic protection |
Ensures that traffic is switched to the backup LDP LSP and minimizes packet loss when a working LDP LSP fails. |
|
Ensures nonstop forwarding on the forwarding plane when the control plane fails on a node. |
BFD for LDP LSP
Bidirectional Forwarding Detection (BFD) can quickly detect faults on an LDP LSP and trigger a traffic switchover upon an LDP LSP failure to improve network reliability.
Background
If a node or link along a working LDP LSP fails, traffic is switched to the backup LSP. The path switchover time depends on the speed of fault detection and traffic switching. A slow path switchover causes long-time traffic loss. Fast traffic switching can be ensured by LDP FRR. Because the fault detection mechanism of LDP is slow, however, LDP FRR alone cannot solve the preceding problem.
As shown in Figure 3-12, an LSR periodically sends Hello messages to its neighboring LSRs to advertise its existence on the network and maintain adjacencies. An LSR creates a Hello timer for each neighbor to maintain an adjacency. Each time the LSR receives a Hello message, the LSR resets the Hello timer. If the Hello timer expires before the LSR receives a new Hello message, the LSR considers that the adjacency is terminated. This mechanism cannot detect link faults quickly, especially when a Layer 2 device is deployed between LSRs.
BFD can quickly detect faults on an LDP LSP and trigger a traffic switchover upon an LDP LSP failure, minimizing packet loss and improving network reliability.
Implementation
BFD for LDP LSPs can rapidly detect a fault on an LDP LSP and notify the forwarding plane of the fault to ensure fast traffic switchover.
A BFD session is bound to an LSP. That is, a BFD session is set up between the ingress and egress nodes. A BFD packet is sent from the ingress node to the egress node along an LSP. Then, the egress node responds to the BFD packet. In this manner, the ingress node can detect the LSP status quickly. After BFD detects an LSP failure, BFD notifies the forwarding plane. Then, the forwarding plane switches traffic to the backup LSP.
Synchronization Between LDP and Static Route
On an MPLS network, primary and backup LSPs are established between LSRs based on static routes. If the LDP session on the primary LSP fails (for some reason other than a link failure) or the primary LSP recovers, traffic is switched between the primary and backup LSPs, causing packet loss. Synchronization between LDP and static routes can solve this problem.
As shown in Figure 3-14, LSR_1 and LSR_4 are connected using static routes. LDP establishes primary and backup LSPs between LSR_1 and LSR_4 based on static routes, and LinkA is the primary path.
Synchronization between LDP and static route implements LSP switchover in the following scenarios:
The LDP session on the primary LSP fails (for some reason other than a link failure).
When the LDP session is Up, MPLS traffic is forwarded through LinkA. If LDP is disabled or an LDP fault occurs on LSR_2, the LDP session between LSR_1 and LSR_2 is torn down. However, the link between LSR_1 and LSR_2 is running properly and static routes are active. When the LSP is switched to LinkB, MPLS traffic is interrupted between LSR_1 and LSR_4.
After synchronization between LDP and static route is enabled on LSR_1, static routes automatically switch to LinkB when the LDP session goes Down. This ensures uninterrupted MPLS traffic during an LSP switchover.
The primary LSP recovers from a fault.
If the link between LSR_1 and LSR_2 fails, the LSP switches to LinkB. When the link between LSR_1 and LSR_2 recovers, the LSP switches back to LinkA. At this time, the backup LSP cannot be used, but the new LSP has not been established. MPLS traffic between LSR_1 and LSR_4 is interrupted during this period.
After synchronization between LDP and static route is enabled on LSR_1, static routes become active only when the LDP session is Up, which ensures uninterrupted traffic.
Synchronization Between LDP and IGP
Synchronization between LDP and IGP ensures consistent IGP and LDP traffic by suppressing IGP route advertisement. This minimizes packet loss and improves network reliability.
Background
- When the primary link fails, the IGP route of the backup link becomes primary and traffic is switched to the backup LSP over the backup link (through LDP FRR). After the primary link recovers, the IGP route of the primary link becomes primary before an LDP session is established over the primary link. As a result, traffic is dropped during attempts to use the unreachable LSP.
- When the IGP route of the primary link is reachable and an LDP session between nodes on the primary link fails, traffic is directed using the IGP route of the primary link, while the LSP over the primary link is torn down. Because a preferred IGP route of the backup link is unavailable, an LSP over the backup link cannot be established, causing traffic loss.
- When the primary/backup switchover occurs on a node, the LDP session is established after IGP GR completion. IGP advertises the maximum cost of the link, causing route flapping.
Synchronization between LDP and IGP helps prevent traffic loss caused by these problems.
Related Concepts
Hold-down timer: controls the amount of time before establishing an IGP neighbor relationship.
Hold-max-cost timer: controls the interval for advertising the maximum link cost on an interface.
Delay timer: controls the amount of time before an LSP establishment.
Implementation
As shown in Figure 3-15, when traffic is switched between primary and backup links, synchronization between LDP and IGP is implemented as follows.
Synchronization between LDP and IGP is implemented as follows:
- The primary link recovers from a physical fault.
The faulty link between LSR_2 and LSR_3 recovers.
An LDP session is set up between LSR_2 and LSR_3. IGP starts the Hold-down timer to suppress establishment of the neighbor relationship.
Traffic keeps traveling through the backup LSP.
After the link fault is rectified, LSR2 and LSR3 can discover each other as an LDP peer and reestablish an LDP session (over a reachable LSR2-to-LSR3 route along the path LSR2 -> LSR4 -> LSR5 -> LSR3). They send a Label Mapping message to each other to establish an LSP and instruct the IGP to start LDP-IGP synchronization.
The IGP establishes a neighbor relationship and switches traffic back to the primary link. The LSP is reestablished and its route converges on the primary link.
- IGP on the primary link is normal and the LDP session is faulty.
An LDP session between nodes along the primary link becomes defective.
LDP notifies the primary link of the session fault. IGP starts the Hold-max-cost timer and advertises the maximum cost on the primary link.
The IGP route of the backup link becomes reachable.
An LSP is established over the backup link and the LDP module on LSR_2 delivers forwarding entries.
- In Figure 3-16, when
an active/standby switchover occurs, synchronization between LDP and
IGP is implemented as follows.
Synchronization between LDP and IGP is implemented as follows:
An IGP on the GR Restarter advertises the actual cost of the primary link and starts the GR Delay timer. The GR Restarter does not end the GR process before the GR delay timer expires. An LDP session can be set up during this period.
Before the GR Delay timer expires, the GR Helper retains the original IGP route and the LSP. When the LDP session goes Down, LDP does not notify the IGP link of the session Down event. In this case, the IGP still advertises the actual link cost, ensuring that the IGP route is not switched to the backup link. When the GR Delay timer expires, GR is complete. If the LDP session is not established, the IGP starts the Hold-max-cost timer and advertises the maximum cost of the primary link, so that the IGP route is switched to the backup link.
If the LDP session is established or the Hold-max-cost timer expires, the IGP resumes the actual link cost of the interface. Then the IGP route is switched back to the primary link.
LDP FRR
LDP fast reroute (FRR) provides link backup on an MPLS network. When the primary LSP fails, traffic is quickly switched to the backup LSP, minimizing traffic loss.
Background
On an MPLS network, when the primary link fails, IP FRR ensures fast IGP route convergence and switches traffic to the backup link. However, a new LSP needs to be established, which causes traffic loss. If the LSP fails (for some reason other than a primary link failure), traffic is restored until a new LSP is established, causing traffic interruption for a long time. LDP FRR is used on an MPLS network to address these issues.
LDP FRR, using the liberal label retention mode of LDP, obtains a liberal label, assigns a forwarding entry to the label, and then delivers the forwarding entry to the forwarding plane as the backup forwarding entry for the primary LSP. When the interface goes Down (as detected by the interface itself or by BFD) or the primary LSP fails (as detected by BFD), traffic is quickly switched to the backup LSP.
Concepts
Manual LDP FRR: The outbound interface and next hop of the backup LSP must be specified using a command. When the source of the liberal label matches the outbound interface and next hop, a backup LSP can be established and its forwarding entries can be delivered.
Auto LDP FRR: This automatic approach depends on IP FRR. A backup LSP can be established and its forwarding entries can be delivered only when the source of the liberal label matches the backup route. That is, the liberal label is obtained from the outbound interface and next hop of the backup route, the backup LSP triggering conditions are met, and there is no backup LSP manually configured based on the backup route. By default, LDP LSP setup is triggered by a 32-bit backup route.
When both Manual LDP FRR and Auto LDP FRR meet the establishment conditions, Manual LDP FRR backup LSP is established preferentially.
Implementation
In liberal label retention mode, an LSR can receive a Label Mapping message of an FEC from any neighboring LSR. However, only the Label Mapping message sent by the next hop of the FEC can be used to generate a label forwarding table for LSP setup. In contrast, LDP FRR can generate an LSP as the backup of the primary LSP based on Label Mapping messages that are not from the next hop of the FEC. Auto LDP FRR establishes forwarding entries for the backup LSP and adds the forwarding entries to the forwarding table. If the primary LSP fails, traffic is switched to the backup LSP quickly to minimize traffic loss.
In Figure 3-17, the optimal route from LSR_1 to LSR_2 is LSR_1-LSR_2. A suboptimal route is LSR_1-LSR_3-LSR_2. After receiving a label from LSR_3, LSR_1 compares the label with the route from LSR_1 to LSR_2. Because LSR_3 is not the next hop of the route from LSR_1 to LSR_2, LSR_1 stores the label as a liberal label. If a route is available for the source of the liberal label, LSR_1 assigns a forwarding entry to the liberal label as the backup forwarding entry, and then delivers this forwarding entry to the forwarding plane with the primary LSP. In this way, the primary LSP is associated with the backup LSP.
LDP FRR is triggered when an interface failure is detected by the interface itself or BFD, or a primary LSP failure is detected by BFD. After LDP FRR is complete, traffic is switched to the backup LSP using the backup forwarding entry. Then the route is converged from LSR_1-LSR_2 to LSR_1-LSR_3-LSR_2. An LSP is established on the new path (the original backup LSP) and the original primary LSP is deleted. Traffic is forwarded along the new LSP of LSR_1-LSR_3-LSR_2.
Usage Scenario
Figure 3-17 shows a typical application environment of LDP FRR. LDP FRR functions well in a triangle topology but may not take effect in some situations in a rectangle topology.
As shown in Figure 3-18, if the optimal route from LSR_1 to LSR_4 is LSR_1-LSR_2-LSR_4 (with no other route for load balancing), LSR_3 receives a liberal label from LSR_1 and is bound to LDP FRR. If the link between LSR_3 and LSR_4 fails, traffic is switched to the route of LSR_3-LSR_1-LSR_2-LSR_4. No loop occurs in this situation.
However, if optional routes from LSR_1 to LSR_4 are available for load balancing (LSR_1-LSR_2-LSR_4 and LSR_1-LSR_3-LSR_4), LSR_3 may not receive a liberal label from LSR_1 because LSR_3 is a downstream node of LSR_1. Although LSR_3 receives a liberal label and is configured with LDP FRR, traffic may still be forwarded to LSR_3 after the traffic switching, leading to a loop. The loop exists until the route from LSR_1 to LSR_4 is converged to LSR_1-LSR_2-LSR_4.
LDP GR
LDP Graceful Restart (GR) ensures uninterrupted traffic transmission during a protocol restart or active/standby switchover because the forwarding plane is separated from the control plane.
Background
On an MPLS network, when the GR Restarter restarts a protocol or performs an active/standby switchover, label forwarding entries on the forwarding plane are deleted, interrupting data forwarding.
LDP GR can address this issue and therefore improve network reliability. During a protocol restart or active/standby switchover, LDP GR retains label forwarding entries because the forwarding plane is separated from the control plane. The device still forwards packets based on the label forwarding entries, ensuring data transmission. After the protocol restart or active/standby switchover is complete, the GR Restarter can restore to the original state with the help of the GR Helper.
Concepts
- GR Restarter: has GR capability and restarts a protocol.
- GR Helper: assists in the GR process as a GR-capable neighbor of the GR Restarter.
The AR3260 can function as both the GR restarter and GR helper, and other devices can only function as the GR helper.
- Forwarding State Holding timer: specifies the duration of the LDP GR process.
- Reconnect timer: controls the time during which the GR Helper waits for LDP session reestablishment. After a protocol restart or active/standby switchover occurs on the GR Restarter, the GR Helper detects that the LDP session with the GR Restarter is Down. The GR Helper then starts this timer and waits for the LDP session to be reestablished before the timer expires.
- Recovery timer: controls the time during which the GR Helper waits for LSP recovery. After the LDP session is reestablished, the GR Helper starts this timer and waits for the LSP to recover before the timer expires.
Implementation
Figure 3-19 shows LDP GR implementation.
LDP GR works as follows:
- An LDP session is set up between the GR Restarter and GR Helper. The GR Restarter and GR Helper negotiate GR capabilities during LDP session setup.
- When restarting a protocol or performing an active/standby switchover, the GR Restarter starts the Forwarding State Holding timer, retains label forwarding entries, and sends an LDP Initialization messages to the GR Helper. When the GR Helper detects that the LDP session with the GR Restarter is Down, the GR Helper retains label forwarding entries of the GR Restarter and starts the Reconnect timer.
- After the protocol restart or active/standby switchover, the GR Restarter reestablishes an LDP session with the GR Helper. If an LDP session is not reestablished before the Reconnect timer expires, the GR Helper deletes label forwarding entries of the GR Restarter.
- After the GR Restarter reestablishes an LDP session with the GR Helper, the GR Helper starts the Recovery timer. Before the Recovery timer expires, the GR Restarter and GR Helper exchange Label Mapping messages over the LDP session. The GR Restarter restores forwarding entries with the help of the GR Helper, and the GR Helper restores forwarding entries with the help of the GR Restarter. After the Recovery timer expires, the GR Helper deletes all forwarding entries that have not been restored.
- After the Forwarding State Holding timer expires, the GR Restarter deletes label forwarding entries and the GR is complete.
LDP NSR
LDP Non-Stop Routing (LDP NSR) ensures nonstop data transmission on the control plane and forwarding plane when an active/standby switchover occurs on a device, without the help of neighboring nodes. For details about NSR, see NSR in Huawei AR Series Access Routers Configuration Guide - Reliability.
The AR2240 supports LDP NSR.
LDP control block
LSP forwarding entries
Cross connect (XC) information that describes the cross connection between a forwarding equivalence class (FEC) and an LSP
Labels, including the following types:
- LDP LSP labels on a public network
- VC labels in Martini Virtual Leased Line (VLL) networking
- VC labels used by dynamic PWs in Pseudo-Wire Emulation Edge to Edge (PWE3) networking
Local-and-Remote LDP Session
A local node can set up both local and remote LDP adjacencies with an LDP peer. That is, the peer is maintained by both local and remote LDP adjacencies.
As shown in Figure 3-20, when the local LDP adjacency is deleted because the link associated with the adjacency fails, the type of the peer may change but the peer status remains unchanged. Depending on the adjacency type, the peer type can be local, remote, or local-and-remote.
When the link is faulty or recovering, the peer type may change as well as the corresponding session type. However, the session stays Up in this process and is not deleted or set to Down.
A typical application of local-and-remote LDP is with a Layer 2 virtual private network (L2VPN). As shown in Figure 3-20, L2VPN services are configured on PE_1 and PE_2. When the direct link between PE_1 and PE_2 is disconnected and then recovers, the changes in the peer and session types are as follows:
- PE_1 and PE_2 have MPLS LDP enabled and establish a local LDP session. Then PE_1 and PE_2 are configured as remote peers and establish a remote LDP session. PE_1 and PE_2 maintain both local and remote adjacencies. In this case, a local-and-remote LDP session exists between PE_1 and PE_2. L2VPN messages are transmitted over this LDP session.
- When the physical link between PE_1 and PE_2 goes Down, the local LDP adjacency goes Down. The route between PE_1 and PE_2 is reachable through P, so the remote LDP adjacency is still Up. The session type changes to a remote session. Since the session is still Up, L2VPN is uninformed of the session type change and does not delete the session. This avoids the neighbor disconnection and recovery process and therefore reduces the service interruption time.
- When the physical link between PE_1 and PE_2 recovers, the local LDP adjacency goes Up. The session is restored to a local-and-remote and remains Up. Again L2VPN is not informed of the session type change and does not delete the session. This reduces the service interruption time.