Traffic Forwarding in M-LAG Failure Scenarios
M-LAG technology improves link reliability from card-level to device-level. If a fault (link, device, or peer-link fault) occurs, M-LAG ensures that normal services are not affected. The following describes how M-LAG ensures proper service running when a fault occurs.
Uplink Failure
DAD packets are generally transmitted through the DAD link between management interfaces. Therefore, DAD between M-LAG master and backup devices is not affected when an uplink fails. The dual-active system is not affected, and M-LAG master and backup devices still properly forward traffic. In Figure 4-13, traffic passing the M-LAG master device is forwarded through the peer-link because the uplink of the M-LAG master device fails.
If the DAD link is on a service network and the faulty uplink is the DAD link, the M-LAG works properly without being affected. If the peer-link also fails, DAD cannot be performed and packet loss occurs.
Downlink Failure
If a downlink M-LAG member interface fails, the DFS group master and backup states do not change. However, if the faulty M-LAG member interface is in master state, the backup M-LAG member interface changes to master state, and traffic is switched to the corresponding link for transmission. The link of the faulty M-LAG member interface goes Down, and the dual-homing networking changes to single-homing networking. The MAC address of the faulty M-LAG member interface is changed to that of the peer-link interface in corresponding entries. After the faulty M-LAG member interface recovers, the status of M-LAG member interfaces is not changed. The backup M-LAG member interface that changes to the master M-LAG member interface remains in master state, and the original master M-LAG member interface is in backup state after the fault is rectified. You can run the display dfs-group dfs-group-id node node-id m-lag command to view the status of an M-LAG member interface.
Assume that a multicast source is at the network side and a multicast group member is at the access side. If the M-LAG member interface on the M-LAG master device fails, the device instructs the remote device to update multicast entries through M-LAG synchronization packets. M-LAG master and backup devices do not load balance traffic depending on whether the last digit of the multicast group address is an odd or even number, and all multicast traffic is forwarded by the M-LAG backup device on which the M-LAG member interface is Up. If the M-LAG member interface on the M-LAG backup device fails, multicast traffic is forwarded similarly.
M-LAG Master Device Failure
If the M-LAG master device fails, the M-LAG backup device becomes the master device and continues to forward traffic, and its Eth-Trunk link is still in Up state. The Eth-Trunk link of the M-LAG master device goes Down, and the dual-homing networking changes to single-homing networking.
If the M-LAG backup device fails, the M-LAG master and backup status remains unchanged, and the Eth-Trunk link of the M-LAG backup device goes Down. The Eth-Trunk link of the M-LAG master device is still in Up state and continues to forward traffic. The dual-homing networking changes to single-homing networking.
When a faulty M-LAG member device recovers, the peer-link goes Up first, and the two M-LAG member devices renegotiate their master and backup roles. After the negotiation succeeds, the M-LAG member interface on the faulty M-LAG member device goes Up and traffic is load balanced. Both the M-LAG master and backup devices retain their original roles after recovering from a fault.
Peer-Link Failure
If the peer-link fails but the DAD heartbeat status is normal when M-LAG is used for dual-homing access on a common Ethernet, VXLAN, or IP network, interfaces excluding the logical interface, management interface, peer-link interface, and stack interface on the M-LAG backup device enter the Error-Down state by default. If the peer-link fails but the DAD heartbeat status is normal when M-LAG is used for dual-homing access on a TRILL network, the M-LAG member interface on the M-LAG backup device enters the Error-Down state.
When the faulty peer-link recovers, the M-LAG member interface in the Error-Down state automatically restores to the Up state after 240s by default, and the other interfaces in the Error-Down state automatically restore to the Up state immediately.
You can run the dual-active detection error-down mode routing-switch command to configure logical interfaces to enter the Error-Down state when the peer-link fails but the DAD heartbeat status is normal in an M-LAG scenario. If the peer-link fails but the DAD heartbeat status is normal when M-LAG is used for dual-homing access on a VXLAN or IP network, the VLANIF interface, VBDIF interface, loopback interface, and M-LAG member interface on the M-LAG backup device enter the Error-Down state.
After logical interfaces are configured to change to Error-Down state when the peer-link fails but the DAD heartbeat status is normal in an M-LAG, if a faulty peer-link interface in the M-LAG recovers, the devices restore VLANIF interfaces, VBDIF interfaces, and loopback interfaces to Up state 6 seconds after DFS group pairing succeeds to ensure that ARP entry synchronization on a large number of VLANIF interfaces is normal. If a delay after which the Layer 3 protocol status of the interface changes to Up is configured, the delay after which VLANIF interfaces, VBDIF interfaces, and loopback interfaces go Up is the configured delay plus 6 seconds.
You can run the m-lag unpaired-port suspend and m-lag unpaired-port reserved commands to flexibly configure whether an interface enters the Error-Down state when the peer-link fails but the DAD heartbeat status is normal in an M-LAG scenario. Table 4-4 describes the interfaces in the Error-Down state when the peer-link fails, the DAD heartbeat status is normal, and the following functions are configured.
Device Configuration |
M-LAG Access to a Common Ethernet, VXLAN, or IP Network |
---|---|
Default scenario |
Interfaces excluding the logical interface, management interface, peer-link interface, and stack interface are in the Error-Down state. |
Suspend function enabled only |
Only the M-LAG member interface and the interface configured with this function are in the Error-Down state. |
Reserved function enabled only |
Interfaces excluding the interface configured with this function, logical interface, management interface, peer-link interface, and stack interface are in the Error-Down state. |
Suspend and reserved functions configured |
Only the M-LAG member interface and the interface configured with the suspend function are in the Error-Down state. |
M-LAG Secondary Faults (Peer-Link and M-LAG Faults)
As shown in scenario 2 in Figure 4-17, if the peer-link fails but the DAD heartbeat status is normal when M-LAG is used for dual-homing access, some interfaces on the DFS backup device enter the Error-Down state. In this case, the DFS master device continues to work. If the DFS master device cannot work because it is powered off, its MPU is damaged, or it restarts due to a fault, both the DFS master and backup devices cannot forward traffic, as shown in scenario 3 in Figure 4-17.
- Peer-link failure: If the peer-link fails but the DAD heartbeat status is normal, some interfaces (for details, see Peer-Link Failure) on the DFS backup device are triggered to enter the Error-Down state. The DFS master device continues to work.
- DFS master device failure: If the peer-link fails and the DFS master device cannot work because it is powered off, its MPU is damaged, or it restarts because of a fault, the M-LAG master and backup devices cannot forward traffic and services are interrupted.
- Enhanced DAD for secondary faults enabled: If enhanced DAD for secondary faults is enabled, the DFS backup device can detect that the DFS master device fails through the DAD mechanism (because it does not receive any heartbeat packets from the master device within a certain period). The backup device then becomes the DFS master device, restores the interfaces in Error-Down state to the Up state, and forwards traffic.
- Secondary fault rectification scenario: Faults on the original DFS master device are rectified and the peer-link failure persists.
- If the LACP M-LAG system ID is switched to the LACP system ID of the local device within a certain period, the access device selects only one of the uplinks as the active link during LACP negotiation. The actual traffic forwarding is normal.
- If the default LACP M-LAG system ID is used, that is, it is not switched, two M-LAG devices use the same system ID to negotiate with the access device. Therefore, links to both devices can be selected as the active link. In this scenario, because the peer-link failure persists, M-LAG devices cannot synchronize information such as the priority and system MAC address of each other. As a result, two M-LAG master devices exist, and multicast traffic forwarding may be abnormal. In this case, as shown in Figure 4-18, the HB DFS master/backup status is negotiated through heartbeat packets carrying necessary information for DFS group master/backup negotiation (such as the DFS group priority and system MAC address). Some interfaces (for details, see Peer-Link Failure) on the HB DFS backup device are triggered to enter the Error-Down state. The HB DFS master device continues to work.
If secondary faults occur on the DFS backup device after the peer-link fails, traffic forwarding is not affected. The DFS master device continues to forward traffic.