MTU Planning Suggestions for Inter-DC VXLAN Interconnection
Scenario Description
Two data centers (DCs) are connected through the WAN bearer network to implement cross-DC communication between servers. The boxes in gray indicate internal networks of DCs that consist of CloudEngine series switches (CE switches for short) and have Virtual Extensible LAN protocol (VXLAN) deployed. A VXLAN tunnel across the bearer network (highlighted in green) is established between the DCI leaf nodes in DC1 and DC2 to forward cross-DC traffic. In this case, the MTU needs to be planned on the entire network. Otherwise, packets may be lost during cross-DC forwarding.
Forwarded Packets Are Not Fragmented in a DC
A CE switch does not fragment packets whose length exceeds the interface MTU. That is, the packets sent by a server are not fragmented by a CE switch even if the packet length exceeds the MTU of the CE switch.
By default, the jumbo frame length supported by a CE switch is 9216 bytes. In specific scenarios, if the MTU on a server is set to 9000 bytes (1500 bytes by default) and a 50-byte VXLAN header (20-byte IP header + 8-byte UDP header + 8-byte VXLAN header + 14-byte MAC header) is added to packets from the server, the CE switch can forward the packets without fragmentation.
A CE switch fragments only the host packets (such as protocol packets) sent by itself based on the interface MTU.
Why Is MTU Planning Required?
During cross-DC VXLAN interconnection, VXLAN packets need to pass through a third-party bearer network. As shown in the green part in the preceding figure, the bearer network functions as the underlay network to forward packets only. The length of VXLAN-encapsulated packets is greater than that of some devices on the bearer network. (The MTU of some intermediate devices cannot be adjusted, or some devices are old and have a smaller MTU.) As a result, VXLAN packets are fragmented during forwarding on the underlay network. As defined in RFC 7348, fragmenting VXLAN packets is not recommended. If VXLAN packets are fragmented, the fragmented packets will be discarded on the receive VTEP, causing packet loss during inter-DC communication.
RFC 7348: https://datatracker.ietf.org/doc/rfc7348/?include_text=1
Planning Suggestions
It is recommended that the MTU be globally planned before network deployment. The following two methods are available, and the first method is recommended.
- Method 1 (recommended): Change the length of packets sent by application-layer servers to the sum of the length of packets sent by application-layer servers and 50 bytes (length of the encapsulated VXLAN header). Ensure that the total length of packets on the bearer network is smaller than the MTU of a network device per hop. This method is easy and requires cooperation from the IT side.
- Method 2: Change the MTU of a network device per hop on the bearer network. Ensure that the MTU is greater than the length of received VXLAN packets to ensure that the packets are not fragmented. It is recommended that the MTU of a device on the bearer network be larger than the default jumbo frame length (9216 bytes) supported by a CE switch. This method is often restricted by the following factors: There are a large number of devices on the bearer network, which are widely distributed and come from different vendors. Therefore, it is difficult to change the MTU. In most cases, there is no permission to change the MTU.