Recommended Networking
The following networking modes are recommended for AI Fabric in the distributed storage scenario:
- Two-layer networking
As shown in Figure 2-1, the two-layer Clos architecture (spine-leaf) is deployed in a PoD, and leaf and spine nodes are fully meshed through 100GE links. Servers are connected to leaf switches through 25GE ports and can be dual-homed to leaf switches through M-LAG. The ratio of compute nodes to storage nodes is 3:1. Server NICs support RoCEv2. RoCEv2 traffic between servers is transmitted only in the PoD.
The storage performance in the distributed storage scenario depends on the storage media and does not have high requirements on the computing performance. Typically, CPU processors are used, and the number of servers is large. Table 2-1 lists the recommended device models based on the number of server nodes.
Table 2-1 Recommended device models on the two-layer networking in the distributed storage scenarioNumber of Server Nodes (N)
Leaf
Spine
N ≤ 1024
CE6865-48S8CQ-EI
CE8850-64CQ-EI
N > 1024
CE6865-48S8CQ-EI
CloudEngine 16800 (equipped with CE-MPUE series MPUs)
Description
- The oversubscription ratio (downlink bandwidth:uplink bandwidth) can be 4:3, 2:1, or 1:1, depending on actual situations.
- Currently, only IPv4 underlay networking is supported.
- If a large Layer 2 network is required, it is recommended that VXLAN be used to implement the overlay large Layer 2 network. In this case, the ECN overlay function needs to be configured.
Here, a network with 1024 servers (compute nodes and storage nodes) is used as an example. On the network, the CE8850-64CQ-EI is used as the spine switch, providing 64 x 100GE ports; the CE6865-48S8CQ-EI is used as the leaf switch, providing 48 x 25GE ports and 8 x 100GE ports; the oversubscription ratio is 4:3. Servers can be dual-homed to leaf switches through M-LAG. Based on the preceding switch models, the numbers of required switches are calculated as follows:
- Calculate the uplink bandwidth: The M-LAG link between the two leaf switches uses two 100GE interfaces, and the remaining six 100GE interfaces are used as the uplinks of each leaf switch to connect to the spine switch. Therefore, the uplink bandwidth for each leaf switch is 6 x 100GE.
- Calculate the number of leaf switches: The oversubscription ratio is 4:3, and the uplink bandwidth is 6 x 100GE. If the access bandwidth is 25GE, 32 downlinks are required. That is, each leaf switch needs to connect to 32 servers. If 1024 servers are deployed and they are dual-homed to leaf switches through M-LAG, a total of 64 (1024/32 x 2) leaf switches are required.
- Calculate the number of spine switches: There are 64 leaf switches, and each leaf switch has six 100GE uplinks for connecting to spine switches. The CE8850-64CQ-EI provides 64 x 100GE interfaces. Therefore, a total of 64 spine switches are required.
- Cross-PoD two-layer networking
Compute servers and storage servers are deployed in different PoDs. RoCEv2 traffic between servers can be forwarded across PoDs, and traffic between PoDs is forwarded through core switches. Core and spine switches are fully meshed through 100GE links. The spine-leaf two-layer networking is deployed in each PoD.
Figure 2-2 AI Fabric deployment topology
Table 2-2 lists the recommended device models.