iLossless: Intelligent and Lossless Network Technology
AI Fabric provides iLossless — an intelligent and lossless network technology — to prevent packet loss caused by network congestion. This algorithm prevents excessive data from entering the network, protecting the device buffer or link from being overloaded. The difference between flow control and congestion control is as follows: Flow control is an end-to-end process in which the transmission rate of the sender is suppressed, enabling the receiver to receive all packets. In contrast, congestion control is a global process involving all hosts, network devices, and factors related to deterioration of network transmission performance. On live networks, flow control and congestion control must be used together to solve network congestion.
iLossless is a series of technologies. They cooperate with each other to solve the packet loss problem caused by congestion on traditional Ethernet networks and build a network environment with zero packet loss, low delay, and high throughput for RoCEv2 traffic to meet high-performance requirements of RoCEv2 applications. Intelligent and lossless network technologies are classified into the following types:
- Flow control
- Congestion control
Flow Control
Flow control, also called link-level flow control, controls the packet sending rate of outbound interfaces on upstream switches, so that inbound interfaces on downstream switches can receive packets in time, preventing packet loss in the case of congestion on the inbound interfaces.
Priority-based Flow Control (PFC) is the most widely used flow control technology. When a PFC-enabled queue on a device is congested, the upstream device stops sending traffic in the queue, implementing zero packet loss. The system does not perform PFC for a PFC-disabled queue, and instead discards packets in the queue when it is congested.
Services can be categorized into lossless or lossy services, depending on whether packets need to be transmitted with no loss.
- Lossless services require zero packet loss during transmission. A PFC-enabled queue is a lossless queue.
- Lossy services allow packet loss during transmission. A PFC-disabled queue is a lossy queue.
PFC effectively prevents packet loss, but some traffic is stopped due to PFC. Therefore, PFC should be used as a last resort. Otherwise, a PFC deadlock may occur if PFC is triggered frequently. If congestion occurs on multiple network devices simultaneously due to a loop or other causes, the interface buffer usage of each device exceeds the threshold, and these devices wait for each other to release resources. As a result, data flows on these devices are permanently blocked. This network state is known as the PFC deadlock state.
To eliminate PFC deadlocks, AI Fabric provides the PFC deadlock detection and PFC deadlock prevention functions. In addition, to optimize the application of the PFC function, AI Fabric provides buffer optimization of lossless queues. This function prevents packet loss caused by congestion before PFC takes effect.
Congestion Control
Congestion control is a global process that enables the network to bear existing traffic load. To mitigate and relieve congestion, the forwarding device, traffic sender, and traffic receiver need to collaborate with each other, and congestion feedback mechanisms need to be used to adjust traffic on the entire network.
Data Center Quantized Congestion Notification (DCQCN) is the most widely used congestion control algorithm on RDMA networks. For DCQCN to work, network devices are only required to support Explicit Congestion Notification (ECN), with other DCQCN functions implemented on the NICs of hosts. When congestion occurs on a forwarding device, the forwarding device sends ECN-marked packets to the traffic receiver. The traffic receiver sends Congestion Notification Packets (CNPs) to the traffic sender. The traffic sender then reduces its packet sending rate to relieve network congestion.
However, DCQCN has problems that cannot be ignored. For example, to use the traditional static ECN function, you must manually configure parameters such as ECN thresholds. However, traffic models on live networks are complex and devices with static ECN configurations cannot cope with traffic changes. In addition, the DCQCN control loop delay, which is already too long due to the long congestion feedback path, becomes even longer on a larger network. As a result, the sender cannot reduce the packet sending rate in a timely manner, and even the congestion is exacerbated.
To solve the DCQCN issues, AI Fabric provides the AI Explicit Congestion Notification (ECN), dynamic ECN, fast ECN, and fast CNP functions. In addition, to apply the DCQCN function to the VXLAN network and extend DCQCN application scenarios, AI Fabric also provides the ECN overlay function.